Memrise2Anki Replacement

losingisfun · October 17, 2022, 5:00pm

It actually is very smart! I was hesitant at first but then tried the Basic with Media.apkg and even tidied it up a little in Anki’s default fashion with example cards for the future users of your extension! I sent you a pull request.

Please take notice that Google will enforce Manifest V3 for all Chrome extensions starting January 2023 so it will probably stop working in a few months. I hope it won’t require a complete rewrite.

Eltaurus · October 17, 2022, 10:10pm

Thank you for this addition, and also for helping with filling up the readme file.

Yes, I’m aware of the manifest version issue. I tried to write the extension using v3 initially, but it was too hard to filter out much more prominent v2 search results, so I decided to deal with the changing api later.
However, it seems that the support for v2 was recently announced to continue till 2024. I believe that Memrise will break something much sooner than that

Eltaurus · October 17, 2022, 10:51pm

Languages I hoped Memrise would help me learn:

Languages Memrise actually makes me learn:

Eltaurus · October 21, 2022, 2:25pm

It seems, that in the middle of it we forgot to answer the original question.

Yes, it is possible to extract audio with this extension now)

clstrife · October 24, 2022, 1:54am

@Eltaurus I opened some issues for a bad course and a suggestion. Do you want me to continue to do that or just post suggestions here?

Eltaurus · October 24, 2022, 5:07am

It’s totally up to you, but if you post your suggestions here, I think more people would see them and be able to share their opinions.

clstrife · October 24, 2022, 7:54am

Suggestion: Make an option for the extension to always download media instead of asking every time. That’s because it can take 1-5 seconds for the dialog to come up and if you move away to another tab, the dialog will close.

Also option to never suggest help importing into Anki.

Thanks

clstrife · October 24, 2022, 10:24am

Another suggestion: By default insert the UTF-8 BOM into the downloaded CSV then you won’t have people asking all the time why Excel is showing garbage.

Here are some simple scripts that will batch convert:

gist.github.com

https://gist.github.com/dogancelik/2a88c81d309a753cecd8b8460d3098bc

ansi-utf8-conversion.md

## Using [Uni2Me](http://web.archive.org/web/20090418063933/http://alf-li.pcdiscuss.com/e_index.html)

* It's free but discontinued.

## Using [UTFCast](http://www.rotatingscrew.com/utfcast.aspx)

* Proprietary software
* Allows conversion from ANSI to UTF-8 with or without BOM

## Using Notepad++

This file has been truncated. show original

I personally used the notepad++ with python script. Read the comments for a proper working version, but it does the job.

Eltaurus · October 24, 2022, 3:36pm

That’s neat! I’ve added that to the exporting settings and Excel seems to read the files properly now.
I also added global constants to disable popups and a course’s id to the saved stuff.

clstrife · October 27, 2022, 4:27am

I noticed something if you try to download multiple courses with audio at the same time. If the load is too high, it’ll crush your computer to near frozen, but that’s not the problem. It’s that the audio files in the folders will have duplicates. The total number is correct, but I’ll see (1) files meaning it had to rename and highly likely some audio files are missing because they were taken by a duplicate.

I’m guessing that courses often have audio files that share the same name, so when you hit that from 2 courses, it’s random which will get copied.

The problem doesn’t exist when you download 1 at a time, so it’s not a huge deal, but if you want to queue a bunch and go away, it may not work.

If it’s easy, I would suggest modifying the download filename by prepending/appending course ID, then when you copy into the media folder, remove the course ID to match the csv.

Or modify the csv link source and filename with course ID and then there shouldn’t be any issues.

Eltaurus · October 27, 2022, 9:47pm

The media files are downloaded directly into their respective subfolders and are not copied there from somewhere else. So I’m not sure how modifying names would solve this issue.

If you are looking for a way to run several instances of the script at the same time, you can try modifying the line

await sleep(100);

in the background.js file, replacing the number with some larger value like

await sleep(1000);

This will increase the interval between downloads in each thread, so it should minimize the chance of the threads clashing.

Btw, how many courses have you managed to download already?

clstrife · October 28, 2022, 4:46am

My Chrome download folder is on spinning disk, so it’s slow enough I could see hundreds of *.tmp files there before my laptop finally caught up and moved them into the media folders. So, not sure if it’s a Chrome thing or javascript where it temporarily downloads to your dl folder, then moves it as part of an atomic operation.

I’ll try the sleep change.

I’ve downloaded dozens of the most popular courses for Korean.

LangAddict · October 28, 2022, 7:27pm

Can I just say that you guys are the best? I thought I’d never get to import a course from Memrise again, but I’m glad I was wrong. I know there’s more important things to tweak in this extension, but is there a way to add Memrise levels as Anki tags the way the old add-on used to? Preferably as hierarchical tags, for example, “German_1” being the parent tag, then “German_1::01_The_Basics”, “German_1::02_Asking_Yes/No_Questions” the 2nd tag, and so on.

Eltaurus · October 29, 2022, 3:44am

That’s strange, for me the temporary download files appear in their media subfolders (which is the expected behavior), not in the root download directory. As far as the script is concerned, a subfolder name is just a part of a downloaded media file name. So the clashing of the names between different courses as you describe is rather puzzling.

I’m thinking about making a community-accessible collection of downloaded courses, as we did previously with the mems, and your contribution would be very much appreciated. Would you mind uploading what you have gathered somewhere?

Eltaurus · October 29, 2022, 3:49am

Thank you

I’ve added the hierarchical tags to the new version of the extension, so you are welcome to try it out.

clstrife · October 29, 2022, 4:34am

I wouldn’t mind at all, but what are the legalities of sharing content that we didn’t personally create? Kinda like the bittorrent issue where it’s much worse to share than to download.

Question: How difficult would it be to have a download links from text file option? Check for the existence of a links.txt and just parse and download every course from there instead of manually doing GUI work.

When you make updates like the levels just added, I would want to re-download the csv files. I would like to just throw the whole list to be redownloaded (easy to copy out links using any number of extensions that copy the highlighted links).

Thanks

LangAddict · October 29, 2022, 8:00am

Just tested it out on a couple of courses and works perfectly. Thank you for the really quick response, guys!

Eltaurus · October 30, 2022, 3:49am

I’m not an expert in this kind of matters, of course, but community courses are made to be freely available online. I don’t think preserving them for the future in the current form, protecting from further Memrise mishandlings, would violate anything. The way I see it, this is more a Wayback Machine type of thing rather than a bit-torrent.

I’m not sure, what is the better way to store this data, and how much space it would require, but perhaps a google drive would suffice for the time being. If you could PM me your gmail address, I’ll create a public directory and add you as a contributor.

It would probably be too big of a task to make a proper interface for such a feature, but I wrote a minimal working version that seems to do the job (although I didn’t test it on large lists of courses).
To download several courses you need to list their full URLs in the queue.txt file (it is found in the extension folder) and set BATCH = true in the coursedump2022.js. All dialog windows are suppressed during batch download, so downloading media is dependent upon ALWAYS_DWLD_MEDIA constant. Change it to true to download all media files.
In batch mode, the extension doesn’t need any particular course page to be opened, but it still needs to be run on an open memrise tab in order to establish a connection with the site.

With batch downloading the media files for all courses from the list are handled by a single process, so it should also solve your previous issue with conflicting filenames.

If you don’t want to redownload media files, and only want to make new CSVs with updated formatting, you can additionally set FAKE_DWLD = true. This will prevent downloading media files without removing their respective columns from the spreadsheets.
I had to restructure the script quite a bit this time, so I hope I didn’t break anything and this works

clstrife · October 30, 2022, 11:30am

I just tested out the batch download and it works well! The first time Chrome will ask you to allow the extension to download multiple files, but not for future executions.

I have made url lists for all the courses I wanted and it’s now very easy to download all of them or to update the csv only.

I think when there are no more iterations on the extension and the csv won’t change further, we can discuss the archiving.

If I have time, I would like to look at beautifulsoup+python (I have limited experience with it for other projects) to scrape the main course lists to pull the author, description and approximate study time and link to the url. Otherwise it’s just a bunch of random csv/zip files (of mp3s).

I think that would be necessary for proper naming of the courses as well. If the course name is unicode, the url course name may just be some random number like 428. Without scraping the webpage data, 428.csv won’t tell us anything about the course (unless you manually search for the course ID in your list of urls and open the page up).

eg 세종한극어 3 - by tmilo - Memrise

clstrife · October 31, 2022, 4:16am

Another suggestion: You may want to make the csv delimiter a variable at the top so users can change to | or \t or something else. Some cards have commas in the text, so it messes with the formatting of the csv (shifts column over). Thanks.