Offline alternative Memento

Another thing. There seems to be an issue with the automatic course DIRECTORY NAMING.
Yesterday I scraped, among other, this course:

Although the course is named
Manual of Standard Tibetan (Tournadre & Dorje)
the directory got named
Manual of Standard Tibetan (Tournadre &…
(including the three dots in the end)
Memento did not display this course at all.
Only after I noticed it did get downloaded and renamed the directory to remove the & and the dots, it gets displayed.

Apart from the above issue with automatic directory naming (the shortened name that Memento does not see) there is this obvious issue that courses like the following three cannot be downloaded all at a time

the script names the directory for the first course only “Eyes of Worlds” and exits when reaching the second course with the message that “directory Eyes of Worlds already exists”.

If it has to name the dirs automatically, could it not simply check for a directory name conflict BEFORE it starts downloading all the courses, and/or in case of conflict, automatically RENAME a conflicting dir as “…01” “…02” ?

Also, how about taking not the name of the course as displayed, but the name used in the URL as the name of the dir?

I see. I overlooked that those courses that were downloaded before the one-level course actually did get saved.

Tibetan i-vowel sign not displaying in tapping test:
འི་
The above character (syllable) consists of a consonant sign འ and a diacritic ི
Ideally, the tapping test should treat such combinations as one character. At the moment it splits them, and only an empty box gets displayed instead of this top diacritic ི
In the original version of the course, the senteces get split into whole words. It is OK if they split into individual characters by Memento (though it is a bit less effective - then the tapping is nearly no different from typing), but one syllabic character needs to be treated as one character. (There are these vowel signs above ི ེ ོ and this one vowel sign below ུ but also there is a multitude of stacked characters, e.g. སྒྱ which consists of ས ག ཡ )
I do not expect you to deal with this, as you are not learning languages using India-derived abugidas.
It’s just FYI

Got banned from adding more replies into this topic,
so I will give you some rest now :smiley:

While downloading a Hindi course (Complete Hindi Course (audio) - by INDIAN_YOGI - Memrise), something happened again (the

Scraping item 337 of 1278
Traceback (most recent call last):
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\response.py”, line 438, in _error_catcher
yield
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\response.py”, line 519, in read
data = self._fp.read(amt) if not fp_closed else b""
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\http\client.py”, line 459, in read
n = self.readinto(b)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\http\client.py”, line 503, in readinto
n = self.fp.readinto(b)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\socket.py”, line 704, in readinto
return self._sock.recv_into(b)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\ssl.py”, line 1241, in recv_into
return self.read(nbytes, buffer)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\ssl.py”, line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [WinError 10054] The existing connection was forcibly terminated by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\models.py”, line 753, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\response.py”, line 576, in stream
data = self.read(amt=amt, decode_content=decode_content)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\response.py”, line 541, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\contextlib.py”, line 135, in exit
self.gen.throw(type, value, traceback)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\response.py”, line 455, in _error_catcher
raise ProtocolError(“Connection broken: %r” % e, e)
urllib3.exceptions.ProtocolError: (“Connection broken: ConnectionResetError(10054, ‘The existing connection was forcibly terminated by the remote host’, None, 10054, None)”, ConnectionResetError(10054, ‘The existing connection was forcibly terminated by the remote host’, None, 10054, None))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Program Files (x86)\Memento_Windows_10_64-bit_v0.5\helper_scripts\scrape_memrise.py”, line 74, in
course.autoScrape(destination, minLevel, maxLevel, skipAudio, skipMnemonics)
File “C:\Program Files (x86)\Memento_Windows_10_64-bit_v0.5\helper_scripts\MemriseCourse.py”, line 369, in autoScrape
self.buildSeedbox(skipAudio, skipMnemonics)
File “C:\Program Files (x86)\Memento_Windows_10_64-bit_v0.5\helper_scripts\MemriseCourse.py”, line 263, in buildSeedbox
open(join(self.courseDir, “assets”, “audio”, audioName), “wb”).write(requests.get(audio[“url”]).content)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py”, line 76, in get
return request(‘get’, url, params=params, **kwargs)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py”, line 61, in request
return session.request(method=method, url=url, **kwargs)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py”, line 542, in request
resp = self.send(prep, **send_kwargs)
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py”, line 697, in send
r.content
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\models.py”, line 831, in content
self._content = b’’.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b’’
File “C:\Users\jakub\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\models.py”, line 756, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: (“Connection broken: ConnectionResetError(10054, ‘The existing connection was forcibly terminated by the remote host’, None, 10054, None)”, ConnectionResetError(10054, ‘The existing connection was forcibly terminated by the remote host’, None, 10054, None))

It looks like the connection failed or was refused from the Memrise side, I’m not sure. Either way I suggest you wait a little bit, because right now I’m improving the script to handle more types of situations and to be able to continue to the next course when it encounters an error instead of crashing altogether.

I suggest trying this now

It should hopefully perform better. Let me know how it goes.

line 7

^
SyntaxError: invalid syntax

Did you download both files?

I pushed a new version for those interested.

1 Like

Great. Thanks! This seems to work well.
How about adding the helper_sripts folder inside the .zip package meant for user installation?

If there is a reason not to do so, I suggest to add a second .zip file on the release page that will contain only the helper_scripts folder and naming it “CourseDownloadUtility” or “DownloadThisToo” or something to that effect :slight_smile:

What exactly are the steps to scrape the single-level courses?

For some reason, Memrise requires you to be logged in to view single level courses. So for the script to work, you need to provide it with a login cookie for a Memrise account. It doesn’t have to be your main account, it just has to be a registered account.

To get the cookie, for example from a Chrome-like browser, you have to

  1. open the page of the course
  2. open the developer tools (usually F12 or Ctrl+Shift+c)
  3. go to the Network tab
  4. refresh the page
  5. right-click on the first item that appeared on the list of the Network tab after refreshing and select the option CopyCopy as cURL
  6. paste the text somewhere, like notepad on Windows

The output should look something like this

etc etc
-H 'sec-ch-ua-mobile: ?1' \
-H 'upgrade-insecure-requests: 1' \
-H 'sec-fetch-site: none' \
etc etc

You’re interested in the line that starts with -H 'cookie:. You need to copy everything after the colon up to the single quote ' character near the end of the line. That is your login cookie for your Memrise account that you shouldn’t share with anyone because it can be used to log into your account without knowing your username + password.

Finally, you call the script like this

python scrape_memrise.py -c 'LOGIN_COOKIE' URL

Note that the cookie should be in quotes, just like the output directory that you pass with the -d option.

The script is easy to download straight from the source to update it. For example, I’ve already updated it, so if I had placed a zip of it in the releases page, it would already be outdated.

Sure, the script is easy to download. But one must KNOW that one has to download the source-code and get this particular directory out of it in addition to the main release package. So if someone only gets the link to the github release page, they will probably not be able to figure out how to get any courses into the app. That is what I meant …

UNKNOWN SCRAPING ISSUE:

Scraped a few courses succesfully.
Then, after scraping item 108 of 4000 of a course (Thai frequency, top 4000 words - by BlueRock68 - Memrise) the script suddenly started returning a cycle of “Switching database” messages endlessly, so after some minutes I aborted it

MINOR MEMENTO RUNNING ISSUE:
When downloading a course fails in the middle of the process for whatever reason, a folder with some incomplete course-database is left. The program displayes it in the list of courses with buttons WATER and REFRESH. When pressing the WATER button, the program just crashes and closes down without any message like “The course is corrupt” or anything. This might look confusing to someone who is not aware that their download of that course did not get completed.

Damn this one was annoying. It turns out this was a problem from the Memrise or course side, because the third item in level 8 (คุณ - you) had an attribute “khoon” which had an ID that doesn’t exist in the course’s database. Even Memrise itself does not show this attribute in the item preview or the test page.

I updated the script so that it does not enter an infinite loop in this oddly specific situation, but it still downloads the unknown attribute and gives it “Unnamed” as a name that you can see in the preview page. You can get the new script here

Hopefully that course doesn’t have any other surprises.

PS. I know Memento can crash under certain situations that aren’t ideal. I haven’t yet programmed any countermeasures against thing like incomplete or corrupted courses.

Tried scraping another course at last.
Now, when I write (I believe) the same structure of the command as before, namely:

C:\Program Files (x86)\Memento5-1\helper_scripts>python scrape_memrise.py -m -d “C:\Users\jakub\Documents\memrise-kurzy_naMEMENTO\MOREtoLEARN” 德格话 - GPA 1B - by Yaks4Life - Memrise

The script does not even start scraping, and immediately I get the following message:

Traceback (most recent call last):
File “C:\Program Files (x86)\Memento5-1\helper_scripts\scrape_memrise.py”, line 22, in
from MemriseCourse import MemriseCourse
File “C:\Program Files (x86)\Memento5-1\helper_scripts\MemriseCourse.py”, line 7

^
SyntaxError: invalid syntax

Did you download both files without modifying them? It’s not possible for an error to exist in line 7 because there’s no code there.

What is written for you there in line 7?

The MemriseCourse.py that is currently at the github page starts with six empty lines and the seventh line consists of the following:

!DOCTYPE html

enclosed in < > brackets

(I did not notice when I copied that error message that this forum website does not show a line starting with < at all, that is why only the second and third line of the error message were seen above)

The error message seems to point by the arrow to the left < which is the first char of that line.

That’s not right at all. How did you download the files?

The easiest way to download the latest 2 script files is by downloading a zip of the source code from the development (not master) branch

from the green code button

(Those were downloaded them from the link you sent above - the last link you sent before this one).

Yes, having now downloaded the whole developer code, I extracted them and they look different. (The one above did not even have the intro “about” lines). Now it works regularly, hurra.

Thanks.
Have a good summer
(I may remain silent as I am soon to depart for travels mostly offline)

One more request, hopefully last :slight_smile:

Would you mind to re-send all the links to what one had to download (python, some libraries or what was that soup) for setting up in Windows, together with the link to memento (and some basic installation + scraping instruction repeated, ideally), so that all this is together in one message?
Thanks a lot!!