Request to become a contributor to Ortegam's "Edexcel GCSE Spanish Vocabulary List" course

WBLund41 · July 7, 2023, 3:46am

This is the course: Edexcel GCSE Spanish Vocabulary List

The course has a lot of useful vocabulary but is not organized well. The usage of commas differs from one word to another, the method of specifying gender and plural differs from one word to another, and many flashcards cram multiple synonyms together without denoting this in any way (among other issues). These issues mean this course is of much lower quality than similar competing courses.

If made a contributor, I’d like to standardize these aspects of the course and rearrange the vocab as needed to fix these problems.

As far as I can tell, @ortegam doesn’t have a forum account and likely hasn’t been active for many years on the main site either. If possible, could you please try to contact them directly for me?

Thanks in advance

ian_mn · July 9, 2023, 2:47pm

Hi William,

Unfortunately, editing the course would generate a substantial number of “phantom entries”.

An alternative approach would be to create you own version of ortegam’s course by first generating a text file using:
*Memrise Scraper - tech189's Website
then pasting the text file into a spreadsheet for editing. After making all necessary changes, you would then “bulk add” the content of the spreadsheet to a new Memrise course.

Another option would be to find an existing, similar Spanish course that is in good shape, and learn Spanish using that course instead. An example would be:
*AQA GCSE Spanish Vocabulary - by EllieGirgis - Memrise

WBLund41 · July 10, 2023, 8:05pm

Hey Ian, thanks for your suggestions. I have a few points I’d like to clarify, though, in response. I think these will make it more apparent why I want to fix up this course specifically. And I have a few questions I’d like to ask, as you likely have a more in-depth understanding than I do as to the nature of phantom entries, etc. I’ll go point-by-point below:

Unfortunately, editing the course would generate a substantial number of “phantom entries”

Is this not something that contributors have control over? As far as I understand it, contributors can add and remove entries as necessary, even from the databases; they just can’t edit the format of the databases (add, modify, or delete columns). As far as I can tell, there’s a userscript that will attempt to do this automatically. I don’t intend to use this userscript as I doubt it will work effectively to the level needed, but I’m linking it anyways because it explicitly says it works “for course creators or course contributors”.

And as far as I can tell, this course already effectively has a large number of “phantom entries” that aren’t quite really phantom entries per the strict definition. As mentioned above, there are countless entries duplicated across the course that differ only in how gender and plurality are represented, duplicate entries with slightly differing numbers of spaces, etc. etc. These will be suggested for multiple choice for slightly-differing identical words unless you use an ignore function.

An alternative approach would be to create you own version of ortegam’s course

And yet this seems to not be an ideal solution. A duplicate course takes up server space on Memrise’s servers (though probably not very much since it’s text-only) while largely containing the exact same content. And it doesn’t fix the course for those who find it in the future.

This is the second full-length course that comes up if you search “GCSE” in the “English (UK) → Spanish (Spain)” category and it appears to be longer than the first course on the list (which is the one you suggested). (After having studied it I don’t think it is longer, due to the duplicates). It should be perfectly reasonable to assume the second course in the list would also be of good quality, but without a proper rating system or any kind of official quality control on the community-created courses, it’s impossible to just assume that of any course.

first generating a text file using Memrise Scraper - tech189’s Website then pasting

I’ve previously used this site to obtain all words for a similarly low-quality GCSE French course and a Python script that I’ve written to suggest duplicates. This allowed me to create an ignore-list for that course that is nearly 1000 words long, which made continuing to use the course manageable. If I truly can’t get contributor on this course, I will do something similar. This course seems to be in even worse shape, though, and thus I don’t think this is an ideal solution either. I’d probably lose half of the “good” content if I ignored every flashcard that has a minor problem.

Another option would be to find an existing, similar Spanish course that is in good shape, and learn Spanish using that course instead.

I’ve already studied nearly half of this course and have a long streak. I don’t wish to start over. If I have to, I’ll just push through to the end and ignore a lot of stuff. I probably don’t really need this course anyways. I’ve finished the 5000 words course and have since started on your “top up” courses. This course seems like it could be good for phrases and drilling synonyms though, if it were fixed up a bit.

I’d like to hear if you can further clarify some of these points. You likely understand more than I do about how course creation works and about the exact nature of phantom entries, etc. and as for why these couldn’t simply be deleted.

ian_mn · July 12, 2023, 1:37am

Hi William,

Re. phantom entries.
“Is this not something that contributors have control over?”

My understanding is that editing existing items results in the database containing not only the revised item but also the corresponding item before the edit was made. As a result, during multiple choice questions the old answer (the “phantom entry”) can appear alongside the revised answer. The more items that have been edited, the more frequently phantom entries will appear when working through the course. This is a longstanding Memrise bug that course creators (and contributors) have no control over.
I’ve never tried using the user script you mention - and there have been times when the script stopped working due to Memrise updates. I don’t know what the current status is.

"And as far as I can tell, this course already effectively has a large number of “phantom entries” "

For duplicates with minor differences, you could painstakingly edit these (or delete duplicates) using the course edit mode. But then new, corresponding “phantom entries” will start appearing anyway during learning sessions.
You could also split items with more than one Spanish word. But this will likely confuse existing users.
I would suggest changing all nouns to include the definite article (e.g. “el” before the noun instead of " (m.) "after the noun. This is easy and quick to do using a text editor (e.g. Vim ) and a spreadsheet (starting with a scraped text file). This would be a laborious, time-consuming task to do manually in Memrise edit mode.

General Comments

The Ortegam course is fairly old, and is almost certainly based on a 10+ year old Edexcel specification. There is a new draft specification being released by Edexcel on 13 July 2023 that might be of interest to you.
*July 2023 Languages Update | Pearson qualifications
I’m sure I’ve not answered answered all your comments. Let me know if more commentary from me would be helpful.

WBLund41 · July 12, 2023, 2:45am

Hey, thanks for your response. I’ll give my thoughts on your answers, almost all of which I agree with. And I’ll give a bit of info from my own research into phantom entries, etc.

My understanding is that editing existing items results in the database containing not only the revised item but also the corresponding item before the edit was made.

This is my understanding, too. I imagine this bug arises because it is computationally expensive to check every level to ensure the phantom entry is no longer used in any other level after being removed from a particular level. This could’ve been fixed a long time ago by implementing each database entry as a one-to-many field (connecting words in the database to words in the levels, which can repeat) and then modifying these lists as the word is added or removed in levels. Because of the existence of this bug, I assume it wasn’t implemented this way, or it would likely be trivial to fix. Fixing this now on Memrise’s side would likely be very computationally expensive, effectively requiring a refactor of all community courses. Or a manually-triggered one-by-one refactor of courses, triggered by creators/contributors.

The other possibility is that they keep it in the database in case you accidentally deleted something that you didn’t mean to. This would be a rather unusual design decision, though. (And even if this is the case, this could also be fixed with the one-to-many idea, since you could only pull database entries with a usage count greater than 0 when selecting for multiple choice). Overall, there’s little excuse for this bug unless they were trying to save server space by not turning each word into a list…

This is a longstanding Memrise bug that course creators (and contributors) have no control over.

There is a little bit of control, in that you can (carefully) delete them from the database.

While researching now, the least time-consuming method I’ve found to deal with them is to follow a procedure similar to the one outlined here. And the second post in that thread includes a means of doing so without resetting everyone’s progress.

and there have been times when the script stopped working due to Memrise updates.

That’s very, very true. Updates have very regularly broken userscripts over the years. And so I don’t know the state of it either. I was mostly linking it as proof that contributors could have control over the database and the bug, rather than just course creators. If contributors were entirely locked out of editing the database outside of the level editor this bug would be much more severe.

For duplicates with minor differences, you could painstakingly edit these (or delete duplicates) using the course edit mode. But then new, corresponding “phantom entries” will start appearing anyway during learning sessions.

I don’t think this is necessarily the case as long as you then delete the old version of the word from the database afterwards. This certainly adds an extra step, though. But I believe it should be possible to save that extra step until the end and handle it via one of the above procedures.

You could also split items with more than one Spanish word. But this will likely confuse existing users.

I actually don’t mind having multiple words as a single flashcard. I just want them to be standardized and indicated in some format. Perhaps for indication, a number in brackets next to the prompt word (or, ideally, a different field entirely but this would require Ortegam since contributors cannot edit the fields of the database).

And as for standardization: have the synonyms in alphabetical order; with a comma and single space between synonyms; and with equivalent means of indicating gender, plurality, and/or reflexivity, etc. between synonyms, except where differences are necessary (a translation that is always plural, for example).

I think this change in particular would make the course very useful for studying synonyms, especially for those that study on computer with typed answers. (This is how I always study. I don’t use the app. I figure this is probably rather rare, though).

I would suggest changing all nouns to include the definite article (e.g. “el” before the noun instead of " (m.) "after the noun.

I agree with this suggested change, and I agree with using Vim or similar to do so. It would still need to be done carefully to ensure nothing gets wrongly replaced, though (the classic Dawizard problem). And because this can differ so heavily from word-to-word (some have (m), (m ), (m pl), (m/pl), typos such as (m0, and feminine variants etc). It would likely need a regular expression of some kind. Definitely something that should be done with multiple backups and a careful manual double-checking.

The Ortegam course is fairly old, and is almost certainly based on a 10+ year old Edexcel specification. There is a new draft specification being released by Edexcel on 13 July 2023 that might be of interest to you.

This is very helpful, thank you. I’d either update the course to better fit this format (adding additional words/levels if necessary, while still keeping as much of the existing content as possible, even if rearranged) or change the name to something along the lines of “Spanish GCSE Supplement” to make it obvious that it’s not up-to-date and shouldn’t be used as the sole means of studying.

Let me know if more commentary from me would be helpful.

Your answers have been helpful, thank you. I hope my responses here are also of interest to you as well. And let me know if anything I’ve said here sounds wrong or incomplete or if you have additional info you think would be of use.

ian_mn · July 12, 2023, 3:26am

For cases with more than one spanish item, the use of the comma as a separator stopped working several years ago. However, the use of a semicolon does still work and, when typing answers, allows the learner to choose to type any of the listed items or, alternatively, all of them.

Also, I just checked, and for ‘first teaching in 2024’ draft specifications, both Edexcel and AQA have produced spreadsheets containing their latest vocabulary lists that are currently available for download:

*July 2023 Languages Update | Pearson qualifications

*https://www.aqa.org.uk/subjects/languages/gcse/spanish-draft-8692/teaching-resources?f.Resource+type