I made this Google Sheets add-on to get pinyin, zhuyin, simplified, traditional form and definitions easily. You can easily look up 50 words at once with a single drag.
I tested your addon using the entire set of Simplified Chinese HSK words
I attempted to retrieve the pinyin for each word and found the following issues:
After roughly 1000 requests it throws you the following error:
āError: Forbidden access. You have reached the daily quota, please try again tomorrow. (line 42).ā
Looks like Google sheets is fundamentally not the best place to run scrapers from.
In the 1000 word sample, it only managed to find the pinyin in 40% of the cases
It is probably because wiktionary prioritises traditional characters and has empty entries for simplified ones. If your program fails to find the info on the original entry page, it must attempt to request the āsee [traditional word]ā url and try again on the other page.
Cannot use the function in a cell that has content on the adjacent right, as it is trying to copy the wiktionary message.
If I were you I would remove the wiktionary message
Hi, thank you for your feedback. All issues are expected.
This is not the limit of Wiktionary. In fact, the add-on does not connect to Wiktionary servers at all. I placed the daily limit (1000 requests) because I expected that this limit is good for normal use cases. Sending 1000 daily requests per user to my current database is a lot compared to other normal websites (it may crashes my server due to database bottleneck if users keep sending requests). As I get enough fund from Patreon, Iāll upgrade the database so I can increase the limit.
This is one of the known issues Iām going to fix soon. Youāre right. You cannot look up anything for simplified form yet because Wiktionary tries to reduce duplicate information by referring to traditional form. But this is an easy fix.
The last column is attribution. Itās required when sharing copyrighted data under CC-BY-SA 3.0. You can delete the column for personal use. If you distribute the data like I did, you just need to include the attribution (or the copyright notice) somewhere. Otherwise, you may get sued by Wiktionary contributors/authors.
Oh I was assuming it was a limitation from Googleās side.
You mean you downloaded the entire wiktionary onto your server? Why not just send direct requests from the user to wiktionary to avoid these issues? (I just wrote a Python scraper for pinyin that does that but for Baidu dictionary)
Yes. Wiktionary makes it easy to download their whole databases. There are 3 reasons why I did that
Iām not parsing their HTML pages. Iām parsing their page contents which come from their databases.
Itās more performant. The data are already parsed and stored on my database. So it just needs to query my database instead of parsing the pages all over again for each requests.
Do analysis. When I have those data, I can connect them. For example, group all words by their categories.
Honestly, I donāt like using the computer for work, but I use Google Sheets, and it is the most useful app.
Recently I found an interesting function that allows importing data information from any software to google sheets. You canāt even imagine how much time does this function saves me.
If you also use Google Sheets and would like to find out how to import data information into google sheets, then check this guide https://www.coupler.io/integrations.
I hope you will use this function. Stay safe and have a wonderful day!