Reading Tutor and Diminishing Returns

First of all check out this website:
http://language.tiu.ac.jp/index_e.html

Copy and paste (or write) some Japanese text, it will analyze the text and provide vocabulary lists and translations, handy little tool.

Lately I have been analyzing articles from various native Japanese websites and looking through the numbers, and I find the numbers very interesting.

The numbers up next are from NHK news website (not the easy version), the latest articles vocabulary list generated follows a rough pattern that goes as follows:

N5 --------------- 50%
N4 --------------- 10%
N3+N2---------- 20%
N1+級外 ------- 20%

Further more, the tool refers to the article as 難しい, now what does this mean?
It shows how significant the drop in time spent studying to the percentage of article coverage when approaching upper intermediate level.

Studying the first 100 Kanji and few hundred vocabulary words (N5) covers 50% of advanced native articles, in contrast studying the the (N1) Kanji (over 1000) and vocabulary (several thousands) yields roughly 20%.

In my opinion the point of diminishing returns is N2 Level (around 1000 Kanji and 6000 Vocabulary), after which, language skills can be developed with minimal frustration (roughly 20% dictionary usage), further reading and listening to native media will cement the core acquired in studying, while being an ongoing uphill grind against the advanced level.

Back to studying!

I don’t disagree with the statistics, but I disagree with your interpretation and conclusion.

Imagine if you had to resort to the dictionary 20% of the time when reading the newspaper in your native tongue. Would you be satisfied with that level of unfamiliarity with the words you were reading? That’s 1 in 5 words that you wouldn’t understand. And if the average sentence is 10-15 words long, you would find yourself not knowing 2-3 words per sentence. I doubt that anyone would be able to comprehend the finer points of an article with that much information missing.

Personally, I wouldn’t be comfortable with less than 97% sight-reading vocabulary recognition rate, so I would suggest continued and focused study until one attains that level.

But having said that, everyone has their own goals for learning a language, and if you just want to understand what an article is talking about, in general terms, then that’s fine too.

Here’s a link to the official JLPT web-page that talks about what each level of proficiency implies, in terms of how well you will be able to function on oral and written communication tasks.

http://www.jlpt.jp/e/about/levelsummary.html

3 Likes

I would certainly love to have knowledge that is as close as possible to 100%, I wouldn’t stop studying either, the threshold I suggested is not a stopping point.

It is my suggested threshold for to kick start developing comprehension with minimal frustration.

The advanced level amount of material is absolutely staggering, that is why I am partial toward developing good comprehension skills and picking up knowledge from context and extended reading and listening, that is the goal after all.

What is your goal, memorizing the dictionary?

I agree with your goal of “developing comprehension with minimal frustration”, but I don’t agree that 80% vocabulary coverage is anywhere near enough to be able to consider yourself at the point where you can start to do that. Having to look up every fifth word in the dictionary when reading a daily newspaper is not “minimal frustration”, in my opinion.

Studies of language and vocabulary development show that the average 4-year old child knows about 5,000 words, and an 8-year old knows 10,000 words, so if one only knows 6,000 words, they would be at the level of a 5-6 year old child. I think that a 5-6 year old child would find reading the daily newspaper a very frustrating and unproductive experience.

The average adult has a vocabulary of anywhere from 20-35,000 words in their native language, and most college-level single-volume dictionaries have something in the order of 60-100,000 headwords.

So, if I were seriously studying for proficiency in a new language, my goal would not be to memorize the dictionary, but in order to be able to read and expand my vocabulary naturally, using contextual clues and daily interaction, I would want to start with a solid base of at least 50% of the vocabulary of a native adult, that is, at least 10,000 words. Once I reached that point, I would consider contextual learning minimally frustrating.

3 Likes

What is your level right now? I am only N2 Vocab / Kanji with only N3 grammar, but still I am enjoying many native materials that I also benefit from such as video games and manga.

But perhaps you are right, I am sure everything would be more fun if I can recognize those N1 Kanji faster (I finished Heisig though… so its not so bad)

I am interested in this course:

The actual problem with that chart is that it’s only looking at unique words then rating those. What actually happens in those N5/N4 words that represent 50% of unique words are likely used 90% of the time due to being used multiple times. The N1+ words that represent 20% of unique words are probably used like 1% of the total word count. Obviously I don’t know the real numbers in this case.

Programs like “Japanese Text Analysis Tools” give you both a ranking and frequency use so you can get a better idea how words are used and gauge difficulty.

1 Like

The tool shows a row for each of % out of total word count and % out of total unique words.

Here is the data from today’s top NHK article:
韓国駐在の長嶺大使ら一時帰国へ 少女像設置で対抗措置
1月6日 11時28分
http://language.tiu.ac.jp/result/jtool/EB21B5DF.html

The N5 % actually went the opposite to what you said, out of total word count it is 252/610 (47%) and out of unique word count it is 68/234 (30%).

Ah, I see now. Nice program on that site. I tried using a drama transcript for a Japanese Drama Immersion course I created for Memrise. The results were interesting: http://language.tiu.ac.jp/result/jtool/E085E97B.html

The 級外 definitely should not be merged with N1 for stats. It just happens to not be categorized with existing JLPT vocab lists. Also, best not to rate small articles with it instead give it a larger collection so you have a better overall idea on the type of articles that get posted there.

On your actual point about diminishing returns, you’re right. Structured studying of vocab lists have a huge benefit early in Japanese studies, but later it’s best to learn new words via context of material you read. The JTAT program can help you find new words in text to front load learning prior to reading it in context which can help a lot.

1 Like