38 messages over 5 pages: 1 2 3 4 5 Next >>
Teango Triglot Winner TAC 2010 & 2012 Senior Member United States teango.wordpress.comRegistered users can see my Skype Name Joined 5553 days ago 2210 posts - 3734 votes Speaks: English*, German, Russian Studies: Hawaiian, French, Toki Pona
| Message 26 of 38 09 September 2011 at 3:13pm | IP Logged |
I concur with Iversen that extensive reading, whilst very useful for learning and maintaining vocabulary, only forms part of the overall picture. For example, when I've trained language models on very large corpora in the past, they've been far from optimal for speech recognition and dialogue translation. This is because spoken language is very different to the type of language used in literature and news articles. When I later add even 10,000 words from a spoken corpus to my models, the results increase significantly.
I think it's also important to point out that television, films, music, Internet and radio, in addition to increasing our spoken vocabulary, all add to our expanding acoustic models and ability to deal with variation, not to mention our pragmatic and cultural understanding. This is more important than ever in these times of increased globalisation and travel, as languages and their various accents or dialects are in a continuous state of flux all around us. Including plenty of multimedia in my daily life over the years has enabled me, for example, to jump effortlessly from Rab's frustrations over the phone with automated helplines to Horsemouth's classic speech in the Jamaican comedy Rockers. Having access to a wide diversity of English is something I'm now really grateful for.
So whilst Professor Arguelles makes a valid point that there are generally more lower-frequency words in written material than speech, and acknowledging that novels offer a faster route to more input, I still think the best way forward is to aim for a synergy of multiple resources, and focus predominantly on having fun and developing a more rounded appreciation of the cultural variety and subtle creativity within language.
Edited by Teango on 09 September 2011 at 3:17pm
7 persons have voted this message useful
| sipes23 Diglot Senior Member United States pluteopleno.com/wprs Joined 4867 days ago 134 posts - 235 votes Speaks: English*, Latin Studies: Spanish, Ancient Greek, Persian
| Message 27 of 38 10 September 2011 at 12:32am | IP Logged |
kagemusha wrote:
I think one of the main points was that an active reading vocabulary is higher then an
active speech vocabulary. Therefore by reading, you are forced to extend your vocabulary
quicker then speech. |
|
|
I wonder. Obviously the most frequent, say 5-10 thousand words, are the same for everyone. Everyone says
"hear" or "see" or "foot". But as you move into less frequently used words, different people have different sets in
active use. In other words, you say "wayward" and I prefer "errant". My neighbor, in turn, uses "wandering." Or
whatever. Since you teach English, your students hear "wayward" but not "errant". I work with immigrants, they
hear "errant", but never "wandering." To further complicate matters, you might only use the word once in a given
month. So even if your English students knew me, I may never say "wayward" in front of them. As a native
speaker, you know lots of native speakers and thus are statistically likely to hear all three of those words at some
point or another. Your English learners eventually go back to their home country. The immigrant goes back to his
ethnic neighborhood. Neither one spends enough time with the language to make enough statistical difference—
and really, they've got a good enough command on the language for their purposes.
I'm guessing the people who hang out here are different. They *do* want to get enough exposure to make a
statistical difference, but don't want to spend years of conversational immersion to get those low-frequency
words. Extensive reading is the tool. You read lots of stuff by different people. And writers tend to like words—
why else become a writer?—so they use lots. (For example, Steadman's introduction to Herodotus says that H.
uses 4207 unique words in Histories Book 1. 2137 of those he uses only one time. The statistic was handy.) Five
books by five authors are likely to present more unusual low-frequency words than just daily conversation with
those same authors.
Again, just my suspicion, and I'm too lazy to do the footwork to back it up with data.
5 persons have voted this message useful
| Zwlth Super Polyglot Senior Member United States Joined 5223 days ago 154 posts - 320 votes Speaks: English*, German, Italian, Spanish, Russian, Arabic (Written), Dutch, Swedish, Portuguese, Latin, French, Persian, Greek
| Message 28 of 38 14 September 2011 at 6:03am | IP Logged |
He's put up the 2nd part of the talk:
Selecting Appropriate Texts for Expanding Vocabulary Range Through Extensive Reading.
It's just as well worth watching as the 1st part, if not even more so.
5 persons have voted this message useful
| sipes23 Diglot Senior Member United States pluteopleno.com/wprs Joined 4867 days ago 134 posts - 235 votes Speaks: English*, Latin Studies: Spanish, Ancient Greek, Persian
| Message 29 of 38 16 September 2011 at 2:28am | IP Logged |
Zwlth wrote:
It's just as well worth watching as the 1st part, if not even more so.
|
|
|
I'd say moreso. Here he shows us the vocabulary analysis tool. The problem, and it's big, is that the tool only has
databases for English. If you had the databases for another language, I think the program would chew through the
text just as well. If.
2 persons have voted this message useful
| sundog66 Tetraglot Newbie United States Joined 5007 days ago 6 posts - 8 votes Speaks: English*, Spanish, Mandarin, Esperanto Studies: Russian
| Message 30 of 38 16 September 2011 at 6:00am | IP Logged |
sipes23 wrote:
The problem, and it's big, is that the tool only has databases for English. |
|
|
What makes this problem especially insidious, I think, is that every single inflectional variant of every word in
every word family has to be explicitly listed for this software to work. For languages with even modestly
sophisticated morphology, such as for example Spanish with its verbs, I would think that this would make the
necessary word database intractably large. More sophisticated software, with automatic recognition of novel but
regular inflectional forms, is probably needed.
But for anyone who wants to use this software for Mandarin, you're in luck. (Well, as long as you're content to be
running the statistics on characters rather than "words".)
Here is a list of
9,933 Chinese characters in decreasing order of frequency. The characters here can be pasted into files in the
right format for use with the software Prof. Arguelles demonstrates. The only thing, though, is that before you
analyze a text, you need to insert a space between each character so that the software recognizes each character
as a "word" for its calculations. This can be done in sed with the following command:
sed 's/./ &/g;s/^ //'
Another nice thing about using the software with Mandarin is that because Mandarin has no inflectional
morphology to worry about, you don't even need a character frequency database to use the software to calculate
the total number of unique character types in a text. I think this is a useful statistic in and of itself in gauging the
difficulty of a text.
3 persons have voted this message useful
| learnvietnamese Diglot Groupie Singapore yourvietnamese.comRegistered users can see my Skype Name Joined 4946 days ago 98 posts - 132 votes Speaks: Vietnamese*, EnglishC2 Studies: French, Mandarin
| Message 31 of 38 16 September 2011 at 9:33am | IP Logged |
Thanks for the list of Chinese characters, Sundog66.
Sometimes, I'm pleasantly surprised to find that the language of programming is quite..."concise" :D
1 person has voted this message useful
| montmorency Diglot Senior Member United Kingdom Joined 4825 days ago 2371 posts - 3676 votes Speaks: English*, German Studies: Danish, Welsh
| Message 32 of 38 19 September 2011 at 1:54am | IP Logged |
learnvietnamese wrote:
Thanks for the list of Chinese characters, Sundog66.
Sometimes, I'm pleasantly surprised to find that the language of programming is quite..."concise" :D |
|
|
Technically, that is the language of "regular expressions".
A gentleman called Jeffrey Friedl wrote an excellent book about them:
"Mastering Regular Expressions".
3 persons have voted this message useful
|
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 0.3750 seconds.
DHTML Menu By Milonic JavaScript
|