Register  Login  Active Topics  Maps  

How many English words do you know?

 Language Learning Forum : Specific Languages Post Reply
21 messages over 3 pages: 1 2
mrwarper
Diglot
Winner TAC 2012
Senior Member
Spain
forum_posts.asp?TID=Registered users can see my Skype Name
Joined 5226 days ago

1493 posts - 2500 votes 
Speaks: Spanish*, EnglishC2
Studies: German, Russian, Japanese

 
 Message 17 of 21
30 January 2014 at 10:15pm | IP Logged 
patrickwilken wrote:
That's exactly why SDT was developed. The simple intuitive idea that you either know something with either 100% or 0% is just not plausible. SDT assumes informationally noisy internal representation.
[...]
SDT was developed to try to try to develop a pure measure of sensitivity. It's not perfect, but it's a lot better than a simple percentage correct.
[...]
another thought is that this test is really about estimating your ability to distinguish between two stimuli classes ('words' and 'non-words')
[...]
It might be for that people score highly not because they know all the 'words', but that their internal model of what an English word is really good (or at least really good at distinguishing between the non-words on the test).

I haven't taken the test, but from what I've read, I think that people get too carried away with their mathematical mumbo-jumbo before getting the basics right. If we go to some levels of refinement, the problems with defining what is or isn't a word are not negligible either, and they may add more noise to measurements. Or so I think.

Let's start by saying 'words are in the dictionary, and non-words are not'. The 'problem' with words is people can guess and say something is a word even if they're not sure. Non-words are tested to compensate lucky guesses in finer measurements. But there are different kinds of non-words in real life, and people will respond to that in different ways.

Random strings are likely to produce zero false positives in non-words, sure. Random syllables that appear in English, combined in ways that don't appear in any dictionary? Maybe, maybe not. But presenting 'non-words' that are actually understandable is opening a can of worms. In real life, people use words meaning something different from what the dictionary says all the time, and they get away with it as long as they are understood and they don't bump into a 'dictionary nazi'. For the same reason, people make up words with varying degrees of success, which hints at how words and non-words are not that clearly disjunct sets in real life. Is this taken into account in the test?

Are rare and obsolete words to be discarded as non-words? How would the test taker know? And if they did, how would they know which rare words should be marked as non-words? Most importantly, are test takers supposed to react to non-words in the test in the same way they would in real life? After all, in real life, test takers would only 'discard' what isn't understandable or whatever grates their ears.
2 persons have voted this message useful



josepablo
Tetraglot
Senior Member
Portugal
Joined 3990 days ago

123 posts - 141 votes 
Speaks: German, Spanish, Dutch, Portuguese
Studies: Russian, Mandarin, Turkish

 
 Message 18 of 21
31 January 2014 at 12:43am | IP Logged 
First try: I scored 90% of the existing words, 0% of nonwords.
Second try: I said yes to 91% of the existing words.
                  and yes to 3% of the nonwords.
    "This gives you a corrected score of 91% - 3% = 88%."
I said yes to the nonword douseness, which sounded quite convincing to me.
The 6 words I did not know:
agenesis, agnatic, angostura, disodium AND two I did know: rocketry and arterially. Don't know why I pressed F, must have been getting bored.
1 person has voted this message useful



dampingwire
Bilingual Triglot
Senior Member
United Kingdom
Joined 4665 days ago

1185 posts - 1513 votes 
Speaks: English*, Italian*, French
Studies: Japanese

 
 Message 19 of 21
31 January 2014 at 1:24am | IP Logged 
josepablo wrote:

The 6 words I did not know:
agenesis, agnatic, angostura, disodium


The first three are specialised terms. disodium isn't really a word, just something
that crops up in chemical names ... some of those part-names I'd count and some not
(dioxide I'd count, but I don't remember ever seeing disodium except in a chemistry
lesson).

I just had a go and got 87/0. The ones I failed to identify as English words were legal
terms, plants a drug and provenience (apparently a US variation of provenance). I
noticed a fair number of chemical terms in the list that I did know. If I'd swung
towards the arts at school that would have significantly hampered me on this test (at
least for the words that came up for me this time).

I took the longest over "seeped" (3.482s) and the quickest was "son" (0.662s). There's
clearly some kind of competition waiting to spring to life here :-)



1 person has voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6703 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 20 of 21
31 January 2014 at 2:44pm | IP Logged 
I have followed the link to the d'-test, but on the machine where I tried it out the d'value always turned out to be 0.000 - maybe the test is java version dependent? And the mathematical intricacies of information theory are beyond me. But I have grasped the general idea - that you shouldn't be able to optimize your score by either avoiding guesswork or guessing wildly. And some of the remarks in the research on the page which patrickwilken referred to indicate that you check the guessing-propensity of a testee on one side by comparing his/her scores to scores based on totally random guesses and on other other on a 100% perfect knowledge of the correct answers (correct answer = a word on the list compiled by the researchers, whether it is in practical use or not).

The central problem is that the notion of 'existing' isn't totally black and white. Some of the fictive words in the ugent test look like simple variations on existing words, and if you used them with people who knew the relevant existing words they would understand your neologisms. Actually I often find words in texts which aren't in my dictionaries, and many of these are probably invented by the author. As I have written earlier I operate with a 'guessable' group in my latest word counts, and my goal should then be not to include non-existant words in the "exists" group - but it would be a bad sign to leave the "guessable" group empty.

My own vocabulary estimates (last updated in Nov. 2013) are based on a selected pages in standard dictionaries, which means that there isn't a testgroup with constructed words which could be used to test the validity of the numbers. But now that I have a "guessable" category it turns out that it much smaller than the "known" and "unknown" categories - in other words: I mostly do know whether I know a word or not. And I have just checked the numbers: my own tests also however around 50% known words in Dutch, but in English my estimates from 2012-13 have an average around 69% (spanning the interval 64 to 78%), and I count 8% guessables (insofar such a word exists). But from four earlier sessions I got an average of 73%, which however covered a span from just 31% to 92% - so it seems that having a special box for "guessables" has stabilized my scores. And funnily enough my own estimates in English lie significantly below those I got with the online test.


Edited by Iversen on 31 January 2014 at 2:51pm

1 person has voted this message useful



Hungringo
Triglot
Senior Member
United Kingdom
Joined 3988 days ago

168 posts - 329 votes 
Speaks: Hungarian*, English, Spanish
Studies: French

 
 Message 21 of 21
31 January 2014 at 2:47pm | IP Logged 
You said yes to 81% of the existing words.

You said yes to 0% of the nonwords.

This gives you a corrected score of 81% - 0% = 81%.

You are at the top level!

In theory this means that my English vocabulary is close to 50 000 words.

Edited by Hungringo on 31 January 2014 at 2:57pm



1 person has voted this message useful



This discussion contains 21 messages over 3 pages: << Prev 1 2

If you wish to post a reply to this topic you must first login. If you are not already registered you must first register


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.2813 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.