Register  Login  Active Topics  Maps  

Basic Vocabulary (De Mauro)

 Language Learning Forum : Philological Room Post Reply
19 messages over 3 pages: 1 2 3  Next >>
Emme
Triglot
Senior Member
Italy
Joined 5141 days ago

980 posts - 1594 votes 
Speaks: Italian*, English, German
Studies: Russian, Swedish, French

 
 Message 1 of 19
29 August 2011 at 10:34am | IP Logged 
Two words of introduction before I start. First, I know that on this forum this topic has been discussed ad nauseam, but I’d like to share with you a different approach which I’ve found interesting. Second, even though this post will focus in some detail only on Italian vocabulary, I believe the theories expounded here can be applied to other languages as well.

Some time ago I heard of an Italian linguist, Tullio De Mauro, but only recently I’ve had the time to read his work. I’m referring to

Tullio De Mauro, Guida all’uso delle parole: Parlare e scrivere semplice e preciso per capire e farsi capire (Roma: Editori Riuniti, 1980)

This book, whose 12th edition was published in 1997, is the basis for the GRADIT (the 8-volume Grande Dizionario Italiano dell'Uso) and the more manageable children’s dictionary DIB - Dizionario di base.

De Mauro has studied the Italian corpus to find out how many and which words constitute the basic Italian vocabulary (more about that at the bottom of this post). His main purpose is to improve readability of texts in a country where writers often seem to pride themselves in being obscure and difficult (and I’m not talking only about those using ‘burocratese’ or legalese). Leaving aside this problem, which may or may not be relevant for second language acquisition, I believe that what is really interesting is how he tackled the issue.

Back in 1980, in the appendix to Guida all’uso delle parole De Mauro published a first preliminary list of about 6700 words which he supposed to comprise the basic vocabulary of Italian: i.e. the words that any Italian with at least 8 years of schooling would master. That does not mean that such a person would know only 6700 words, but that these words would represent the common ‘core of concordance of the idiolects’ of people who have passed the ‘scuola media’ (middle school) exams no matter their region of origin or their social class.

Moreover, this basic vocabulary was divided into three categories: the core vocabulary was made up of 2000 words which Italians were expected to know after elementary school (5 years of schooling). 2900 words were high frequency words and finally 1800 words were deemed as highly available words (see definition below).

Over the two following decades De Mauro and his team have kept studying the corpus (paying attention both to frequency and dispersion of words) and what Italians of different social ranks and regional origins actually use and/or understand and have come up with several categories for the words in the GRADIT:

- fundamental words (‘lessico fondamentale’): 2049
- high frequency words (‘lessico di alto uso’): 2576
- highly available words (‘lessico di alta disponibilità’): 1897
- common words (‘lessico comune’): 47.060 i.e. known by most Italians no matter the socio-cultural class or the region of origin
- literary words (‘lessico letterario’): 5000 i.e. words in the literary canon from 1200-1900
- obsolete words (‘lessico obsoleto’): 13.000
- low frequency words (‘lessico di basso uso’): 22.000
- regional words (‘regionalismi’): 5000
- dialect words (‘dialettismi’): 338
- foreign words (‘esotismi’): 7000
- technical words (‘lessici tecnici’): > 100.000

Before we get bogged down on the debate: ‘What is a word?’, let’s say that here word means a lemma, i.e. an entry in the dictionary so that, for instance, adjectives are represented by the singular, masculine form (‘rosso’: red, but if you are learning Italian you need to know all the forms ‘rosso / rossa / rossi / rosse’), and verbs are represented by the infinitive (‘essere’: to be and not ‘sono / eravamo / foste / etc’).

Let’s now focus on the core vocabulary of the language. The first three categories mentioned above (fundamental, high frequency and highly available words) and comprising almost 7000 words constitute the basic vocabulary of the language and are estimated to cover about 98% of any text.

Fundamental words alone make up 90% of texts and speech and are the words that are essential when speaking a language. Examples: ‘il’ the; ‘mangiare’ to eat; ‘casa’ house; ‘buono’ good; ‘questo’ this.

High frequency words add another 6% and as their name implies, they recur frequently in written or spoken text. Examples: ‘assaggiare’ taste; ‘campana’ bell; ‘giornaliero’ daily; ‘selvaggio’ wild.

Finally, highly available words represent only 1-2% of the language corpus. This category is probably the most interesting, at least from my point of view. Here we find words that are only rarely written or actually spoken, but that are very present to anyone’s mind as they refer to common objects and basic ideas: ‘portafoglio’ wallet; ‘nuotare’ swim; ‘infermiere’ nurse; ‘asciugacapelli’ hair dryer; ‘inconsolabile’ inconsolable. De Mauro maintains that by their nature these important words ‘risk being excluded from frequency lists’.

Why do I find this category particularly interesting? Firstly, because I don’t remember hearing about it before, and secondly because at last I’ve realised that it is my proficiency (or lack thereof) in this category that informs my entire relationship with the languages I’ve learned so far. I’m referring to English and German.

I don’t want to bore you excessively so I’ll try to describe how I’ve learned these two languages as quickly as possible. I started English at school when I was about 10; I fell in love with it and when I was a teenager I attended three years of evening classes and took the FCE. Both at school and at evening classes I’ve worked with traditional – and on this site often maligned – textbooks. They accompanied me to the B2 level and I’ve felt at home with English ever since. Later I went on to university to study language and literature and I’ve never had a problem.

Enter German. I started university with poor basics. For the first two years we didn’t have a textbook but were told to use a grammar book (for autonomous study) and in class we worked with the language assistant’s handouts, which consisted mainly of photocopies of magazine and newspaper articles. We also had to read literature in the original for the literary courses we had to follow. By the way, looking back I now realise there wasn’t much difference between the teaching approach in the English department and in the German one: it was just me, who was a different person in English and in German.

Finally, for third-year German we used a textbook (at CEFR level C1) and a couple of other books designed specifically for the preparation of the then C1 proficiency exam (ZMP). I passed the exam with excellent marks, so I’m supposed to know German at C1 level (ok, maybe now it’s a bit rusty!). And yet I sometimes feel I represent a paradox: an advanced learner unable to function in the language at a lower level (I sometimes doubt whether I could pass a B2 or B1 exam or whether I could have passed one even back then).

Putting self-esteem problems aside, the crux of my argument is that I’ve never reached the point where I feel at ease with the language. Even now, when I’ve almost forgotten the stressful university environment and can judge my languages with a more impartial outlook, if I think of German I feel overwhelmed.

I’ve come to believe that all my anxiety regarding German originates in never having mastered the highly available words of that language. As the definition of this category implies, these words are not easy to pick up even with extensive reading and listening because they don’t occur often either in text or speech and yet they represent objects, ideas, and concepts that help us make sense of the world around us.

When I was a teenager I learnt the English names for pieces of cutlery. Over the years, how many times have I read or heard the word ‘fork’? Maybe hundreds of times. How many times have I written or said it? Maybe dozens. Yet, the fact that I’m aware that I know this not incredibly common word gives me the feeling that I can grasp reality even through a second language. By reading extensively in German, I’ve passively learnt about ‘Gabel’, ‘Messer’, and ‘Löffel’, but not mastering these simple words means that even a kitchen can transform into a mental minefield if I try to navigate it in German.

Now I understand that if I want to get out of this impasse, I need to actively learn this kind of vocabulary. And since it’s difficult to acquire these words (given their relative infrequency) in authentic materials, I’m probably best off heading back to lower-intermediate/upper-intermediate textbooks where this kind of words are usually taught systematically. What I find somewhat ironic is that many learners can’t wait to get rid of textbooks to move on to authentic materials exactly because they find that textbooks present low-frequency vocabulary they have no use for in the real world.

I’m not implying that they are wrong and everybody should go back to textbooks and give up on authentic materials, as I may be the only person facing this problem with highly-available vocabulary, but I believe that if others feel that there’s something wrong with the progress they are making, it’s probably worth it to find out if this can be the source of the problem.

Sorry for the exceedingly long post.


P.S. I’ve read the 1983 edition of Guida all’uso delle parole, but I’ve searched online for the most recent numerical data. That’s why there may be some discrepancies in the figures I reported.


P.P.S. Learners of Italian may be happy to know that the publisher Paravia has made the almost 7000 words of basic vocabulary of the DIB - Dizionario di base della lingua Italiana (according to the De Mauro principles) available here:
- fundamental words
- high frequency words
- highly available words

Edited by Emme on 29 August 2011 at 10:36am

20 persons have voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6497 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 2 of 19
29 August 2011 at 12:39pm | IP Logged 
You are definitely not the only one who faces the problem with highly-available vocabulary, but I doubt that a return to textbooks will solve it. If you want to learn about things in a kitchen then find some interesting books about kitchens and read about them. I'm a great fan of wordlists, but only those I make myself, and working through a chapter or two about kitchens while noting down all relevant new words will teach you a lot about things found in a kitchen - though there will always be a few rare objects whose names won't be found in a given source (time will tell whether you ever will need to learn these items).

A text book is mostly a bland mixture of common words and uncommon words which just slipped into the text - better go directly for the kind of 'extra' vocabulary you want to learn in genuine texts about that subject.

De Mauro's categories make sense, and it is a very interesting observation that there are mid to low-frequency words which you should know and others which you can ignore for the time being. I would however have expected to find more words in the first group - 1897 "highly available" but uncommon words isn't much. I would include every object owned by a normal person in this category, plus all kinds of institutions and laws and job types and things you can buy in your local supermarket. I don't remember when I last said "skraldemand" in Danish (renovation worker) - probably last time they striked. But I definitely want to know the word for him. And the names for different cereals and herbs and fish and gardening tools, and I would expect to find more than 2000 of those names,

When I make wordlists directly from dictionaries (in addition to those I make with words from the texts I study) my main criterion is my gut feeling: is this a word I really would like to know? And my preferences are of course based on my interests and occupations, so they may differ from the of other learners. But one of the purposes of this activity is precisely to remind me of words which I would like to know, but which for some reason didn't pop up in the texts I have studied.


Edited by Iversen on 29 August 2011 at 5:29pm

4 persons have voted this message useful



Emme
Triglot
Senior Member
Italy
Joined 5141 days ago

980 posts - 1594 votes 
Speaks: Italian*, English, German
Studies: Russian, Swedish, French

 
 Message 3 of 19
29 August 2011 at 4:42pm | IP Logged 
@Iversen
About the textbooks: I suppose that using them may offer some sense of security to a person lacking in confidence like me.

Apart from the rather low number of highly-available words in the list (which I believe depends very much on the situation of Italian*), I find the real epiphany for me was understanding that there are words that are more important than others for reasons other than frequency of use.

People may not really need them to read a text or to watch a movie (or to write a dissertation), but they are necessary to grasp everyday life. It’s very likely that not every learner has the psychological need to master these words because s/he doesn’t meet them often enough in real life to make them worth memorizing, but other people’s minds (like mine) work differently. Subconsciously, I need those words to filter reality (there are theories that say that most forms of thought are based on language and that you need language to think). I wonder why I never realized that before and why the issue isn’t more widely discussed. Am I normal as a language learner? Am I finding problems where none exist?



*Italian suffers quite a lot from being a rather ‘recent’ national language (and mainly a written one to boot). Most people still speak a dialect or a regional variety at home, so it’s quite understandable that highly-available words denoting everyday objects or ideas for most Italians are probably in their idiolect. That limits the number of words which researchers can confidentially say are known to every Italian with a modicum of education under his/her belt.

1 person has voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6497 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 4 of 19
29 August 2011 at 5:38pm | IP Logged 
Emme wrote:
Most people still speak a dialect or a regional variety at home, so it’s quite understandable that highly-available words denoting everyday objects or ideas for most Italians are probably in their idiolect.


If the Italians really use regional terms for everyday objects then you would expect to see those terms in de Mauro's lists as regionalisms - but he has identified less than 5000* regionalisms. This is still somewhat mysterious.

But the conclusion is clear: you can use frequency tables to identify the most common words, but after the first 1000 or 2000 words the frequency isn't the most important thing to know.

* number corrected

Edited by Iversen on 30 August 2011 at 12:33am

1 person has voted this message useful



s_allard
Triglot
Senior Member
Canada
Joined 5224 days ago

2704 posts - 5425 votes 
Speaks: French*, English, Spanish
Studies: Polish

 
 Message 5 of 19
29 August 2011 at 7:19pm | IP Logged 
What an interesting OP. It illustrates once again the statistical nature of (possibly all) vocabulary systems. We notice that approximately 2000 "word" represent 90% usage of the corpus. Another 2500 words represent 6% and another 2000 represent 1-2% (the highly available words). This is something that we have all seen before. One could easily surmise that probably something like 300 words make up 50% of the usage in the corpus - (I'm just guessing, of course).

When we look at these figures, we must keep in mind that they represent a broad sample of usage and users. This does not mean that these figures reflect the usage of any one individual. To talk about a wide range of subjects, one needs a wide vocabulary, including the "highly available words" and even more technical words. At the same time, one can have a perfectly coherent conversation without ever using many words that are not relevant to the subject at hand. For example, since I don't have a garden, I would be very hard pressed to name in my native language some simple garden tools and plants that any gardener would know. Many people may not know the names of certain parts of a bicycle for example. This applies even more acutely in a foreign language where we can be very proficient in general but completely ignorant of common items like the contents of the kitchen and the bathroom.

We see this time and time again when foreigners are speaking our language very well but trip up on very simple or mundane subjects where they do not master the relevant vocabulary because they simply were not exposed to it. This is where correction by a native speaker is so important. This also why it is so important to listen to how native speakers speak to us. You have to clue into the vocabulary being used and how it used.
1 person has voted this message useful



Emme
Triglot
Senior Member
Italy
Joined 5141 days ago

980 posts - 1594 votes 
Speaks: Italian*, English, German
Studies: Russian, Swedish, French

 
 Message 6 of 19
29 August 2011 at 10:48pm | IP Logged 
Iversen wrote:

If the Italians really use regional terms for everyday objects then you would expect to see those terms in de Mauro's lists as regionalisms - but he has identified less than 1000 regionalisms. This is still somewhat mysterious.
[...]


Sorry if I wasn’t very clear in my explanation of ‘regionalisms’ and ‘dialect words’.

In the GRADIT there are about 5000 regionalisms and 338 dialect words. Regionalisms are defined as having their origins in a dialect and being now used in one regional variety of standard Italian (I know this sounds like an oxymoron, but in Italy there is such a thing as standard Italian with regional influences. Just think of the 'passato prossimo' vs 'passato remoto' to speak about the past).

The 338 dialect words are felt by native speakers to belong to a dialect but nevertheless they appear in the occasional ‘standard Italian’ text. I can imagine, for instance, that future dictionaries might contain the Sardinian word ‘accabadora’ after Michela Murgia’s novel Accabadora which won the prestigious Campiello literary prize last year.

One typical example of regionalism is the word ‘anguria / cocomero’. In northern Italy the watermelon is known as ‘anguria’. In central and southern Italy the same fruit is called ‘cocomero’ (a name, by the way, that in my northern dialect is actually used for ‘cucumber’). In any dictionary (not only the GRADIT) you can find both ‘anguria’ and ‘cocomero’. Not only are they both acceptable in standard Italian, actually they are both correct. You only need to remember that the first one is more likely to occur when the speaker / writer is someone from the North and the second one when s/he is from the South.

I presume that a word like ‘watermelon’ can be considered a highly-available word. After all we all know what fruit it is and we may think about it quite often, especially on hot summer days. Yet it’s not likely to recur very often in literature, in magazines, scripts or other texts, so it’s unlikely to appear among high-frequency words. But when a researcher has to classify this word, s/he can’t actually ascribe it to the highly-available words known to all Italians with at least 8 year of schooling because most of them will actually know and use just the one term for ‘watermelon’ common in their area.

That’s why neither ‘anguria’ nor ‘cocomero’ can be part of the list of highly-available words for native speakers. With either one of them, half the population would simply don’t recognize the term as a word they are likely to use (even just mentally).

I suppose that there must be hundreds or even thousands of similar words in a language like Italian where local varieties are still very important. And that’s why the list for highly-available words for all native speakers of Italians has just about 2000 items.

The rest of the highly-available words each speaker possesses probably don’t belong to any variety of acceptable ‘standard Italian’ (but to the dialect or idiolect of the single person or small group of people) and so they don’t belong in a dictionary. At least that’s my hypothesis.


Edited by Emme on 29 August 2011 at 11:01pm

5 persons have voted this message useful



Andrew C
Diglot
Senior Member
United Kingdom
naturalarabic.com
Joined 4984 days ago

205 posts - 350 votes 
Speaks: English*, Arabic (Written)

 
 Message 7 of 19
29 August 2011 at 11:09pm | IP Logged 
Emme wrote:
Am I finding problems where none exist?


I think you are!

To me, these "highly available" words are not important. If they are hardly ever used, why worry about them? For example, I think it is very easy to live without knowing the word "hairdryer".

Something that interested me on your list was the 47,000 common words that everyone knows. I think that if this shows anything, it shows what a truly mammoth task learning vocabulary is and that we should therefore focus on what we need, rather than trying to learn everything. I seems especially fruitless to learn words that are hardly ever used.

It's very interesting you feel more comfortable with English than German and maybe it is for the reason you said. But it might also be that English is a lot closer to Italian than German is (because of the French influence on English).

I would also say that it is very difficult to learn words just by reading or listening - you can't deduce what a "fork" is easily from the surrounding words in a text. Of course if you were in an actual English kitchen with someone asking you to "pass the fork", learning would be so much easier.

1 person has voted this message useful



s_allard
Triglot
Senior Member
Canada
Joined 5224 days ago

2704 posts - 5425 votes 
Speaks: French*, English, Spanish
Studies: Polish

 
 Message 8 of 19
30 August 2011 at 7:49am | IP Logged 
I think there are some methodological issues that have to be clarified when one looks at vocabulary frequency lists. When one reads that the individual with 8 years of schooling would master at least 7000 words, there is a bit of fallacy here. This does not mean the all individuals have to master 7000 words. First of all, you have to distinguish between available or passive vocabulary, in the sense of words that one understands and could use and active vocabulary, the words that one actually uses. For example, to read any major newspaper over the course of a month requires a large vocabulary. But most people only use a small portion of the words they read.

Secondly, and perhaps more importantly, this statistical average of vocabulary size reflects the summed differences of smaller individual active vocabularies. Remember that in any language there is a very tiny core - in the area of 200 to 300 or even less - of word families that make up 50 to 60% of all usage. In French for example, three verbs make up around 20 to 30% of conversational verbs. We can assume that these words are common to all speakers. Then, as we know, as our vocabulary expands the incremental percentage of usage gets smaller.

All these corpus studies are based on large samples of texts or speakers. Any one text or speaker can have a relatively small vocabulary range. However, beyond that core vocabulary set, the actual words used can vary considerably from one speaker to the other. In other words, if ten speakers each have a vocabulary of 1000 active words, 800 of which are common and 200 are unique to each speaker, the total vocabulary range is 2800. These are the number of words for 100% coverage of all the words in the corpus of these 10 speakers. But a given speaker only uses 1000 words. So, to read a newspaper from cover to cover over a week, you might need 10,000 words, but in fact in your daily life you may never actually use more than 1000. Indeed, there may be certain common words that you may never use because they are not part of your universe.

Active vocabulary size is closely related to education and work. We can assume that all adults in a given society have a basic lexical repertoire of words related to things like food and eating, the body, personal hygiene, the home, transportation, illness, etc. Beyond that, one's active vocabulary is closely related to one's profession. And for that very reason retired people will lose the work-related vocabulary because they no longer use it.

Let's take the example of an employee in a large hardware or renovation store. That person has a technical vocabulary that is at least 10 times what I have when it comes to things related to home renovation. The same thing with an employee who works in the automobile part department of one of our large stores called Canadian Tire. Since I don't have an automobile, my knowledge of automobile parts terminology is nothing compared to that of the salesperson who is continuously looking at a computer database of car parts with thousands of entries.

This is exactly why I believe that most people can function in everyday conversations with very small vocabularies. But when you sum all these small vocabularies you get large "average" vocabularies that are misleading. Just a last example that may be more relevant here. If I look at the vocabulary of my 760 odd posts here at HTLAL, my vocabulary is not that wide, alas. In fact, it's rather limited. I pretty much keep using the same words. Whatever the actual figure is, to read all the other posts here you need a vocabulary larger than what you need for just my posts.





Edited by s_allard on 30 August 2011 at 2:13pm



2 persons have voted this message useful



This discussion contains 19 messages over 3 pages: 2 3  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.3750 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.