Grammar Frequency (General discussion) Language Learning Forum

Grammar Frequency
Tags: Grammar
Share with: Delicious Digg reddit Facebook StumbleUpon
Language Learning Forum : General discussion

DaraghM
Diglot
Senior Member
Ireland
Joined 6151 days ago
1947 posts - 2923 votes

Speaks: English*, Spanish
Studies: French, Russian, Hungarian

Message 1 of 8

07 September 2012 at 1:54pm | IP Logged

Does anyone know if the frequency of grammatical concepts has ever been derived for various languages ?

In Spanish, the most frequent word is 'que'. However, this doesn't tell me about its actual usage. Is 'que' more likely to occur as a relative pronoun by itself, or combined as 'lo que'. Similarly, 'de', has numerous usages but what is the most common, expressing possession or in conjunction with verbs. Is the -ía ending for imperfect verbs more common than -aba ? Is the imperfect more common than the preterit ?

In Russian, what are the relative frequency of the cases, and the frequency of the case endings ? Is the -у end for verbs more common than -ю E.g. я иду (I go), я читаю (I read)

I think an invaluable resource for language learners would be a frequency list of grammatical concepts. Most grammar books are either too brief or too detailed to learn from efficiently; but if the concepts were ordered by frequency, it would provide the greatest coverage in the shortest time possible.

What are your thoughts ?

Edited by DaraghM on 07 September 2012 at 1:55pm
1 person has voted this message useful

Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6703 days ago
9078 posts - 16473 votes

Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

Message 2 of 8

07 September 2012 at 3:32pm | IP Logged

I made a complete typological survey of subordinate phrases in Modern French as part of my final dissertation when I studied that language, and I remember that only one other person (whose name I don't remember now) had made something similar and published it in a book - I could probably find the name in my dissertation if need be.

As rebellious as ever I had proposed my own reclassifications of the phrase types, and then I went through exactly 10.000 pages of literature (representing different genres and styles) and marked all subordinates and some related constructions without finite verbs and counted them. The main theme of the dissertation was something like "The correlative constructions in Modern French", but the statistics included much more than that, and I also described the development of the phrase types from Latin through the history of French with ramifications into other Romance langages and beyond. I could easily have made it into a doctoral thesis, but I needed a thesis for my candidate degree first.

And then I discovered that I couldn't even get a job as a grammar teaching hireling at my own institute - I needed a course in pedagogics, but if I had taken that I would at best have ended up as a teacher in the 'gymnasium' (high school), most likely on a part time basis. The positions at the universities were taken by the 68'er generation. Therefore I stopped doing serious scientific research, and I never got around to publish the results of my statistical analysis.

Edited by Iversen on 07 September 2012 at 9:08pm
5 persons have voted this message useful

Chung
Diglot
Senior Member
Joined 7156 days ago
4228 posts - 8259 votes

20 sounds
Speaks: English*, French
Studies: Polish, Slovak, Uzbek, Turkish, Korean, Finnish

Message 3 of 8

07 September 2012 at 8:03pm | IP Logged

DaraghM wrote:

I've never seen anything systematic considering frequency of several features in a language. I have read essays, studies or lists for some feature in a language that also list frequency of that feature for the language in question. These often involve corpus analysis as you'd deduce the frequency of a feature by searching a large sample of the language in use.

Here are a few examples:

Cases in Finnish (including frequency of cases' employment based on analysis of articles from Helsingin Sanomat)
The use of the partitive case in Finnish learner language: A corpus study
Word order variation in German main clauses: A corpus analysis

In general, most courses for languages that I've come across align generally to frequency of use in the contemporary language. For a canonical nominative-accusative language (the only type of language alignment that I've experienced) I invariably start with present tense and nominative case, and then move onto the other tenses and cases respectively. However I think that this is hard to think of as something applicable to several languages since teachers or language course authors have to consider not only frequency of the feature but to a certain degree also how difficult or time-consuming it can be for non-natives to figure out.
1 person has voted this message useful

Peregrinus
Senior Member
United States
Joined 4492 days ago
149 posts - 273 votes

Speaks: English*

Message 4 of 8

07 September 2012 at 8:51pm | IP Logged

For conjugated and inflected variations, surely the frequency of same can be derived from a non-lemmatized frequency list. Such lists in my experience tend to be heavily based on written sources, so perhaps there would be difference in common speech to some degree.

For phrases, as in the use of que in Spanish, you can look for the studies of large corpora for lexical chunks ("n-grams" is often used in academic studies that I have see).

Here is such a site with files for English, Spanish and Portuguese:

http://www.ngrams.info/spanport.asp

It has a large number of files according to the number of words looked at, as in 2-grams, 3-grams, etc. The problem is that you have to manually select what you are looking for to remove mere collocations and other junk. For frequency purposes though, I don't remember if such data is included. Even if it is, I suspect it only applies to that specific number of words, but perhaps not.

1 person has voted this message useful

Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6703 days ago
9078 posts - 16473 votes

Message 5 of 8

07 September 2012 at 9:03pm | IP Logged

A non-lemmatized frequency list would lump words with similar endings, but different functions together - like the ending "ae" in Latin. To get something relevant you would have to do some grammatical analysis to separate those cases.
1 person has voted this message useful

Peregrinus
Senior Member
United States
Joined 4492 days ago
149 posts - 273 votes

Speaks: English*

Message 6 of 8

07 September 2012 at 9:25pm | IP Logged

Iversen,

That is a good point. While it would be more pronounced in heavily inflected languages like Russian and Latin, it still would be present in Spanish with gender endings. Also in Spanish one would have to manually combine and mathematically derive a combined frequency for the alternate imperfect subjunctive endings and distinguish between those conjugated endings which are the same in more than one tense.

Edited by Peregrinus on 07 September 2012 at 9:27pm
1 person has voted this message useful

Serpent
Octoglot
Senior Member
Russian Federation
serpent-849.livejour
Joined 6597 days ago
9753 posts - 15779 votes

4 sounds
Speaks: Russian*, English, Finnish^C1, Latin, German, Italian, Spanish, Portuguese
Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish

Message 7 of 8

07 September 2012 at 10:40pm | IP Logged

It would be the same in German, with the same form of the article appearing in various cases. Seriously, it would be EASIER if there were no repeats!!! like in Finnish.

Gunnemark wrote a bit on this subject, but it was more in terms of what concepts/meanings you should start with, just like the 400-500 words of the "active minimum".
1 person has voted this message useful

Jeffers
Senior Member
United Kingdom
Joined 4909 days ago
2151 posts - 3960 votes

Speaks: English*
Studies: Hindi, Ancient Greek, French, Sanskrit, German

Message 8 of 8

16 September 2012 at 9:57pm | IP Logged

I had a flashcard program which had the 5000 most frequent word-forms in the Greek NT,
arranged in order. It seemed like a good idea, but actually it was a bugger to learn the
forms out of context. Many forms have multiple uses, so to answer the card correctly you
should be able to name them all.

There is a lot of Bible software with each word grammatically tagged. I used to use one
called Bible Works. It should be possible to create grammar frequency lists with software
like this, but it would need to be with texts important enough for people to have gone
through and tagged every single word (such as the Bible).

By the way, the software is sophisticated enough to find things such as, for example, a
particular verb within two words of any infinitive. And I bought it nearly 12 years ago.

1 person has voted this message useful

If you wish to post a reply to this topic you must first login. If you are not already registered you must first register

Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum

This page was generated in 0.3750 seconds.