DaraghM Diglot Senior Member Ireland Joined 6151 days ago 1947 posts - 2923 votes Speaks: English*, Spanish Studies: French, Russian, Hungarian
| Message 1 of 8 07 September 2012 at 1:54pm | IP Logged |
Does anyone know if the frequency of grammatical concepts has ever been derived for various languages ?
In Spanish, the most frequent word is 'que'. However, this doesn't tell me about its actual usage. Is 'que' more likely to occur as a relative pronoun by itself, or combined as 'lo que'. Similarly, 'de', has numerous usages but what is the most common, expressing possession or in conjunction with verbs. Is the -ía ending for imperfect verbs more common than -aba ? Is the imperfect more common than the preterit ?
In Russian, what are the relative frequency of the cases, and the frequency of the case endings ? Is the -у end for verbs more common than -ю E.g. я иду (I go), я читаю (I read)
I think an invaluable resource for language learners would be a frequency list of grammatical concepts. Most grammar books are either too brief or too detailed to learn from efficiently; but if the concepts were ordered by frequency, it would provide the greatest coverage in the shortest time possible.
What are your thoughts ?
Edited by DaraghM on 07 September 2012 at 1:55pm
1 person has voted this message useful
|
Iversen Super Polyglot Moderator Denmark berejst.dk Joined 6703 days ago 9078 posts - 16473 votes Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian Personal Language Map
| Message 2 of 8 07 September 2012 at 3:32pm | IP Logged |
I made a complete typological survey of subordinate phrases in Modern French as part of my final dissertation when I studied that language, and I remember that only one other person (whose name I don't remember now) had made something similar and published it in a book - I could probably find the name in my dissertation if need be.
As rebellious as ever I had proposed my own reclassifications of the phrase types, and then I went through exactly 10.000 pages of literature (representing different genres and styles) and marked all subordinates and some related constructions without finite verbs and counted them. The main theme of the dissertation was something like "The correlative constructions in Modern French", but the statistics included much more than that, and I also described the development of the phrase types from Latin through the history of French with ramifications into other Romance langages and beyond. I could easily have made it into a doctoral thesis, but I needed a thesis for my candidate degree first.
And then I discovered that I couldn't even get a job as a grammar teaching hireling at my own institute - I needed a course in pedagogics, but if I had taken that I would at best have ended up as a teacher in the 'gymnasium' (high school), most likely on a part time basis. The positions at the universities were taken by the 68'er generation. Therefore I stopped doing serious scientific research, and I never got around to publish the results of my statistical analysis.
Edited by Iversen on 07 September 2012 at 9:08pm
5 persons have voted this message useful
|
Chung Diglot Senior Member Joined 7156 days ago 4228 posts - 8259 votes 20 sounds Speaks: English*, French Studies: Polish, Slovak, Uzbek, Turkish, Korean, Finnish
| Message 3 of 8 07 September 2012 at 8:03pm | IP Logged |
DaraghM wrote:
Does anyone know if the frequency of grammatical concepts has ever been derived for various languages ?
In Spanish, the most frequent word is 'que'. However, this doesn't tell me about its actual usage. Is 'que' more likely to occur as a relative pronoun by itself, or combined as 'lo que'. Similarly, 'de', has numerous usages but what is the most common, expressing possession or in conjunction with verbs. Is the -ía ending for imperfect verbs more common than -aba ? Is the imperfect more common than the preterit ?
In Russian, what are the relative frequency of the cases, and the frequency of the case endings ? Is the -у end for verbs more common than -ю E.g. я иду (I go), я читаю (I read)
I think an invaluable resource for language learners would be a frequency list of grammatical concepts. Most grammar books are either too brief or too detailed to learn from efficiently; but if the concepts were ordered by frequency, it would provide the greatest coverage in the shortest time possible.
What are your thoughts ?
|
|
|
I've never seen anything systematic considering frequency of several features in a language. I have read essays, studies or lists for some feature in a language that also list frequency of that feature for the language in question. These often involve corpus analysis as you'd deduce the frequency of a feature by searching a large sample of the language in use.
Here are a few examples:
Cases in Finnish (including frequency of cases' employment based on analysis of articles from Helsingin Sanomat)
The use of the partitive case in Finnish learner language: A corpus study
Word order variation in German main clauses: A corpus analysis
In general, most courses for languages that I've come across align generally to frequency of use in the contemporary language. For a canonical nominative-accusative language (the only type of language alignment that I've experienced) I invariably start with present tense and nominative case, and then move onto the other tenses and cases respectively. However I think that this is hard to think of as something applicable to several languages since teachers or language course authors have to consider not only frequency of the feature but to a certain degree also how difficult or time-consuming it can be for non-natives to figure out.
1 person has voted this message useful
|
Peregrinus Senior Member United States Joined 4492 days ago 149 posts - 273 votes Speaks: English*
| Message 4 of 8 07 September 2012 at 8:51pm | IP Logged |
For conjugated and inflected variations, surely the frequency of same can be derived from a non-lemmatized frequency list. Such lists in my experience tend to be heavily based on written sources, so perhaps there would be difference in common speech to some degree.
For phrases, as in the use of que in Spanish, you can look for the studies of large corpora for lexical chunks ("n-grams" is often used in academic studies that I have see).
Here is such a site with files for English, Spanish and Portuguese:
http://www.ngrams.info/spanport.asp
It has a large number of files according to the number of words looked at, as in 2-grams, 3-grams, etc. The problem is that you have to manually select what you are looking for to remove mere collocations and other junk. For frequency purposes though, I don't remember if such data is included. Even if it is, I suspect it only applies to that specific number of words, but perhaps not.
1 person has voted this message useful
|
Iversen Super Polyglot Moderator Denmark berejst.dk Joined 6703 days ago 9078 posts - 16473 votes Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian Personal Language Map
| Message 5 of 8 07 September 2012 at 9:03pm | IP Logged |
A non-lemmatized frequency list would lump words with similar endings, but different functions together - like the ending "ae" in Latin. To get something relevant you would have to do some grammatical analysis to separate those cases.
1 person has voted this message useful
|
Peregrinus Senior Member United States Joined 4492 days ago 149 posts - 273 votes Speaks: English*
| Message 6 of 8 07 September 2012 at 9:25pm | IP Logged |
Iversen,
That is a good point. While it would be more pronounced in heavily inflected languages like Russian and Latin, it still would be present in Spanish with gender endings. Also in Spanish one would have to manually combine and mathematically derive a combined frequency for the alternate imperfect subjunctive endings and distinguish between those conjugated endings which are the same in more than one tense.
Edited by Peregrinus on 07 September 2012 at 9:27pm
1 person has voted this message useful
|
Serpent Octoglot Senior Member Russian Federation serpent-849.livejour Joined 6597 days ago 9753 posts - 15779 votes 4 sounds Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish
| Message 7 of 8 07 September 2012 at 10:40pm | IP Logged |
It would be the same in German, with the same form of the article appearing in various cases. Seriously, it would be EASIER if there were no repeats!!! like in Finnish.
Gunnemark wrote a bit on this subject, but it was more in terms of what concepts/meanings you should start with, just like the 400-500 words of the "active minimum".
1 person has voted this message useful
|
Jeffers Senior Member United Kingdom Joined 4909 days ago 2151 posts - 3960 votes Speaks: English* Studies: Hindi, Ancient Greek, French, Sanskrit, German
| Message 8 of 8 16 September 2012 at 9:57pm | IP Logged |
I had a flashcard program which had the 5000 most frequent word-forms in the Greek NT,
arranged in order. It seemed like a good idea, but actually it was a bugger to learn the
forms out of context. Many forms have multiple uses, so to answer the card correctly you
should be able to name them all.
There is a lot of Bible software with each word grammatically tagged. I used to use one
called Bible Works. It should be possible to create grammar frequency lists with software
like this, but it would need to be with texts important enough for people to have gone
through and tagged every single word (such as the Bible).
By the way, the software is sophisticated enough to find things such as, for example, a
particular verb within two words of any infinitive. And I bought it nearly 12 years ago.
1 person has voted this message useful
|