16 messages over 2 pages: 1 2 Next >>
lichtrausch Triglot Senior Member United States Joined 5949 days ago 525 posts - 1072 votes Speaks: English*, German, Japanese Studies: Korean, Mandarin
| Message 1 of 16 07 May 2013 at 2:37am | IP Logged |
Or so this new study claims. I'm skeptical to put it mildly.
news article
study
3 persons have voted this message useful
| Volte Tetraglot Senior Member Switzerland Joined 6428 days ago 4474 posts - 6726 votes Speaks: English*, Esperanto, German, Italian Studies: French, Finnish, Mandarin, Japanese
| Message 2 of 16 07 May 2013 at 9:17am | IP Logged |
The news article is utterly unconvincing, but the paper is interesting, whether or not it's correct... and it's safe to say it isn't.
Edited by Volte on 09 May 2013 at 6:05pm
2 persons have voted this message useful
| patrickwilken Senior Member Germany radiant-flux.net Joined 4522 days ago 1546 posts - 3200 votes Studies: German
| Message 3 of 16 07 May 2013 at 12:01pm | IP Logged |
I found the paper really interesting (didn't read the news report).
One of the basic findings is that the probability that a word is a proposed conjugate across multiple language groups correlated with its frequency of use. So words that are used a lot are much more likely to be conserved, than less frequent words. This observation allows you to use the less frequent words as an estimator for how likely similar sounding words appear by chance across language groups.
The average estimated half-life for words is 2000-4000 years, but for highly conserved words this might be closer to 10k-15k years.
The 23 words that show cognate class sizes of four or more Eurasiatic language groups are: I, who, not, that, we, to give, what, man, Ye, old, mother, to hear, hand, fire, to pull, black, to flow, bark, ashes, to spit, worm.
Of course, any one of these cognates might be a chance occurrence, but their statistics assigns probability across all words, to then reconstruction language trees.
Interesting study. Thanks for the heads up.
1 person has voted this message useful
|
emk Diglot Moderator United States Joined 5521 days ago 2615 posts - 8806 votes Speaks: English*, FrenchB2 Studies: Spanish, Ancient Egyptian Personal Language Map
| Message 4 of 16 07 May 2013 at 12:04pm | IP Logged |
Thank you, lichtrausch, for tracking down the paper itself. I've just finished skimming the paper, and here are my first reactions.
Cons:
1. The paper is written by biologists, with only a single linguist participating (judging from a quick skim of the author list).
2. The paper is published in PNAS, which is a general interest journal, not a linguistic journal.
3. The authors are claiming to revolutionize somebody else's field.
So far, this is a very old story: A bunch of biologists go nuts with a big database, and decide to tell historical linguists that they've been doing it all wrong for the past 150 years. This happens two or three times per year, and it never goes anywhere, because the biologists are doing sloppy bulk-processing of data which has already been carefully analyzed by hand (at least for Indo-European languages—the biologists have an easier time making useful contributions to less-studied language families).
Now, the good news:
1. The authors actually have a hypothesis for why they might be able to do better than other researchers in the field, related to unusually stable high-frequency vocabulary.
2. The authors appear to understand statistics, and they're careful not to make certain errors that typically undermine papers like this.
3. The data for the paper is drawn from a small set of Swadesh words which were chosen up front. This helps prevent the problem so eloquently described by xkcd:
The validity of this paper is ultimately going to turn on two things:
1. The credibility of the Languages of the World Etymological Database, which they use heavily.
2. The validity of their core statistical techniques, which I can't evaluate without a much closer reading of the paper.
Anyway, if you like statistics, this paper is probably worth reading carefully. There are some nice ideas here, and the authors bring some interesting techniques to the debate. Thank you once again, lichtrausch, for digging beyond the news reports and finding the original paper.
6 persons have voted this message useful
| patrickwilken Senior Member Germany radiant-flux.net Joined 4522 days ago 1546 posts - 3200 votes Studies: German
| Message 5 of 16 07 May 2013 at 12:28pm | IP Logged |
emk wrote:
Cons:
1. The paper is written by biologists, with only a single linguist participating (judging from a quick skim of the author list).
2. The paper is published in PNAS, which is a general interest journal, not a linguistic journal.
3. The authors are claiming to revolutionize somebody else's field.
|
|
|
Your comments are interesting, but I think your cons are a little unfair.
1. This is an interdisciplinary paper. It seems perfectly reasonable that people with good math skills team up with people with linguistic training. Generally multi-author papers are composed of people with different skills. For this you would want people with good maths/statistics and people with good linguistic backgrounds, plus probably someone to do the work (usually a grad student as first author).
2. PNAS is a general science journal, with a brief to publish papers that are both highly significant and have broad appeal. For what it's worth it's in the top three science journals globally. "PNAS" is often jokingly called "Publication other than Nature or Science". If you have a cure for cancer, have compelling evidence for plate tectonics, etc you don't submit to a specialized journal (you'd generally submit to either Nature/Science and then Science/Nature if rejected, then PNAS, then some specialized journal).
It is also the official publication of the National Academy of Sciences, and I think it's great that they think linguistics falls under the domain (does anyone know how many members of the academy are linguists?).
That's not to say the result may turn out to be wrong, but it's certainly gone through stricter review than if it had gone to a lower impact linguistics journal. I can tell you that for the journals I used to work for (which were one step lower down than PNAS) - the reviewers were often much more senior than the authors. I confident that they would have had 3-4 very senior linguists and statisticians to review this paper before publication. If any one of these people raised doubts it is highly likely that the paper would have been rejected.
3. Are they? At least one of them is a linguist, and I am not sure about the other's, but I am pretty sure they have published previous papers in this area. Presumably Andreea Calude, at least, would take offense to that statement. Academics want to get maximum credit for each paper they write, so it makes no sense to write a paper with someone else with the same skills are your own.
My sense, having read this paper as a non-linguist, is that not only is the result interesting, but just as interesting are their experimental techniques which seem novel and of broad interest. What will happen now is that lots of academics will jump this paper, and try to publish a paper showing why everything is crap (you don't get a paper by agreeing - and it's much harder to create something positive than to criticize). So if we wait a few years it will be easy enough to see how true any of the results are.
Edited by patrickwilken on 07 May 2013 at 12:49pm
2 persons have voted this message useful
| Lykeio Senior Member United Kingdom Joined 4233 days ago 120 posts - 357 votes
| Message 6 of 16 07 May 2013 at 12:41pm | IP Logged |
I've flicked through the paper and glanced at the news article (which is odd, out of the
words I can think of with easy cognates "hand" is famously not one of them). Not much to
say really, its the typical kind of paper one sees a few times a year. A revival of over
simplified phylogenic models, glottochronology etc almost always by non linguists. This
is the second popular paper this year.
Not much to say from a comparative philology perspective, all the data and all the
studies are there if they/anyone was interested. Its not at all surprising that Renfrew
was involved. I don't mean that to sound nasty towards him, he's nice in person and I
enjoy some of his archaeological work on bronze age sanctuaries but...there's a reason
why only (some) archaeologists and the general public are drawn to his work on PIE.
3 persons have voted this message useful
|
emk Diglot Moderator United States Joined 5521 days ago 2615 posts - 8806 votes Speaks: English*, FrenchB2 Studies: Spanish, Ancient Egyptian Personal Language Map
| Message 7 of 16 07 May 2013 at 2:11pm | IP Logged |
patrickwilken wrote:
emk wrote:
Cons:
1. The paper is written by biologists, with only a single linguist participating (judging from a quick skim of the author list).
2. The paper is published in PNAS, which is a general interest journal, not a linguistic journal.
3. The authors are claiming to revolutionize somebody else's field.
|
|
|
Your comments are interesting, but I think your cons are a little unfair. |
|
|
My Bayesian priors for "historical linguistics paper written by biologists and published in a high impact factor journal" are very low, for reasons I've discussed elsewhere. These papers seem to get published a couple of times per year, and they're not generally acknowledged as credible by experts within the field of historical linguistics. There's actually a nice discussion of these issues in the current paper:
Quote:
Such evidence is often criticized for two reasons. First, most words are thought to suffer from too much semantic and phonetic erosion to allow secure identification of true cognates beyond 5,000 to 9,000 y (11, 12), and second, even if a number of apparent cognates can be identified, proponents of long-range relationships have been unable to provide statistical verification that the resemblances they have found are beyond what would be expected by chance between unrelated languages (11, 12). Where statistical tests have been used (9, 13), the results have been inconclusive because of the difficulty of establishing secure null models that estimate the number of resemblances expected to arise by chance. |
|
|
Essentially, trying to push historical linguistics beyond 9,000 years puts you firmly into "crank" territory. This is the judgement of researchers who've studied the proto-languages in extensive detail over the course of decades. So a paper like the current one bears a heavy burden of proof.
That said, this paper actually tries to address most of the obvious objections, and the authors can include or exclude certain suspicious word classes without changing the overall results. So that's why I suggested that this paper might reward a close reading.
A nice litmus test for these any of these theories would be the Afro-Asiatic language family, which contains the Semitic languages, Ancient Egyptian, Berber and a variety of other north African languages. This is the oldest of the generally-acknowledged language families, and it has written evidence going back about 5,400 years. But Afro-Asiatic is a bit of a weird mess: the vocabulary varies more than it should between branches, despite surprising grammatical similarities.
Now, if some statisticians could actually provide testable hypotheses for Afro-Asiatic, I'd be impressed. This is a subject well within the frontiers of existing research, and there is tons of evidence and lots of open questions.
2 persons have voted this message useful
| patrickwilken Senior Member Germany radiant-flux.net Joined 4522 days ago 1546 posts - 3200 votes Studies: German
| Message 8 of 16 07 May 2013 at 3:03pm | IP Logged |
emk wrote:
My Bayesian priors for "historical linguistics paper written by biologists and published in a high impact factor journal" are very low, for reasons I've discussed elsewhere. |
|
|
Sure. I am not disputing your Bayesian priors, just your stated cons.
I only read the paper, and haven't bothered to check the affiliations of all the authors, but I still think your labeling this as "linguistics paper written by biologists" is inaccurate (at least in so far as it implies all the authors were linguistics), and unfair (it's an interdisciplinary paper).
The way I read the paper (as a non-expert) is:
1. Words change over time.
2. Some words change more quickly than others.
3. Proposed cognates across language groups may be real or illusory.
4. The more common a word is in usage the more likely it will have a cognate in another language family.
5. Point 4, suggests that at least some of the cognates are real, though who knows which ones (authors deal with three possible objections to this point).
6. Point 4 also allows you to make an estimate of the likelihood a word is a false cognate by comparing high vs low frequency word cognates (by assuming low freq. words more likely to be false cognates) and so get a half-life (or mutation rate) for words.
7. All this allows you to put a probability estimate on the likelihood any word is a cognate.
8. Using statistics from Point 7 it's straightforward to generate an evolutionary tree.
Point 6 seems to be the tricky one (Point 5 less so to me).
Edited by patrickwilken on 07 May 2013 at 3:23pm
1 person has voted this message useful
|
This discussion contains 16 messages over 2 pages: 1 2 Next >>
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 3.2190 seconds.
DHTML Menu By Milonic JavaScript
|