chrismerck Newbie United States quasiphysics.wordpreRegistered users can see my Skype Name Joined 4528 days ago 2 posts - 4 votes
| Message 1 of 4 04 July 2012 at 12:05am | IP Logged |
Here's an idea for something to help out when learning several related languages. This came to me in the shower
last week. I think it's totally doable, but I want to put some feelers out to other interested people before plunging
into development.
I heartily welcome comments, feature ideas, constructive criticism, links to similar projects, and links to public-
domain or permissively licensed source material.
motivation:
Aspiring polyglots face the huge challenge of massive vocabulary acquisition. Even in closely related languages, it
is easy to miss cognates because of sound changes and different orthographies. I have found that using
etymological information from dictionary lookups has boosted my recall for learning from a single language --
but I conjecture that this boost will carry over to other languages of interest if the dictionary would provide
related words in every language of interest when they exist. However, in present form, this procedure requires
searching through a number of different books! I believe an aggregated electronic version would be a huge boon
to the modern polyglot community for this reason.
intention:
To build a tool with an online (mobile-friendly) interface for simultaneously searching the etymological entries of
many trustworthy soruces and giving a digest of the relationships between words of interest in a concice manner.
Hyperlinks will allow the user to jump to the original source dictionary.
input:
etymological dictionaries (or normal dictionaries with etymological information) in several languages
output:
a network of words with edges representing an etymological link as claimed by some dicitonary. This may be
stored in XML.
usage:
The user searches for a word form in any language. The program finds all matching nodes, and then for each
node, lists all neighboring entries, regardless of language. The program may also search several levels deep,
filtering results by languages of interest. Think of a tree-like presentation.
example:
So, for example, if I search "hill", and I have told the program that I'm also interested in French and Swedish, then
I should see the Swedish "kulle" and French "collin" as results, with links showing how they are interconnected:
Eng: "hill" << O.E. "hyll" cf. L. "collis" >> Fr. "colline"
Eng: "hill" << O.E. "hyll" cf. Pr.I.E. *"kel-" >> Sv. "kulle"
Or something like that (I'm not claiming this etymology is right or whatever.)
The point is, is that you get just enough info on the screen so you can make a mental link between the
vocabularies of the various languages and learn the word as a SINGLE word, rather than having to relearn
repeatedly. -- Plus, I firmly believe that such interconnected knowledge is more easily retained than simple, rote
pairs like "kulle=hill".
Edited by chrismerck on 04 July 2012 at 12:07am
2 persons have voted this message useful
|
vermillon Triglot Senior Member United Kingdom Joined 4679 days ago 602 posts - 1042 votes Speaks: French*, EnglishC2, Mandarin Studies: Japanese, German
| Message 2 of 4 04 July 2012 at 12:31pm | IP Logged |
Hello!
I'm very enthusiastic about this kind of idea. If I'm correct, there is a website that acts as a sort of indo-european etymology dictionary... I can't point to it, I just have the feeling I've visited something like that some weeks ago, perhaps someone here will be able to post a link to it.
I have several questions:
-do you know sources? I usually use the wiktionary, which is surprisingly good for that task and easy to access. If there exists a better source, then it's probably already what you're looking to develop...
-the data structure? It seems to me that most etymologies work as trees, even though there are sometimes several hypotheses for reconstructed roots. In terms of display, a tree seems more interesting to me.
-why "just enough information"? If you went with this, perhaps there could be several settings? Personally I like the long lists of the wiktionary, giving many cognates and often developing the tree from a Germanic language to the whole (Indo-)European family.
Not very useful input sorry. I mainly wanted to say that I'd really like to have a tool to make my etymology search more efficient and that I believe that, indeed, it would be of tremendous help in my learning of Germanic languages' vocabulary.
2 persons have voted this message useful
|
chrismerck Newbie United States quasiphysics.wordpreRegistered users can see my Skype Name Joined 4528 days ago 2 posts - 4 votes
| Message 3 of 4 04 July 2012 at 2:48pm | IP Logged |
Hi, vermillon, thanks for your comments!
> there is a website that acts as a sort of indo-european etymology dictionary...
> do you know sources?
For my personal ambitions, I want sources for the following languages. I'll indicate where I have them:
French - Littre
German - Grimm's DWB (unfortunately the spellings are archaic)
English - etymonline.com
Swedish - Svenska Akademiens Ordbok
The Sintic languages could also be interesting - although this may be a totally separate project. Here we are interested in the development of the loosely coupled development of the written and spoken forms of words. I know that there exist many Zidian that cover the pronunciation and calligraphic considerations in the various languages, but I don't think that working at the character level is so useful for making progress in learning : it's better to work with whole words (whether they are single character words, dual character words, chengyu, kanji + okurigana, etc.).
This would apply to (at least):
Mandarin
Cantonese
Korean
Japanese
Vietnamese
> the wiktionary, which is surprisingly good for that task and easy to access.
I have used wikitionary. It is the most extensive resource I have seen overall, providing etymologies for words in many languages, but it does not do the sort of tree-walking deep lookups that I want.
That said, if I could get access to the "edit" page format for the etymologies, it would be a very easy job of parsing! This would be an excellent source for this project. (I could potentially even give back to Wikitionary if I have high confidence in my database.)
> the data structure? It seems to me that most etymologies work as trees, even though there are sometimes several hypotheses for reconstructed roots. In terms of display, a tree seems more interesting to me.
The way I see it, words have four important properties; first, there are the extrinsic properties of time and space, where 'space' means the language/dialect/speakers who have the word in their active vocabularies. Secondly, there are the intrinsic properties of form and meaning. Over time, words are attested repeatedly at different places (in 'space') with potentially different forms (written or spoken) and in different contexts (and thus different shades of meaning).
So, in terms of this model, the purpose of an etymological entry is to shed some light on the trace of word attestations through time. This may take the form of a quotation for recent usages, or it may be a traditional etymological reference, like " from L. clavis 'key' ". In the quotation, we know the time, space, and written form variables with high precision, and we get a glimps into the period meaning. In a tranditional etymological reference, we have a vague idea of space (a whole language: Latin), a vaguer idea of time (given only implicitely by the language, which only existed within a certain period), and we get a written form plus a modern gloss to give some idea of the old meaning.
Physically, this evolution suggests a tree structure. However, as you point out, there is often a degree of uncertainty about the origins of various words. Other times, we cannot so clearly trace the development of the word, but we can still point to cognates in other languages.
For this reason, I believe that a graph (aka network) data structure is most appropriate. Verticies are word forms disambiguated as to language. Edges are directed from vertex A to B, meaning that a reference to B is found in the etymological entry for A, with extra information (an annotation) attached to the edge which links back to the source which provided that etymological link.
The user display may well appear to be tree-like. I'm not sure the best way to visually represent this information...
> why "just enough information"? If you went with this, perhaps there could be several settings? Personally I like the long lists of the wiktionary, giving many cognates and often developing the tree from a Germanic language to the whole (Indo-)European family.
I find that I quickly get distracted. I like to look up words sometimes while doing extensive reading. However, if I'm given too much information I loose my train of thought.
Still, it would be nice to have a setting to give more info for people who are in a more exploratory mode at the time.
2 persons have voted this message useful
|
Chung Diglot Senior Member Joined 7157 days ago 4228 posts - 8259 votes 20 sounds Speaks: English*, French Studies: Polish, Slovak, Uzbek, Turkish, Korean, Finnish
| Message 4 of 4 04 July 2012 at 3:59pm | IP Logged |
I think that a starting point to build something more suitable to the average learner (instead of a comparative linguist) is this set of databases. You can extract much raw data even though it focuses more on items with established or proposed genetic links, and provides references to hard copies.
2 persons have voted this message useful
|