Register  Login  Active Topics  Maps  

Meta-Etymological Dictionary

  Tags: Etymology
 Language Learning Forum : Philological Room Post Reply
chrismerck
Newbie
United States
quasiphysics.wordpreRegistered users can see my Skype Name
Joined 4308 days ago

2 posts - 4 votes

 
 Message 1 of 4
04 July 2012 at 12:05am | IP Logged 
Here's an idea for something to help out when learning several related languages. This came to me in the shower
last week. I think it's totally doable, but I want to put some feelers out to other interested people before plunging
into development.

I heartily welcome comments, feature ideas, constructive criticism, links to similar projects, and links to public-
domain or permissively licensed source material.

motivation:

Aspiring polyglots face the huge challenge of massive vocabulary acquisition. Even in closely related languages, it
is easy to miss cognates because of sound changes and different orthographies. I have found that using
etymological information from dictionary lookups has boosted my recall for learning from a single language --
but I conjecture that this boost will carry over to other languages of interest if the dictionary would provide
related words in every language of interest when they exist. However, in present form, this procedure requires
searching through a number of different books! I believe an aggregated electronic version would be a huge boon
to the modern polyglot community for this reason.

intention:

To build a tool with an online (mobile-friendly) interface for simultaneously searching the etymological entries of
many trustworthy soruces and giving a digest of the relationships between words of interest in a concice manner.
Hyperlinks will allow the user to jump to the original source dictionary.

input:

etymological dictionaries (or normal dictionaries with etymological information) in several languages

output:

a network of words with edges representing an etymological link as claimed by some dicitonary. This may be
stored in XML.

usage:

The user searches for a word form in any language. The program finds all matching nodes, and then for each
node, lists all neighboring entries, regardless of language. The program may also search several levels deep,
filtering results by languages of interest. Think of a tree-like presentation.

example:

So, for example, if I search "hill", and I have told the program that I'm also interested in French and Swedish, then
I should see the Swedish "kulle" and French "collin" as results, with links showing how they are interconnected:

Eng: "hill" << O.E. "hyll" cf. L. "collis" >> Fr. "colline"

Eng: "hill" << O.E. "hyll" cf. Pr.I.E. *"kel-" >> Sv. "kulle"

Or something like that (I'm not claiming this etymology is right or whatever.)

The point is, is that you get just enough info on the screen so you can make a mental link between the
vocabularies of the various languages and learn the word as a SINGLE word, rather than having to relearn
repeatedly. -- Plus, I firmly believe that such interconnected knowledge is more easily retained than simple, rote
pairs like "kulle=hill".

Edited by chrismerck on 04 July 2012 at 12:07am

2 persons have voted this message useful



vermillon
Triglot
Senior Member
United Kingdom
Joined 4459 days ago

602 posts - 1042 votes 
Speaks: French*, EnglishC2, Mandarin
Studies: Japanese, German

 
 Message 2 of 4
04 July 2012 at 12:31pm | IP Logged 
Hello!

I'm very enthusiastic about this kind of idea. If I'm correct, there is a website that acts as a sort of indo-european etymology dictionary... I can't point to it, I just have the feeling I've visited something like that some weeks ago, perhaps someone here will be able to post a link to it.

I have several questions:
-do you know sources? I usually use the wiktionary, which is surprisingly good for that task and easy to access. If there exists a better source, then it's probably already what you're looking to develop...
-the data structure? It seems to me that most etymologies work as trees, even though there are sometimes several hypotheses for reconstructed roots. In terms of display, a tree seems more interesting to me.
-why "just enough information"? If you went with this, perhaps there could be several settings? Personally I like the long lists of the wiktionary, giving many cognates and often developing the tree from a Germanic language to the whole (Indo-)European family.

Not very useful input sorry. I mainly wanted to say that I'd really like to have a tool to make my etymology search more efficient and that I believe that, indeed, it would be of tremendous help in my learning of Germanic languages' vocabulary.
2 persons have voted this message useful



chrismerck
Newbie
United States
quasiphysics.wordpreRegistered users can see my Skype Name
Joined 4308 days ago

2 posts - 4 votes

 
 Message 3 of 4
04 July 2012 at 2:48pm | IP Logged 
Hi, vermillon, thanks for your comments!

> there is a website that acts as a sort of indo-european etymology dictionary...

> do you know sources?

For my personal ambitions, I want sources for the following languages. I'll indicate where I have them:

French - Littre
German - Grimm's DWB (unfortunately the spellings are archaic)
English - etymonline.com
Swedish - Svenska Akademiens Ordbok

The Sintic languages could also be interesting - although this may be a totally separate project. Here we are interested in the development of the loosely coupled development of the written and spoken forms of words. I know that there exist many Zidian that cover the pronunciation and calligraphic considerations in the various languages, but I don't think that working at the character level is so useful for making progress in learning : it's better to work with whole words (whether they are single character words, dual character words, chengyu, kanji + okurigana, etc.).

This would apply to (at least):
Mandarin
Cantonese
Korean
Japanese
Vietnamese


> the wiktionary, which is surprisingly good for that task and easy to access.

I have used wikitionary. It is the most extensive resource I have seen overall, providing etymologies for words in many languages, but it does not do the sort of tree-walking deep lookups that I want.

That said, if I could get access to the "edit" page format for the etymologies, it would be a very easy job of parsing! This would be an excellent source for this project. (I could potentially even give back to Wikitionary if I have high confidence in my database.)


> the data structure? It seems to me that most etymologies work as trees, even though there are sometimes several hypotheses for reconstructed roots. In terms of display, a tree seems more interesting to me.

The way I see it, words have four important properties; first, there are the extrinsic properties of time and space, where 'space' means the language/dialect/speakers who have the word in their active vocabularies. Secondly, there are the intrinsic properties of form and meaning. Over time, words are attested repeatedly at different places (in 'space') with potentially different forms (written or spoken) and in different contexts (and thus different shades of meaning).

So, in terms of this model, the purpose of an etymological entry is to shed some light on the trace of word attestations through time. This may take the form of a quotation for recent usages, or it may be a traditional etymological reference, like " from L. clavis 'key' ". In the quotation, we know the time, space, and written form variables with high precision, and we get a glimps into the period meaning. In a tranditional etymological reference, we have a vague idea of space (a whole language: Latin), a vaguer idea of time (given only implicitely by the language, which only existed within a certain period), and we get a written form plus a modern gloss to give some idea of the old meaning.

Physically, this evolution suggests a tree structure. However, as you point out, there is often a degree of uncertainty about the origins of various words. Other times, we cannot so clearly trace the development of the word, but we can still point to cognates in other languages.

For this reason, I believe that a graph (aka network) data structure is most appropriate. Verticies are word forms disambiguated as to language. Edges are directed from vertex A to B, meaning that a reference to B is found in the etymological entry for A, with extra information (an annotation) attached to the edge which links back to the source which provided that etymological link.

The user display may well appear to be tree-like. I'm not sure the best way to visually represent this information...

> why "just enough information"? If you went with this, perhaps there could be several settings? Personally I like the long lists of the wiktionary, giving many cognates and often developing the tree from a Germanic language to the whole (Indo-)European family.

I find that I quickly get distracted. I like to look up words sometimes while doing extensive reading. However, if I'm given too much information I loose my train of thought.

Still, it would be nice to have a setting to give more info for people who are in a more exploratory mode at the time.
2 persons have voted this message useful



Chung
Diglot
Senior Member
Joined 6937 days ago

4228 posts - 8259 votes 
20 sounds
Speaks: English*, French
Studies: Polish, Slovak, Uzbek, Turkish, Korean, Finnish

 
 Message 4 of 4
04 July 2012 at 3:59pm | IP Logged 
I think that a starting point to build something more suitable to the average learner (instead of a comparative linguist) is this set of databases. You can extract much raw data even though it focuses more on items with established or proposed genetic links, and provides references to hard copies.


2 persons have voted this message useful



If you wish to post a reply to this topic you must first login. If you are not already registered you must first register


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.2188 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.