Register  Login  Active Topics  Maps  

Spanish: A little subs2srs experiment

 Language Learning Forum : Language Learning Log Post Reply
147 messages over 19 pages: 1 24 5 6 7 ... 3 ... 18 19 Next >>
BOLIO
Senior Member
United States
Joined 4650 days ago

253 posts - 366 votes 
Speaks: English*
Studies: Spanish

 
 Message 17 of 147
28 October 2014 at 2:50pm | IP Logged 
EMK, this experiment of yours is captivating. I am an idiot concerning computers (My wife insists the list is much longer than computers). You have my full attention with this project.

Your idea could become the Netflix for language learners. You could have a site where learners watch movies in their L2 with all the features you are describing. Really it could be limitless. I would be your first customer!

Thanks for doing this,

BOLIO


1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5524 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 18 of 147
29 October 2014 at 12:06pm | IP Logged 
Thank you for your interest, Bolio! I'm trying to write this experiment up in detail so that if it works reasonably well, other people can see what I did. Or if it fails, so they can avoid my mistakes. :-)

Today I'm going to talk about my reviewing process in detail.

Unfair advantages from French

When Sprachprofi started her subs2srs experiment, she didn't know any Japanese at all. So she decided to search out specific cards to learn: very short cards, generally in groups illustrating a specific, important word. But I can afford to start at the beginning, and just suspend cards until I find something interesting. Why? Because I have the unfair advantage of knowing French and English. For example (in Anki's "night mode"):



Here, I can immediately guess that su "his (sg)" and sus "his (pl)" are possessive adjectives, because they sound like son/sa/ses in French. Similarly, I can guess that de is "of", and so on. I also get compañeros, frecuenzia, refería and so on from English. Oh, and hija "daughter" I got from my single episode of Destinos. If I didn't get this "discount", I'd need to be a lot more strategic about choosing which cards to learn.

Masculine professions that end in -a

You remember how I noticed un pediatra especialista en alergias… a page ago, and Crush responded?

Crush wrote:
And i know you didn't ask for an answer, but words derived from Greek that end in 'a' will generally be masculine. Usually this will end in -ma or -ema (sistema, tema, problema, poema, etc.). But here you've also got the fact that the pediatrician is a male pediatrician. A female "pediatra" would use la/una. This is common of many Spanish professions, in particular the -ista's.

Well, I've just spotted another example, periodista "journalist":



Again, I don't feel any need to explain the difference between especialista and convertido yet. At this point, I'm just content to notice things. But I take an interest in things like word endings, and I try to see patterns.

Overnight audio comprehension improvements

I really struggled to understand the audio on this card yesterday, and I almost could hold it my head after quite a few replays. So I went ahead and passed it on faith:



This morning, the audio was crystal clear and I understood it without any effort. It's always nice when that happens. :-) This is the great power of subs2srs: It allows me to take difficult audio, review it several times, consolidate that understanding overnight, and then hold onto it until it matures after about a month. It's roughly the same process as reviewing Assimil lessons, but the chunk size is much smaller, and the scheduling is better optimized thanks to the SRS algorithm.

Great examples with quiero

As you can see, quiero appears 4 times in 3 lines of subtitles here:



It's clear that quiero means "I want". And this is a wonderful thing to know, because it gives me a fair bit of bad, fake Spanish:

- quiero + noun: Asking for stuff.
- quiero + infinitive: A handy stand-in for "I want to VERB", "I will VERB", "I am VERBing", etc. Who needs to conjugate anything? :-)

Now, as I keep arguing in the vocabulary threads, I think you need about 2,000 words before you can really talk about stuff. But quiero plus a dozen-odd nouns and infinitives is actually enough to be useful.

More dictionary work

Here, I used Google's Translate app to look up ya, and added the definition to a card:



So this is what reviewing feels like: some guesswork, some dictionary lookups, some handy greetings, a useful verb or two, some nice overnight improvements in audio comprehension, and so on. I know from prior experience with Assimil that 15 or 30 hours of this will make a huge difference.
5 persons have voted this message useful





emk
Diglot
Moderator
United States
Joined 5524 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 19 of 147
30 October 2014 at 4:00am | IP Logged 
For those of you who are following along, here's a much more detailed import tutorial using Matando Cabos. (I don't necessarily recommend this film—it looks good, but it uses a lot of words that aren't in my spell-check dictionary.) To make this work, we'll use four pieces of software:

1. Handbrake.
2. Subtitle Edit.
3. Subs2srs.
4. Anki.

This tutorial assumes that all the subtitle tracks you need on on the DVD. If you're lucky, and you choose the right source material, you may actually be able to do all this in an evening, and not a weekend.

A note for Linux (and Mac?) users

Subtitle Edit and subs2srs are open source Windows applications. They run quite well under VirtualBox.

1. Handbrake

We'll use Handbrake to convert DVDs into MKV video files, because Handbrake will quietly solve about 20 annoying different problems that we really don't want to know about.



The key settings here:

1. MKV format.
2. Manually specify each subtitle track we want.

2. Subtitle Edit

First, we need to import a subtitle track from our MKV file:



After a minute or so, this will show the OCR dialog. We want to choose OCR via image compare, because this gives us the best chance of getting correctly spelled subtitles in a language we don't know.



We also want to specify a Spanish dictionary, and enable all the cleanup options, including Prompt for unknown words:



Once this is done, we will be prompted to identify each character the first time it appears:



If a word is unknown, we will be prompted to confirm or correct it:



Note that you can also find subtitles at opensubtitles.org, and that Subtitle Edit has numerous options for aligning, spell-checking and converting subtitles. Don't hesitate to explore, and make sure everything is aligned correctly. The Synchronization menu, in particular, can help tremendously. Also, make sure your subtitles are in UTF-8 format before continuing. This will save you hassles later.

3. Subs2srs

The next step is to configure subs2srs:



Here, the critical settings are:

1. Pad timings: About 1250 milliseconds on both ends, but you can adjust this if needed. This saves many otherwise unusable cards.
2. Snapshots enabled.
3. Video disabled. This makes your Anki deck huge, and gains almost nothing.

You also want to specify 1 line of context before and after the clip in the Advanced subtitle options. This also saves many otherwise unusable cards.



Once everything is configured, select Preview:



Check this carefully. Make sure the clips at the beginning and end of the film are both correctly aligned, and that the padding is reasonable.

4. Import into Anki

I don't have a detailed tutorial for this yet. But you may want to see:

1. My earlier short tutorial.
2. My Anki "note" template and instructions.

Anyway, if you pick DVDs with good subtitles, and you're reasonably good with computers, you should be able to get through most of this process in an evening—assuming a language with a reasonably small alphabet that you know how to type.
8 persons have voted this message useful



rdearman
Senior Member
United Kingdom
rdearman.orgRegistered users can see my Skype Name
Joined 5228 days ago

881 posts - 1812 votes 
Speaks: English*
Studies: Italian, French, Mandarin

 
 Message 20 of 147
30 October 2014 at 1:17pm | IP Logged 
Just a quick note regarding Virtual Box and licensing. Obviously you must have a license for any operating system you put into a Virtual Machine. So if you don't have a copy of windows and you are a MAC/Linux user both Subtitle Edit and subs2srs run under WINE. Wine is a free and open source compatibility layer software application that aims to allow applications designed for Microsoft Windows to run on Unix-like operating systems.

You can download or get more information about Wine here.

I got both of these applications up and running but haven't yet used them in anger. I'm using Ubuntu so you're mileage may vary.

@EMK - I used Subtitle Edit on my virtual machine trying to line up the srt with the video (I had to chop up the video a little and the timings were off) but the video was just a white square. I converted between flv, mp4, avi, mpeg and while I could watch all of them in VLC or other video player, the Subtitle Edit display didn't work for any of those file types. Googled for help but didn't get much. I noticed in your instructions above that you output to mkv files when using handbrake? Is this where I've gone wrong? Does it need to be an mkv file?
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5524 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 21 of 147
31 October 2014 at 12:10pm | IP Logged 
rdearman wrote:
@EMK - I used Subtitle Edit on my virtual machine trying to line up the srt with the video (I had to chop up the video a little and the timings were off) but the video was just a white square. I converted between flv, mp4, avi, mpeg and while I could watch all of them in VLC or other video player, the Subtitle Edit display didn't work for any of those file types. Googled for help but didn't get much. I noticed in your instructions above that you output to mkv files when using handbrake? Is this where I've gone wrong? Does it need to be an mkv file?

I use MKV video instead of MP4 video because MKV has historically had better support for embedding subtitle tracks. This means that I can easily rip a video with embedded subtitles using Handbrake, and I can expect that other tools like Subtitle Edit will be able to import the subtitles. This saves a whole lot of steps.

I'm not using video playback within Subtitle Edit at all, at least not yet. As far as I know, Subtitle Edit has both an internal video playback system, and a mode where it uses Windows video codecs. In both cases, you need to download and install the codecs as instructed. But in my case, I've always been able to get at least one correctly aligned subtitle file, so I don't need video at all.

You situation is a lot more complicated: You recorded a video with embedded commercials, edited those commercials out, and attempted to align subtitles, correct? And you're working in a less transparent language, too. This is much harder than anything I've ever attempted—I've always just looked for DVDs with embedded subtitle tracks, or at worst, DVDs where I can grab something from opensubtitles.org. I would recommend this approach for people who have a choice.

I'm impressed that you've gotten things to work with Wine and manually-edited video. But I have the unfair luxury of avoiding that path, so I won't be of much help. :-(

Conversations among peers

As usual, the hardest parts of this movie are the conversations between close friends, especially older teenagers and college students. They speak very quickly, they use lots of slang, and they speak over each other.



The card on the left was hopeless. The card on the right was the only card I managed to salvage from this entire scene. In general, for conversations like this, even a single usable card is a good result. I'm learning to suspend cards after only a couple seconds of consideration.

Narration and cross-generation conversations

Two much better sources of comprehensible input are narration and cross-generation conversation. I can often keep 2/3rds of the cards from these scenes (assuming I want them). Here are two nice examples:



I have my doubts as to whether Los veo raros means "You're acting strange." More evidence will be required. But you can see that I'm occasionally looking up words and adding them to cards.

Cards per day and time spent reviewing

Last night, I asked Anki to show me 10 new cards per day instead of 5. It takes me roughly 15 minutes to find and learn 10 new cards. Fortunately, reviews go much faster: I can make it through 15 "recent" cards in under 5 minutes, and they're getting faster as they mature. (In my mature French audio decks, I can easily review 75 cards in about 10 minutes. Mature subs2srs cards are fast.) Plus, with subs2srs cards, I typically use the "Easy" button a lot, and I continue to aggressively cull cards even once I've learned them. Taken all together, this means that I can afford to learn more cards up front without risking the usual massive SRS overload in 3 weeks.

What experience teaches me

One thing that's nice about this whole process: Even though this is an experiment, I've done most of the individual parts before. I've turned intermediate French into advanced French using a Super Challenge. I've done one-and-a-half Assimil courses and 33,006 Anki reviews. Mostly, all this experience is useful because it helps me chill out. I know that:

- I don't need to understand the hard stuff right away. Instead, I should focus on the easy stuff.
- Anki cards get better as they mature.
- Deleting cards is the secret to happiness. I need to resist the urge to fixate on lousy cards.
- It doesn't matter what button I choose.
- Little bits of knowledge add up if I just stick to it. I understand these two Khatzumoto essays in my bones.
2 persons have voted this message useful





emk
Diglot
Moderator
United States
Joined 5524 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 22 of 147
31 October 2014 at 3:03pm | IP Logged 
A bit of background listening

Right now, I have disc 1 of Harry Potter y la piedra filosofal looping quietly in the background while I work, just to get more exposure to the sounds. It's a rather nice edition, and it matches the DRM-free ebook sold at Pottermore.

I don't expect to learn any actual Spanish this way. How could I? I'm not even paying attention, and my level of understanding is too low for cheating and consolidating to kick in. But I do hope to get my ears accustomed to the rhythm, sounds and syllables of the language. Khatzumoto writes about his personal experiences and one very minor academic study, and I'm inclined to agree: listening seems to build up an "under layer" that makes things easier later on.

Interestingly, even after an hour or two of subs2srs with native materials, I'm already finding it easier to hear syllable and word boundaries in the audiobook. Nice.

Learning the sounds

Meanwhile, here are two web pages I will start poking at soon, probably this weekend.

Mexican Spanish phonetics on Wikipedia



Paul Meier's interactive Flash IPA chart



The Paul Meier chart has explanations of the rows and columns if you mouse over the names, and it plays the sounds. For detailed explanations, look the sounds up on Wikipedia.

My short-term goal here would be to familiarize myself with the sounds of Mexican Spanish. If I were really serious about this project, and if I had money to burn, then I might also spend $96.99 on Idahosa's Flow of Spanish course. I beta-tested his French course at a reduced price, and I appreciated the detailed phonetic explanations and personal feedback. If you want to get really solid on the phonetics, you could do a lot worse.

I figure it's worth paying attention to sounds early on. If I don't, I tend to mentally "lump together" sounds that should be distinct, and it's a pain to untangle them latter.

Should I learn any grammar at this stage?

I'm quite happy puzzling out grammar for now, but at some point, I'll probably look some stuff up. I'm quite tempted by Essential Spanish Grammar, because the French version was short and cheap and provided a nice high-level overview without getting into the weeds. But it almost feels like overkill at this point. I'm tempted to get a quick reference chart at some point and hang it on the wall, because that's even shorter and more concise. But there's no rush. I'm happily just muddling along in Anki for now, puzzling things out. Any grammar study I do will be haphazard and lackadaisical, because I enjoy puzzling things out. Other folks clearly feel differently, and they like to see things laid out clearly.
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5524 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 23 of 147
01 November 2014 at 11:44am | IP Logged 
Mexican Spanish phonetics, take 1

All right, working through the Mexican Spanish Phonetics table mentioned above, I've identified the most interesting consonants. (You may need an IPA font to view these.)

- [β]: Bilabial approximant. Like a V with the lips. I need to learn this.
- [r] and [ɾ]: Trilled and tapped R. I have both of these already. (I'm good at Rs.)
- [ʝ]: Palatal fricative. Sort of like a rough Y. I need to work on this.
- [ɲ]: Palatal nasal. Like the English "-ng", but further forward. I need to work on this.
- [ɣ]: Velar approximant. Sort of like a G that that doesn't close. I need to work on this on this.
- [x]: Velar fricative. One of those German CH sounds. (I have several of these.)

And two groups which seem to be less important:

- Miscellaneous affricates. Mostly ones I have already, or ones used in indigenous loan words. Low priority.
- Miscallaneous labio-velar forms. These look like a sound-law applied to velar + U, and they may be dialectical. Need to keep my ears open for examples, and listen closely to the pronunciation of güey.

Vowels are tidy: Spanish uses the normal 5 vowel system, which I've had ever since I first studied Italian on my own in high school.

For each of these sounds, I use the interactive IPA chart to listen to them repeatedly, and I figure out how to pronounce them. Then I practice them here and there throughout the day, and I listen carefully for them with subs2srs.

Working directly on these sounds is very helpful, because I'm working with full-speed native speech from day 1. And full-speed native speech often assumes that I can hear and distinguish even hints of these sounds. But I'm not trying to master these right now—I'm only trying to create some new "hooks" in my head that I can use while listening. Also, the adult brain can happily gloss over this stuff for years if left to its own devices, so a little bit of attention up-front can help me a lot.

What about studying?

This is pretty much how I "study": I start with interesting input, and I focus most of my efforts on that. Every one in a while, though, a detail will capture my attention, and maybe I'll go look it up. Then I'll listen carefully for more examples of that detail in my input.

But the input always comes both first and last: it comes first, so that my brain can start seeing patterns. And it comes last, because I only study to help make the patterns more visible in the input.

There are people like Victor Hart and Khatzumoto who argue that any form of explicit grammar studying may be counter-productive. But then there are people like patrickwilken who managed to learn a huge amount of German grammar but not all of it from heroic amounts of input. Personally, I figure that I'm willing to occasionally do a small amount of haphazard studying. Maybe 6 verb endings or something like that. :-) Other people may prefer more.

Yes, I'm trying to speed up "natural" acquisition

Language learning is a lot of fun once I can actually understand a decent fraction of easy television (and books). The bits before that point are pretty fun, too, but I don't enjoy them so much I want to drag them out forever. So everything I'm doing is aimed towards understanding TV as soon as possible.

One interesting difference between Assimil and subs2srs: subs2srs involves much more aggressive listening practice from the very beginning, thanks to the intensive and repeated use of native audio. I'm very interested to find out how this will affect my overall progress.
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5524 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 24 of 147
02 November 2014 at 3:19pm | IP Logged 
Notes from today's reviews

On the left, a great card. I can guess that faro is "headlight", because of the French cognate phare. And this is the second or third time I've seen coche, and my brain has accepted that it means "car." If I see a word in several different contexts, and I can more-or-less understand it each time, I generally learn the word automatically. So it's best to keep moving and not get hung up on junk cards.



On the right, we have a genuinely difficult card that I really like. It illustrates about 5 different interesting things—temieron, which means "they feared", fracaso and suerte, which are cool words, plus a way of describing time, and so on. For this card, I'm going to work harder than usual to keep it. It's OK to have favorite cards if they're awesome.

Unfortunately, it looks like I have another group of cards that aren't sticking—mostly from one conversation between Tenoch and his mother. I marked most of these as "Hard" this morning, and if they don't improve, I'll suspend them.

Subs2srs & listening training

A typical subs2srs card contains maybe 3 to 5 seconds of native audio. With each card, my goal is to understand it, and ideally to understand it directly in Spanish. To do this, I need to do two things:

1. Distinguish native sounds, even when they're rapid and heavily reduced.
2. Hold 3 to 5 seconds of native audio in my echoic memory long enough to make sense of it.

So there's two separate things going on here: Not only am I using subs2srs to make input comprehensible, but I'm also aggressively training my brain to handle Spanish audio. This part feels less like Assimil, and more like a listening-only version of Idahosa's mimic method. He's never explained the latter steps in detail, but as I understand it, it involves aggressive sound-training that allows accurate reproduction of native audio, even without complete comprehension.

This is interesting, and it might explain why Sprachprofi could follow a single Japanese series after 30 hours—she was training her listening and sound memory very aggressively, and she was picking up the important series-specific vocabulary. And of course, she's also an awesome polyglot. But now I'm very curious about where I'll be after 30 hours of Anki. :-)


2 persons have voted this message useful



This discussion contains 147 messages over 19 pages: << Prev 1 24 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.5000 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.