Register  Login  Active Topics  Maps  

Spanish: A little subs2srs experiment

 Language Learning Forum : Language Learning Log Post Reply
147 messages over 19 pages: 1 2 3 4 5 6 7 ... 1 ... 18 19 Next >>


emk
Diglot
Moderator
United States
Joined 5520 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 1 of 147
25 October 2014 at 11:20pm | IP Logged 
Why this experiment?

I was inspired by Sprachprofi's article Understand Your Favourite TV Series in 30 Days, where she used subs2srs and Anki to achieve pretty decent listening comprehension for a single Japanese TV series in 30 hours. Of course, as she points out, her skills were quite narrow.

Now, I've already used subs2srs with Amélie, and I once used a similar hand-made tool with Buffy and a bunch of MC Solaar songs. And I've spent plenty of time with Anki: I've done 31,965 reps across all my decks, and I'm using it to learn Middle Egyptian slowly. Given this experience, I think Sprachprofi's approach would be fun to try.

What's the plan?

I'm going to try to learn beginner some Spanish the way I learned intermediate French: Using native materials, and lots of video. Any grammar study will be purely haphazard. My primary source will be subs2srs cards, but I reserve the right to mix in a bit of Destinos or anything else which looks like fun.

Is there any sort of half-baked theory behind this?

After learning French and a bit of Egyptian, I'm a big believer in Krashen's hypothesis that "We acquire language when we understand messages." (Mind you, unlike Krashen, I also think other things are important.) Personally, I think of improving passive skills as a process of "cheating and consolidating", where I use various unfair advantages to understand things, and then I become used to them through sheer exposure:



In particular, I feel that it's worth being as creative as possible in the "cheating" step. I've generally found that watching incomprehensible video is largely useless, at least for me.

Why Spanish?

Spanish has been on my list for a while. Even in the northeastern US, I run into a fair number of Spanish speakers, and Borges is one of my all-time favorite authors. And of course, I get a huge discount, thanks to English and French.

What's the media?

I've got copies of Y tu mamá también (recommended by tastyonions) and Matando Cabos (recommended by iguanamon). These look like fun.



UPDATE: Useful links

A few highlights which might be of interest:

Technology:
Subs2srs tutorial
Subtitle edit tutorial
Spanish subtitles for use with subs2srs
Making the audio on each card longer (for more advanced learners)
Using LF Aligner and Aglona Reader to make parallel Android ebooks (because books have their own vocabulary)

Progress reports:
Watching TV after 5 or 6 hours of reviewing cards
Trying to read the graphic novel Blacksad after 29 days
30 days, 10 hours of Anki reviews!
Extensive watching after almost 60 days (watching 6 new episodes)


Edited by emk on 23 December 2014 at 6:00pm

6 persons have voted this message useful



Serpent
Octoglot
Senior Member
Russian Federation
serpent-849.livejour
Joined 6585 days ago

9753 posts - 15779 votes 
4 sounds
Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese
Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish

 
 Message 2 of 147
25 October 2014 at 11:39pm | IP Logged 
I don't know whether they fit your experiment, but I recommend the following semi-native* resources:
http://www.uiowa.edu/~acadtech/phonetics - web and mobile apps for learning the phonetics. I remember how playing around with it was a huge boost for my comprehension
LyricsTraining (more info here, see the whole article too)
GLOSS
*by semi-native I mean that L1 speakers don't use this - the recordings ARE native.

Now, emk already knows about these, but I'm sure he'll be gaining followers soon :-) And of course I second the recommendation of Destinos.

Edited by Serpent on 25 October 2014 at 11:41pm

3 persons have voted this message useful





emk
Diglot
Moderator
United States
Joined 5520 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 3 of 147
26 October 2014 at 12:25am | IP Logged 
The annoying thing about using subs2srs is getting all the data into Anki. This process has never worked the same way twice for me. But here's a rough overview of how I did it this time.

Converting a DVD to MKV format

I start out by buying actual, legal DVDs on Amazon. In this case, I have a DVD which contains Spanish audio and pretty good English subtitles. To use subs2srs, I'm going to need to extract the video, audio and subtitle track using Handbrake:



I need to configure Handbrake to output in the MKV format and to extract the subtitle track as an embedded VOBSUB track in the MKV file:



If I had multiple useful subtitle tracks, I'd extract all of them here. VOBSUB is an annoying, image-based subtitle format that's typically stored in two files: an *.idx file, and a *.sub file. I can extract the tracks from the MKV as follows (though there's also a GUI version of this available somewhere):

Code:
mkvextract tracks y_tu_mama_tambien.mkv 3:subs

Finding More Subtitles

I need a Spanish subtitle track, so I search on opensubtitles.org and download various subtitle files until I find one which is more complete and better spelled than the others:



Pay attention to the red arrow. That's the easiest link to click to get the actual subtitle file without getting a complete, annoying video player application at the same time.

The good news: These subtitles are in *.srt format, which is a nice, text-based format that works well with subs2srs. The bad news: The subtitle timing is off, and so is the subtitle speed. This will be a headache.

Cleaning Up Subtitles

Now I have one subtitle file in VOBSUB image format (*.idx/*.sub, with accurate timing), and one in *.srt format (with wildly inaccurate timing). I want two *.srt files with accurate timing.

Normally, I prefer to convert VOBSUB to *.srt using SubRip, which asks me to identify each subtitle letter manually the first time it appears. This produces clean subtitles with very few OCR errors, which is ideal for language-learning. But unfortunately, this particular VOBSUB file shows up with blank scanlines in SubRip, so I'm going to have to use an OCR-based tool and accept some minor problems.

Fortunately, there's an excellent open source tool called Subtitle Edit that can perform all different sorts of subtitle conversions:



In this case, I import the *.sub file for English VOBSUB subtitles, and use the OCR mode, because I don't mind a few spelling errors in the English output. Then I use Synchronization > Point sync via other subtitle… to adjust the timing of the Spanish subtitles to match the English ones I extracted from the DVD.

Generating data for Anki

Now that I have a video file, and two *.srt subtitle files with correct timing, I can set up subs2srs as follows:



Some key settings include:

- Still images.
- 1250 milliseconds padding before and after each audio clip.
- One line of context before and after each line. (This is in a preference dialog somewhere.)

The last two items are really important, because the extra context allows me to salvage cards that would be lost to subtitle timing errors or different subtitle breaks in each language.

Finally, I preview the resulting cards:



Things to check:

1. Do the subtitles seem to line up correctly?
2. If I play the audio for a given line, is it usable?
3. Do lines in the middle and the end work as well as lines in the beginning?

Subs2srs will output a *.tsv file with multi-column data for Anki, and a media directory full of sound clips and thumbnail images. To import this into Anki, I'll use a custom note type. More on this later.

Edited by emk on 22 December 2014 at 5:57pm

9 persons have voted this message useful



YnEoS
Senior Member
United States
Joined 4242 days ago

472 posts - 893 votes 
Speaks: English*
Studies: German, Russian, Cantonese, Japanese, French, Hungarian, Czech, Swedish, Mandarin, Italian, Spanish

 
 Message 4 of 147
26 October 2014 at 1:04am | IP Logged 
Looks like an awesome experiment emk, I'm looking forward to reading your log.


You're probably already aware of this, but subs2SRS can use VOBSUB image files as well. Of course having .srt formatted stuff is always nicer and means less files to sync up between devices, but sometimes it's just quicker/simpler to just use the image files, especially with character based languages like Cantonese, where OCR is much more difficult.



Edited by YnEoS on 26 October 2014 at 1:08am

2 persons have voted this message useful



Crush
Tetraglot
Senior Member
ChinaRegistered users can see my Skype Name
Joined 5853 days ago

1622 posts - 2299 votes 
Speaks: English*, Spanish, Mandarin, Esperanto
Studies: Basque

 
 Message 5 of 147
26 October 2014 at 3:43am | IP Logged 
emk, i've been wanting to ask you a lot of questions about Anki, MCDs, SRS, subs2srs, etc. for a while now because i think you've got a really good (set of) method(s) that would be useful to me where i'm at in a lot of my languages, so i'm really happy you've started this experiment and will be following along closely. Thanks also for your last post, i'm going to try to convert some movies now and see if i can get subs2srs working.
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5520 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 6 of 147
26 October 2014 at 4:10am | IP Logged 
Serpent wrote:
I don't know whether they fit your experiment, but I recommend the following semi-native* resources:

Thank you! I'm mostly taking it easy this time, and limiting myself to comprehensible input. But I'll look at the phonetic app.

My ongoing Egyptian experiment has demonstrated that it's possible for me to learn to understand a language with minimal fuss using Assimil, MCD cards and Anki. But the data entry is a pain, and of course Egyptian is a purely written language.

I want to try another variation on this theme: If I'm just moving between Romance languages, can I learn like an intermediate student (native materials, etc.) starting from day one? And can I start with listening comprehension?

subs2srs is interesting for a couple of reasons:

1. It allows me to stretch ridiculously far beyond my natural listening level.
2. It tends to burn the pronunciation and intonation into my head.
3. There's no ongoing data entry.
4. My parallel texts come nicely pre-aligned.
5. Thanks to the SRS algorithm, I can review things quite efficiently.

YnEoS wrote:
You're probably already aware of this, but subs2SRS can use VOBSUB image files as well.

I saw those options, but I'm too in love with AnkiDroid's ability to select text subtitles, choose "Share", and pass the text straight into Google's "Translate" app. :-) Plus I can imitate Sprachprofi and run searches for cards that illustrate interesting details.

I'm pleased to announce that my deck is now loaded. I'll try to post some more screenshots tomorrow, as well as my subs2srs Anki card templates.

How I chose my movies

I was looking for:

1. Conversational Mexican Spanish.
2. A nice mix of speech registers: slang, adults, voice-overs, etc.
3. Something worth watching a lot: Ideally, something both fun and good.
4. Something where I could assemble accurate bilingual subs without too much work.

It's entirely possible that a kid's cartoon would be an even better choice. And Sprachprofi has some more advice on picking shows in her article.

But for me, this is a central problem in learning a language: Finding interesting things that are worth watching and reading. If I can't find anything, I consider that a bad sign. :-)

Importing into Anki

First I need to create a new "Note" type in Anki, and make sure it has the same fields as the *.tsv file exported by subs2srs. Note, however, that I've flipped the order of the Source and Sound fields, because "Sound" is our "key" field, and it needs to come first:



We also need to create our card template:



For a copy-and-pasteable version of the template, see here.

Now we can go ahead and import. Note that we need to manually flip "Source" and "Sound" once again, so that the order matches our *.tsv file:



Reviewing cards using desktop Anki

Shiny!



I'm making heavy use of the "R" key to replay clips (easily 10 times or more if needed), and I'm searching for easy, useful clips. I'll try to post some mobile screenshots tomorrow.

Edited by emk on 22 December 2014 at 5:59pm

3 persons have voted this message useful



Serpent
Octoglot
Senior Member
Russian Federation
serpent-849.livejour
Joined 6585 days ago

9753 posts - 15779 votes 
4 sounds
Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese
Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish

 
 Message 7 of 147
26 October 2014 at 5:51am | IP Logged 
Learning as an intermediate student from day one is exactly what I've done in Spanish and Italian :-) I've used the resources I mentioned for that, as well as obviously football, twitter etc.

Do you plan to read in Spanish? To me starting to read has been a key moment in both cases. I'm sure you'll have lots of success, but at some point you'll definitely need reading imo.
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5520 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 8 of 147
26 October 2014 at 1:53pm | IP Logged 
Crush wrote:
emk, i've been wanting to ask you a lot of questions about Anki, MCDs, SRS, subs2srs, etc. for a while now because i think you've got a really good (set of) method(s) that would be useful to me where i'm at in a lot of my languages, so i'm really happy you've started this experiment and will be following along closely.

Please feel free to ask questions!

Serpent wrote:
Do you plan to read in Spanish? To me starting to read has been a key moment in both cases. I'm sure you'll have lots of success, but at some point you'll definitely need reading imo.

Oh, maybe I'll read at some point. (I mean, beyond the bilingual reading that I'm already doing with subs2srs.) But for the moment, I still have no sense of Spanish sound, or rhythm, or anything else. So I prefer to work with nicely-chunked audio sources and bilingual text for now.

Front & back of a card

Here's what subs2srs cards look like in AnkiDroid. The "front" of the card is on the left, and the back of the card is on the right:



I hear the audio twice by default: Once when the front of the card shows, and once when the back of the card shows.

Looking up vocabulary with Google's Translate app

Here's one of the reasons why I like to have correctly-spelled, textual *.srt subtitles on my cards, and not just images captured from VOBSUB subtitles:



I can select a word or phrase, select Patager "Share", and look it up using Google's Translate application. Even better, the Translate app keeps a history of everything I've looked up:



If I want to go back and improve the cards later, I can add these translations to the cards using desktop Anki. If I want to do that, I should probably add a "Hint" field to the front of each card, and "Note" field to the back.

Picking cards

Obviously, I don't want to learn every card in the deck—many of them are far above my level, or they use lots of random advanced vocabulary that I don't care about. Basically, I'm searching for sentences that are already "decipherable" or "i+1" content—stuff I can already comprehend with minimal effort. Remember my post about cheating and consolidating: I'm looking for stuff where the "cheating" is already sufficient without a lot of extra work. :-)

For example, the card on the left is pretty reasonable. It's part of the voice-over narration, so it's nice and clear, and the meaning is pretty much transparent thanks to cognates. But the card on the right is a total disaster, and I can delete or suspend it immediately:



When I'm going through the fast-paced dialog, however, I just keep discarding cards until I find something that looks really useful:



This card actually has audio of both the highlighted line, and the line after, thanks to that 1250 millisecond padding I applied earlier. And this is the second time I've seen ¡Aquí está! "It's right here." Definitely a useful phrase!

What button to choose?

When answering a card, should I mark it "hard" or "good" or "easy?" What counts as a failure? Do I have to understand all the audio on a card, or just a single useful snippet?

The answer is, none of this matters. I have 1,173 cards just from Y Tu Mamá También alone, and I haven't even made cards from Matando Cabos yet. I can afford to delete, and suspend, and forget, and answer randomly, and I'll still get lots of exposure to the high frequency stuff. If I "use up" both movies, I can go find a Spanish dub of Avatar or something. This is a really critical lesson about using Anki: No single card matters at all, and I should delete any card which annoys me in any way whatsoever. Khatzumoto sells an "MCD Kit" with 60 handy tips, and I think 27 of those tips are some variation on delete.

I have noticed one thing, however: It's good to make heavy use of "Hard" and "Easy", and not just choose "Good" every single time. This spaces the cards out nicely, and it allows me to drastically increase the intervals between the reviews of easy cards.

Results from last night's cards

Last night, I learned 6 cards. I'm just taking it easy. It's not like I don't have a 285-card backlog of reviews for Middle Egyptian right now. :-)

Reviewing the cards I learned last night, I already notice that some of them make sense, even without translation. The others I can mostly puzzle out with several listens. I expect the cards to mostly get easier for the first several reviews, and then, somewhere between the 20 and 30 day mark, a lot of them will just "click" and make perfect sense. If you're doing Anki, watch for that "click" once you're been studying for about a month. Let me know if that happens to you, too.

Edited by emk on 22 December 2014 at 6:09pm



4 persons have voted this message useful



This discussion contains 147 messages over 19 pages: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.3594 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.