Register  Login  Active Topics  Maps  

Spanish: A little subs2srs experiment

 Language Learning Forum : Language Learning Log Post Reply
147 messages over 19 pages: << Previous 1 2 3 4 5 6 7 ... 9 ... 18 19 Next >>
rdearman
Senior Member
United Kingdom
rdearman.orgRegistered users can see my Skype Name
Joined 3636 days ago

881 posts - 1812 votes 
Speaks: English*
Studies: Italian, French, Mandarin

 
 Message 65 of 147
17 November 2014 at 10:56am | IP Logged 
kujichagulia wrote:
emk and anybody else doing this, I have a question for you all.

With your subs2srs decks, do you have any cards where the subtitles do not match the audio? Perhaps one word is different, or the entire line is different from the audio. What do you do with such cards? Do you:
(a) Delete them right away, or
(b) Depending on the card, keep them and use the subtitles as hints to the audio?

The reason I ask is that I have some DVDs that I really like, that also come with Japanese and/or Portuguese subtitles. The problem is that, most of the time, the subtitles don't match the audio. A search on some of the subtitle websites mentioned above (OpenSubtitles.org, etc.) didn't find any files with subtitles that match the audio verbatim. (In fact, it seems like they just ripped the subtitles from the DVD and posted them up there.) I like those DVDs and would like to use them, if possible, for subs2srs, hence the question. It would be cool to have 3,000 cards of natural conversation with audio, but if 95% of them are going to be deleted, that leaves 150. Is that enough to understand a movie or a TV show?


If you have the subtitles and you understand enough of the language you can line them up manually with Sub Edit. I did this for a Mandarin soap opera. If you use the waveforms it can be done, and I figure if you know enough of the language to spot the mistakes then you should be able to align them. Or you can simply correct the card in the Anki deck when you notice a mistake.

Failing that, I personally don't see a problem just deleting the card, after all you could just do another 2-3 films. I think it proves the validity of this method in that I have already spotted a number of mistakes in the subtitles because I'm intensively listening to a short section.
2 persons have voted this message useful



kujichagulia
Senior Member
Japan
Joined 3247 days ago

1031 posts - 1571 votes 
Speaks: English*
Studies: Japanese, Portuguese

 
 Message 66 of 147
17 November 2014 at 1:36pm | IP Logged 
rdearman wrote:
If you have the subtitles and you understand enough of the language you can line them up manually with Sub Edit. I did this for a Mandarin soap opera. If you use the waveforms it can be done, and I figure if you know enough of the language to spot the mistakes then you should be able to align them. Or you can simply correct the card in the Anki deck when you notice a mistake.

Failing that, I personally don't see a problem just deleting the card, after all you could just do another 2-3 films. I think it proves the validity of this method in that I have already spotted a number of mistakes in the subtitles because I'm intensively listening to a short section.

rdearman, thank you for the reply!

I apologize for not being clear in my last post. When I said the subtitles didn't match the audio, I didn't mean the timing. If it's a matter of shifting the subtitles to fix the timing, you guys have laid out a nice tutorial here. But I was referring to the language and vocabulary itself. With the DVDs I have, more often than not, what a character is saying differs greatly from what is written in the subtitle.

For example, imagine a character is saying:
"I'm thinking of going to this great Italian restaurant downtown. Care to join me?"

But the subtitle written on the screen says:
"I want to go to an Italian restaurant. Want to come?"

I'm wondering if this audio and subtitle - which has nearly the same meaning as the line in the audio, but is structured differently - would be useless as an Anki card, because what you hear is not exactly what you read.
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 3932 days ago

2615 posts - 8805 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 67 of 147
17 November 2014 at 4:07pm | IP Logged 
kujichagulia wrote:
With the DVDs I have, more often than not, what a character is saying differs greatly from what is written in the subtitle.

For example, imagine a character is saying:
"I'm thinking of going to this great Italian restaurant downtown. Care to join me?"

But the subtitle written on the screen says:
"I want to go to an Italian restaurant. Want to come?"

I'm wondering if this audio and subtitle - which has nearly the same meaning as the line in the audio, but is structured differently - would be useless as an Anki card, because what you hear is not exactly what you read.

I've seen several different kinds of subtitles:

1. (Nearly-)Accurate subtitles: These work great with subs2srs, and you should be able to work with media 3 or 4 CEFR levels above your "real" level.

2. Abbreviated subtitles: These are mostly accurate, except that they often leave out a few words to better fit on the screen. These aren't too bad: Some lines are already accurate, and some of the inaccurate ones are close enough that you can guess the missing words correctly. The French dub of Le Trône de fer is like this.

3. Wildly-inaccurate subtitles OR native-only subtitles. You can use these with subs2srs, and they'll help a little. But it's pretty frustrating, unless you can almost understand the series without subtitles, and just need occasional hints.

If you're studying English, you should be able to find thousands of hours of movies and TV series with subtitles in categories (1) and (2), especially if you get your materials from the US. One easy way to check:

1. Search for a subtitle file on opensubtitles.org.
2. Search for a scene from the series or movie on YouTube.
3. Compare the subtitles against the series.

It's worth doing this, if at all possible—accurate subtitles make subs2srs pretty amazing.

Also, some thoughts about deletion: My deletion rate varies hugely depending on the media I'm using: I was deleting about 2/3rds of the Y Tu Mamá También cards, but now I'm only deleting about 10% of the Avatar cards (though that number may go up if I get annoyed reviewing too many of them). When I'm working with French decks, I obviously delete a huge number of cards because they're too easy. As always, I'm a huge fan of card deletion.

More about "substudy"

As mentioned earlier, I'm working on public domain tools for subtitle processing. To use these, you'll probably need to be a serious geek: Everything is written in Rust, and you'll need to compile it from source.

To get the code, see the GitHub site for substudy. Here are the currently-supported features:

Quote:
Subtitle processing tools for students of foreign languages

Usage:
substudy clean <subtitles>
substudy combine <foreign-subtitles> <native-subtitles>
substudy --help

For now, all subtitles must be in *.srt format and encoded as UTF-8.

When combining subtitle files, substudy also adjusts the timing so you have a bit more time to read and think about the bilingual subtitles whenever possible.
2 persons have voted this message useful



Stefan
Diglot
Senior Member
Sweden
stefannilsson.cRegistered users can see my Skype Name
Joined 2727 days ago

22 posts - 29 votes
Speaks: Swedish*, EnglishC1
Studies: German

 
 Message 68 of 147
17 November 2014 at 6:23pm | IP Logged 
I created decks for "Die Welle" and "Spirited Away" in German earlier today. Took me a
few hours but now I have 2134 cards (1404 + 730) which will keep me occupied for a
while.

Unfortunately I didn't look close at the preview so after I imported the first deck into Anki I
discovered that the OCR gave spaces before . such as " ." and " !" in almost every
sentence. It would've taken me forever to fix in Anki due to every card having 6 sentences
(before, target and after for both languages). So I opened the .srt files and did a quick
search and replace (" !" -> "!") before creating a new deck.

Now I need to figure out the best method to study the cards. 20 cards a day (my usual
schedule) would take 70 days just to go through "Die Welle". On the other hand, if I do too
many at once, I'm sure it will become overwhelming with all the reviews.
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 3932 days ago

2615 posts - 8805 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 69 of 147
17 November 2014 at 8:04pm | IP Logged 
Stefan wrote:
Now I need to figure out the best method to study the cards. 20 cards a day (my usual
schedule) would take 70 days just to go through "Die Welle". On the other hand, if I do too
many at once, I'm sure it will become overwhelming with all the reviews.

Forgive me if I get philosophical for a minute. :-) This answer is probably a bit more detailed than you need, but it's for everybody who is reading along, and some folks have less Anki experience than you do.

TL;DR: Ruthless deletion and enthusiastic use of the "Easy" button.

Way back in the mists of time, I treated my Anki decks as big lists of things I was obliged to learn. If it was in my deck, I had to repeat it until I got it right. This led to marathon 40-minute SRS sessions of the purest misery, because my decks were full of crap.

Two things helped change how I thought about SRS.

1. Khatzumoto wrote an article about deletion.

Quote:
Even with a large backlog, a living deck is a healthy deck. A deck that experiences turnover is a healthy deck.

Deletion is the best kind of turnover.
Doing reps is the second best.
Deleting while you do reps gives you the best of both worlds. 一石二鳥 (one stone, two bird), if you will.

An SRS deck that doesn’t get cards deleted is like a house that doesn’t get the trash taken out. It doesn’t matter how nice the furniture is — how nice the stuff you add is. Sooner or later, if you don’t take out the trash, the trash takes over.

These are words of wisdom. Khatzumoto also sells an MCD Kit which gives the same advice in even stronger terms.

2. I generated some huge subs2srs decks.

The thing about subs2srs is there are always more cards. Hundreds more. Thousands more. Once you figure out how to generate them, you'll never run out of cards again. So you can abuse your deck any way you want, and nothing particularly bad will happen, because there's another thousand cards of native media just waiting to be learned.

So it's time to take control of your Anki experience. :-)

First step: Deletion!

1. Delete anything that's too hard.
2. Delete anything that's too easy.
3. Delete anything that's cut off, or which has a bad translation.
4. Delete anything which annoys you.
5. Delete anything which fails to make you smile.
6. Delete anything that makes you die a little bit inside.
7. Delete anything which teaches you nothing of interest.
8. Delete old cards that are no longer fun or worthwhile.
9. As Khatzumoto suggests, you can even delete randomly, or by accident. It's all good.

My initial deletion rate for Y Tu Mamá También was about 2 out every 3 cards, mostly because it was too hard. My deletion rate for Amélie a few years ago was even higher, thanks to cards that were too easy, too hard, not actually useful or just plain boring. My deletion rate for Avatar, however, is surprisingly low—only about 1 in 10 or so. I'll probably delete a bunch of these cards as they mature.

For efficient deletion, I configure AnkiDroid so that a swipe to the right suspends a card. Then I can bulk delete all the suspended cards using desktop Anki at some later date.

Here's what my breakdown looks like. Yellow is suspended, light green is less than a month old, and black is unlearned:



I typically delete plenty of cards on first sight. I delete other cards after a couple of learning cycles, before I ever pass them. I delete still more cards when they fail to stick after several days. And I delete mature cards ruthlessly—if I don't get a little smile out of reviewing them, they're generally gone.

If you can't describe a card as "fun, compelling i+1 input" at any point, you might as well just chuck it, and draw another card from the deck. If that one's no fun, draw again. :-)

Second step: The "Easy" button!

I use the "Easy" button a lot when reviewing subs2srs cards. Here's a breakdown. Blue is cards that I'm learning, and light green is cards less than a month old:



You can see some interesting details in the center column here:

1. The "1" button is the "fail" button. I only use it for a handful of cool cards that are worth the extra effort. Normally, I either suspend failures or lump them in with "2", below.

2. The "2" button is the "hard" button. I use it for youngish cards that need extra work and that I like enough to offer a second chance.

3. The "3" button is the "normal" button. I use this for cards that I understand with a bit of work.

4. The "4" button is the "easy" button. I use almost as much as "3", either for cards that require little work, or cards which I don't care much about.

If it takes me more than 0.5 seconds to choose a button, I just hit something at random. If I choose a button that's "too far right," what's the worst that will happen? Well, the next time I see that card, it will be too hard. So I'll just go ahead and delete it. No problem!

Also, YeNoS has been experimenting with greatly increasing Anki's standard intervals. I haven't done this yet, but it sounds very interesting.

The key insight: We're using Anki to cherry-pick cards that are fun and ready to be learnt, and slashing and burning our way through everything else.
2 persons have voted this message useful





emk
Diglot
Moderator
United States
Joined 3932 days ago

2615 posts - 8805 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 70 of 147
18 November 2014 at 1:03pm | IP Logged 
A big review day

There was a big spike in reviews today (just by chance), but things are going quite well indeed. As usual, watching Avatar with (or without) bilingual subtitles during the day gives me a huge boot.

In no particular order, here are some cards:



- Left: An short, easy card with an excellent example of aún "yet".
- Right: Tricky audio (his tongue is frozen to the staff), but comprehensible with several repetitions. Also, the third-person ven for "you see"



- Left: I'm about to start throwing out more cards for being too easy. Yay!
- Right: It's a bit mind-blowing that I can understand cards like this without the subtitles.



- Left: "No, it's not like that." Understood without subtitles.
- Right: "I was smiling?" The past progressive is starting to click, but I don't yet have a good idea of when this form is preferred over the imperfect.



- Left: Only about 3 of today's new cards were particularly challenging; this was one of them.
- Right: Understood it on the first listen without subtitles.



- I'm not sure what's going on with viste here; it's not a verb form that I would predict. A typo? Some weird idiomatic thing used in certain expressions? I'll keep my eyes open.

More and more, I can figure out what a new card is saying by listening to it a couple of times, and maybe looking up a key word on the back of the card. Today I probably used "Easy" for the majority of new cards. But when reviewing old cards, I find myself choosing "Normal" because I'm noticing new details I couldn't see before, and I want to see them again reasonably soon!

I really feel like things are starting to come together. Spanish verbs are actually starting to feel more natural, and I'm recognizing more patterns. This is really an exceptionally agreeable and efficient alternative to Assimil, though the underlying theory is pretty much the same.

Of course, I do have good days and bad days. Without an "official" road map, it's sometimes easy to fear that things just won't come together in a timely fashion. But then a day or two later, everything is going wonderfully.
1 person has voted this message useful



rdearman
Senior Member
United Kingdom
rdearman.orgRegistered users can see my Skype Name
Joined 3636 days ago

881 posts - 1812 votes 
Speaks: English*
Studies: Italian, French, Mandarin

 
 Message 71 of 147
18 November 2014 at 1:20pm | IP Logged 
When are you going to change your anki deck preference to Spanish not French? :)
2 persons have voted this message useful



lorinth
Tetraglot
Senior Member
Belgium
Joined 2674 days ago

443 posts - 581 votes 
Speaks: French*, English, Spanish, Latin
Studies: Mandarin, Finnish

 
 Message 72 of 147
18 November 2014 at 2:23pm | IP Logged 
emk, "viste" is the 2nd person singular of "ver" in the past tense: you saw.

While I'm here, thanks for inspiring me to join the sub2srs bandwagon :) As I feel my biggest
problem is listening comprehension (in Mandarin), I have been using monolingual Mandarin
subtitles: in many cases, revealing the written version is enough to make me understand what
is being said. When it's not, I check the meaning with an external program. In my opinion,
it's easier that way than to try to coax two srt files into aligning properly.


1 person has voted this message useful



This discussion contains 147 messages over 19 pages: << Prev 1 2 3 4 5 6 7 810 11 12 13 14 15 16 17 18 19  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 1.5635 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2020 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.