Register  Login  Active Topics  Maps  

Spanish: A little subs2srs experiment

 Language Learning Forum : Language Learning Log Post Reply
147 messages over 19 pages: << Previous 1 2 3 4 5 6 7 ... 17 ... 18 19 Next >>
Crush
Tetraglot
Senior Member
ChinaRegistered users can see my Skype Name
Joined 5625 days ago

1622 posts - 2299 votes 
Speaks: English*, Spanish, Mandarin, Esperanto
Studies: Basque

 
 Message 129 of 147
23 December 2014 at 2:23pm | IP Logged 
Honestly i think hunalign (or LF Aligner, andras_farkas' tool based off of hunalign) gives pretty decent results, especially for languages pairs like English and Spanish. And that takes maybe a minute at most. andras_farkas also wrote a tool called Ebookmaker (Windows only) which requires Calibre and can be used to make interlaced parallel texts, which i think is more suitable for the smaller screens on phones/tablets/e-book readers.
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5292 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 130 of 147
23 December 2014 at 5:54pm | IP Logged 
Ah, Crush, thank you. LF Aligner works very nicely here. So it's time for another tutorial, but be warned everybody—this tutorial requires more geek skills than the subs2srs tutorials. You'll need to know how to run Ruby scripts, and how to work with text and CSV files.

Preparing the text for LF Aligner

First, you'll need a DRM-free ebook in both your source and your target language. You might want to try the Gutenberg Project for older ebooks, or Pottermore for Harry Potter. You'll need to convert your ebook to a text file using the UTF-8 encoding. If you have an epub or a mobi file, you can convert it to text using Calibre, an open source ebook manager.

Once your books are in text format, you'll need to clean them up. This involves three steps:

1. Delete any front-matter or back-matter that isn't part of the actual book. You want plain parallel text.
2. Put the title on the first line of the book.
3. Format the chapter titles in each language as "1. Name of chapter".

Step (1) is so you don't confuse LF Aligner. Steps (2) and (3) will help my dodgy Ruby script generate a table of contents.

Aligning the text

You want your input in UTF-8 text files:



Specify the two languages you're going to use:



Select your cleaned up input files:



Hope that the segmentation process actually works:



Decide what to do with the output. If you just want to look at it, choose Use the graphical editor. If you want to actually export it, select Generate an xls and open it for reviewing.



If you chose XLS, open it up in Excel or Open Office, and remove the header and the useless third column, then save it as a CSV file. If you chose the internal viewer, have fun looking at your aligned text:



Converting from CSV to PBO

To do this, you'll need a Ruby interpreter installed. My script is available on GitHub, with instructions. Yes, PBO is only pretending to be XML—it actually barfs unless the input is exactly the way it wants it.

Feel free to open up your PBO in the desktop version of Aglona Reader.

Try reading in Aglona Reader on an Android tablet

You can get the app in the Google store. It seems to work well:



To scroll, click on the grayed-out text at the very bottom of the page.

You'll definitely find some alignment errors. But it looks pretty usable to me, especially on a 7" tablet. For the audio, Via Diva recommends Smart AudioBook Player, which I'll look at over the holidays.
7 persons have voted this message useful





emk
Diglot
Moderator
United States
Joined 5292 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 131 of 147
24 December 2014 at 4:50am | IP Logged 
OK, I've launched two mini-experiments in the last couple of days, so it's time for a preliminary report.

Experiment 1: Subs2srs cards with ~10 seconds of audio. So far, so good. I'm not forgetting the cards overnight, or anything like that. But actually understanding these cards is a fascinating challenge. I need to "blank" my mind, and completely resist any urge to translate or remember or explain. The only way to keep up is to understand the dialog as it arrives, without even using echoic memory. This feels like it might be a very useful mental exercise.

Experiment 2: Listening/Reading. I just went through chapter 1 of Harry Potter y la piedra filosofal using Aglona Reader and Smart AudioBook Player. It takes a bit of practice to scroll, pause and resume, but after fooling around for a few minutes, I had no problems.

Listening/Reading is a pretty hardcore listening exercise! I get hit hard and fast with native content, and I need to work very hard to keep up. Of course, I wind up losing a huge amount. Interestingly, I've found that I use the English text more than the Spanish text. But my eyes dart around quickly nonetheless. It's a very active experience.

Training listening hard. As you can see, I'm doing a lot of hard-core listening work very early on: Long-format subs2srs cards, trying to follow along with an audiobook, etc. I really like this early and aggressive focus on listening. It's one of the hardest skills, but also one of the most useful, I think. It can even replace speech skills to a certain extent, in my experience—people are quite happy to speak French to me as long as I understand what they're saying, even if my responses are short and simple. I want to see how far I can take this listening-heavy approach.

Edited by emk on 24 December 2014 at 4:50am

1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5292 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 132 of 147
24 December 2014 at 1:02pm | IP Logged 
Unfortunately, with my new, longer cards, I can't always show the English translation and the picture. But the picture is more fun, so I'll show that. :-)



- Left: Deep breath. Deep breath. Empty my mind. Live in the moment. I can do this.
- Right: Another card where I have no choice but to understand in real time, in Spanish.

Cards like the two above definitely represent a new and useful challenge, and I suspect they're going to help.



- Left: GOT IT ON THE FIRST TRY WITHOUT SUBTITLES!
- Right: OK, this one is just fun. And a veces "sometimes" is pretty useful.



- Left: You can see some alignment errors at the bottom of the page, but they're not a huge problem.
- Right: I finished chapter 1 last night!

I'm definitely glad I've changed things around a bit. Originally, the basic subs2srs cards were a real challenge, but during the last couple of weeks, it felt like I was just piling up more of the same. Adding the long-format cards and some L/R makes me feel like I'm stretching my brain again.
1 person has voted this message useful



Tupiniquim
Senior Member
Brazil
Joined 5843 days ago

184 posts - 217 votes 
Speaks: Portuguese*
Studies: English, Russian

 
 Message 133 of 147
24 December 2014 at 2:30pm | IP Logged 
Hi emk, I have a question about your subs2srs routine.

What do you do when a card shows L1 and L2 sentences that match in context, but not without it? As a simple example, imagine that the L1 sentence is "Thank you for the gift!" and the L2 is "Thank you for the car!". Do make a note on the side of the L1 sentence to remind you that the L2 words mean something different, or do you simply try to recall it without any editing? Or maybe do you trash/suspend it?
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5292 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 134 of 147
25 December 2014 at 7:03pm | IP Logged 
Tupiniquim wrote:
What do you do when a card shows L1 and L2 sentences that match in context, but not without it? As a simple example, imagine that the L1 sentence is "Thank you for the gift!" and the L2 is "Thank you for the car!". Do make a note on the side of the L1 sentence to remind you that the L2 words mean something different, or do you simply try to recall it without any editing? Or maybe do you trash/suspend it?

Excellent question. Basically, I do whatever is easiest, and whatever avoids wasting my time on dodgy cards. See the example below on the left:



- Left: The Spanish actually says "What type of porquería do you believe we'll accept?" The English says, "What kind of slum do you think this is?" This is clear enough from context, so I kept the card and didn't bother with a note.

- Right: Another nice long card. It looks like jamás has the same doubled meaning of "ever/never" that jamais has in French.

A gear recommendation

I purchased these headphones a while back for Skyping with business clients, and they've been a huge help for subs2srs and L/R:



These are Logitech H800 wireless headphones. They're a pretty high-end model, and not the cheapest, but they work better than what I had before. These headphones work over Bluetooth and USB, and they also have a boom microphone. The sound quality is good, and they fit a wide range of heads. There's even a poorly-documented trick which allows you to switch them between multiple Bluetooth devices (hold down "volume up" and "skip back" at the same time, then connect to them from the phone or tablet you want to use).

If your phone has poor speakers, or if you want to do SRS reviews in bed at 5am without waking anybody up, they're an excellent choice (if rather on the pricey side).

Harry Potter, chapter 2

The setup with Smart AudioBook Player and Aglona Reader is working quite nicely:



I'm still trying to figure out the best way to use the Spanish text and the English text as I listen. If there's more background noise, I end up relying more on the Spanish text. As I mentioned before, L/R is a very active process—it requires paying close attention and trying to figure out as much as possible.

Grammar study! (1 hour)

While travelling for the holiday, I spent an hour reading through Essential Spanish Grammar, which costs a few dollars and provides a 50-page summary of Spanish grammar (with small pages and lots of whitespace). I really love these books—they summarize pretty much everything you need to know for B1, and most of what you need for B2, and you can read through them in a couple of hours. You could easily find much more complete grammars, but that's not what I need at my current level.

Anyway, I like to read these books after I've been exposed to the language. This means that much of the material will already be familiar. And reading through this book, I was impressed at how much Spanish I already knew:

1. I could understand the vocabulary in over half the example sentences.
2. When I read about things like ¿no es verdad?, I could actually hear Sokka's voice in my head.
3. I'd already seen examples for a large fraction of the grammatical points in the book.

I also picked up some nice details:

1. How Spanish object pronouns work.
2. How Spanish distinguishes this and that (it seems simpler than the French system).
3. What aquel and aquella are doing.
4. The weird, fake "reflexive" in Se lo dice "He tells it to him", which is actually just an indirect object.

And now I can keep my eyes open for more examples of what I've just learned, and I'll be able to make more effective use of my input. I like working this way—a lot of input, a bit of grammar study, more input, and so. Grammar without input always feels like a nasty slog. But even a few hours of grammar here and there allow me to get more out of my input.

Anyway, I'm going to try to get in another chapter of Harry Potter now. :-)
2 persons have voted this message useful



eyðimörk
Triglot
Senior Member
France
goo.gl/aT4FY7
Joined 3859 days ago

490 posts - 1158 votes 
Speaks: Swedish*, English, French
Studies: Breton, Italian

 
 Message 135 of 147
25 December 2014 at 8:28pm | IP Logged 
I really like the way your L/R set-up looks! I almost want a smartphone or tablet just to be able to replicate it.
1 person has voted this message useful



sfuqua
Triglot
Senior Member
United States
Joined 4525 days ago

581 posts - 977 votes 
Speaks: English*, Hawaiian, Tagalog
Studies: Spanish

 
 Message 136 of 147
25 December 2014 at 10:36pm | IP Logged 
I've always tended to spend too much time setting up parallel texts for L-R, which is why I settled on my usual method of hunalign if I've got both texts, and google translate (only for English of course) when I only have the Spanish. I have a file of regular expressions that I can quickly copy into notepad++ to slice and dice the Spanish text into pieces that make it easy to run it through google translate. I usually make a bunch of anki cards out of sentences from any book I'm going to L-R, even through I usually delete a bunch

If I am not careful, I spend way too much time "getting ready to study" and not enough time studying. I also do something that is anathema to many people, I use IVONA tts voices if I can't get a decent audiobook. I just don't find high quality tts voices that annoying. I don't hesitate to slow down the audio if I feel like I'm getting swamped at "regular speed." There are many texts that I find difficult at 160-180 words a minute, which are trivial at 110 wpm. I think slowing the text is just another way of "cheating" to understand, and later it makes it easy to follow at full speed.

I absolutely love L-R, anki, and shadowing, and I plan to spend a year doing a lot of it starting in a few months. I've recently reached the point where I can slowly read Gabriel Garcia Marquez, and easier material is becoming very easy, so L-R is pretty easy compared to where I was a year ago.
I think your progress using L-R is going to be very fast :)

Edited by sfuqua on 25 December 2014 at 10:44pm



1 person has voted this message useful



This discussion contains 147 messages over 19 pages: << Prev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 1.1875 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.