Register  Login  Active Topics  Maps  

Breakthrough: Closed captions capture

  Tags: Subtitles
 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
12 messages over 2 pages: 1 2  Next >>
tommus
Senior Member
CanadaRegistered users can see my Skype Name
Joined 5625 days ago

979 posts - 1688 votes 
Speaks: English*
Studies: Dutch, French, Esperanto, German, Spanish

 
 Message 1 of 12
14 June 2015 at 4:21pm | IP Logged 
After years of struggle to capture and download closed captions, I have discovered an
excellent new method for CC download from online Flash videos. So far I have only used
it on Dutch sources but it should work for similar online Flash videos in other
languages.

So far, I have only managed to use it in Firefox on Windows 7, and could not get it to
work on Chrome. This is a work in progress but I thought others would want to try this
new method right away.

1. Install the Adblock Plus extension to Firefox. (not Adblock, but Adblock Plus).

2. Open a video such as the NOS Journaal news (or other Dutch CC programs on
"http://www.uitzendinggemist.nl").

NOS Journaal

3. The red Adblock Plus (ABP) icon will appear on the Firefox extensions taskbar.

4. Click the down arrow on the ABP icon.

5. Click on "Open blockable items".

6. The URL of the closed captions will be one of these many blockable items. For these
Dutch videos, the URL will look like this: http://e.omroep.nl/tt888/POW_00942224. Note
the "tt888" part which means closed captions. On other sites, you will have to search
the blockable items for the closed caption URL.

7. Right click on that URL and select "Open in New Tab", and then "Save File".

8. Add .txt or .srt to the filename.

9. It downloads the entire closed captions file as a .srt text file.

10. I then strip out the times, etc from the .srt file to make it a plain text file. I
also save a copy as an HTML file so I can use pop-up dictionaries.

That's it! Maybe it looks complicated but it is actually very simple.

I'd be interested to hear of successes with this technique in other browsers and on
other language sites.

7 persons have voted this message useful



dhoeffer
Pentaglot
Newbie
Netherlands
enoent.org
Joined 3232 days ago

14 posts - 23 votes
Speaks: Italian, German*, English, Spanish, Dutch

 
 Message 2 of 12
14 June 2015 at 6:22pm | IP Logged 
If you don't want to use Adblock, you can also just substitute the URL, like from
http://www.npo.nl/nos-journaal/13-06-2015/POW_00940136 to
http://e.omroep.nl/tt888/POW_00940136,
to get the subtitles (i.e. just keep the POW_* part).

I'm curious though, what do you do with these?
1 person has voted this message useful



tommus
Senior Member
CanadaRegistered users can see my Skype Name
Joined 5625 days ago

979 posts - 1688 votes 
Speaks: English*
Studies: Dutch, French, Esperanto, German, Spanish

 
 Message 3 of 12
14 June 2015 at 8:49pm | IP Logged 
dhoeffer wrote:
just keep the POW_* part).

Thanks. Good suggestion, now that I know the format.

Quote:
I'm curious though, what do you do with these?

I read along with the text as I listen to/watch the video to improve my listening
comprehension and enlarge my vocabulary. Just looking at the closed captions in the video
is too frustrating, mostly because they lag behind the video which makes them mostly
useless and distracting. With the text, you can perfectly match the speed. As well, the
collection of closed captions provides a great corpus of spoken Dutch, not just for the news
but for the various other captioned programs and series.
1 person has voted this message useful



chaotic_thought
Diglot
Senior Member
United States
Joined 3301 days ago

129 posts - 274 votes 
Speaks: English*, German
Studies: Dutch, French

 
 Message 4 of 12
14 June 2015 at 11:17pm | IP Logged 
I also prefer a text transcript over subtitles. If the subtitles are in SRT format, they are easy to convert into a text document which I can read, take notes in, highlight phrases, etc.

1 person has voted this message useful



menez93
Pro Member
Italy
Joined 4209 days ago

10 posts - 10 votes
Speaks: Italian*
Personal Language Map

 
 Message 5 of 12
14 June 2015 at 11:22pm | IP Logged 
this is great, really great. The only thing that I don't know is when you say "strip out
the times, etc from the .srt to make it a plain text file". How can you do that? I have a
messy .srt file with the xml code. I want to get rid of it and have just the words and
the times. Exactly like a normal .srt file taken from opensubtitles. How can you do that?
1 person has voted this message useful



tommus
Senior Member
CanadaRegistered users can see my Skype Name
Joined 5625 days ago

979 posts - 1688 votes 
Speaks: English*
Studies: Dutch, French, Esperanto, German, Spanish

 
 Message 6 of 12
15 June 2015 at 12:00am | IP Logged 
Quote:
How can you do that?

You can easily remove the unwanted lines in the .srt text file with a text editor that has
a "macro" capability. Macros allow you to record keystrokes and play them back. For
example, in this bit of .srt text:

------------

7
00:01:25.003 --> 00:01:27.023
Goedenavond, dames en heren.

8
00:01:27.023 --> 00:01:30.017
Welkom bij deze speciale aflevering..

9
00:01:30.017 --> 00:01:33.012
Een uitzending die ......
----------

Experiment a bit first. Then put your cursor in the first blank line (above the 7). Start
the macro recording. Press the key combination that would Delete to End of Line. Then pus
the down arrow to go to the next line. Another Delete to EOL. Another arrow down. Then 3rd
Delete to EOL. Then two down arrows to get to the next blank line. Stop the recorder.

With the cursor on a blank line, press and hold down the button to Play Recording. Watch
the unwanted lines disappear. Save the file.

Details will vary from editor to editor on what key strokes to use and on how to record and
play macros, but they are quite similar. Saves a whole lot of editing time.

Check out this Wiki review of text editors which lists those that support macros (about
half-way down the article under Extra features).

oftext_editors">https://en.wikipedia.org/wiki/Comparisonoftext_editors

1 person has voted this message useful



mrwarper
Diglot
Winner TAC 2012
Senior Member
Spain
forum_posts.asp?TID=Registered users can see my Skype Name
Joined 4985 days ago

1493 posts - 2500 votes 
Speaks: Spanish*, EnglishC2
Studies: German, Russian, Japanese

 
 Message 7 of 12
15 June 2015 at 1:00am | IP Logged 
Some time ago, I used SRT subtitles for my classes and my own studies, so I started coding a small on-line utility to skip some repetitive work on the SRT files.

Currently, it can alter the timestamps of subtitles, and, more to the point, create human-readable transcripts as plain text, html and PDF. It can also do some crude vocabulary compilation, which will get links to external dictionaries from within the PDF or HTML transcripts.

If you think you might have some use for it, you can have a look at it here.

Just my $0.02.

Edited by mrwarper on 15 June 2015 at 10:25am

4 persons have voted this message useful



tommus
Senior Member
CanadaRegistered users can see my Skype Name
Joined 5625 days ago

979 posts - 1688 votes 
Speaks: English*
Studies: Dutch, French, Esperanto, German, Spanish

 
 Message 8 of 12
15 June 2015 at 2:19am | IP Logged 
mrwarper wrote:
so I started coding a small on-line utility

Wow! That is quite nice. I especially like the idea of the personal glossary.
I tried a file and it works great. Thanks.


1 person has voted this message useful



This discussion contains 12 messages over 2 pages: 2  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.3750 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.