tommus Senior Member CanadaRegistered users can see my Skype Name Joined 5625 days ago 979 posts - 1688 votes Speaks: English* Studies: Dutch, French, Esperanto, German, Spanish
| Message 1 of 12 14 June 2015 at 4:21pm | IP Logged |
After years of struggle to capture and download closed captions, I have discovered an
excellent new method for CC download from online Flash videos. So far I have only used
it on Dutch sources but it should work for similar online Flash videos in other
languages.
So far, I have only managed to use it in Firefox on Windows 7, and could not get it to
work on Chrome. This is a work in progress but I thought others would want to try this
new method right away.
1. Install the Adblock Plus extension to Firefox. (not Adblock, but Adblock Plus).
2. Open a video such as the NOS Journaal news (or other Dutch CC programs on
"http://www.uitzendinggemist.nl").
NOS Journaal
3. The red Adblock Plus (ABP) icon will appear on the Firefox extensions taskbar.
4. Click the down arrow on the ABP icon.
5. Click on "Open blockable items".
6. The URL of the closed captions will be one of these many blockable items. For these
Dutch videos, the URL will look like this: http://e.omroep.nl/tt888/POW_00942224. Note
the "tt888" part which means closed captions. On other sites, you will have to search
the blockable items for the closed caption URL.
7. Right click on that URL and select "Open in New Tab", and then "Save File".
8. Add .txt or .srt to the filename.
9. It downloads the entire closed captions file as a .srt text file.
10. I then strip out the times, etc from the .srt file to make it a plain text file. I
also save a copy as an HTML file so I can use pop-up dictionaries.
That's it! Maybe it looks complicated but it is actually very simple.
I'd be interested to hear of successes with this technique in other browsers and on
other language sites.
7 persons have voted this message useful
|
dhoeffer Pentaglot Newbie Netherlands enoent.org Joined 3232 days ago 14 posts - 23 votes Speaks: Italian, German*, English, Spanish, Dutch
| Message 2 of 12 14 June 2015 at 6:22pm | IP Logged |
If you don't want to use Adblock, you can also just substitute the URL, like from
http://www.npo.nl/nos-journaal/13-06-2015/POW_00940136 to
http://e.omroep.nl/tt888/POW_00940136,
to get the subtitles (i.e. just keep the POW_* part).
I'm curious though, what do you do with these?
1 person has voted this message useful
|
tommus Senior Member CanadaRegistered users can see my Skype Name Joined 5625 days ago 979 posts - 1688 votes Speaks: English* Studies: Dutch, French, Esperanto, German, Spanish
| Message 3 of 12 14 June 2015 at 8:49pm | IP Logged |
dhoeffer wrote:
just keep the POW_* part). |
|
|
Thanks. Good suggestion, now that I know the format.
Quote:
I'm curious though, what do you do with these? |
|
|
I read along with the text as I listen to/watch the video to improve my listening
comprehension and enlarge my vocabulary. Just looking at the closed captions in the video
is too frustrating, mostly because they lag behind the video which makes them mostly
useless and distracting. With the text, you can perfectly match the speed. As well, the
collection of closed captions provides a great corpus of spoken Dutch, not just for the news
but for the various other captioned programs and series.
1 person has voted this message useful
|
chaotic_thought Diglot Senior Member United States Joined 3301 days ago 129 posts - 274 votes Speaks: English*, German Studies: Dutch, French
| Message 4 of 12 14 June 2015 at 11:17pm | IP Logged |
I also prefer a text transcript over subtitles. If the subtitles are in SRT format, they are easy to convert into a text document which I can read, take notes in, highlight phrases, etc.
1 person has voted this message useful
|
menez93 Pro Member Italy Joined 4209 days ago 10 posts - 10 votes Speaks: Italian* Personal Language Map
| Message 5 of 12 14 June 2015 at 11:22pm | IP Logged |
this is great, really great. The only thing that I don't know is when you say "strip out
the times, etc from the .srt to make it a plain text file". How can you do that? I have a
messy .srt file with the xml code. I want to get rid of it and have just the words and
the times. Exactly like a normal .srt file taken from opensubtitles. How can you do that?
1 person has voted this message useful
|
tommus Senior Member CanadaRegistered users can see my Skype Name Joined 5625 days ago 979 posts - 1688 votes Speaks: English* Studies: Dutch, French, Esperanto, German, Spanish
| Message 6 of 12 15 June 2015 at 12:00am | IP Logged |
You can easily remove the unwanted lines in the .srt text file with a text editor that has
a "macro" capability. Macros allow you to record keystrokes and play them back. For
example, in this bit of .srt text:
------------
7
00:01:25.003 --> 00:01:27.023
Goedenavond, dames en heren.
8
00:01:27.023 --> 00:01:30.017
Welkom bij deze speciale aflevering..
9
00:01:30.017 --> 00:01:33.012
Een uitzending die ......
----------
Experiment a bit first. Then put your cursor in the first blank line (above the 7). Start
the macro recording. Press the key combination that would Delete to End of Line. Then pus
the down arrow to go to the next line. Another Delete to EOL. Another arrow down. Then 3rd
Delete to EOL. Then two down arrows to get to the next blank line. Stop the recorder.
With the cursor on a blank line, press and hold down the button to Play Recording. Watch
the unwanted lines disappear. Save the file.
Details will vary from editor to editor on what key strokes to use and on how to record and
play macros, but they are quite similar. Saves a whole lot of editing time.
Check out this Wiki review of text editors which lists those that support macros (about
half-way down the article under Extra features).
oftext_editors">https://en.wikipedia.org/wiki/Comparisonoftext_editors
1 person has voted this message useful
|
mrwarper Diglot Winner TAC 2012 Senior Member Spain forum_posts.asp?TID=Registered users can see my Skype Name Joined 4985 days ago 1493 posts - 2500 votes Speaks: Spanish*, EnglishC2 Studies: German, Russian, Japanese
| Message 7 of 12 15 June 2015 at 1:00am | IP Logged |
Some time ago, I used SRT subtitles for my classes and my own studies, so I started coding a small on-line utility to skip some repetitive work on the SRT files.
Currently, it can alter the timestamps of subtitles, and, more to the point, create human-readable transcripts as plain text, html and PDF. It can also do some crude vocabulary compilation, which will get links to external dictionaries from within the PDF or HTML transcripts.
If you think you might have some use for it, you can have a look at it here.
Just my $0.02.
Edited by mrwarper on 15 June 2015 at 10:25am
4 persons have voted this message useful
|
tommus Senior Member CanadaRegistered users can see my Skype Name Joined 5625 days ago 979 posts - 1688 votes Speaks: English* Studies: Dutch, French, Esperanto, German, Spanish
| Message 8 of 12 15 June 2015 at 2:19am | IP Logged |
mrwarper wrote:
so I started coding a small on-line utility |
|
|
Wow! That is quite nice. I especially like the idea of the personal glossary.
I tried a file and it works great. Thanks.
1 person has voted this message useful
|