Register  Login  Active Topics  Maps  

Breakthrough: Closed captions capture

  Tags: Subtitles
 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
12 messages over 2 pages: 1
Doitsujin
Diglot
Senior Member
Germany
Joined 5079 days ago

1256 posts - 2363 votes 
Speaks: German*, English

 
 Message 9 of 12
15 June 2015 at 9:51am | IP Logged 
tommus wrote:
You can easily remove the unwanted lines in the .srt text file with a text editor that has a "macro" capability.

You don't need a macro-enabled editor, any text editor with regular expression support, for example Notepad++ will do.

If you have a Windows machine, you can use Notepad++ and the following regular expression to delete the time stamps. Simply search for:

Code:
\d+\r\n\d+:\d+:\d+\.\d+ --> \d+:\d+:\d+\.\d+

and replace all occurrences with nothing. (Linux and Mac users probably need to delete '\r'.)

\d+ = placeholder for one or more numbers
\r\n = carriage return + linefeed (Windows line-break sequence)
2 persons have voted this message useful



mrwarper
Diglot
Winner TAC 2012
Senior Member
Spain
forum_posts.asp?TID=Registered users can see my Skype Name
Joined 4985 days ago

1493 posts - 2500 votes 
Speaks: Spanish*, EnglishC2
Studies: German, Russian, Japanese

 
 Message 10 of 12
15 June 2015 at 11:05am | IP Logged 
Doitsujin wrote:
tommus wrote:
You can easily remove the unwanted lines in the .srt text file with a text editor that has a "macro" capability.

You don't need a macro-enabled editor, any text editor with regular expression support, for example Notepad++ will do.

You don't even need an 'editor' as such. Although I'd always advise to manually confirm any 'search and replace' operations, as long as your files are well-formatted, you could do all of that in a single, non-interactive pass using sed, the stream editor (available for all platforms since the dawn of time) or any similar tool.

tommus wrote:
mrwarper wrote:
so I started coding a small on-line utility

Wow! That is quite nice. I especially like the idea of the personal glossary.
I tried a file and it works great. Thanks.

Thank you :)

Development slowly stalled because it presently worked for me and my students lacked enthusiasm, so adding new vocabulary-related stuff was too much work (my own languages had either progressed beyond or stagnated below the point where this was most useful for me anyway), but I could always consider suggestions for improvement.

For example, I thought it would be nice to try and de-inflect words, and there's a pseudo-mechanism to do that for known words, but it's really crude right now (basically instead of one item per line, you put all of their forms in a single line, separated by tabs -- that, however, will simply skip words from being picked up, but different forms of any given new word will still be incorporated as new, different words).

The last thing I planned for real was batch processing of multiple files (thinking of series and such), so when compiling glossaries anything that appears in previous files would be ignored, etc. but I didn't implement that either.

Edited by mrwarper on 15 June 2015 at 11:15am

1 person has voted this message useful



Crush
Tetraglot
Senior Member
ChinaRegistered users can see my Skype Name
Joined 5624 days ago

1622 posts - 2299 votes 
Speaks: English*, Spanish, Mandarin, Esperanto
Studies: Basque

 
 Message 11 of 12
17 June 2015 at 6:53am | IP Logged 
mrwarper, that's a great tool, thanks for sharing it! De-inflecting words is something i've thought a lot of language learning platforms could benefit from, but i doubt it's something trivial to implement, though perhaps it'd be possible using the hunspell morphological dictionaries.
1 person has voted this message useful



sfuqua
Triglot
Senior Member
United States
Joined 4524 days ago

581 posts - 977 votes 
Speaks: English*, Hawaiian, Tagalog
Studies: Spanish

 
 Message 12 of 12
25 June 2015 at 3:03pm | IP Logged 
Using notepad++, you can just do a search and replace on:
^[0-9].*
and it will remove every line that starts with a number. You have to uncheck the "box"
matches newline at the bottom or you'll delete the whole file. Removing every line that
starts with a number takes care of .srt files usually.
It's just a crude version of the other tools.


1 person has voted this message useful



This discussion contains 12 messages over 2 pages: << Prev 1

If you wish to post a reply to this topic you must first login. If you are not already registered you must first register


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.2500 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.