nystagmatic Triglot Groupie Brazil Joined 4313 days ago 47 posts - 58 votes Speaks: Portuguese*, English, French Studies: German
| Message 1 of 5 21 October 2015 at 1:12am | IP Logged |
I've been doing this lately and it's been quite useful. Figured it might help others. The main point is to use regular expressions to turn an aligned .txt from lfaligner into a properly formatted .html file. I use Vim, and the commands are pretty crazy; I tried to make the instructions accessible to less tech-savvy people, but there's probably an easier way to do this, using software I don't know about. Feel free to share if you know of any.
First, create a parallel text with lfaligner. Align it as well as you can (or as well as you're willing to - perfect alignment isn't necessary, though it helps).
Then open the aligned .txt with GVim. Vim is a text editor for hardcore mofos such as you and I. You can drag and drop the file into the program window, or type:
:e C:\path-to-your\file.txt
Now copy the following line from here:
%s/\([^\t]*\)\t\([^\t] *\).*/<table><tr><td>\1<\/td><td& gt;\2<\/td><\/tr><\/table>
Then in Vim press the following keys:
q:"+p
And press Return. It should add a bunch of tags to your file. Now press i to type and append the following line to the top of your file:
<html><body>
And the following line to the bottom:
</body></html>
Press Esc, then press
:sav C:\path-to-your\file.html
And press Return. You now have an .html file which you can add to Calibre and send to your Kindle. :)
The point of having each row be a separate table, in case you're wondering, is that the Kindle gets really sluggy when dealing with large tables. So we use a bunch of tiny ones instead.
Enjoy!
Edited by nystagmatic on 21 October 2015 at 3:06am
3 persons have voted this message useful
|
andras_farkas Tetraglot Groupie Hungary Joined 4904 days ago 56 posts - 165 votes Speaks: Hungarian*, Spanish, English, Italian
| Message 2 of 5 22 January 2016 at 11:24am | IP Logged |
1. Most people don't use vim in 2016 so it would be much simpler to give the instructions
as a series of search and replace operations, although you'd still need a text editor that
supports regex*. BTW you surely meant <td>\2<\/td>, not <td& gt;\2<\/td>
2. I'm not sure if you know that I have a bilingual ebook generator program on my site at http://www.farkastranslations.com/bilingual_books.php
It uses an interlaced format as I felt it was more easy/reliable and tables don't work too well on 7" screens. Anyway, pls post an HTML and a screenshot of what a table like this
looks like on an actual kindle. If it looks usable I might add it as an output format - it
would be trivial to do. Does this require that you go through Amazon's email
syncing/conversion service or can the kindle read the html natively?
* Come to think of it, MS Word might do the job. Enable wildcards and replace \p with
[whatever needs to be at the end of the line]\p[whatever needs to be at the start of the
line]. Then replace tab as required, and fix the first and last row if needed.
Edited by andras_farkas on 22 January 2016 at 11:28am
3 persons have voted this message useful
|
nystagmatic Triglot Groupie Brazil Joined 4313 days ago 47 posts - 58 votes Speaks: Portuguese*, English, French Studies: German
| Message 3 of 5 18 February 2016 at 1:40pm | IP Logged |
(Everyone: see below for easier, corrected instructions.)
Hey Andras,
You're right - I didn't give it enough thought, and this is a bit more complicated than it had to be. :P The point is that every line has to begin with <table><tr> and end with </tr></table>, and each cell - so the stuff that in each line is separated by a tab - has to begin with <td> and end with </td>. So each line should look like this:
<table><tr> <td>FIRST LANGUAGE TEXT</td><td>SECOND LANGUAGE TEXT</td></tr></table>
You're right about the <, naturally. It looks like the forum messes it up for some reason unless I add a space between the tags. Here's the correct command for Vim (I can't edit my original post anymore):
%s/\([^\t]*\)\t\([^\t] *\).*/<table><tr><td>\1<\/td> <td><\/td><\/tr><\/table>
One weird thing I hadn't noticed, though, since I'd just been letting Calibre do all the heavy lifting for me, is that the Amazon service requires headers or metatags* - otherwise it just bounces back with an error message. So if you want to mail the raw .html to the Amazon address, you have to open it in Word, save it as .docx, then mail that one. This also produces a slightly different result on the Kindle: it shows lines dividing each cell and, most importantly, it doesn't require that any given cell be entirely contained in one screen (which means that, if you don't convert to .docx and send this way, cells which are too long get cropped off).
* I don't have time to test it right now to figure out what exactly is it that it wants, but it worked after I tried adding a doctype, a charset metatag, and html (with lang attribute) and body tags. I guess it must be the doctype.
I didn't know about your ebook generator, and I think this might be a good addition to it. I can't take pictures right now, but the result looks pretty good and has been extremely useful to me in language learning.
As for doing it with Word: after quite a bit of friendly banging my head against the wall, I figured out an efficient way. First, Catch #1, the file must end with a new line. So go to the very end and press Return. After you do that, just go on Search and Replace, enable Wildcards, and replace this:
(*)^t([!^t]{1,})*^13
With this (the spaces are just so that the HTLAL forum won't screw it up, they make no difference):
^l<table> <tr><td>\1</td><td>\2</td>< /tr> </table>^l
Then, Catch #2, you have to save the file as .txt (Unformatted Text) and rename it to .html, since just saving as .html will make Word treat the tags as text.
This only works for files which still have that third column from lfaligner, that small tag with the name of the file and the languages. If your files doesn't have it, use (*)^t(*)^13 instead.
Lastly, you don't need to add \<html> and \<body> tags at the beginning and end of the file if you're using Calibre or converting to .docx with Word. This also goes for Vim or any other method.
Cheers!
- Caio
1 person has voted this message useful
|
Rhian Moderator France Joined 6501 days ago 265 posts - 288 votes Speaks: English* Personal Language Map
| Message 4 of 5 18 February 2016 at 8:53pm | IP Logged |
You should consider posting this on on
www.forum.language-learners.org After
various technical problems here the aforementioned
site was set up and it's a lot more active, it
might be nice for more people to see your helpful
posts!
1 person has voted this message useful
|
andras_farkas Tetraglot Groupie Hungary Joined 4904 days ago 56 posts - 165 votes Speaks: Hungarian*, Spanish, English, Italian
| Message 5 of 5 25 February 2016 at 4:22pm | IP Logged |
I tried your method and all the html did was crash my kindle repeatedly. If you want to help
people use a method that you think is useful, post a sample HTML file and some instructions
that a human can follow. I have no idea what you mean by amazon needing meta tags and I'm
really confused by how saving an html file as docx would add the requisite tags. I also don't
know how you were using calibre for this method, which would perhaps be relevant as most
people who would want to do this would probably prefer using calibre to some manual hack.
1 person has voted this message useful
|