Sep 23 2008

First public-y latex pdf book

Charlene @ 1:13 pm

So, the Arabian Nights is still a work in progress as it’s a pretty technically complicated first project.  I whipped through this Collected Stories in about 24 hours in comparison, and I believe there are a lot fewer text-transcription errors in it too.  Still, it could do with a bit more formatting around the in-text “evidence” bits he uses, such as letters, interviews, etc.  Regardless, here’s the Lovecraft’s Collected Works as well as the LaTeX package with the useful bits if you want to edit and recompile.  If anyone actually felt like submitting corrections, I’m happy to take em, as I’ve found it’s hard to enjoy a book while wearing a serious editor hat.  Especially for a book I prolly won’t read at night as it’s scary…

Props to Project Gutenberg for the initial digitization, too.

Tags:


Aug 19 2008

Project Gutenberg continued…

Charlene @ 3:31 pm

After hand-formatting the first 50 pages or so of the Project Gutenberg version of “1001 Nights” I came across this neat project called Gutenmark that takes the plaintext and converts it into either html or LaTex-formatted documents for easier reading.  I’m downloading LaTex now to edit said document, as that’s more more print-style layout and then I can export to PDF where I can read it happily.  He has an awesome reformatting of Alice’s Adventures in Wonderland where he reformatted and re-inserted the original (public domain) illustrations (other texts he’s reformatted are here).

Here’s the comparison, with the top being the modified version in Sumatra PDF and the bottom being being the plaintext viewed in Firefox (click it to enlarge):

I was talking (complaining) to Bonnie about this - and about the essence of the Gutenberg project, which seems to be preservation of the written product with ultimate forward compatibility, hence plaintext.  However, the text alone isn’t the true product - the layout, the formatting, the illustrations is the true product that should be preserved.  You lose so much context and enjoyment if you can even get yourself through a plaintext version of the book.  Layouts are designed for humans, while I think the plaintext was designed for machines. Accessibility, at least in public health-land, can be described as “the right services for the right people at the right time.”  I think that perhaps for Project Gutenberg, accessibility’s right time is the future and right people are computers.  I mean, it’s cool that they started this in the late 70’s with hand-transcribing texts(!) on mainframes, but the average person isn’t going to really enjoy these materials - they’ll check out Google Books and book scans, which I beef about further down…

That being said, as this is for humans to read, why doesn’t Project Gutenberg also create a nicely-formatted PDF version for download?  They already support a more readable format for pocketpc-like devices, and they’ve sort of started this by having html versions, but in a pdf reader, where you can set it up to view facing pages like a real book, only a PDF will really do.  It could be a final step in their review project they do with Distributed Proofreaders.  Plus, it’s a great opportunity to overshadow the book-scanning projects of Google Books and what-not.  The book scans aren’t “clean” for individual reading (both in font crispness and general page quality), though I think they have their place in a very purist preservation sense.  These newly digitized and proofed copies give you an electronic basis for producing a pdf, and are much easier on the eyes - that’s why, I suppose, when I get a e-book copy from a publisher it’s not a scanned copy of the printed book!  Plus adding in the original (if publicly available) illustrations would give some new life to these older books and increase readership.  And the Gutenmark program usage is truly painless - it took about 10 seconds to do the 1001 nights first volume, which is about 600 pages A5.

The goal to me is this intermediate point on the continuum of fully digital (plaintext) and fully analog (book scans): human readbility and appeal.  Throw in “now” and you have my take on what the accessibility should be.  We want these books read, right?

Tags: , ,


Aug 11 2008

e-book love

Charlene @ 9:10 pm

While I love Project Gutenberg, I find its search methods and output (and download) incredibly frustrating.  It’s a great resource for me here - free books! small size! But the absolute lack of a rational browsing method or collection-style download kils me.  For example, I wrote a quick email to Melissa with a Cthulu reference, which led me to read about Cthulhu and Lovecraft in Wikipedia, then magically the Arabian Nights.  The Arabian Nights is indeed in Project Gutenberg, but as individual 800k plain-text volumes - and when you search for “Arabian Nights” on their site you get something like this (a screenshot cuz you can’t even link to search results…):

…so apparently it’s multiple volumes…? Assuming “Arabian Nights Entertainments” is what you’re looking for? (Which, btw, it isn’t - half a screen down is apparently the more definitive 10-volume translation per wikipedia).  Then you decide to download one of those volumes, and you get something like this:

…so, what’s the difference between all the versions? Why should I care?  I just want to read the damn book…and then I definitely would of course prefer a courier serifed font.  Bleh.

And I know keeping it completely plaintext makes lots of sense from a forward-compatibility standpoint, or file size, or something…but…having just read one of those free PDF-form Tor e-books - with full-color maps, page layout, and everything - I am hopelessly spoiled and saddened.

Perhaps, if I ever get a free moment, it would be fun to make a “real” e-book for the Arabian Nights - layout, illustrations/photos-of-dramatic-re-enactments, etc.  I suspect from a copyright standpoint it’s ok - esp. if for only personal use - as the source material is freely available.  After the traumatic Peace Corps Monoglia Cookbook 2008 layout experience, I’m ready for another masochistic go-round.  And you can get one of those long-tail publishers like Blurb to print it all nice-like, I bet, too.

Sigh. When I’m free. Unless other people would like to assist in personally re-publishing an out-of-copyright book? Of any kind?

Tags: , , , , ,