i have finally saved it as .txt and have finished about 100 pages ;-) but since
the thread was taking on some interesting proportions ;
here's my $0.2 worth
a) i agree with bob on one count it is easier to paste from pdf for tables list
b) i used staroffice (windows) which had three options save it as .txt or
ms-dos txt or unix text i chooose the safest route of txt since i work both in
windows and linux also my java editor ( wonders of wonders works better on
linux) which has some nifty tree view and automatic refresh as i type in my
c) sebastian though had suggested as alternative to majjix but that dam'n thing
always gave java out of memory error and i didn't want to go back to disturb
mucho busy man , our own sebastian .
d) with the word document saved as text i have only one problem that all the
" ' " i.e. single qoutes or rather apostrophe get saved in probably a binary
format since it shows up as as tiny square (outline) . i used find and replace
function of my editor to get rid of it.
thank you all very much for this thread
On Wed, 02 Aug 2000, RoSmith wrote:
From: Bob McIlvride [mailto:rob...@cogent.ca]
Sebastian Rahtz wrote:
A better route is to print the documents out and have them
A _slightly_ better route is to convert them to PDF and select and paste
the text a paragraph at a time into your text editor, creating and/or
adding DocBook markup as you go.
I can't imagine that would be any easier than simply saving the file as
"Text Only" or "MS-DOS Text" and starting from there.
Another thought would be to Export the .doc file as HTML, then use HTML Tidy
to clean up the MSHTML and convert it to XHTML. I have never done this, but
it may work better than the "Text Only" solution.