Hi,
Can anybody please help me.
I am trying to make some modules for TW. I have the documents (which are in the Public Domain), and I have the blessing of the owner of the web site that I got them from.
The problem I have is that the files are in PDF format. When I convert them to text there is a paragraph marker at the end of each line. So far the best solution I have come up with is to use find and replace to get rid of the paragraph markers. Then I end up with a solid block of text. I then have to work through this, comparing with the original PDF and putting the necessary paragraph markers in.
If this is the only way, then so be it. However, I am hoping that somebody out there may have come up with a solution.
Many thanks,
Tony
HELP FOR MAKING MODULES
-
- Posts: 118
- Joined: Fri Jun 15, 2007 7:15 pm
- Location: London, England
HELP FOR MAKING MODULES
Words are the clothes our thoughts wear
Re: HELP FOR MAKING MODULES
Hi Tony,
Simplest way is to replace the strange sign with a paragraph marker in the extended serch and replace function in Word?
This is how I did it..
Richard
Simplest way is to replace the strange sign with a paragraph marker in the extended serch and replace function in Word?
This is how I did it..
Richard
Re: HELP FOR MAKING MODULES
Unfortunately, what you describe is the only way...
There are some more 'clever' converters that try to link lines together in paragraphs, but in the end there are always mistakes, simply because the PDF format has no information on paragraphs.
What i suggest is this: ask the website owner to give you the doc file that the PDFs came from explaining that the PDF format is not appropriate for this. People don't really know that PDF has this issue (or anyway design feature)
Costas
There are some more 'clever' converters that try to link lines together in paragraphs, but in the end there are always mistakes, simply because the PDF format has no information on paragraphs.
What i suggest is this: ask the website owner to give you the doc file that the PDFs came from explaining that the PDF format is not appropriate for this. People don't really know that PDF has this issue (or anyway design feature)
Costas
-
- Posts: 118
- Joined: Fri Jun 15, 2007 7:15 pm
- Location: London, England
Re: HELP FOR MAKING MODULES
Hi Costas,
Thanks for that, I will contact the owner and see if he can come up with the document files.
Tony
Thanks for that, I will contact the owner and see if he can come up with the document files.
Tony
Words are the clothes our thoughts wear
Re: HELP FOR MAKING MODULES
Hi Tony,
I have had the same difficulties with cutting and pasting the contents of PDF files. So, some months ago, I wrote a Word macro for my own personal use that attempts to make proper paragraphs as you described. As Costas pointed out, it is impossible to achieve perfect results, but I have managed to get some pretty good results with it.
The coding is pretty rough and unpolished, but if you (or anyone else) is interested, I can post the file on the forum.
Paul.
I have had the same difficulties with cutting and pasting the contents of PDF files. So, some months ago, I wrote a Word macro for my own personal use that attempts to make proper paragraphs as you described. As Costas pointed out, it is impossible to achieve perfect results, but I have managed to get some pretty good results with it.
The coding is pretty rough and unpolished, but if you (or anyone else) is interested, I can post the file on the forum.
Paul.
Re: HELP FOR MAKING MODULES
Hi Tony,
This is what I do with these PDF files, and I've worked with them a lot in the past 12 months, reformatting over 100 books. I use a 3 step method.
1] Clean up basic text column in Open Office Writer - principally punctuation and spacing. My typical text starts out as a column of words, 30-45 character wide, with many text errors in spelling and punctuation - an 'h' will appear in the raw document as 'li'. Hence, 'the' appears in the text as 'tlie' frequently.
If a lot of your words are hyphenated at the end of a line, as older book publishers had a tendency to incorporate while printing, you will need to remove these hyphens first. I use Open Office for this, prior to working with the text in Notepad++. I do a search with the 'Find & Replace' function of Open Office and replace anything with these characteristics [ - space]. I leave the replace character slot empty, and the hyphens are gone. I use the 'Find & Replace' function for 10-12 different character or punctuate marks. It's fast and 90% accurate.
2] I copy paste my files into Notepad++ (a free text editor program).
After breaking up the long chapter text columns into the appropriate chapter/paragraphs, as shown in the PDF file, I take my cursor and highlight each paragraph. Then using the "Ctrl + J" function, will remove the line breaks and place all of the lines into ONE line per paragraph. You will notice that the paragraph now looks reasonably normal, if you have 'word wrap' turned on.
3] After getting the chapter/paragraphs grouped properly in Notepad++, I then copy/paste back into Open Office Writer - giving each chapter it's own document file.
I then proofread my chapter documents, correcting spelling and any faulty punctuation I find. Applying the proper fonts, headings, etc. I also run the spell check. I make errors, but not as many as when I first started. Sincerely, I desire greater simplicity in producing a document that has been correctly proofread from raw PDF files into module ready documents, but I've not been able to figure that out. Viable instruction is always accepted....
I have found that I can take a 15 chapter 350 page book, from a raw file into ready to proofread documents in about 2 hours. Some books take a little extra effort, but perseverance seems to work.
The Lord bless you in your efforts - don't become discouraged. You will be encouraged with the final results of your new modules.
This is what I do with these PDF files, and I've worked with them a lot in the past 12 months, reformatting over 100 books. I use a 3 step method.
1] Clean up basic text column in Open Office Writer - principally punctuation and spacing. My typical text starts out as a column of words, 30-45 character wide, with many text errors in spelling and punctuation - an 'h' will appear in the raw document as 'li'. Hence, 'the' appears in the text as 'tlie' frequently.
If a lot of your words are hyphenated at the end of a line, as older book publishers had a tendency to incorporate while printing, you will need to remove these hyphens first. I use Open Office for this, prior to working with the text in Notepad++. I do a search with the 'Find & Replace' function of Open Office and replace anything with these characteristics [ - space]. I leave the replace character slot empty, and the hyphens are gone. I use the 'Find & Replace' function for 10-12 different character or punctuate marks. It's fast and 90% accurate.
2] I copy paste my files into Notepad++ (a free text editor program).
After breaking up the long chapter text columns into the appropriate chapter/paragraphs, as shown in the PDF file, I take my cursor and highlight each paragraph. Then using the "Ctrl + J" function, will remove the line breaks and place all of the lines into ONE line per paragraph. You will notice that the paragraph now looks reasonably normal, if you have 'word wrap' turned on.
3] After getting the chapter/paragraphs grouped properly in Notepad++, I then copy/paste back into Open Office Writer - giving each chapter it's own document file.
I then proofread my chapter documents, correcting spelling and any faulty punctuation I find. Applying the proper fonts, headings, etc. I also run the spell check. I make errors, but not as many as when I first started. Sincerely, I desire greater simplicity in producing a document that has been correctly proofread from raw PDF files into module ready documents, but I've not been able to figure that out. Viable instruction is always accepted....
I have found that I can take a 15 chapter 350 page book, from a raw file into ready to proofread documents in about 2 hours. Some books take a little extra effort, but perseverance seems to work.
The Lord bless you in your efforts - don't become discouraged. You will be encouraged with the final results of your new modules.
-
- Posts: 118
- Joined: Fri Jun 15, 2007 7:15 pm
- Location: London, England
Re: HELP FOR MAKING MODULES
Hi,
Thank you all for your helpful advice and hints.
Paul, I would be very interested to give your macro a try if you could post it here.
I need to make this as easy as possible because I have a '50 Volume' set that I am working on. A couple of years ago I produced the first 10 volumes. These were easy as they were text files and it was just a case of spell-checking, correcting some of the formatting and then copying into TheWord. However, the next 40 volumes are only available in PDF. They were scanned from the old original books and just saved in that format. Unfortunately, there are also a load of errors from the scanning and spell-checking alone is taking forever and a day, so anything that can help me speed up the process is very much appreciated.
Many thanks to all again.
Tony
Thank you all for your helpful advice and hints.
Paul, I would be very interested to give your macro a try if you could post it here.
I need to make this as easy as possible because I have a '50 Volume' set that I am working on. A couple of years ago I produced the first 10 volumes. These were easy as they were text files and it was just a case of spell-checking, correcting some of the formatting and then copying into TheWord. However, the next 40 volumes are only available in PDF. They were scanned from the old original books and just saved in that format. Unfortunately, there are also a load of errors from the scanning and spell-checking alone is taking forever and a day, so anything that can help me speed up the process is very much appreciated.
Many thanks to all again.
Tony
Words are the clothes our thoughts wear
Re: HELP FOR MAKING MODULES
Tony, considering your last post I think you should first pass all your files thru an OCR software because when you scanned a book the result is like a picture instead a book.
If this is the case I recommend to use the free OCR service in google docs, simply create an account (if you don't have one already) an upload your files, the results are quite good.
Regards.
Manuel.
If this is the case I recommend to use the free OCR service in google docs, simply create an account (if you don't have one already) an upload your files, the results are quite good.
Regards.
Manuel.
Awaiting the return of the Lord (The Glorious rapture of the Church)...
http://jesus-christ-is-coming.blogspot.com/
http://www.cristo-viene.cl
http://jesus-christ-is-coming.blogspot.com/
http://www.cristo-viene.cl
-
- Posts: 118
- Joined: Fri Jun 15, 2007 7:15 pm
- Location: London, England
Re: HELP FOR MAKING MODULES
Hi Manuel,
I don not have the original scans. The owner of the files scanned books in and using OCR software converted them to PDF files. I have now converted the PDF files to text files. And it is the proofing of the text files which is taking an awfully long time.
In effect I have something like 20 LARGE books which I need to read through and correct. The first chapter of the first book have over 100 errors in it so as you can imagine it is very time consuming.
Tony
I don not have the original scans. The owner of the files scanned books in and using OCR software converted them to PDF files. I have now converted the PDF files to text files. And it is the proofing of the text files which is taking an awfully long time.
In effect I have something like 20 LARGE books which I need to read through and correct. The first chapter of the first book have over 100 errors in it so as you can imagine it is very time consuming.
Tony
Words are the clothes our thoughts wear
Re: HELP FOR MAKING MODULES
Hi again,
I have the macro, but I am unable to post it to the forum as a .dot file or .doc file. Costas, what would be the best way for me to make this available?
Paul.
I have the macro, but I am unable to post it to the forum as a .dot file or .doc file. Costas, what would be the best way for me to make this available?
Paul.
Re: HELP FOR MAKING MODULES
Hi Paul,pjc wrote:Hi again,
I have the macro, but I am unable to post it to the forum as a .dot file or .doc file. Costas, what would be the best way for me to make this available?
Paul.
make a zip of it and add it as attachment to a post!
Costas
-
- Posts: 118
- Joined: Fri Jun 15, 2007 7:15 pm
- Location: London, England
Re: HELP FOR MAKING MODULES
Hi Paul,
After your posting I got to thinking a lot about macros. I was sure there must be some way of tidying the original files up.
After playing around for some time I realised that it was a lot easier than I thought as the paragraphs themselves were indented and the indent was three spaces.
I made a macro to replace three spaces + paragraph marker with 2 paragraph markers. Then I replaced one space + paragraph marker with one space.
This, believe it or not, actually worked and I ended up with the document showing paragraphs correctly.
The rest was a doddle, replacing double spaces with a single space to get rid of the hundreds of scanning errors where there were two or three spaces between words.
So, I have now started in earnest and hope to get on with the job.
Thank you for your kind offer which I will still accept if you think it would do any more than I have already done.
Yours in Christ,
Tony
After your posting I got to thinking a lot about macros. I was sure there must be some way of tidying the original files up.
After playing around for some time I realised that it was a lot easier than I thought as the paragraphs themselves were indented and the indent was three spaces.
I made a macro to replace three spaces + paragraph marker with 2 paragraph markers. Then I replaced one space + paragraph marker with one space.
This, believe it or not, actually worked and I ended up with the document showing paragraphs correctly.
The rest was a doddle, replacing double spaces with a single space to get rid of the hundreds of scanning errors where there were two or three spaces between words.
So, I have now started in earnest and hope to get on with the job.
Thank you for your kind offer which I will still accept if you think it would do any more than I have already done.
Yours in Christ,
Tony
Words are the clothes our thoughts wear
Re: HELP FOR MAKING MODULES
Hi again,
Here is the Microsoft Word macro in .zip format (thanks Costas! ) for making proper paragraphs from imported text. As I mentioned previously, it was originally written for my own personal use, so it's pretty rough code, and it hasn't been fully tested. I have since added a Help tab to the dialog box to provide some basic directions, and to explain how the macro works.
The macro is also able to do some other basic reformatting – such as removing lines containing only page numbers, putting two spaces after sentences and removing white space from the start of a line. There is also the option of saving and loading your settings, and an undo feature if you are not happy with the results.
To use:
1. Load the document into Microsoft Word. (You may have to adjust your settings to allow for macros)
2. Paste the text you want to reformat into the document.
3. Run the macro "ShowFormatImportedTextDialog".
4. A dialog box will appear.
5. Click on the "Help" tab for further directions on how to use.
I hope this helps to speed up the reformatting of your documents (although Tony, your method was probably better for the documents which you described!)
Paul.
Here is the Microsoft Word macro in .zip format (thanks Costas! ) for making proper paragraphs from imported text. As I mentioned previously, it was originally written for my own personal use, so it's pretty rough code, and it hasn't been fully tested. I have since added a Help tab to the dialog box to provide some basic directions, and to explain how the macro works.
The macro is also able to do some other basic reformatting – such as removing lines containing only page numbers, putting two spaces after sentences and removing white space from the start of a line. There is also the option of saving and loading your settings, and an undo feature if you are not happy with the results.
To use:
1. Load the document into Microsoft Word. (You may have to adjust your settings to allow for macros)
2. Paste the text you want to reformat into the document.
3. Run the macro "ShowFormatImportedTextDialog".
4. A dialog box will appear.
5. Click on the "Help" tab for further directions on how to use.
I hope this helps to speed up the reformatting of your documents (although Tony, your method was probably better for the documents which you described!)
Paul.
- Attachments
-
- AutoFormatImportedText.zip
- Microsoft Word macro for making proper paragraphs.
- (3.9 KiB) Downloaded 390 times
Re: HELP FOR MAKING MODULES
"If at first you don't succeed...."
My apologies to all who downloaded the previous zip file and found......nothing! Here finally, is the file you will need.
Eccl 7:8a
My apologies to all who downloaded the previous zip file and found......nothing! Here finally, is the file you will need.
Eccl 7:8a
Re: HELP FOR MAKING MODULES
PJC,
I am still not seeing the MACRO as described. Is anyone else having this problem?
I am still not seeing the MACRO as described. Is anyone else having this problem?
Dave
Gods gifts comes wrapped many different ways.
Gods gifts comes wrapped many different ways.