HELP FOR MAKING MODULES

Discussion on theWord modules and other resources
tonydowden
Posts: 118
Joined: Fri Jun 15, 2007 7:15 pm
Location: London, England

HELP FOR MAKING MODULES

Post by tonydowden »

Hi,

Can anybody please help me.

I am trying to make some modules for TW. I have the documents (which are in the Public Domain), and I have the blessing of the owner of the web site that I got them from.

The problem I have is that the files are in PDF format. When I convert them to text there is a paragraph marker at the end of each line. So far the best solution I have come up with is to use find and replace to get rid of the paragraph markers. Then I end up with a solid block of text. I then have to work through this, comparing with the original PDF and putting the necessary paragraph markers in.

If this is the only way, then so be it. However, I am hoping that somebody out there may have come up with a solution.

Many thanks,

Tony :mrgreen:
Words are the clothes our thoughts wear
gennesse
Posts: 103
Joined: Tue Jan 22, 2008 4:10 pm
Location: Nederland

Re: HELP FOR MAKING MODULES

Post by gennesse »

Hi Tony,

Simplest way is to replace the strange sign with a paragraph marker in the extended serch and replace function in Word?

This is how I did it..

Richard
csterg
Site Admin
Posts: 8627
Joined: Tue Aug 29, 2006 3:09 pm
Location: Corfu, Greece
Contact:

Re: HELP FOR MAKING MODULES

Post by csterg »

Unfortunately, what you describe is the only way...
There are some more 'clever' converters that try to link lines together in paragraphs, but in the end there are always mistakes, simply because the PDF format has no information on paragraphs.
What i suggest is this: ask the website owner to give you the doc file that the PDFs came from explaining that the PDF format is not appropriate for this. People don't really know that PDF has this issue (or anyway design feature)
Costas
tonydowden
Posts: 118
Joined: Fri Jun 15, 2007 7:15 pm
Location: London, England

Re: HELP FOR MAKING MODULES

Post by tonydowden »

Hi Costas,

Thanks for that, I will contact the owner and see if he can come up with the document files.

Tony
Words are the clothes our thoughts wear
pjc
Posts: 20
Joined: Fri Jun 11, 2010 3:10 am

Re: HELP FOR MAKING MODULES

Post by pjc »

Hi Tony,

I have had the same difficulties with cutting and pasting the contents of PDF files. So, some months ago, I wrote a Word macro for my own personal use that attempts to make proper paragraphs as you described. As Costas pointed out, it is impossible to achieve perfect results, but I have managed to get some pretty good results with it.

The coding is pretty rough and unpolished, but if you (or anyone else) is interested, I can post the file on the forum.

Paul.
LarryG
Posts: 8
Joined: Thu Dec 24, 2009 9:03 am

Re: HELP FOR MAKING MODULES

Post by LarryG »

Hi Tony,

This is what I do with these PDF files, and I've worked with them a lot in the past 12 months, reformatting over 100 books. I use a 3 step method.

1] Clean up basic text column in Open Office Writer - principally punctuation and spacing. My typical text starts out as a column of words, 30-45 character wide, with many text errors in spelling and punctuation - an 'h' will appear in the raw document as 'li'. Hence, 'the' appears in the text as 'tlie' frequently.

If a lot of your words are hyphenated at the end of a line, as older book publishers had a tendency to incorporate while printing, you will need to remove these hyphens first. I use Open Office for this, prior to working with the text in Notepad++. I do a search with the 'Find & Replace' function of Open Office and replace anything with these characteristics [ - space]. I leave the replace character slot empty, and the hyphens are gone. I use the 'Find & Replace' function for 10-12 different character or punctuate marks. It's fast and 90% accurate.

2] I copy paste my files into Notepad++ (a free text editor program).

After breaking up the long chapter text columns into the appropriate chapter/paragraphs, as shown in the PDF file, I take my cursor and highlight each paragraph. Then using the "Ctrl + J" function, will remove the line breaks and place all of the lines into ONE line per paragraph. You will notice that the paragraph now looks reasonably normal, if you have 'word wrap' turned on.

3] After getting the chapter/paragraphs grouped properly in Notepad++, I then copy/paste back into Open Office Writer - giving each chapter it's own document file.

I then proofread my chapter documents, correcting spelling and any faulty punctuation I find. Applying the proper fonts, headings, etc. I also run the spell check. I make errors, but not as many as when I first started. Sincerely, I desire greater simplicity in producing a document that has been correctly proofread from raw PDF files into module ready documents, but I've not been able to figure that out. Viable instruction is always accepted....

I have found that I can take a 15 chapter 350 page book, from a raw file into ready to proofread documents in about 2 hours. Some books take a little extra effort, but perseverance seems to work.

The Lord bless you in your efforts - don't become discouraged. You will be encouraged with the final results of your new modules.
tonydowden
Posts: 118
Joined: Fri Jun 15, 2007 7:15 pm
Location: London, England

Re: HELP FOR MAKING MODULES

Post by tonydowden »

Hi,

Thank you all for your helpful advice and hints.

Paul, I would be very interested to give your macro a try if you could post it here.

I need to make this as easy as possible because I have a '50 Volume' set that I am working on. A couple of years ago I produced the first 10 volumes. These were easy as they were text files and it was just a case of spell-checking, correcting some of the formatting and then copying into TheWord. However, the next 40 volumes are only available in PDF. They were scanned from the old original books and just saved in that format. Unfortunately, there are also a load of errors from the scanning and spell-checking alone is taking forever and a day, so anything that can help me speed up the process is very much appreciated.

Many thanks to all again.

Tony :D
Words are the clothes our thoughts wear
Manuel
Posts: 174
Joined: Wed May 05, 2010 12:45 am
Location: Santiago, Chile

Re: HELP FOR MAKING MODULES

Post by Manuel »

Tony, considering your last post I think you should first pass all your files thru an OCR software because when you scanned a book the result is like a picture instead a book.

If this is the case I recommend to use the free OCR service in google docs, simply create an account (if you don't have one already) an upload your files, the results are quite good.


Regards.
Manuel.
Awaiting the return of the Lord (The Glorious rapture of the Church)...

http://jesus-christ-is-coming.blogspot.com/
http://www.cristo-viene.cl
tonydowden
Posts: 118
Joined: Fri Jun 15, 2007 7:15 pm
Location: London, England

Re: HELP FOR MAKING MODULES

Post by tonydowden »

Hi Manuel,

I don not have the original scans. The owner of the files scanned books in and using OCR software converted them to PDF files. I have now converted the PDF files to text files. And it is the proofing of the text files which is taking an awfully long time.

In effect I have something like 20 LARGE books which I need to read through and correct. The first chapter of the first book have over 100 errors in it so as you can imagine it is very time consuming.

Tony :D
Words are the clothes our thoughts wear
pjc
Posts: 20
Joined: Fri Jun 11, 2010 3:10 am

Re: HELP FOR MAKING MODULES

Post by pjc »

Hi again,

I have the macro, but I am unable to post it to the forum as a .dot file or .doc file. Costas, what would be the best way for me to make this available?

Paul.
csterg
Site Admin
Posts: 8627
Joined: Tue Aug 29, 2006 3:09 pm
Location: Corfu, Greece
Contact:

Re: HELP FOR MAKING MODULES

Post by csterg »

pjc wrote:Hi again,

I have the macro, but I am unable to post it to the forum as a .dot file or .doc file. Costas, what would be the best way for me to make this available?

Paul.
Hi Paul,
make a zip of it and add it as attachment to a post!
Costas
tonydowden
Posts: 118
Joined: Fri Jun 15, 2007 7:15 pm
Location: London, England

Re: HELP FOR MAKING MODULES

Post by tonydowden »

Hi Paul,

After your posting I got to thinking a lot about macros. I was sure there must be some way of tidying the original files up.

After playing around for some time I realised that it was a lot easier than I thought as the paragraphs themselves were indented and the indent was three spaces.

I made a macro to replace three spaces + paragraph marker with 2 paragraph markers. Then I replaced one space + paragraph marker with one space.

This, believe it or not, actually worked and I ended up with the document showing paragraphs correctly.

The rest was a doddle, replacing double spaces with a single space to get rid of the hundreds of scanning errors where there were two or three spaces between words.

So, I have now started in earnest and hope to get on with the job.

Thank you for your kind offer which I will still accept if you think it would do any more than I have already done.

Yours in Christ,

Tony :D :D
Words are the clothes our thoughts wear
pjc
Posts: 20
Joined: Fri Jun 11, 2010 3:10 am

Re: HELP FOR MAKING MODULES

Post by pjc »

Hi again,‎

Here is the Microsoft Word macro in .zip format (thanks Costas! :oops: ) for making proper paragraphs ‎from imported text. As I mentioned previously, it was originally written for my own ‎personal use, so it's pretty rough code, and it hasn't been fully tested. I have since added ‎a Help tab to the dialog box to provide some basic directions, and to explain how the ‎macro works.‎

The macro is also able to do some other basic reformatting – such as removing lines ‎containing only page numbers, putting two spaces after sentences and removing white ‎space from the start of a line. There is also the option of saving and loading your ‎settings, and an undo feature if you are not happy with the results.‎

To use: ‎
‎1. Load the document into Microsoft Word. (You may have to adjust your settings to allow for ‎macros)‎
‎2. Paste the text you want to reformat into the document.‎
‎3. Run the macro "ShowFormatImportedTextDialog".‎
‎4. A dialog box will appear.‎
‎5. Click on the "Help" tab for further directions on how to use.‎

I hope this helps to speed up the reformatting of your documents (although Tony, your ‎method was probably better for the documents which you described!)‎

Paul.‎
Attachments
AutoFormatImportedText.zip
Microsoft Word macro for making proper paragraphs.
(3.9 KiB) Downloaded 390 times
pjc
Posts: 20
Joined: Fri Jun 11, 2010 3:10 am

Re: HELP FOR MAKING MODULES

Post by pjc »

"If at first you don't succeed...." :roll: :oops:

My apologies to all who downloaded the previous zip file and found......nothing! Here finally, is the file you will need.
AutoFormatImportedText.zip
Macro for making proper paragraphs.
(3.9 KiB) Downloaded 463 times
Eccl 7:8a :D
UBC4ME
Posts: 178
Joined: Mon Nov 02, 2009 2:43 am
Location: Greenville, South Carolina

Re: HELP FOR MAKING MODULES

Post by UBC4ME »

PJC,

I am still not seeing the MACRO as described. Is anyone else having this problem?
Dave
Gods gifts comes wrapped many different ways.
Post Reply