[www.theword.net]

Twitter live feed  
View unanswered posts | View active topics It is currently Fri Dec 03, 2021 3:28 pm



Reply to topic  [ 19 posts ]  Go to page 1, 2  Next
 HELP FOR MAKING MODULES 
Author Message

Joined: Fri Jun 15, 2007 6:15 pm
Posts: 118
Location: London, England
Post HELP FOR MAKING MODULES
Hi,

Can anybody please help me.

I am trying to make some modules for TW. I have the documents (which are in the Public Domain), and I have the blessing of the owner of the web site that I got them from.

The problem I have is that the files are in PDF format. When I convert them to text there is a paragraph marker at the end of each line. So far the best solution I have come up with is to use find and replace to get rid of the paragraph markers. Then I end up with a solid block of text. I then have to work through this, comparing with the original PDF and putting the necessary paragraph markers in.

If this is the only way, then so be it. However, I am hoping that somebody out there may have come up with a solution.

Many thanks,

Tony :mrgreen:

_________________
Words are the clothes our thoughts wear


Mon Aug 16, 2010 7:53 pm
Profile

Joined: Tue Jan 22, 2008 3:10 pm
Posts: 103
Location: Nederland
Post Re: HELP FOR MAKING MODULES
Hi Tony,

Simplest way is to replace the strange sign with a paragraph marker in the extended serch and replace function in Word?

This is how I did it..

Richard


Mon Aug 16, 2010 8:05 pm
Profile
Site Admin

Joined: Tue Aug 29, 2006 2:09 pm
Posts: 8611
Location: Corfu, Greece
Post Re: HELP FOR MAKING MODULES
Unfortunately, what you describe is the only way...
There are some more 'clever' converters that try to link lines together in paragraphs, but in the end there are always mistakes, simply because the PDF format has no information on paragraphs.
What i suggest is this: ask the website owner to give you the doc file that the PDFs came from explaining that the PDF format is not appropriate for this. People don't really know that PDF has this issue (or anyway design feature)
Costas


Mon Aug 16, 2010 8:13 pm
Profile WWW

Joined: Fri Jun 15, 2007 6:15 pm
Posts: 118
Location: London, England
Post Re: HELP FOR MAKING MODULES
Hi Costas,

Thanks for that, I will contact the owner and see if he can come up with the document files.

Tony

_________________
Words are the clothes our thoughts wear


Mon Aug 16, 2010 8:30 pm
Profile

Joined: Fri Jun 11, 2010 2:10 am
Posts: 20
Post Re: HELP FOR MAKING MODULES
Hi Tony,

I have had the same difficulties with cutting and pasting the contents of PDF files. So, some months ago, I wrote a Word macro for my own personal use that attempts to make proper paragraphs as you described. As Costas pointed out, it is impossible to achieve perfect results, but I have managed to get some pretty good results with it.

The coding is pretty rough and unpolished, but if you (or anyone else) is interested, I can post the file on the forum.

Paul.


Wed Aug 18, 2010 5:19 am
Profile

Joined: Thu Dec 24, 2009 8:03 am
Posts: 8
Post Re: HELP FOR MAKING MODULES
Hi Tony,

This is what I do with these PDF files, and I've worked with them a lot in the past 12 months, reformatting over 100 books. I use a 3 step method.

1] Clean up basic text column in Open Office Writer - principally punctuation and spacing. My typical text starts out as a column of words, 30-45 character wide, with many text errors in spelling and punctuation - an 'h' will appear in the raw document as 'li'. Hence, 'the' appears in the text as 'tlie' frequently.

If a lot of your words are hyphenated at the end of a line, as older book publishers had a tendency to incorporate while printing, you will need to remove these hyphens first. I use Open Office for this, prior to working with the text in Notepad++. I do a search with the 'Find & Replace' function of Open Office and replace anything with these characteristics [ - space]. I leave the replace character slot empty, and the hyphens are gone. I use the 'Find & Replace' function for 10-12 different character or punctuate marks. It's fast and 90% accurate.

2] I copy paste my files into Notepad++ (a free text editor program).

After breaking up the long chapter text columns into the appropriate chapter/paragraphs, as shown in the PDF file, I take my cursor and highlight each paragraph. Then using the "Ctrl + J" function, will remove the line breaks and place all of the lines into ONE line per paragraph. You will notice that the paragraph now looks reasonably normal, if you have 'word wrap' turned on.

3] After getting the chapter/paragraphs grouped properly in Notepad++, I then copy/paste back into Open Office Writer - giving each chapter it's own document file.

I then proofread my chapter documents, correcting spelling and any faulty punctuation I find. Applying the proper fonts, headings, etc. I also run the spell check. I make errors, but not as many as when I first started. Sincerely, I desire greater simplicity in producing a document that has been correctly proofread from raw PDF files into module ready documents, but I've not been able to figure that out. Viable instruction is always accepted....

I have found that I can take a 15 chapter 350 page book, from a raw file into ready to proofread documents in about 2 hours. Some books take a little extra effort, but perseverance seems to work.

The Lord bless you in your efforts - don't become discouraged. You will be encouraged with the final results of your new modules.


Wed Aug 18, 2010 9:16 am
Profile

Joined: Fri Jun 15, 2007 6:15 pm
Posts: 118
Location: London, England
Post Re: HELP FOR MAKING MODULES
Hi,

Thank you all for your helpful advice and hints.

Paul, I would be very interested to give your macro a try if you could post it here.

I need to make this as easy as possible because I have a '50 Volume' set that I am working on. A couple of years ago I produced the first 10 volumes. These were easy as they were text files and it was just a case of spell-checking, correcting some of the formatting and then copying into TheWord. However, the next 40 volumes are only available in PDF. They were scanned from the old original books and just saved in that format. Unfortunately, there are also a load of errors from the scanning and spell-checking alone is taking forever and a day, so anything that can help me speed up the process is very much appreciated.

Many thanks to all again.

Tony :D

_________________
Words are the clothes our thoughts wear


Wed Aug 18, 2010 10:07 am
Profile

Joined: Tue May 04, 2010 11:45 pm
Posts: 174
Location: Santiago, Chile
Post Re: HELP FOR MAKING MODULES
Tony, considering your last post I think you should first pass all your files thru an OCR software because when you scanned a book the result is like a picture instead a book.

If this is the case I recommend to use the free OCR service in google docs, simply create an account (if you don't have one already) an upload your files, the results are quite good.


Regards.
Manuel.

_________________
Awaiting the return of the Lord (The Glorious rapture of the Church)...

http://jesus-christ-is-coming.blogspot.com/
http://www.cristo-viene.cl


Wed Aug 18, 2010 7:21 pm
Profile

Joined: Fri Jun 15, 2007 6:15 pm
Posts: 118
Location: London, England
Post Re: HELP FOR MAKING MODULES
Hi Manuel,

I don not have the original scans. The owner of the files scanned books in and using OCR software converted them to PDF files. I have now converted the PDF files to text files. And it is the proofing of the text files which is taking an awfully long time.

In effect I have something like 20 LARGE books which I need to read through and correct. The first chapter of the first book have over 100 errors in it so as you can imagine it is very time consuming.

Tony :D

_________________
Words are the clothes our thoughts wear


Wed Aug 18, 2010 7:42 pm
Profile

Joined: Fri Jun 11, 2010 2:10 am
Posts: 20
Post Re: HELP FOR MAKING MODULES
Hi again,

I have the macro, but I am unable to post it to the forum as a .dot file or .doc file. Costas, what would be the best way for me to make this available?

Paul.


Thu Aug 19, 2010 3:05 am
Profile
Site Admin

Joined: Tue Aug 29, 2006 2:09 pm
Posts: 8611
Location: Corfu, Greece
Post Re: HELP FOR MAKING MODULES
pjc wrote:
Hi again,

I have the macro, but I am unable to post it to the forum as a .dot file or .doc file. Costas, what would be the best way for me to make this available?

Paul.

Hi Paul,
make a zip of it and add it as attachment to a post!
Costas


Thu Aug 19, 2010 4:23 pm
Profile WWW

Joined: Fri Jun 15, 2007 6:15 pm
Posts: 118
Location: London, England
Post Re: HELP FOR MAKING MODULES
Hi Paul,

After your posting I got to thinking a lot about macros. I was sure there must be some way of tidying the original files up.

After playing around for some time I realised that it was a lot easier than I thought as the paragraphs themselves were indented and the indent was three spaces.

I made a macro to replace three spaces + paragraph marker with 2 paragraph markers. Then I replaced one space + paragraph marker with one space.

This, believe it or not, actually worked and I ended up with the document showing paragraphs correctly.

The rest was a doddle, replacing double spaces with a single space to get rid of the hundreds of scanning errors where there were two or three spaces between words.

So, I have now started in earnest and hope to get on with the job.

Thank you for your kind offer which I will still accept if you think it would do any more than I have already done.

Yours in Christ,

Tony :D :D

_________________
Words are the clothes our thoughts wear


Thu Aug 19, 2010 7:54 pm
Profile

Joined: Fri Jun 11, 2010 2:10 am
Posts: 20
Post Re: HELP FOR MAKING MODULES
Hi again,‎

Here is the Microsoft Word macro in .zip format (thanks Costas! :oops: ) for making proper paragraphs ‎from imported text. As I mentioned previously, it was originally written for my own ‎personal use, so it's pretty rough code, and it hasn't been fully tested. I have since added ‎a Help tab to the dialog box to provide some basic directions, and to explain how the ‎macro works.‎

The macro is also able to do some other basic reformatting – such as removing lines ‎containing only page numbers, putting two spaces after sentences and removing white ‎space from the start of a line. There is also the option of saving and loading your ‎settings, and an undo feature if you are not happy with the results.‎

To use: ‎
‎1. Load the document into Microsoft Word. (You may have to adjust your settings to allow for ‎macros)‎
‎2. Paste the text you want to reformat into the document.‎
‎3. Run the macro "ShowFormatImportedTextDialog".‎
‎4. A dialog box will appear.‎
‎5. Click on the "Help" tab for further directions on how to use.‎

I hope this helps to speed up the reformatting of your documents (although Tony, your ‎method was probably better for the documents which you described!)‎

Paul.‎


Attachments:
File comment: Microsoft Word macro for making proper paragraphs.
AutoFormatImportedText.zip [3.9 KiB]
Downloaded 298 times
Fri Aug 20, 2010 3:04 am
Profile

Joined: Fri Jun 11, 2010 2:10 am
Posts: 20
Post Re: HELP FOR MAKING MODULES
"If at first you don't succeed...." :roll: :oops:

My apologies to all who downloaded the previous zip file and found......nothing! Here finally, is the file you will need.

Attachment:
File comment: Macro for making proper paragraphs.
AutoFormatImportedText.zip [3.9 KiB]
Downloaded 372 times


Eccl 7:8a :D


Mon Aug 23, 2010 12:18 am
Profile

Joined: Mon Nov 02, 2009 1:43 am
Posts: 147
Location: Greenville, South Carolina
Post Re: HELP FOR MAKING MODULES
PJC,

I am still not seeing the MACRO as described. Is anyone else having this problem?

_________________
Dave
Gods gifts comes wrapped many different ways.


Mon Aug 23, 2010 1:19 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 19 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.
[ Time : 0.716s | 15 Queries | GZIP : Off ]