Tips on how to delete hidden duplicate modules

Share your favorite tips, workarounds and shortcuts for theWord
ErikJon
Posts: 431
Joined: Thu Nov 08, 2012 11:24 pm

Tips on how to delete hidden duplicate modules

Post by ErikJon »

Believe it or not, there are many duplicates out there, and perhaps you have already installed many without even realizing it. They are hard to catch, because they do not usually have the same exact title. You may have one commentary entitled "biblecommentary-matthewhenry.cmt" while another called is called "henry's-bible-commentary.cmt." Inside they may be otherwise identical.

Tools for duplicate removal.

There are file-duplicate programs available, which detect differences based on varying criteria. Some look only for exact matches regarding the file name, while others examine the content of the two and may ignore the file name entirely. Some of them, like Auslogic's, will find the duplicates, but will then automatically remove the ones that it deemed unworthy, without asking the user for any additional input.

There is a nice program called "Fast Duplicate File Finder," which, unlike the fifty other duplicate finders available online, allows you to adjust the degree of similarity between potential matches, according to file content--not only by name. In other words, even if the module names are entirely different, and the formatting inside is different, and one includes a "table of contents" and the other does not, this program can detect that 99% of the overall content is identical, and will suggest by default, that you delete the older version of the two. I call that useful. In fact, you can reduce the percentage of similarity to 92% or any other figure you like, if you think that there may be a handful of other insignificant differences, such as an introduction from the module creator, an embedded image of the original book cover, an obsolete index at the end, with page numbers, that may have been removed by the creator of the duplicate module, since the page numbers were no longer accurate. Someone may have removed the dedication, the pages of advertising that appeared at the back of the original printed work; someone may have added or removed entire Bible passages, substituting just the references or vice versa--all with good intentions of making the module easier to use within TheWord. In these cases the similarity between two otherwise identical modules will not be 99% but perhaps 97% or less, depending on the size of the overall module. With this program you can adjust the percentage to suit your needs.

Unfortunately, the free version of this program allows only a handful of duplicate detections at a time when using this handy percentage-of-similarity feature. If you only have 200 modules installed anyway, it may not be an issue. By contrast, I have 1,900 modules installed, myself, and at least 100 verified duplicates somewhere among them which I have already noticed in passing, so I wish that the program would reveal all of the duplicates to me at once. The full version of that program costs a whopping $40, but the limited version is free. Get it here http://www.mindgems.com/products/Fast-D ... -About.htm

Precautions to take into consideration before removing duplicates.

Some of you may be asking why I have not bothered to delete my own 100 duplicates if I have already noticed them in passing. The answer is complex, and may help you to take certain factors into consideration before removing any duplicates yourself.

I have noticed my own duplicates while using the "define module set" browser window, which (to my knowledge) allows me to sort modules, but does not allow me to delete them. (How I wish it would, as it would make a great "dashboard" for managing the whole library at once). Moreover, to my knowledge, it does not allow me to copy any file names, generate a list of modules--duplicates or otherwise--nor to do anything else, other than simply create sets, modify them, or remove the sets. Consequently, in order to delete a duplicate, I must first remember which ones were duplicates, or else write them down in a word processor. Then I must close the "define module set" window, and then locate the duplicate files, one by one within the main viewing window. However, after I find the duplicates, before deleting either, I must first compare them, to determine which of the two is less desirable. (All of this takes time, which is one reason why I have put it off.)

I recommend that you, yourself, take the following factors into consideration before deleting any file duplicate:

1. Which of the two files in question is better formatted and more legible for long-term use? If both modules contain 300 topics and 200 subtopics, I will not want to take the time to reformat the inferior duplicate, but will probably choose to keep the better one.

However, some modules have text and headings with so many different colors that they look more like Christmas trees than reference works, and for the sake of legibility I would rather have the entire text in basic black, to maintain better legibility. Some modules use strange fonts, or combinations thereof, at large type size, and have extremely wide margins all around, embedded into each page, requiring more scrolling just to read each chapter. Some modules have excessive spacing between paragraphs, dingbats, and other ornate symbols scattered throughout, and while I certainly appreciate the module creator's desire to make his work attractive for everyone, and the time that he took to make themodule, in the long run I would rather have the text simple, compact, legible and altogether more useful. I may choose to keep the simpler module, and to discard the other one.

(In fact, we should not make fun, but should remember that some module creators started off unaware of all these practical issues, embellishing their modules with all the unnecessary adornments, and only in later years realized that doing so was impractical; their more recent modules are much better, by comparison, but their old modules have long since been spread around the world, with no chance of correcting those issues.)

2. Which module has formatting issues that can be more easily corrected? If both modules are relatively the same, but one of them, for example, has all the text nicely "wrapped" while the other one has every line of text "hanging" to the right (i.e., with hard returns at the end of every line), it would be much easier to polish the first one than to correct the issues in the second one. (Although Josh Bond surely has some secret tricks for correcting that kind of thing that would make either case equally correctable, from his point of view.)

3. Which of the two files in question is unlocked, so that I can make minor corrections as I find them, and reformat the text when necessary? By that I meant to say that, if the module has only ten topics, I might be perfectly willing to take the time to adjust the type size, if it is truly an issue to me. Nevertheless, if one version of the file is well formatted and locked, while another version is poorly formatted but unlocked, I will likely discard the locked duplicate and reformat the other one for long-term use.

4. Which of the two files in question is well documented with publication information, and biographical information about the author, so that I can easily identify its doctrinal point of view and properly cite it in my bibliographies and footnotes? Some module creators whip out the module as fast as they know how to, and do not bother to put any information into the "properties" window, not to mention background information about the author. We end up using a "mystery module" that cannot be properly cited to add any credibility at all to our writings.

5. Is one of the duplicates an updated or improved version of the other? If all other factors are equal, perhaps one of them is simply a corrected version of the other.

6. Are the files themselves duplicates, or are only the titles near duplicates? When deleting duplicates, be careful to take into account the fact that many files represent different volumes of a larger work. These volume numbers usually appear at the very end of the title, to the right, so you may easily miss them. Even wehn using an automatic file-duplicate remover, you may be presented with two files that appear to be duplicates, only because the titles are nearly identical. Read all the way to the right before making the decision.

You may have any of these installed, for example:

"Simpson, A.B. - Holy Spirit or Power from on High, in two volumes, v.1"
"Simpson, A.B. - Holy Spirit or Power from on High, in two volumes, v.2"
"Simpson, A.B. - Holy Spirit or Power from on High, in two volumes, v.1+v.2"
"Simpson, A.B. - Holy Spirit or Power from on High"

In this case, you certainly want to keep the first two, if they were the only ones that you had. Then again, the third one represents a combined edition, which may or not be better, depending on how it is formatted and whether it is permanently locked, etc. On the other hand, the last one would seem to contain the combined edition, but as it does not specifically claim to have both volumes in one, you cannot be sure until you open it and check.

Deleting the duplicate By the way, when you find duplicates using "Fast Duplicate File Finder," or when using any other similar program, you may be presented with the default option of deleting automatically the older of each pair of duplicates. Unfortunately, this method of detection assumes that the new version is an updated version, but we know that that is not always the case. Sometimes the older version was made by an expert and was well formatted, while the newer version was created very recently by a novice who did not take the time to format it well. Always check the internal issues mentioned above, before deciding which to discard, rather than assuming by the date that one is "new and improved."

Of course, after the decision is made regarding which module to keep and which to discard, we can easily delete the less desirable duplicate from within TheWord by clicking on the corresponding Bookview tab with the right mouse button, as long as the cursor is still within the Bookview window. Nevertheless, deletion from within the program can take up to 60 seconds, depending on the size of the module. If you do not believe me, try deleting one of your larger Webster's Dictionary modules, or your largest commentary. This delay is one of the reasons that I have not bothered to delete many duplicates at all, because I like to keep working rather than to "stop and take out the trash." The other reasons are what were mentioned above (i.e., not being able to first generate a list of duplicates from the "define module sets" window, not being able to delete one without first comparing it to the other, etc.) While I am still looking for a faster way to delete duplicates, another option is to remove the module externally from the TheWord folder.

In this way one has to be careful to select the right module, meaning that, once again, one must write down or remember all the file names, as there seems to be no means of generating a list automatically. From within the program, one can rename the duplicate module temporarily, such as by inserting a special symbol into the title that is easy to identify and hard to confuse, and then, after closing the program, go to the module source folder, search for every file containing that symbol, and then delete those modules, all at once. Note that this cannot be done by simply renaming the "abbreviation" for the module, as that does not change the module name, itself; I assume that you have to rename the file itself within the properties window and save the change.

Well, as usual, this has been "much ado about nothing," but I hope that some part of it has been helpful, at least. Please feel free to post any corrections below to my instructions above, or else your observations, or additional suggestions.
.
.
I'm an Independent Baptist running TheWord portable v 5.0.0.1481 from an external 500GB hard drive with over 1,900 modules installed and loaded in my current module set. I'm using 32-bit Vista Ultimate SP1 with a 2.7gHz processor and 4GB RAM.
.