EDBALL-L ArchivesArchiver > EDBALL > 2005-06 > 1119622283
From: Michel Metford-Platt <>
Subject: Re: [EdBall] GEDCOM cleanup
Date: Fri, 24 Jun 2005 16:11:23 +0200
Hi David, and Bev and Joanne,
Thank you for your email, David. I was wondering, but have been so busy that
unfortunately it's not on my priority list.
1. Have changed your email in my address book.
2. Great! Was there a retirement party? Hope all is well in it...
Report Card comments.
A. I am doing a rather standard "search and replace". EG, Ref#56 gets
replaced by "Bailey, Rosalie Fellows, Pre Revolutionary Dutch Houses and
Families in Northern New Jersey and Southern New York, p. 538; Pat Wardell /
". But it will appear as
"Bailey, Rosalie Fellows. (nd:538) Pre Revolutionary Dutch Houses and
Families in Northern New Jersey and Southern New York. trans. by Pat
Wardell." which is the form I use(d) at university. "nd" is obviously no
date because none was given, which is terribly unfortunate (in that
particular source). I am going to contact each person by email before I do a
search and replace of each particular source to see if the account it still
active, and try to nail down any particulars. I will post them on the
EDBALL-L as they pop up.
I am not actually moving them from notes because that is WAY too time
consuming, and like you said, it would not be of any help.
B. sounds fantastic. as you know, what was put together was the merging of
many files, and people have different ways of doing their own thing. It's
good it's being standardized. Heck, the spelling of colour and color isn't
standardized, so good luck! But I think you may have a problem with some NJ
records as counties changed and were carved out of old ones. The only thing
I think that one must be careful of is to use the original name of the
county when the record was done. I can only show an example from PA where
Montgomery Co. was carved out of Philadelphia Co. in 1784. All events (so to
speak) that took place in Montgomery Co. before 1784 I keep as happening in
Phila Co. as that's where the records are kept (wills, deeds, etc). I don't
know when Union Co. was carved out and Essex, but you may wish to use the
original county of existence. Just an idea, but we bounce that back and
C. Unique reference numbers are great ideas. What is your pattern? These
things are usually in some mathematical pattern.
D. I do remember the discussions (about 10 years ago) taking care of a lot
of possible time-line problems. Yes, there were lots of women who were under
12. And some people lived forever. Probably 90% were cleaned up. As you come
across them, would you post them to the list? I know that adds an extra
step, so decline if it's too cumbersome.
... as for the help you've asked for. I will go through my Ball files and
scan any page from a book which I have on file. But Joanne is the one to
contact about all those painstaking transcriptions. I will not attempt to
duplicate her glorious work!
And as I said, I will contact people who are listed in the source list to
see if their emails are still working. Then proceed (nicely I hope) to ask
(if the email is still working) if the pages he/she/they got the information
from can be scanned and have him/her/them send it to you. But I will do it
one person at a time as it is time consuming.
I think that's all. I usually follow-up emails with something I forgot, or
will forget. So I'll stop now and send something later.
on 24/06/05 5:03, David G. Ball at wrote:
> Bev, Michel, and Joanne,
> Time for an update and a couple of key bits of info:
> 1. My email has changed (finally upgraded to cable high-speed internet) to
> 2. I retired two weeks ago and am finally able to spend some time on
> As for the GEDCOM, I have only managed a little time on it, so far, but have
> made some useful progress (mostly knowing the size of the job better).
> Apologize if my review is a bit direct, but bottom line the GEDCOM is a huge
> benefit to my bigger project and I really appreciate having it to play with.
> To have entered everything direct would have taken a minimum of two years as
> the focus of my project; thanks to you I can look at cleaning it up over a
> year in my spare time. Now the report card:
> A. The sources, with very few exceptions are very incomplete and will
> generally need to be revisited to provide "book quality" bibliography and
> citations. Don't waste your time playing with moving them from notes to
> something else; nothing you can do would benefit me in any way.
> B. The place locations were a very mixed bag of styles, spelling, and
> format (e.g., NY, N.Y., New York). TMG has six columns that I use for
> "site" (e.g., cemeteries, hospitals, etc.), "detail" (generally street
> addresses), town, county, state, and country. I have cleaned all of the
> country and state entries, as well as site locations and have made it
> through Indiana entering in county names (thank goodness for the county
> finder on RootsWeb). A bunch of spelling typos got fixed and by working
> directly in the Master Place List I get all the individual data tags that
> used any of the changed formatting.
> C. Have begun to enter a unique reference number to all of the Balls and
> Ball wives, so that I can use those numbers to annotate the "raw data" files
> (which I will talk a bit about below). Have done 400 of the 1500 or so
> D. Have just scratched the surface on cleaning each person's file.
> Basically I have added a flag to everyone that says "Clean: Yes or No?".
> The "No" people are highlighted in red. The "Yes" people turn the highlight
> to pale blue. Any new people will automatically be "Yes". Actually, other
> than deleting blank tags (e.g., a birth tag with only a "?" and nothing
> else) and dealing with the source info, there is very little cleaning
> needed. Have found some timeline problems (12 year old mothers, people
> living to 130, etc.), but very few.
> E. A brief search through my paper and book files says that I have easily a
> thousand or more people to add to the file (mostly from the Joe Scukanec
> notebooks that I have). I also want to start to add census data.
> So, what can anyone do to help? There are two key areas right now:
> First: The secondary sources....I have a few of the books from the source
> list (NEHGS Register, item by Mary Ball Coultrap, item by Claudette Maerz,
> item by James Savage, item by Howard Lee Ball, item by Joseph H. Vance, and
> the Frank C. Ball collection at Muncie). I would doubt that very much of
> the information (other than names, dates and places) was extracted from any
> of those sources in your list. It would be really useful to find out which
> of those sources may be on-line (I have memberships to Ancestry.com, NEHGS,
> and Heritage Quest through NEHGS). Items that really only have a few pages
> that are in your possession or can be acquired without much cost or effort
> can be emailed to me as JPEG or PDF files (please include full bibliography
> info and the BigEd GEDCOM source number). That way I can begin to rebuild
> the source documentation within the TMG file, as well as to expand the
> amount of data that is recorded from any source.
> Second: The primary sources....one of the key parts of the Ball Project is
> the building of computer files of raw data (birth, baptism, marriage, death,
> burial, probate, Bible, land, military, civil, census, passenger lists,
> etc., etc.) for all people with the name of Ball. I already have pulled
> down the Ball census index info from Ancestry.com for all years. Now I am
> working on Mass. vital records to 1850. This is why I have a unique 5-digit
> reference number for every Ball in my trees. When I use a primary record as
> the source, I annotate the record in the raw data database with the
> appropriate reference number. That way I can both scan the raw data for
> potential matches AND see which bits of raw data have not yet been assigned
> to someone in the trees. The point is that I can never have too many raw
> data bits. I have zero relating to New Jersey or any of the locations that
> have concentrations of Ball descendants for Edward and Abigail Ball of
> Newark NJ. Anyone wanting to really help the next generation of development
> of the BigEd file that is willing to extract raw data primarily for the
> northern half of the US will be doing a great service to the Ball Project.
> Again, I need a proper citation for each data bit (volume and page number
> and title of the record source, etc.), so I can properly document each item
> as it is entered into the trees.
> Those are for now. As time goes on I will have to find a way to get both
> the trees and the raw data files on-line to encourage a wider circle to
> contribute to both. I am going to have to have a computer wizard to take on
> that task, since it is beyond my ken. I do know that "Second Site" software
> is designed to produce websites from TMG files with little or no expertise,
> but getting the raw data visible will take some work. One step at a time.
> Getting a good base of names, dates and places into TMG for twenty or thirty
> Ball clans is my first goal and I am willing to do that mostly from
> secondary sources. Later I can improve the quality of the trees by using
> primary data as I get enough into the databases.
> Anyway, this is my update. At any time I can shoot back the revised GEDCOM
> (or TMG file to those of you that use The Master Genealogist software), but
> I think it will get more attention here for the next year or so.