ZachariahWells-L ArchivesArchiver > ZachariahWells > 2003-01 > 1043002144
From: OrinWells <>
Subject: [Zachariah Wells] Re: Cluster analysis of Wells DNA data
Date: Sun, 19 Jan 2003 10:49:04 -0800
At 11:57 AM 1/19/2003 -0500, wrote:
>I never got any responses from my question a few weeks ago on the Wells
>DNA project, so I assumed that no one had done any further analysis.
Sorry, I missed the post. I think I was out of town at the time you posted
it and simply didn't pick it up in the email when I got back. I am going
to reply and copy this to the list as some there may be interested in
this. If they aren't or get bored they can use the delete button.
> I entered the DNA data from your table "Family BaseLine DNA Patterns"
> into a file and then tried running several cluster analysis algorithms on
> it just for fun. I don't really know anything about the DNA markers, so
> I assumed that they were all of equal "weight" statistically speaking.
This is correct. EVERY male has them all. In some rare cases the lab is
unable to get a measurement on a marker and sometimes they will get two
peaks for reasons they do not understand.
>I have attached one of the cluster trees generated (as a JPEG) for your
>possible interest. It was run using the "flexible" method (please don't
>ask me for details!). Another algorithm, Ward's Method, gave identical
I have examined this and I need some guidance on the "Flexible Beta
Distance" such as what does it mean? This seems to be the key to conveying
a relationship. Thanks for taking the time to do this. One thing that has
to be understood is that DNA mutations seem to be pretty random and there
is no nice mathematical formula that can be applied even though many have
claimed they have one. They just fall apart when you apply real test
results. Mother nature plays by her own rules in this game. In looking at
the distributions I am afraid this could give a viewer a false impression
regarding the relationship between families. I probably would be very
hesitant to post this in it's current form.
In your earlier message you asked the following questions (I went back and
found it). Let me respond to them as well as I can.
1. Is the analysis posted the final quantitative analysis of the samples?
Yes. What you see is what we are going to get unless the lab expands the
markers further and we get additional markers but there is no guarantee we
would be retrofitted.
2. I frankly do not find the current analysis very satisfying. It is
apparently simply a pairwise comparison of differences in markers of the
individual families, with no indication as to why the families were listed
in the order they were (or did I miss something?)
You are correct. There is no order of importance between markers. They
all have pretty much equal weight. The markers have been selected because
they tend to show a higher rate of mutations than the other markers on the
Y-Chromosome and are in a collection of markers identified by the industry
as most suitable. Clearly there are other markers and each testing lab
differs slightly from the other. As to the order, it is an arbitrary
sequencing based on the numeric progression of the marker values, nothing
more. That is DYS385a 11 will come before DYS385a 12. Where a pair of
markers match between two subjects, the next pair is used (DYS385b
etc.) In a very few cases they may appear to be out of "order" because a
close match between participants has been identified where there are
variances in the early order of markers.
Now, to solve this lack of meaning I plan to write an analysis paper to
provide more information on the ancestral families and what the DNA
patterns are showing this. I will try to write it in a way everyone can
understand it. It will take a bit but I am going to do it in conjunction
with some information I am going to give the fellow who plans to write a
piece in the Wall Street Journal on the project.
3. It seems to me that something like a cluster analysis would be more
useful. It would use all the marker data together to show how closely the
various families are related. This analysis could be used to develop a tree
diagram to show the relative degree of relatedness of the families. (Or is
this sort of analysis not appropriate because of the nature of the marker
Of course one of the premises has to be that there is a relationship
between the families. While we all eventually trace back to a common
ancestor in pre-history, for all practical purposes this is generally a
false assumption. Remember that surnames were selected by an individual
nearly 1,000 years ago and many unrelated folks may have selected the same
surname including Wells. What we are seeing makes it very clear that we
are not all very closely related.
If I had read this I would have said "have a go at it". BUT understand
that based on the average mutation rates this is not likely to produce much
of interest in most cases. The reason is that when any two subjects are 4
or more variations from each other it is unlikely they have a common
ancestor much more recently than a couple of thousand years ago. So even
if there is a similarity it probably is not something that can be proven
for genealogy purposes which is the whole objective here. The fact we are
dealing with a surname in common may muck this up a bit and we really
don't understand that influence yet.
4. I wonder if a more thorough analysis planned in the future. Or are we
still waiting for additional samples?
Yes to both. In the first instance I hope to get Dr. R. Spencer Wells to
spend a bit of time with me (remotely) to see if we can identify anything
in the analysis that is of more interest than just subject A matches
Subject B and is only 2 markers different from Subject C. But we are
indeed waiting for more samples. My goal is to collect up to 600 samples
worldwide to expand our base and, hopefully, include as many of the
different origin Wells families as humanly possible. The cost is probably
the largest obstacle to this. But I don't give up easily.
I am sorry if I sound critical. I definitely do not mean to be! You are
doing great work in shepherding the process along and keeping all of us on
the list informed. I thank you for all your time and effort. I am just
concerned that it seems like a lot more could be done with the data set as
it exists currently.
No need to apologize. I don't mind comments especially if they are
inquisitive and constructive.
>As you can see, the cluster tree gives putative relationships among the
>family groups. The heights of the connections on the y-axis are strictly
>based on values of the test statistic and have no further interpretation
>(for example, as to time since divergence of the families). I don't think
>that there are any surprises here--on the Zach end of things
>anyway. Families W016, W020, and W028 are obviously very closely related,
>as one could tell just by scanning the data.
As you said, no surprise there. The trick remaining is to find the
documentation that links the common ancestor to them.
>The next nearest kin would seem to be families W005 and W023, followed by
>families W008, W011, and W027. Hopefully you know better than I whether
>these relationships make any sense or were simply obvious from the raw data.
The difference between family W005 (Richard/Tunis Wells of Virginia and
Fayette Co., PA) is 18 values if we ignore the double peak at DYS394. That
is a VERY large difference with no possibility of them having a Most Common
Recent Ancestor (MCRA) much earlier than the last ice age or further
back. In fact there is another marker behind the scenes from BYU that
reveals another two-step difference making them 20 variances out of 27
markers. As I said before, 4 is considered to be unrelated and can occur
randomly with almost any two surnames that started out several markers
off. Of course if we go back far enough in time (60,000 years) we are all
related to a common ancestor.
Family 23 (David and Susannah Wells of Mechlenburg, Virginia) is equally
distant with 19 differences and one behind the scenes. Family 8 (the
"Little Wells" family of James Wells of Baltimore Co., Maryland) has 21
variances (again ignoring the double peak) and one behind the
scenes. Family 11 (Samuel Wells of Stafford County, Virginia) has 22
differences. Family 27 (John Wells of York County, Virginia) is 20
variations. Now some of these are closer to each other than they are the
Zachariah/Aaron/Robert/etc. clan. But they are still way off from each
other genetically. There are some markers that cause one to wonder if
there is not a relationship. For example on marker DYS455 where most of
the families carry a value of 10 there is a group of families that carry a
7. Also on YCAIIa most of the subjects carry a 19. But some of them,
apparently unrelated, carry a 21 including your extended family. Does that
have some significance? Is it typical of some group such as the Vikings or
Celts? I don't know at this point. But it is one of the topics I want to
discuss with Dr. Wells.
>I know nothing of the Wells families on the other main branch--that is,
>the left hand side. It is interesting that family W006 stands out all by
>itself in this analysis. Have the relationships of these families on the
>left branch been worked out historically?
You are referring to the Baseline families chart.
Yes, they have 32 variations compared to your family and are clearly not
close to any of the other families. There is a thought in the community of
geneticists that each mutation happens on average once every 20
transmissions (father-son event). If you peg this at a generation of 25
years (I favor 35) this works out to a mutation once every 500 years. Thus
4 mutations would be 2,000 years. 32 mutations would require 16,000 years
to occur. The REAL numbers may be much shorter. For example if you were
to use this criteria on two brothers in our study one would conclude their
common ancestor lived 1500 years ago when in fact they are brothers. So
while the rule of thumb might be 20 generations for a mutation the reality
is it might be anywhere from 1 to 40 and I am pulling 40 out of the
hat. Some families seem to be more prone to mutation than others. A lot
of work remains to be done in this field. It is far from an exact science.
What is important to us is who matches our DNA pattern and then usually
ONLY if their Surname matches Wells OR if we have reason to suspect they
might be the product of a non-paternal event. We have some of these in the
Wells study. Some have matched as expected, some have come as surprises to
Orin R. Wells
Wells Family Research Association
P. O. Box 5427
Kent, Washington 98064-5427
Subscribe to the "Wells-L" list on RootsWeb
|[Zachariah Wells] Re: Cluster analysis of Wells DNA data by OrinWells <>|