DNA-R1B1C7-L Archives

Archiver > DNA-R1B1C7 > 2007-08 > 1186852372


From: "Richard B. Hare" <>
Subject: Re: [DNA-R1B1C7] Network.exe
Date: Sat, 11 Aug 2007 13:12:52 -0400
In-Reply-To: <e0d2d2870708110935j342c06e8u90553e65c2c3dfe9@mail.gmail.com>


Thank you for your last memo to Paul! It really explained the assumptions
behind the algorithms. Now I understand.
I'll try to add something simple about variations attributable to sample
size. Most of my career was in Market Research, so I know just how screwy
data can become when we don't understand the concepts.
Cheers
Dick


-----Original Message-----
From:
[mailto:] On Behalf Of David Ewing
Sent: Saturday, August 11, 2007 12:35 PM
To:
Subject: Re: [DNA-R1B1C7] Network.exe

Paul, responding to your message beginning, "Great links:"

To your first question, yes, but everything turns on the word "probably."
Consider a lineage in which there is no mutation for 10 generations, then
one finally occurs. These calculations will yield the result that the man
whose son first showed the mutation is "probably more closely related" to
his 8th great grandfather than he is to his own son. I am beginning to think
that average mutation rates and calculations based on them should only be
used when talking about large enough data sets. I am no statistician, but as
I recall, "large enough" is something that can be calculated, too, and it
seems that the numbers we are considering in our family projects are not
large enough.

To your second question, almost. I say "almost" because calculating observed
mutation rates doesn't depend on comparing anything to probabilities--it is
just a matter of seeing what in fact has happened. I did the calculation as
a part of adding family group data to Charles Kerchner's website, where he
has been collecting data on the mutation rates in family groups with well
worked out conventional genealogies. To calculate an observed mutation rate,
one simply divides the genetic distance between two individuals by the
number of transmission events separating them, and then divides that by the
number of markers that are being considered. Again, think about the case
where a mutation first shows up in the son of a man who doesn't have it. The
genetic distance is 1, there is 1 transmission event, and if we consider
only the one marker the observed rate is 1, or 100% per generation. If we
consider 37 markers, the observed "average" mutation rate over 37 markers is
1/37 = 0.027, or 2.7% per generation. But it is just plain silly to
calculate averages based on a single case.

John, responding to your message that immediately followed Paul's: You ask,
"My question would be how do you go from Anne's .002 mutation rate in her
calculator to the mutation rate used by Trinity College in Network.exe?" The
mutations of interest occur only at the time of procreation (or rather, only
at the time that a sperm is manufactured that is destined to be successful
in giving rise to a male child), so any statement of mutation rates in terms
of years depends on an estimate of average age of the father at the time of
conception. Otherwise, the 0.002 number and the 0.00069 number are exactly
comparable, and you are correct to multiply these by the number of markers
considered.

I should say that the 12 descendants of John Ewing of Carnashannagh in our
project are on average 7.4 generations from him, and he was born c1648, 359
years ago. He didn't father any children on the day he was born, obviously,
and we didn't test new-born babies, but 359/7.4 = 48.5 years. Just picking
numbers from the sky, let's suppose he was 40 on average when his children
were born and our project participants were 60 when they were tested--still,
we are talking about 35 years for the average generation time.

So which mutation rate should we use? Because different markers have
different mutation rates, this depends on which markers we are using. I
don't have at my finger tips a list of the markers Zhivotovsky was using,
but my recollection that it was a relatively small number of markers, and
not the same ones we are now using. You must have seen John Chandler's paper
in JOGG about locus specific mutation rates for the markers in the FtDNA
standard 37-marker panel (www.jogg.info/22/Chandler.pdf). I spent quite a
bit of time looking at these and determined the following:

There are 15 markers with rates <0.002; the slowest (DYS426) has a rate of
0.00009 and the average of these 15 is 0.00088.

There are 23 markers with rates <0.004; the average of these 23 is 0.00148.
(This includes the slowest 15 mentioned above. I didn't calculate the
average for just the 8 not included in the 15, but they range from GATA-H4
at 0.0028 to DYS442 at 0.00324.)

There are 32 markers with rates <0.008; averaging 0.00265. (Again, this
includes all the slower markers already mentioned. The 9 additional ones
range from DYS460 at 0.00402 to DYS570 at 0.00790.)

The entire 37-marker panel has an average mutation rate of 0.00492 according
to Chandler's calculations. By far the fastest rates are CDYa/b at 0.0351,
triple the next fastest, which is DYS576 at 0.01022.

Bottom line, I think if we are comparing full 37-marker panels, we should
not be using average per locus rates as low as 0.002 and certainly not
0.00069. And we should probably not be using 25 year generation times, at
least not in the last 400 years among Scottish Presbyterians. I am by no
means expert, but I think if we want to do calculations about the MRCA of
the R1b1c7 McLaughlins and Ewings, we should use fewer markers and
Chandler's rates.

Try this:
393, 19, 426, 388, 389-1, 392, 459a, 459b, 455, 454, 437, 448, YCA-IIa/b,
and 438 with an average mutation rate of 0.00088 per locus, or 0.0132 for
the 15-marker panel. I think I'll go to work on this now, but I'm not sure
I'll have time to finish before other responsibilities shoulder this project
aside.

David Ewing

-------------------------------
To unsubscribe from the list, please send an email to
with the word 'unsubscribe' without the
quotes in the subject and the body of the message

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.11.13/946 - Release Date: 8/10/2007
3:50 PM


No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.11.13/946 - Release Date: 8/10/2007
3:50 PM



This thread: