DNA-R1B1C7-L Archives

Archiver > DNA-R1B1C7 > 2007-08 > 1187019705


From: "Ken Nordtvedt" <>
Subject: Re: [DNA-R1B1C7] Network.exe
Date: Mon, 13 Aug 2007 09:41:45 -0600
References: <e0d2d2870708130732i11b8a06oa7855c9ac81b1920@mail.gmail.com>


----- Original Message -----
From: "David Ewing" <>


> Ken Nordtvedt said, "Fast markers actually should have better statistics
> connected with their variance than slow markers. This is so because there
> are more mutational events. The superiority of fast markers in this
> respect
> is reflected in the recommended way to combine the variances or mutational
> counts of the whole set of markers employed."
>
> Interesting. I'm not clear about what "better statistics connected with
> their variance" means, though.

Let's say two markers have a ratio of 4 to 1 in their mutation rates. Then
in some tree of descendancies the fast marker will on average have four
times as many mutations occuring (at random locations) than the slow marker.
The statistical flucuations of the fast marker will be only twice as large
in number of mutations, and that translates in fractional terms for the fast
marker to have half the fractional size of its number of mutations --- for
example, 25 plus or minus 5 mutations versus 100 plus or minus 10 mutations.
First case the standard deviation is plus or minus 20 percent, while second
case standard deviation is plus or minus 10 percent.

So consider a TMRCA estimate. If you did it using one batch of markers of
mutation rate "m", and then did it with another batch of markers with
mutation rate "4m", the confidence interval for the latter estimate should
be half as wide as in the former estimate. Because TMRCA is to first
approximation simply proportional to number of mutations.

Ken



When I did the little exercise of estimating
> TMRCA for McLaughlin v. Ewing with progressively larger sets of markers
> having progressively higher average mutation rates (per Chandler), the
> most
> likely TMRCA in generations declined at each step: 85, 48, 40, 34 (for 15,
> 23, 32 and 37 markers respectively). Why should this be so? Is it because
> of
> the number of markers, the average mutation rate, or just coincidental?
> And
> does this mean that we might get more fruitful (not "prefered," but rather
> more believable) results by considering only the 15 fastest mutating
> markers
> rather than the 15 slowest?

I don't know what is going on in your above case. Did you multiply average
mutation rate by number of markers to get total mutation rate? It is the
latter that goes into the equation. You should have better confidence
intervals using fast markers than slow markers (assuming same number of
markers) Total mutation rate is the quality factor. Of course, some fast
markers are flaky, so that's a problem. And fast markers can lead to unseen
back mutations more often. I prefer using a large number of slow mutators.



>
> Here is a another pair of questions for you, Ken. I commented before that
> it
> would be silly to calculate an "average" mutation rate based on a single
> observation of a mutation distinguishing father and son. It wouldn't be
> much
> less silly if you had two such. How many observations would you need in
> order for it not to be silly?

Sometimes you have to estimate (not calculate) a mutation rate based on a
single observation. Your confidence interval is going to be broad, however.
In a wartime situation or other existential situation one does the best one
can with the information available, and that could mean concluding what you
can from a single observation. So what is "silly" depends on the context
and purpose and what conclusions you draw from the very limited information.

And mutation rates should probably not be thought of as "average". These
little molecular machines probably just have a certain probability of
mis-copying each time they carry out the task. Would you say a dice has an
"average" rate of coming up "5"? I would not; I'd say it has a rate or
probability of doing so, roll by roll. If someone gave me odds more
favorable than 6 to 1 for getting a "5" on a single roll, I'd take it --- no
averaging in sight.








And I think maybe my next question is just the
> same question turned on its head. How many surname project participants
> does
> one need to begin meaningful calculations of TMRCA?

If there are a bunch of surname project participants there is generally a
wide variety of supplemental information of relationships between the lines.
This severely complicates doing the proper TMRCA estimate of one of the many
things one could do that for, assuming you want to feed all the
supplementary information into the estimate.

Remember: the classic TMRCA for two present-day people, given their two
haplotypes and no other information, is a very special situation. That is
what renders the estimate so simple. As soon as you just go to three,
there are many possible variations --- meaning complexities --- of the
calculations you can have concerning the tree connecting those three.
There's lots of information you might have which could be tossed into the
calculation and which would affect the results.

>
> Take the Ewing project as an example. We have 42 R1b1c7 Ewings that are
> within genetic distance 4 of their own 37-marker modal. Is that a large
> enough sample to expect meaningful results? Twelve of these men are known
> on
> the basis of conventional genealogy to be in one kindred, having a common
> ancestor 7.4 generations ago on average; another kindred has five men who
> have common ancestor 8 generations ago on average, and another kindred has
> four men who have a common ancestor 8.25 generations ago on average. Does
> this change anything about how many men we need in the project to do
> meaningful TMRCA calculations?

You have enough men to do a TMRCA. Afterall, you can do it with two. But
your above example shows the kind of information you know about the tree to
start with, and that information will change the most likely situation for
any question you might ask.

Ironically, the simplest situations are those with an assumption of maximum
ignorance of additional factors, and those seem the cases for which the
simple mathematical formulas are thrown out to the customers for their use.
>
> David Ewing
>
> -------------------------------
> To unsubscribe from the list, please send an email to
> with the word 'unsubscribe' without the
> quotes in the subject and the body of the message
>



This thread: