# Interpreting DNA Mixtures in Structured Populations

Download Interpreting DNA Mixtures in Structured Populations

## Preview text

James M. Curran,1 Ph.D.; Christopher M. Triggs,2 Ph.D.; John Buckleton,3 Ph.D.; and B. S. Weir,1 Ph.D.

Interpreting DNA Mixtures in Structured Populations

REFERENCE: Curran JM, Triggs CM, Buckleton J, Weir B.S. Interpreting DNA mixtures in structured populations. J Forensic Sci 1999;44(5):987–995.

ABSTRACT: DNA profiles from multiple-contributor samples are interpreted by comparing the probabilities of the profiles under alternative propositions. The propositions may specify some known contributors to the sample and may also specify a number of unknown contributors. The probability of the alleles carried by the set of people, known or unknown, depends on the allelic frequencies and also upon any relationships among the people. Membership of the same subpopulation implies a relationship from a shared evolutionary history, and this effect has been incorporated into the probabilities. This acknowledgment of the effects of population structure requires account to be taken of all people in a subpopulation who are typed, whether or not they contributed to the sample.

KEYWORDS: forensic science, DNA typing, interpretation, mixed DNA profiles, population structure, likelihood ratios

The interpretation of DNA profiles from more than one contributor is one of the most challenging tasks facing forensic scientists. Part of the complexity is due to the very large number of combinations of genotypes that must be considered in some situations, although a body of theory for a coherent treatment of mixed stains is now available (1–3). For a defendant who is not excluded from a mixed stain this theory avoids the potential prejudice that can follow from simplistic “random man not excluded” arguments.

In some cases, the typing technology may allow complexity to be avoided. When fragments are detected in ways that allow semiquantitation of the amount of DNA for each allele it may be possible to determine which alleles are from the same contributor. Examples include fluorescently-labeled length variants detected by lasers, or silver staining to detect band intensity on a gel. There can still be doubt, however, especially when different people contribute more or less equally to the mixture and such problems increase with the number of contributors. As long as a quantitative assessment of the evidentiary strength of DNA mixtures is required, we believe that there will be a need for analyses that consider all possible sets of genotypes that would lead to the mixture profile.

Our previous treatment (3) assumed independence of all the alleles in the mixed profile. This means independence within indi-

1 Program in Statistical Genetics, Department of Statistics, North Carolina State University, Raleigh, NC.

2 Department of Statistics, University of Auckland, Private Bag 92019, Auckland, New Zealand.

3 ESR, Private Bag 92021, Auckland, New Zealand. Received 3 Feb. 1998; and in revised form 24 Aug. 1998; accepted 15 Dec. 1998.

viduals, implying Hardy-Weinberg and linkage equilibrium, as well as independence between individuals, meaning that the contributors are unrelated. Although these assumptions may be adequate in many situations, they ignore the low-level dependence among alleles within the same population due to evolutionary forces. Two people within the same population must have common ancestors at some point in the past, the point being closer for smaller populations, and this imposes a dependence between their alleles. A necessary corollary to this evolutionary relationship is the low degree of inbreeding among offspring of two parents from the same population. It is this logic that leads to the necessity of working with conditional profile probabilities rather than the profile probabilities themselves, and it is what led to Recommendation 4.2 of the second NRC report (2). Instead of determining the probability of finding a profile in a random member of a population, it is necessary to determine the probability of finding the profile given that the profile has been seen once already. Conditional probabilities take explicit account of allelic dependencies.

In this paper we extend our previous treatment to allow for the dependencies among all the alleles carried by the contributors to the mixture. Initially we will assume that all contributors belong to the same population, as this is likely to maximize the effects we are considering. We will also adopt the relatively simple formulation for the probabilities of sets of alleles advocated by Balding and Nichols (4). Less restrictive treatments (5) would be unwieldy. Although we do not expect the population structure effects that we are considering will be substantial, we believe that they should be considered for mixed DNA stains to the same extent that they are considered for single stains.

Likelihood Ratios

Likelihood ratios have been recognized by authors of several recent books as the appropriate way of interpreting evidence (6–11). At a trial there will be alternative hypotheses or propositions about who contributed to this evidence: the prosecution will have proposition Hp, and here we will suppose there is a single alternative proposition Hd. The likelihood ratio LR is

Pr(Evidence | Hp) LR ϭ ᎏ Pr(Evideᎏ nce | Hd)

The DNA evidence E for mixed-stain cases is the set of alleles found among all the people who have either been typed directly or whose type is inferred because they are considered to have contributed to the stain. Previously (3) we took E to mean only the alleles in the stain, but the addition now of the alleles from people who may have been typed even though they are hypothesized not

987

988 JOURNAL OF FORENSIC SCIENCES

to have contributed to the stain is necessary to allow for the effects of population structure.

We will make a distinction between the genetic profile, which is simply a listing of the distinct alleles in the mixture, and the statistical profile which is a list of all 2n alleles when there are n contributors. These two profiles will be different whenever some contributors are homozygous, or when some contributors share alleles. We will ignore the possibility of null alleles so that only homozygous individuals contribute a single allele to a genetic profile.

We will use much of our previous notation (3), and repeat our observation that the interpretation of a mixed stain genetic profile requires a specification of the known contributors to the profile and of the number of unknown contributors. We will derive results for single loci and then multiply likelihood ratios over loci.

As an example, suppose the evidentiary sample in a single-perpetrator rape case shows three alleles a, b, c at some locus. The sample was recovered from the victim’s person, she was found to be of type ab and a suspect was found to be of type c. The prosecution proposition is likely to be Hp: “The victim and the suspect were the only contributors to the sample,” and a likely alternative proposition is Hd: “The victim and some unknown man were the only contributors to the sample.” The usual solution (2,3) for this situation is

1

LR ϭ ᎏ pc(2paᎏ ϩ 2pbᎏ ϩ pc)

(1)

where the p’s are the allele frequencies. We now derive this result from the perspective of this paper, first with population structure ignored.

Under proposition Hp only the victim and suspect are involved and they have both been typed. The DNA evidence is therefore the genotype pair (ab, cc). We write the probability of this pair as Pr(ab, cc) ϭ 2 Pr(abcc). The approach we are taking assigns probabilities to sets of alleles without regard to the arrangement of alleles among individuals, but we do need a factor of “2” for the heterozygous victim. Had the victim been ab and the suspect bc we would have required the probability 4 Pr(abbc) since there are then two heterozygotes. When population structure is ignored, as it was previously (3), the probability of a set of alleles is just the product of frequencies of the separate alleles, so Pr(abcc) ϭ pa pb pc2. The numerator of LR is, therefore,

Pr(E | Hp) ϭ 2pa pb pc2

(2)

Note that, because the victim and suspect are both known individuals, there is no need to consider the 2! orders of these two people as was erroneously done in in the first printing of (7).

Under proposition Hd there are three people to consider: the suspect of genotype cc who did not contribute to the sample, and the victim of type ab plus the perpetrator of unknown genotype who both did contribute to the sample. Examination of the profiles of the sample and the victim shows that the unknown man must have allele c and may also have alleles a, b or c. There are a total of six alleles in E, and the probability is Pr(ab, cc, ac) ϩ Pr(ab, cc, bc) ϩ Pr(ab, cc, cc) or 4 Pr(aabccc) ϩ 4 Pr(abbccc) ϩ 2 Pr(abcccc). The denominator of LR is

Pr(E | Hd) ϭ 4p2a pb pc3 ϩ 4pa p2b pc3 ϩ 2pa pb pc4

(3)

The factors of 2 or 4 are because of the one or two heterozygotes. Dividing Eq 2 by Eq 3 leads to the previously known result given in Eq 1.

It will be helpful to modify this example before proceeding fur-

ther. Suppose the profiles are the same as just discussed, except that now the sample is not from the victim’s person (e.g., it may be from discarded clothing) and the alternative to Hp is specified as Hd: “Two unknown people were the contributors to the sample.” Under this proposition, there are four people involved: the victim and suspect, neither of whom contributed to the sample, and two unknown people who were the contributors. These last two people must have alleles abc between them but cannot have any other alleles. The possible combinations of genotypes for the unknown people are (aa, bc), (ab, ac), (ab, bc), (ab, cc), (bb, ac), (ac, ab), (ac, bb), (ac, bc), (bc, aa), (bc, ab), (bc, ac), and (cc, ab). These 12 combinations represent three distinct sets of alleles: aabc, abbc, abcc, and each set has a coefficient of 12 which is the number of ways of arranging the four alleles into two different genotypes. The coefficient includes the effects of the two orders of alleles within heterozygotes as well as the two orders of different genotypes such as aa, bc and bc, aa. The probabilities of all eight alleles among the four people involved are obtained by multiplying the probabilities 12 Pr(aabc), 12 Pr(abbc), 12 Pr(abcc) by the probability Pr(ab, cc) ϭ 2 Pr(abcc) of the victim and suspect, and can be written as 24 Pr(aaabbccc) ϩ 24 Pr(aabbbccc) ϩ 24 Pr(aabbcccc) so that

Pr(E | Hd) ϭ 24p3a p2b pc3 ϩ 24p2a p3b pc3 ϩ 24p2a p2b pc4 (4)

Dividing Eq 2 by Eq 4 gives the LR for this situation as

LR

ϭ

Pr(E | Hp) ᎏ Pr(E ᎏ | H )

ϭ

ᎏᎏ 1 ᎏ 12pa pb pc( pa ϩ pb ϩ pc)

(5)

d

as has been given before (3). We now modify the solutions in Eqs 1 and 5 to accommodate the

situation where all people, the victim, the suspect and (under Hd) the unknown person(s), belong to the same subpopulation. Probabilities for the genotype(s) of the unknown person(s) must take into account the knowledge that two people in this subpopulation have been found to have genotypes ab and cc.

For both scenarios, Hp is that only the victim and suspect were the contributors to the sample. We will show that the required term Pr(abcc) is given by

[(1 Ϫ )pa][(1 Ϫ )pb][(1 Ϫ )pc][(1 Ϫ )pc ϩ ] Pr(abcc) ϭ ᎏᎏ (1 Ϫᎏ )(1)(1 ϩᎏ )(1 ϩᎏ 2) ᎏ

where is the coancestry coefficient in the subpopulation to which the victim and suspect both belong.

For the denominator in the first scenario, which is that the victim and an unknown person contributed to the sample but the suspect did not, there are three people and six alleles to consider. We will show, for example, that

Pr(aabccc)

[(1 Ϫ )pa][(1 Ϫ )pa ϩ a][(1 Ϫ )pb] ϫ [(1 Ϫ )pc][(1 Ϫ )pc ϩ ][(1 Ϫ )pc ϩ 2]

ϭ ᎏ (1 Ϫᎏ )(1)(1 ᎏ ϩ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ϩ ᎏ 4)

where is the coancestry coefficient for the subpopulation to which all three people belong. These expressions lead to

(1 ϩ 3)(1 ϩ 4) LR ϭ [ᎏ (1 Ϫ )ᎏ pc ϩ 2ᎏ ][(1 Ϫ ᎏ )(2pa ϩᎏ 2pb ϩ pᎏ c) ϩ 7] (6)

which reduces correctly to Eq 1 when ϭ 0. For the second scenario, where both contributors to the sample

are unknown under Hd, we need terms such as Pr(aaabbccc), and we will show that

Pr(aaabbccc) ϭ ᎏ X aXᎏ bX c Y

where

X a ϭ [(1 Ϫ )pa][(1 Ϫ )pa ϩ ][(1 Ϫ )pa ϩ 2] X b ϭ [(1 Ϫ )pb][(1 Ϫ )pb ϩ ] X c ϭ [(1 Ϫ )pc][(1 Ϫ )pc ϩ ][(1 Ϫ )pc ϩ 2] Y ϭ (1 Ϫ )(1)(1 ϩ )(1 ϩ 2)(1 ϩ 3)(1 ϩ 4)(1 ϩ 5)

ϫ (1 ϩ 6)

so that LR becomes

LR ϭ ᎏ 12[(1 Ϫ(ᎏ 1)pϩa ϩ3)(ᎏ ]1[(ϩ1 Ϫ4)ᎏ ()1pbϩϩ5])[ᎏ ((11ϩϪ6))pᎏ c ϩ 2] (7) ϫ[(1 Ϫ )(pa ϩ pb ϩ pc) ϩ 7]

and this reduces to Eq 5 when ϭ 0. What is the numerical effect of using Eq 6 instead of Eq 1? When

allele frequencies are all relatively small at 0.1 and has the relatively high value of 0.03, the LR drops from 20 to 12.33. Multiplying values from Eq 6 over several loci can give quite large LR values, but they will be less than those from Eq 1 in which population structure is ignored.

The approach we have just illustrated is as follows. Alternative propositions are needed that specify the numbers of contributors to the evidentiary sample. Some of these contributors will be known and typed people, and some will be unknown people. Those contributors, together with any typed people who are known (under the proposition) not to be contributors, contain among them a set of alleles whose probability can be written down as the product of the separate allele proportions or as a more complicated function that incorporates the population structure parameter . There is also a factor of 2 for each known heterozygote, and a term for the number of ways of arranging all 2x alleles from x unknown people into pairs. There may be different sets of alleles from unknown people under some propositions, and the probabilities for these sets must be added together. The likelihood ratio is the ratio of probabilities under alternative propositions. As additional examples, we list the results for each of the common cases described in (7) in the Appendix.

Although it is possible to follow the above line of argument for any situation, we prefer to work with a general approach amenable to automatic (computer-based) calculation as we did previously (3). This will relieve the forensic scientist of the need for lengthy calculations in the same way that computer programs such as POPSTATS can be used for other DNA calculations. We will lay out the logic behind this general approach even though we anticipate the routine use of computer packages.

In order to do this we need to break the problem into two parts; we list the alleles, with their multiplicities, carried by the unknown contributors under Hp or Hd, and then we determine the probabilities of the allele sets. The two probabilities lead to the likelihood ratio. It is our use of the theory in (4) that allows us to concentrate on alleles rather than genotypes.

Notation

Much of the complexity in dealing with mixtures can be removed by a mnemonic notation, as laid out in Table 1. We find it very helpful to label the alleles at a locus A by the letters Ai. There are sets of alleles (not necessarily distinct—the statistical profiles)

CURRAN ET AL. • INTERPRETING DNA MIXTURES 989

TABLE 1—Notation for mixture calculations.

Alleles in the profile of the evidence sample.

C The set of alleles in the evidence profile.

Cg The set of distinct alleles in the evidence profile.

nC The known number of contributors to C.

hC The unknown number of heterozygous contributors.

c The known number of distinct alleles in Cg.

ci The unknown number of copies of allele Ai in C.

1

Յ

ci

Յ

2nC,

Σc iϭ1

ci

ϭ

2nC

Alleles from typed people that H declares to be contributors.

T The set of alleles carried by the declared contributors to C.

Tg The set of distinct alleles carried by the declared contributors.

nT The known number of declared contributors to C.

hT The known number of heterozygous declared contributors.

t

The known number of distinct alleles in Tg carried by nT declared

contributors.

ti The known number of copies of allele Ai in T.

0

Յ

ti

Յ

2nT,

Σc iϭ1

ti

ϭ

2nT.

Alleles from unknown people that H declares to be contributors.

U The sets of alleles carried by the unknown contributors to C.

x The specified number of unknown contributors to C: nC ϭ nT ϩ x.

c Ϫ t The known number of alleles that are required to be in U.

r The known number of alleles in U that can be any allele in Cg,

r ϭ 2x Ϫ (c Ϫ t).

nx The number of different sets of alleles U, nx ϭ (c ϩ r Ϫ1)!/

[(cϪ1)!r!].

ri The unknown number of copies of Ai among the r unconstrained

alleles in U.

0

Յ

ri

Յ

r,

Σc iϭ1

ri

ϭ

r.

ui The unknown number of copies of Ai in U: ci ϭ ti ϩ ui,

Σc iϭ1

ui

ϭ

2x.

If Ai is in Cg but not in Tg: ui ϭ ri ϩ 1. If Ai is in Cg and also in Tg:

ui ϭ ri.

Alleles from typed people that H declares to be non-contributors. V The set of alleles carried by typed people declared not to be

contributors to C. nV The known number of people declared not to be contributors to C. hV The known number of heterozygous declared non-contributors.

vi The known number of copies of Ai in V: Σi vi ϭ 2nV.

that occur in the crime sample (C ). For a particular proposition there are alleles (T ) carried by typed people declared to be contributors and alleles (U ) carried by unknown contributors to the sample, and there are alleles (V ) carried by any people declared not to have contributed to the sample. There are corresponding sets of distinct alleles—the genetic profiles—and these sets are indicated by a g subscript. Note that the same person may be declared to be a contributor to the sample under one proposition, and declared not to be contributor under another proposition.

Allele Sets

The alleles in the evidence profile are carried by typed people declared to be contributors or unknown people, so that C is the combination (union) of sets T and U. For a given proposition, the probability of the evidence profile depends also on the alleles carried by people who have been typed but are declared by that proposition not to have contributed to the profile. For a proposition in which there are x unknown contributors, we write the probability as Px (T, U, V ) in an extension of our previous notation (3). Note, however, that the present probability is for all the alleles in the sets T, U, V whereas the probability in (3) was for only the alleles in U conditional on those in T. In the total set of 2nC ϩ 2nV ϭ 2nT ϩ 2nU ϩ 2nV alleles, we see from Table 1 that allele Ai occurs ci ϩ

990 JOURNAL OF FORENSIC SCIENCES

vi ϭ t i ϩ ui ϩ vi times. We add the probabilities over all possible nx ϭ (c ϩ r Ϫ 1)!/[(c Ϫ 1)!r!] distinct sets of ui. As listed in Table 1, c is the number of distinct alleles in Cg and r is the number of alleles carried by unknown people that can be any one of these c al-

leles.

Generating the nx sets U is a two-stage process. Some of the alleles in each set must be present: these are the alleles in the set Cg that are not in set Tg. Other alleles are not under this constraint because they already occur in Tg, and there are ri copies of Ai alleles in this unconstrained set. It is a straightforward computing task to

let r1 range over the integers 0, 1, . . ., r, then let r2 range over the integers 0, 1, . . ., r Ϫ r1, then let r3 range over the integers 0, 1, . . ., r Ϫ r1 Ϫ r2, and so on. The final count rc is obtained by subtracting the sum of r1, r2, . . ., rcϪ1 from r. The total number of Ai alleles in set U is ∑ciϭ1 ui ϭ 2x where ui ϭ ri for those alleles in both Cg and Tg, and ui ϭ ri ϩ 1 for alleles in Cg but not in Tg.

For any ordering of the 2x ϭ ∑i ui alleles in U, successive pairs of alleles can be taken to represent genotypes and there are (2x)!/(∏ciϭ1 ui!) possible orderings. This is the number of possible sets of unknown genotypes that have each allelic set U. Although it

is the genotypes that correspond to the x unknown people, it is the

set of 2x alleles that we use to determine the probability, in combination with the 2nT ϩ 2nV alleles among the known people. Because the nT typed people all have specified genotypes, we consider not all possible orderings of the 2nT alleles but just a factor of 2 for each heterozygote. Similarly, we need a factor of 2 for each het-

erozygote among the set of nV non-contributors (this corrects erroneous statements in (7)).

For the single-perpetrator rape example above, now writing alleles a, b, c as A1, A2, A3, the evidence sample set is Cg ϭ (A1, A2, A3) and c ϭ 3. Under Hd (the victim and one unknown person contributed to the mixed stain) the set from known people is T ϭ (A1, A2) and nT ϭ 1, t ϭ 2. The set from the unknown person must contain A3 since c Ϫ t ϭ 1, x ϭ 1, r ϭ 1, but can also contain any of the three alleles in set Cg: i.e. there are nx ϭ 3 different sets of alleles from the unknown person. We also considered the situation

where Hd is that the evidence stain was from two unknown people, x ϭ 2 and no known contributors, nT ϭ t ϭ 0. Now U must contain all three alleles A1, A2, A3, c Ϫ t ϭ 3, and the r ϭ 1 other allele can be any of these three. There are nx ϭ 3 different sets U. The counts of alleles A1, A2, A3 in these sets are, therefore, (2,1,1), (1,2,1), (1,1,2) and each of these can be ordered in 4!/(2!1!1!) ϭ 12 ways.

Allele Dependencies

We now consider how to attach probabilities to the sets of a

lleles discussed in the last section. We suppose that a state of

evolutionary equilibrium has been established, so that the proba-

bilities of sets of alleles can be found from the Dirichlet distribu-

tion (13). This distribution depends on allele proportions and the

coancestry coefficient. The statement that the relationship between

pairs of alleles in a subpopulation can be quantified by the coances-

try coefficient has several interpretations (12). Here we will take

it to mean that the probability that two alleles taken at random from

the

subpopulation

are

both

of

type

A

i

is

p

2 i

ϩ

pi

(1

Ϫ

pi

),

where

pi

is the allele frequency of Ai averaged over subpopulations. When

allele frequencies over populations follow the Dirichlet distribu-

tion, the probability of a set of frequencies {pi} for alleles Ai is

given by

∏ ⌫(␥.)

Pr({pi}) ϭ ᎏ ∏i ⌫ᎏ (␥i )

( pi )␥iϪ1

i

where

∑ ␥i ϭ (1 Ϫ )pi /, ␥. ϭ ␥i ϭ (1 Ϫ )/ i

and ⌫ is the gamma function with the property ⌫(x ϩ 1) ϭ x⌫(x). The great advantage of this Dirichlet distribution is that it allows the probability of any set of alleles to be found very simply. If the set has m i copies of Ai, then the probability is

∏ ∏ Pr

Ami

⌫(␥.) ϭ ᎏᎏ

⌫(m i ϩ ␥i ) ᎏᎏ

(8)

i i

⌫(m . ϩ ␥.) i ⌫(␥i )

where m . ϭ ∑i m i. This is the result upon which Eqs 4.10 in the 1996 NRC report (2) are based (4).

In our mixed-stain situation, there are ti ϩ ui ϩ vi copies of allele Ai, and the required probability is

∑ (2x)!2hT ϩhV

Px(T, U, V ) ϭ0ՅriՅr ᎏ ∏ci ϭᎏ 1 ui! ∑ciϭ1 riϭr

∏ ⌫(␥.)

c ⌫(␥i ϩ ti ϩ ui ϩ vi )

ϫ ⌫ᎏ (␥. ϩ 2ᎏ x ϩ 2nTᎏ ϩ 2nV ) iϭ1 ᎏᎏ ⌫(␥i ) ᎏ (9)

Summing over the {ri} values accounts for all nx sets U. Although this is a very compact expression, implementing it in a

computer program is easier after some expansion. From the properties of the gamma function ⌫(и) and the definition of ␥i

⌫(␥.) ᎏᎏᎏ

ϭ

ᎏᎏ 2xϩ2nᎏ T ϩ2nV ᎏ

⌫(␥. ϩ 2x ϩ 2nT ϩ 2nV )

∏

2x ϩ jϭ0

2

n

T

ϩ

2

n

V

Ϫ1

[(1

Ϫ

)

ϩ

j ]

⌫(␥i ϩ ti ϩ ui ϩ vi )

∏

t iϩu j ϭ0

i

ϩ

v

i

Ϫ1

[(1

Ϫ

)pi

ϩ

j ]

ᎏᎏ ⌫(␥i ) ᎏ ϭ ᎏᎏ tiϩuiϩvi ᎏ

We can also make the summation over {ri} values more explicit by showing the range of values of each ri. Equation 9 becomes

∑ ∑ ∑ r rϪr1 rϪr1Ϫ. . .ϪrcϪ2 (2x)!2hT ϩhV

Px(T, U, V ) ϭ

иии

r1ϭ0 r2ϭ0

rcϪ1ϭ0

ᎏ ∏ciϭᎏ 1 ui!

∏ ciϭ1

∏

t j

iϩu ϭ0

iϩ

v

i

Ϫ1

[(1

Ϫ

)pi

ϩ

j ]

ϫᎏ ∏ j2ϭx ϩ0 2nᎏ T ϩ2nV Ϫ1ᎏ [(1 Ϫ )ᎏ ϩ j ]

(10)

Likelihood ratios are formed as the ratios of two such probabilities, and we note that people declared to be contributors under one proposition may be declared to be non-contributors under the other. In other words, every person typed is declared to be either a contributor or a non-contributor. The number of people typed, and the alleles they carry among them, are the same for every proposition. For this reason, nT ϩ nV, hT ϩ hV and ui ϩ vi will be the same in the probabilities for each proposition. The term 2hT ϩhV will cancel out of the likelihood ratio, as will some of the terms in the products in the numerator and denominator of the right hand side of Eq 10.

If population structure is ignored, and is set to zero, Eq 10 reduces to

Px(T, U, V )

∑ ∑ ∑ ∏ r

ϭ

rϪr1

rϪr1Ϫ.

иии

.

.ϪrcϪ2

(2x)!2hT ϩhV ᎏᎏ

c

p t iϩuiϩv i i

r1ϭ0 r2ϭ0

rcϪ1ϭ0

∏ciϭ1 ui! iϭ1

This is equivalent to Eq 3 in our earlier treatment (3) and may be in a form more convenient for computation. Because of cancelation of terms in the likelihood ratio, it can be seen that nT, nV, hT, hV, ti, vi are not used when ϭ 0. In this case the value of LR depends only on the numbers and frequencies of the alleles carried by unknown contributors. There is no need to consider the genotypes of typed people, whether or not they contribute to the evidence sample. This is different to the situation where population structure is taken into account—then the genotypes of all typed people are needed.

In the degenerate case where there are no typed people, contributors or non-contributors, ti ϭ vi ϭ 0, then ui ϭ ri and the sum for ϭ 0 is just a multinomial expansion:

∑ Px(U) ϭ

c

2x

pi

iϭ1

Examples

We now consider an example where the evidence sample Cg ϭ (A1A2A3 A4) (c ϭ 4) is known to be from two perpetrators but only one suspect, of type A1A2, has been apprehended. Proposition Hp is that this suspect and one unknown person were the contributors, so T ϭ (A1A2) (nT ϭ 1, t ϭ 2) and U has only one possibility (nx ϭ 1): the two alleles A3A4. There are no known non-contributors, so V ϭ , nV ϭ 0 where denotes the empty set. The probability under Hp is

P1({A1A2}, {A3A4}, {})

∏ ϭ

ᎏ2!2ᎏ1

⌫(␥.) ᎏᎏ

4

⌫(␥i ϩ 1) ᎏᎏ

1!1! ⌫(␥. ϩ 4) iϭ1 ⌫(␥i )

4(1 Ϫ )3p1p 2 p3p4 ϭ ᎏ (1 ϩ )(ᎏ 1 ϩ 2)

Proposition Hd is that there are no known contributors, T ϭ , nT ϭ 0, there is one person known not be a contributor, V ϭ (A1A2), nV ϭ 1, and there are two unknown contributors who must carry all four alleles between them. Once again, there is only one possible

set U ϭ (A1A2A3A4), nx ϭ 1 and the probability is

P2({}, {A1A2A3A4}, {A1A2})

∏ ∏ ϭ

ᎏ4!2ᎏ 1

⌫(␥.) ᎏᎏ

2

⌫(␥i ϩ 2) ᎏᎏ

4

⌫(␥i ϩ 1) ᎏᎏ

1!1!1!1! ⌫(␥. ϩ 6) iϭ1 ⌫(␥i ) iϭ3 ⌫(␥i )

48(1 Ϫ )3p1p 2 p 3p4[(1 Ϫ )p1 ϩ ][(1 Ϫ )p 2 ϩ ] ϭ ᎏᎏ (1 ϩ )(ᎏ 1 ϩ 2)(ᎏ 1 ϩ 3)(ᎏ 1 ϩ 4) ᎏ

The likelihood ratio for this example is, therefore,

(1 ϩ 3)(1 ϩ 4) LR ϭ ᎏ 12[(1 Ϫ ᎏ )p1 ϩ ᎏ ][(1 Ϫ ᎏ )p 2 ϩ ]

which reduces to 1/(12p1p 2) when ϭ 0 as has been given previously (1,3).

A more complicated example is for a rape committed by three

men. Suppose that the evidence sample has alleles (A1, A2, A3, A4), the victim is of type A1A2 and a single suspect has type A3A3. Then two alternative propositions are; Hp: “The victim, the suspect and two unknown men contributed to the sample,” and Hd: “The victim and three unknown men contributed to the sample.”

The evidence genetic profile has c ϭ 4 alleles Cg ϭ (A1, A2, A3,

CURRAN ET AL. • INTERPRETING DNA MIXTURES 991

A4). Under proposition Hp there are t ϭ 3 distinct alleles Tg ϭ (A1, A2, A3) from two known contributors and no alleles from people known not to be contributors, V ϭ . For x ϭ 2 unknown contributors, the number of sets of r ϭ 3 alleles these people can carry in addition to the A4 allele they must have among them is n2 ϭ 6!/(3!3!) ϭ 20. The counts u1, u2, u3, u 4 for all four alleles A1, A2, A 3, A 4 among the two unknown men, together with the multiplicities [4!21]/ [u1!u2! u3!u 4!], are

0,0,0,4:2 0,0,1,3:8 0,0,2,2:12 0,0,3,1:8 0,1,0,3:8

0,1,1,2:24 0,1,2,1:24 0,2,0,2:12 0,2,1,1:24 0,3,0,1:8

1,0,0,3:8 1,0,1,2:24 1,0,2,1:24 1,1,0,2:24 1,1,1,1:48

1,2,0,1:24 2,0,0,2:12 2,0,1,1:24 2,1,0,1:24 3,0,0,1:8

Under proposition Hd there are t ϭ 2 alleles, T ϭ (A1, A2), from a known contributor (the victim) and two alleles V ϭ A3, A3 from a person (the suspect) known not to be a contributor. For x ϭ 3 unknown contributors, the number of sets of r ϭ 4 alleles these people can carry in addition to the A3, A4 alleles they must have among them is n3 ϭ 7!/(4!3!) ϭ 35. The counts u1, u2, u3, u4 for A1, A2, A3, A4, with coefficients [6!21]/(u1!u2!u3!u4!), for the 35 possible sets are:

0,0,1,5:12 0,0,2,4:30 0,0,3,3:40 0,0,4,2:30 0,0,5,1:12

0,1,1,4:60 0,1,2,3:120 0,1,3,2:120 0,1,4,1:60 0,2,1,3:120

0,2,2,2:180 0,2,3,1:120 0,3,1,2:120 0,3,2,1:12 0,4,1,1:60

1,0,1,4:60 1,0,2,3:120 1,0,3,2:120 1,0,4,1:60 1,1,1,3:240

1,1,2,2:360 1,1,3,1:240 1,2,1,2:360 1,2,2,1:360 1,3,1,1:240

2,0,1,3:120 2,0,2,2:180 2,0,3,1:120 2,1,1,2:360 2,1,2,1:360

2,2,1,1:360 3,0,1,2:120 3,0,2,1:120 3,1,1,1:240 4,0,1,1:60

For each proposition, the multiplicities are multiplied by the appropriate Dirichlet probabilities and the 20 or 35 terms added together. Obviously this is a task better suited for a computer.

Multiple Subpopulations

So far we have considered the situation where all people involved in the evidence interpretation have been in the same subpopulation. Other situations are likely, especially when victim and suspect belong to different racial groups. The same sets of alleles are involved as before, but now the probabilities need to be calculated separately for the alleles within each subpopulation.

We begin by returning to our first example of a single-perpetrator rape where the victim was of type A1, A2, the suspect was of type A3A3 and the evidence sample was A1A2A3. If there was reason to believe that the perpetrator was of the same racial type as the suspect, but of a different type from the victim, then the victim’s alleles need to be separated from those of the suspect and, under Hd, from the unknown perpetrator. Suppose that the victim belonged to racial group 1, with coancestry 1 for her subpopulation and allele frequencies p1, p 2, for A1, A2. Suppose also that the suspect and perpetrator belong to racial group 2, with coancestry coefficient 2 for their subpopulation and allele frequencies q1, q2, q3 for alleles A1, A2, A3. Suppose, further, that there is zero coancestry between alleles in different racial groups so that alleles in groups 1 and 2 can be treated independently.

992 JOURNAL OF FORENSIC SCIENCES

Under Hp, the probability is

P0({A1A2A3A3}, {}, {})

ϭ 2(1 Ϫ 1 )p1p 2 ϫ q3[(1 Ϫ 2 )q3 ϩ 2]

since the pair A 1A 2 from group 1 and the pair A 3 A 3 from group 2 are treated separately. Under Hd, one of the three components of P1({A1A2}, U, {A3A3}) is

P1({A1A2 }, {A1A3}, {A3, A3}) ϭ 2(1 Ϫ 1)p1p 2 2(1 Ϫ 2 )q2q3[(1 Ϫ 2 )q3 ϩ 2][(1 Ϫ 2)q3 ϩ 22]

ϫ ᎏᎏ(ᎏ 1 ϩ 2 )(ᎏ 1 ϩ 22 )ᎏᎏ

since the pair A1A2 from group 1 and the two pairs A1A3, A3 A3 from group 2 are treated separately. Equation 6 is replaced by

(1 ϩ 2)(1 ϩ 22) LR ϭ [ᎏ (1 Ϫ 2)ᎏ q3 ϩ 22ᎏ ][(1 Ϫ 2ᎏ )(2q1 ϩᎏ 2q 2 ϩ q3ᎏ ) ϩ 32]

The general Eq 10 can be modified to allow for different subpopulations. However, when any of the three sets T, U, V contains alleles from different subpopulations, as was the case in the example just considered, it will be necessary to introduce further notation. Each of the counts ti, ui, vi would need to be split into a component for each subpopulation, and the multiplicity coefficients would also need to be derived separately for each subpopulation.

Discussion

We offer this treatment of the effects of population structure on DNA mixture calculations to complement two previous treatments—the effects of population structure on single stains (2,4) and the interpretation of mixed stains without population structure (1,3). Our study therefore closes a gap in current DNA forensic interpretation.

Our treatment is based firmly on the use of likelihood ratios and the accompanying need for conditional probabilities. There is no alternative when the evidence is less than certain under the proposition Hp. Conditional probabilities are necessary to incorporate the known genetic nature of DNA profiles. The full meaning of profiles cannot be found without accounting for the role of evolution in shaping the probabilities of sets of profiles. The novel feature of this study lies in accounting for the information contained in the profiles of people who are declared not to have contributed to the evidence profile. This has arisen for the situation of a suspect, who is not excluded from the evidence profile, being declared not to be a contributor under proposition Hd.

The arguments made for incorporating non-contributors can be extended. Several people may be typed during the course of an investigation. Even if they are excluded as being contributors, they provide information for the probability calculations when they can be considered to belong to the same subpopulation as (some of) people not excluded. They make their contribution to the calculation via allelic set V.

Our treatment has assumed a specific number of unknown contributors, but we realize that this number is very likely not to be known. Although some general statements about conservative assumptions can be made (3), such as assuming large numbers of unknown people for loci with few alleles and small numbers of unknown people for loci with many alleles, we prefer not to formulate rules. Instead we recommend the calculation of likelihood ratios

under plausible ranges of numbers, and the reporting of the more conservative results.

We have not allowed for unseen, or “null,” alleles as has been done previously (2,3) because the move away from RFLP technology in forensic science has diminished the need for such a treatment. We have not considered other typing-system features such as intensity or peak height differences as these have been discussed elsewhere. However, we do consider that the approach described here is sufficiently flexible to allow the interpretation of many different mixed-stain DNA profiles.

Software for conducting the calculations described in this paper can be obtained directly from the World Wide Web page www.stat.ncsu.edu (click on “Statistical Genetics”) or by sending email to [email protected]

Acknowledgments

This work was supported in part by a postdoctoral fellowship from the New Zealand Foundation for Research in Science and Technology to JMC, and by NIH grant GM 45344 to North Carolina State University. John Storey wrote computer software to implement the calculations in the paper.

References

1. Evett IW, Buffery C, Wilcott G, Stoney D. A guide to interpreting single locus profiles of DNA mixtures in forensic cases. J Forensic Sci Soc 1991; 31:41–7.

2. National Research Council. DNA technology in forensic science. Washington, DC: National Academy Press 1992.

3. Weir BS, Triggs CM, Starling LI, Walsh KAJ, Buckleton J. Interpreting DNA mixtures. J Forensic Sci 1997;42:213–22.

4. Balding DJ, Nichols RA. DNA profile match probability calculations: How to allow for population stratification, relatedness, database selection and single bands. Forensic Sci Int 1994;64:125–40.

5. Weir BS. The effects of inbreeding on forensic calculations. Ann Rev Genet 1994;28:597–621.

6. Aitken CGG. Statistics and the evaluation of evidence for forensic scientists. New York: Wiley 1995.

7. Evett IW, Weir BS. Interpreting DNA evidence: Statistical genetics for forensic science. Sunderland, MA; Sinauer 1998.

8. Faigman DL, Kaye DH, Saks MJ, Sanders J. Modern scientific evidence: The law and science of expert testimony. St. Paul, MN; West 1997.

9. Robertson B, Vignaux GA. Interpreting evidence: evaluating forensic science in the courtroom. Chichester, UK; Wiley 1995.

10. Royall R. Statistical evidence: A likelihood paradigm. London; Chapman and Hall 1997.

11. Schum DA. Evidential foundations of probabilistic reasoning. New York; Wiley 1994.

12. Weir BS. The coancestry coefficient in forensic science. Proc 8th Int Symp Human Identification. Madison, WI; Promega 1998.

13. Wright S. The genetical structure of populations. Ann Eugen 15:323–54.

Additional information and reprint requests: Bruce S. Weir, Ph.D. North Carolina State University Dept of Statistics PO Box 8203 Raleigh, NC 27695-8203

APPENDIX

In this Appendix we show the effects of population structure for each of the six common situations described in Chapter 7 of (7). A

diagram for the profiles in each case is shown in Fig. 1, and in each case setting ϭ 0 reduces the result to the one given in (7).

Case 1: Four-Allele Mixture, Heterozygous Victim, and Heterozygous Suspect

The victim is of type A3A4, the suspect is of type A1A2, and the crime sample of type A1A2A3A4. The two propositions are

Hp: The victim and the suspect contributed to the stain. Hd: The victim and an unknown person contributed to the stain.

The evidence sample is C ϭ Cg ϭ (A1A2A3A4) and c ϭ 4. Under Hp, the alleles from known contributors are T ϭ Tg ϭ

A1A2A3A4 and nT ϭ 2, hT ϭ 2, t ϭ 4. There are no alleles from unknown contributors or from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, the alleles from known contributors are T ϭ Tg ϭ A3A4 and nT ϭ 1, hT ϭ 1, t ϭ 2. The alleles from unknown con-

CURRAN ET AL. • INTERPRETING DNA MIXTURES 993

tributors are constrained to be U ϭ A1A2 and x ϭ 1, r ϭ 0. The alleles from people declared not to be contributors are V ϭ A1A2 and nV ϭ 1, hV ϭ 1.

The required probabilities are Hp: P0({A1A2A3A4}, , )

22(1 Ϫ )4p1 p2 p3 p4 ϭ (ᎏ 1 Ϫ )(ᎏ 1 ϩ )(ᎏ 1 ϩ 2)

Hd: P1({A3A4}, {A1A2}, {A1A2}) 222!(1 Ϫ )4p1 p2 p3p4[(1 Ϫ )p1 ϩ ][(1 Ϫ )p2 ϩ ]

ϭ ᎏ 1!1!(ᎏ 1 Ϫ )(1ᎏ ϩ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ϩᎏ 4)

and the likelihood ratio is (1 ϩ 3)(1 ϩ 4)

LR ϭ ᎏ 2[(1 Ϫ ᎏ )p1 ϩ ]ᎏ [(1 Ϫ )ᎏ p2 ϩ ]

FIG. 1

Case 2: Three-Allele Mixture, Homozygous Victim, and Heterozygous Suspect

The victim is of type A3, the suspect is of type A1A2, and the crime sample of type A1A2A3. The two propositions are

Hp: The victim and the suspect contributed to the stain. Hd: The victim and an unknown person contributed to the stain.

The evidence sample is C ϭ (A1A2A3A3), so Cg ϭ (A1A2A3) and c ϭ 3.

Under Hp, the alleles from known contributors are Tg ϭ A1A2A3 and nT ϭ 2, hT ϭ 1, t ϭ 3. There are no alleles from unknown contributors or from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, the allele from known contributors is Tg ϭ A3 and nT ϭ 1, hT ϭ 0, t ϭ 1. The alleles from the unknown contributor are constrained to include A1A2, and x ϭ 1, r ϭ 0. The alleles from the person declared not to be a contributor are V ϭ A1A2 and nV ϭ 1, hV ϭ 1.

The required probabilities are

Hp: P0({A1A2A3A3}, , )

21(1 Ϫ )3p1 p2 p3[(1 Ϫ )p3 ϩ ] ϭ ᎏ (1 Ϫᎏ )(ᎏ 1 ϩᎏ )(1 ϩ ᎏ 2)

Hd: P1({A3A3}, {A1A2}, {A1A2}) 212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p2 ϩ ][(1 Ϫ )p3 ϩ ]

ϭ ᎏ 1!1!(1 Ϫᎏ )(1 ϩ ᎏ )(1 ϩ 2)ᎏ (1 ϩ 3)(ᎏ 1 ϩ 4)

and the likelihood ratio is (1 ϩ 3)(1 ϩ 4)

LR ϭ ᎏ 2[(1 Ϫ ᎏ )p1 ϩ ]ᎏ [(1 Ϫ )ᎏ p2 ϩ ]

as it was for Case 1.

Case 3: Three-Allele Mixture, Heterozygous Victim, and Homozygous Suspect

The victim is of type A2A3, the suspect is of type A1, and the crime sample of type A1A2A3. The two propositions are

994 JOURNAL OF FORENSIC SCIENCES

Hp: The victim and the suspect contributed to the stain. Hd: The victim and an unknown person contributed to the stain.

The evidence sample is C ϭ (A1A1A2A3), so Cg ϭ (A1A2A3) and c ϭ 3.

Under Hp, the alleles from known contributors are Tg ϭ A1A2A3 and nT ϭ 2, hT ϭ 1, t ϭ 3. There are no alleles from unknown contributors or from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, the alleles from known contributors are Tg ϭ A2A3 and nT ϭ 1, hT ϭ 1, t ϭ 2. The alleles from the unknown contributor are constrained to include A1, and x ϭ 1, r ϭ 1. The unknown contributor may also carry alleles A1, A2 or A3. The alleles from the person declared not to be a contributor are V ϭ A1, so nV ϭ 1, hV ϭ 0.

The required probabilities are

21(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] Hp: P0({A1A1A2A3}, , ) ϭ ᎏ (1 Ϫᎏ )(1 ϩᎏ )(1 ϩᎏ 2)

Hd: P1({A2A3}, {A1A?}, {A1A1})

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p1 ϩ 2][(1 Ϫ )p1 ϩ 3]

ϭ ᎏ 2!(1ᎏ Ϫ )(1 ϩᎏ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ϩᎏ 4)

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] [(1 Ϫ )p1 ϩ 2][(1 Ϫ )p2 ϩ ]

ϩ ᎏ 1!1!(1ᎏ Ϫ )(1 ᎏ ϩ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ᎏ ϩ 4)

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p1 ϩ 2][(1 Ϫ )p3 ϩ ]

ϩ ᎏ 1!1!(1ᎏ Ϫ )(1 ᎏ ϩ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ϩᎏ 4)

and the likelihood ratio is

(1 ϩ 3)(1 ϩ 4) LR ϭ [ᎏ (1 Ϫ )pᎏ 1 ϩ 2]ᎏ [(1 Ϫ )ᎏ ( p1 ϩ 2ᎏ p2 ϩ 2pᎏ 3) ϩ 7]

Case 4: Four-Allele Mixture, Heterozygous Suspect, and One Unknown

The suspect is of type A1A2, and the crime sample of type A1A2A3A4. The two propositions are

Hp: The suspect and an unknown person contributed to the stain.

Hd: Two unknown people contributed to the stain.

The evidence sample is C ϭ Cg ϭ (A1A2A3A4) and c ϭ 4. Under Hp, the alleles from known contributors are T ϭ Tg ϭ

A1A2 and nT ϭ 1, hT ϭ 1, t ϭ 2. There are two alleles A3A4 from unknown contributors, but no alleles from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, there are no alleles from known contributors are T ϭ Tg ϭ and nT ϭ 0, hT ϭ 0, t ϭ 0. The alleles from unknown contributors are constrained to be U ϭ A1A2A3A4 and x ϭ 2, r ϭ 0. The alleles from people declared not to be contributors are V ϭ A1A2 and nV ϭ 1, hV ϭ 1.

The required probabilities are

212!(1 Ϫ )4p1 p2 p3 p4 Hp: P1({A1A2}, {A3A4}, ) ϭ ᎏ 1!1!(1 Ϫ ᎏ )(1 ϩ )ᎏ (1 ϩ 2)

Hd: P2(, {A1A2A3A4}, {A1A2}) 214!(1 Ϫ )4p1 p2 p3 p4[(1 Ϫ )p1 ϩ ][(1 Ϫ )p2 ϩ ]

ϭ ᎏ 1!1!1!1ᎏ !(1 Ϫ )(ᎏ 1 ϩ )(1ᎏ ϩ 2)(1ᎏ ϩ 3)(1ᎏ ϩ 4)

and the likelihood ratio is (1 ϩ 3)(1 ϩ 4)

LR ϭ 1ᎏ 2[(1 Ϫ ᎏ )p1 ϩ ᎏ ][(1 Ϫ ᎏ )p2 ϩ ]

Case 5: Three-Allele Mixture, Heterozygous Suspect, and One Unknown

The suspect is of type A1A2, and the crime sample of type A1A2A3. The two propositions are

Hp: The suspect and one unknown person contributed to the stain.

Hd: Two unknown people contributed to the stain.

The evidence sample is C ϭ (A1A2A2A3A3), so Cg ϭ (A1A2A3) and c ϭ 3.

Under Hp, the alleles from known contributors are Tg ϭ A1A2 and nT ϭ 1, hT ϭ 1, t ϭ 2. The alleles from unknown contributors are constrained to include A3 and may also include A1, A2 or A3. There are no alleles from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, there are no alleles from known contributors, so nT ϭ 0, hT ϭ 0, t ϭ 0. The alleles from the unknown contributor are constrained to include A 1, and x ϭ 1, r ϭ 1. The unknown contributor may also carry alleles A1, A2 or A3. The alleles from the person declared not to be a contributor are V ϭ A1A2 and nV ϭ 1, hV ϭ 1.

The required probabilities are

Hp: P1({A1A2}, {A3A?}, ) 212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ]

ϭ ᎏ 1!1!(1ᎏ Ϫ )(1 ᎏ ϩ )(1 ϩᎏ 2)

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p2 ϩ ] ϩ ᎏ 1!1!(1ᎏ Ϫ )(1ᎏ ϩ )(1 ϩᎏ 2)

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p3 ϩ ] ϩ ᎏ 2!(1ᎏ Ϫ )(1 ϩᎏ )(1 ϩᎏ 2)

Hd: P2(, {A1A2A3?}, {A1A2}) 214!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ ) p1 ϩ 2][(1 Ϫ )p2 ϩ ]

ϭ ᎏ 2!1!1!(ᎏ 1 Ϫ )(1ᎏ ϩ )(1 ᎏ ϩ 2)(1 ᎏ ϩ 3)(1 ᎏ ϩ 4)

214!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p2 ϩ 2][(1 Ϫ )p2 ϩ 2]

ϩ ᎏ 1!2!1!ᎏ (1 Ϫ )(1ᎏ ϩ )(1 ᎏ ϩ 2)(1 ᎏ ϩ 3)(1 ϩᎏ 4)

214!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p2 ϩ 2][(1 Ϫ )p3 ϩ ]

ϩ ᎏ 1!1!2!(ᎏ 1 Ϫ )(1ᎏ ϩ )(1 ᎏ ϩ 2)(1 ᎏ ϩ 3)(1 ᎏ ϩ 4)

and the likelihood ratio is

(1 ϩ 3)(1 ϩ 4)[(1 Ϫ )(2p1 ϩ 2p2 ϩ p3) ϩ 5] LR ϭ 1ᎏ 2[(1 Ϫ ᎏ )p1 ϩ ]ᎏ [(1 Ϫ )ᎏ p2 ϩ ] ᎏᎏ

ϫ [(1 Ϫ )( p1 ϩ p2 ϩ p3) ϩ 5]

Case 6: Four-Allele Mixture, Two Heterozygous Suspects

The suspects are of type A1A2 and A3A4, and the crime sample is of type A1A2A3A4. The two propositions may be

Hp: The two suspects contributed to the stain. Hd: Two unknown people contributed to the stain.

The evidence sample is C ϭ Cg ϭ (A1A2A3A4) and c ϭ 4. Under Hp, the alleles from known contributors are T ϭ Tg ϭ

A1A2A3A4 and nT ϭ 2, hT ϭ 2, t ϭ 4. There are no alleles from unknown contributors or from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, there are no alleles from known contributors, T ϭ Tg ϭ and nT ϭ hT ϭ t ϭ 0. The alleles from unknown contributors are constrained to be U ϭ A1A2A3A4 and x ϭ 2, r ϭ 0. The alleles from people declared not to be contributors are V ϭ A1A2A3A4 and nV ϭ 2, hV ϭ 2.

CURRAN ET AL. • INTERPRETING DNA MIXTURES 995

The required probabilities are 22(1 Ϫ )4p1 p2 p3 p4

Hp: P0({A1A2A3A4}, , ) ϭ (ᎏ 1 Ϫ )(ᎏ 1 ϩ )(ᎏ 1 ϩ 2) Hd: P2(, {A1A2A3A4}, {A1A2A3A4}) ϭ Q

where

224!(1 Ϫ )4p1 p2 p3 p4[(1 Ϫ )p1 ϩ ][(1 Ϫ )p2 ϩ ]

Q

ϭ

ϫ[(1 Ϫ )p3 ϩ ][(1 Ϫ )p4 ϩ ] ᎏᎏᎏᎏᎏᎏ

1!1!1!1!(1 Ϫ )(1 ϩ )(1 ϩ 2)

ϫ (1 ϩ 3)(1 ϩ 4)(1 ϩ 5)(1 ϩ 6)

and the likelihood ratio is

(1 ϩ 3)(1 ϩ 4)(1 ϩ 5)(1 ϩ 6) LR ϭ 2ᎏ 4[(1 Ϫ ᎏ )p1 ϩ ]ᎏ [(1 Ϫ )pᎏ 2 ϩ ] ᎏ

ϫ [(1 Ϫ )p3 ϩ ][(1 Ϫ )p4 ϩ ]

Interpreting DNA Mixtures in Structured Populations

REFERENCE: Curran JM, Triggs CM, Buckleton J, Weir B.S. Interpreting DNA mixtures in structured populations. J Forensic Sci 1999;44(5):987–995.

ABSTRACT: DNA profiles from multiple-contributor samples are interpreted by comparing the probabilities of the profiles under alternative propositions. The propositions may specify some known contributors to the sample and may also specify a number of unknown contributors. The probability of the alleles carried by the set of people, known or unknown, depends on the allelic frequencies and also upon any relationships among the people. Membership of the same subpopulation implies a relationship from a shared evolutionary history, and this effect has been incorporated into the probabilities. This acknowledgment of the effects of population structure requires account to be taken of all people in a subpopulation who are typed, whether or not they contributed to the sample.

KEYWORDS: forensic science, DNA typing, interpretation, mixed DNA profiles, population structure, likelihood ratios

The interpretation of DNA profiles from more than one contributor is one of the most challenging tasks facing forensic scientists. Part of the complexity is due to the very large number of combinations of genotypes that must be considered in some situations, although a body of theory for a coherent treatment of mixed stains is now available (1–3). For a defendant who is not excluded from a mixed stain this theory avoids the potential prejudice that can follow from simplistic “random man not excluded” arguments.

In some cases, the typing technology may allow complexity to be avoided. When fragments are detected in ways that allow semiquantitation of the amount of DNA for each allele it may be possible to determine which alleles are from the same contributor. Examples include fluorescently-labeled length variants detected by lasers, or silver staining to detect band intensity on a gel. There can still be doubt, however, especially when different people contribute more or less equally to the mixture and such problems increase with the number of contributors. As long as a quantitative assessment of the evidentiary strength of DNA mixtures is required, we believe that there will be a need for analyses that consider all possible sets of genotypes that would lead to the mixture profile.

Our previous treatment (3) assumed independence of all the alleles in the mixed profile. This means independence within indi-

1 Program in Statistical Genetics, Department of Statistics, North Carolina State University, Raleigh, NC.

2 Department of Statistics, University of Auckland, Private Bag 92019, Auckland, New Zealand.

3 ESR, Private Bag 92021, Auckland, New Zealand. Received 3 Feb. 1998; and in revised form 24 Aug. 1998; accepted 15 Dec. 1998.

viduals, implying Hardy-Weinberg and linkage equilibrium, as well as independence between individuals, meaning that the contributors are unrelated. Although these assumptions may be adequate in many situations, they ignore the low-level dependence among alleles within the same population due to evolutionary forces. Two people within the same population must have common ancestors at some point in the past, the point being closer for smaller populations, and this imposes a dependence between their alleles. A necessary corollary to this evolutionary relationship is the low degree of inbreeding among offspring of two parents from the same population. It is this logic that leads to the necessity of working with conditional profile probabilities rather than the profile probabilities themselves, and it is what led to Recommendation 4.2 of the second NRC report (2). Instead of determining the probability of finding a profile in a random member of a population, it is necessary to determine the probability of finding the profile given that the profile has been seen once already. Conditional probabilities take explicit account of allelic dependencies.

In this paper we extend our previous treatment to allow for the dependencies among all the alleles carried by the contributors to the mixture. Initially we will assume that all contributors belong to the same population, as this is likely to maximize the effects we are considering. We will also adopt the relatively simple formulation for the probabilities of sets of alleles advocated by Balding and Nichols (4). Less restrictive treatments (5) would be unwieldy. Although we do not expect the population structure effects that we are considering will be substantial, we believe that they should be considered for mixed DNA stains to the same extent that they are considered for single stains.

Likelihood Ratios

Likelihood ratios have been recognized by authors of several recent books as the appropriate way of interpreting evidence (6–11). At a trial there will be alternative hypotheses or propositions about who contributed to this evidence: the prosecution will have proposition Hp, and here we will suppose there is a single alternative proposition Hd. The likelihood ratio LR is

Pr(Evidence | Hp) LR ϭ ᎏ Pr(Evideᎏ nce | Hd)

The DNA evidence E for mixed-stain cases is the set of alleles found among all the people who have either been typed directly or whose type is inferred because they are considered to have contributed to the stain. Previously (3) we took E to mean only the alleles in the stain, but the addition now of the alleles from people who may have been typed even though they are hypothesized not

987

988 JOURNAL OF FORENSIC SCIENCES

to have contributed to the stain is necessary to allow for the effects of population structure.

We will make a distinction between the genetic profile, which is simply a listing of the distinct alleles in the mixture, and the statistical profile which is a list of all 2n alleles when there are n contributors. These two profiles will be different whenever some contributors are homozygous, or when some contributors share alleles. We will ignore the possibility of null alleles so that only homozygous individuals contribute a single allele to a genetic profile.

We will use much of our previous notation (3), and repeat our observation that the interpretation of a mixed stain genetic profile requires a specification of the known contributors to the profile and of the number of unknown contributors. We will derive results for single loci and then multiply likelihood ratios over loci.

As an example, suppose the evidentiary sample in a single-perpetrator rape case shows three alleles a, b, c at some locus. The sample was recovered from the victim’s person, she was found to be of type ab and a suspect was found to be of type c. The prosecution proposition is likely to be Hp: “The victim and the suspect were the only contributors to the sample,” and a likely alternative proposition is Hd: “The victim and some unknown man were the only contributors to the sample.” The usual solution (2,3) for this situation is

1

LR ϭ ᎏ pc(2paᎏ ϩ 2pbᎏ ϩ pc)

(1)

where the p’s are the allele frequencies. We now derive this result from the perspective of this paper, first with population structure ignored.

Under proposition Hp only the victim and suspect are involved and they have both been typed. The DNA evidence is therefore the genotype pair (ab, cc). We write the probability of this pair as Pr(ab, cc) ϭ 2 Pr(abcc). The approach we are taking assigns probabilities to sets of alleles without regard to the arrangement of alleles among individuals, but we do need a factor of “2” for the heterozygous victim. Had the victim been ab and the suspect bc we would have required the probability 4 Pr(abbc) since there are then two heterozygotes. When population structure is ignored, as it was previously (3), the probability of a set of alleles is just the product of frequencies of the separate alleles, so Pr(abcc) ϭ pa pb pc2. The numerator of LR is, therefore,

Pr(E | Hp) ϭ 2pa pb pc2

(2)

Note that, because the victim and suspect are both known individuals, there is no need to consider the 2! orders of these two people as was erroneously done in in the first printing of (7).

Under proposition Hd there are three people to consider: the suspect of genotype cc who did not contribute to the sample, and the victim of type ab plus the perpetrator of unknown genotype who both did contribute to the sample. Examination of the profiles of the sample and the victim shows that the unknown man must have allele c and may also have alleles a, b or c. There are a total of six alleles in E, and the probability is Pr(ab, cc, ac) ϩ Pr(ab, cc, bc) ϩ Pr(ab, cc, cc) or 4 Pr(aabccc) ϩ 4 Pr(abbccc) ϩ 2 Pr(abcccc). The denominator of LR is

Pr(E | Hd) ϭ 4p2a pb pc3 ϩ 4pa p2b pc3 ϩ 2pa pb pc4

(3)

The factors of 2 or 4 are because of the one or two heterozygotes. Dividing Eq 2 by Eq 3 leads to the previously known result given in Eq 1.

It will be helpful to modify this example before proceeding fur-

ther. Suppose the profiles are the same as just discussed, except that now the sample is not from the victim’s person (e.g., it may be from discarded clothing) and the alternative to Hp is specified as Hd: “Two unknown people were the contributors to the sample.” Under this proposition, there are four people involved: the victim and suspect, neither of whom contributed to the sample, and two unknown people who were the contributors. These last two people must have alleles abc between them but cannot have any other alleles. The possible combinations of genotypes for the unknown people are (aa, bc), (ab, ac), (ab, bc), (ab, cc), (bb, ac), (ac, ab), (ac, bb), (ac, bc), (bc, aa), (bc, ab), (bc, ac), and (cc, ab). These 12 combinations represent three distinct sets of alleles: aabc, abbc, abcc, and each set has a coefficient of 12 which is the number of ways of arranging the four alleles into two different genotypes. The coefficient includes the effects of the two orders of alleles within heterozygotes as well as the two orders of different genotypes such as aa, bc and bc, aa. The probabilities of all eight alleles among the four people involved are obtained by multiplying the probabilities 12 Pr(aabc), 12 Pr(abbc), 12 Pr(abcc) by the probability Pr(ab, cc) ϭ 2 Pr(abcc) of the victim and suspect, and can be written as 24 Pr(aaabbccc) ϩ 24 Pr(aabbbccc) ϩ 24 Pr(aabbcccc) so that

Pr(E | Hd) ϭ 24p3a p2b pc3 ϩ 24p2a p3b pc3 ϩ 24p2a p2b pc4 (4)

Dividing Eq 2 by Eq 4 gives the LR for this situation as

LR

ϭ

Pr(E | Hp) ᎏ Pr(E ᎏ | H )

ϭ

ᎏᎏ 1 ᎏ 12pa pb pc( pa ϩ pb ϩ pc)

(5)

d

as has been given before (3). We now modify the solutions in Eqs 1 and 5 to accommodate the

situation where all people, the victim, the suspect and (under Hd) the unknown person(s), belong to the same subpopulation. Probabilities for the genotype(s) of the unknown person(s) must take into account the knowledge that two people in this subpopulation have been found to have genotypes ab and cc.

For both scenarios, Hp is that only the victim and suspect were the contributors to the sample. We will show that the required term Pr(abcc) is given by

[(1 Ϫ )pa][(1 Ϫ )pb][(1 Ϫ )pc][(1 Ϫ )pc ϩ ] Pr(abcc) ϭ ᎏᎏ (1 Ϫᎏ )(1)(1 ϩᎏ )(1 ϩᎏ 2) ᎏ

where is the coancestry coefficient in the subpopulation to which the victim and suspect both belong.

For the denominator in the first scenario, which is that the victim and an unknown person contributed to the sample but the suspect did not, there are three people and six alleles to consider. We will show, for example, that

Pr(aabccc)

[(1 Ϫ )pa][(1 Ϫ )pa ϩ a][(1 Ϫ )pb] ϫ [(1 Ϫ )pc][(1 Ϫ )pc ϩ ][(1 Ϫ )pc ϩ 2]

ϭ ᎏ (1 Ϫᎏ )(1)(1 ᎏ ϩ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ϩ ᎏ 4)

where is the coancestry coefficient for the subpopulation to which all three people belong. These expressions lead to

(1 ϩ 3)(1 ϩ 4) LR ϭ [ᎏ (1 Ϫ )ᎏ pc ϩ 2ᎏ ][(1 Ϫ ᎏ )(2pa ϩᎏ 2pb ϩ pᎏ c) ϩ 7] (6)

which reduces correctly to Eq 1 when ϭ 0. For the second scenario, where both contributors to the sample

are unknown under Hd, we need terms such as Pr(aaabbccc), and we will show that

Pr(aaabbccc) ϭ ᎏ X aXᎏ bX c Y

where

X a ϭ [(1 Ϫ )pa][(1 Ϫ )pa ϩ ][(1 Ϫ )pa ϩ 2] X b ϭ [(1 Ϫ )pb][(1 Ϫ )pb ϩ ] X c ϭ [(1 Ϫ )pc][(1 Ϫ )pc ϩ ][(1 Ϫ )pc ϩ 2] Y ϭ (1 Ϫ )(1)(1 ϩ )(1 ϩ 2)(1 ϩ 3)(1 ϩ 4)(1 ϩ 5)

ϫ (1 ϩ 6)

so that LR becomes

LR ϭ ᎏ 12[(1 Ϫ(ᎏ 1)pϩa ϩ3)(ᎏ ]1[(ϩ1 Ϫ4)ᎏ ()1pbϩϩ5])[ᎏ ((11ϩϪ6))pᎏ c ϩ 2] (7) ϫ[(1 Ϫ )(pa ϩ pb ϩ pc) ϩ 7]

and this reduces to Eq 5 when ϭ 0. What is the numerical effect of using Eq 6 instead of Eq 1? When

allele frequencies are all relatively small at 0.1 and has the relatively high value of 0.03, the LR drops from 20 to 12.33. Multiplying values from Eq 6 over several loci can give quite large LR values, but they will be less than those from Eq 1 in which population structure is ignored.

The approach we have just illustrated is as follows. Alternative propositions are needed that specify the numbers of contributors to the evidentiary sample. Some of these contributors will be known and typed people, and some will be unknown people. Those contributors, together with any typed people who are known (under the proposition) not to be contributors, contain among them a set of alleles whose probability can be written down as the product of the separate allele proportions or as a more complicated function that incorporates the population structure parameter . There is also a factor of 2 for each known heterozygote, and a term for the number of ways of arranging all 2x alleles from x unknown people into pairs. There may be different sets of alleles from unknown people under some propositions, and the probabilities for these sets must be added together. The likelihood ratio is the ratio of probabilities under alternative propositions. As additional examples, we list the results for each of the common cases described in (7) in the Appendix.

Although it is possible to follow the above line of argument for any situation, we prefer to work with a general approach amenable to automatic (computer-based) calculation as we did previously (3). This will relieve the forensic scientist of the need for lengthy calculations in the same way that computer programs such as POPSTATS can be used for other DNA calculations. We will lay out the logic behind this general approach even though we anticipate the routine use of computer packages.

In order to do this we need to break the problem into two parts; we list the alleles, with their multiplicities, carried by the unknown contributors under Hp or Hd, and then we determine the probabilities of the allele sets. The two probabilities lead to the likelihood ratio. It is our use of the theory in (4) that allows us to concentrate on alleles rather than genotypes.

Notation

Much of the complexity in dealing with mixtures can be removed by a mnemonic notation, as laid out in Table 1. We find it very helpful to label the alleles at a locus A by the letters Ai. There are sets of alleles (not necessarily distinct—the statistical profiles)

CURRAN ET AL. • INTERPRETING DNA MIXTURES 989

TABLE 1—Notation for mixture calculations.

Alleles in the profile of the evidence sample.

C The set of alleles in the evidence profile.

Cg The set of distinct alleles in the evidence profile.

nC The known number of contributors to C.

hC The unknown number of heterozygous contributors.

c The known number of distinct alleles in Cg.

ci The unknown number of copies of allele Ai in C.

1

Յ

ci

Յ

2nC,

Σc iϭ1

ci

ϭ

2nC

Alleles from typed people that H declares to be contributors.

T The set of alleles carried by the declared contributors to C.

Tg The set of distinct alleles carried by the declared contributors.

nT The known number of declared contributors to C.

hT The known number of heterozygous declared contributors.

t

The known number of distinct alleles in Tg carried by nT declared

contributors.

ti The known number of copies of allele Ai in T.

0

Յ

ti

Յ

2nT,

Σc iϭ1

ti

ϭ

2nT.

Alleles from unknown people that H declares to be contributors.

U The sets of alleles carried by the unknown contributors to C.

x The specified number of unknown contributors to C: nC ϭ nT ϩ x.

c Ϫ t The known number of alleles that are required to be in U.

r The known number of alleles in U that can be any allele in Cg,

r ϭ 2x Ϫ (c Ϫ t).

nx The number of different sets of alleles U, nx ϭ (c ϩ r Ϫ1)!/

[(cϪ1)!r!].

ri The unknown number of copies of Ai among the r unconstrained

alleles in U.

0

Յ

ri

Յ

r,

Σc iϭ1

ri

ϭ

r.

ui The unknown number of copies of Ai in U: ci ϭ ti ϩ ui,

Σc iϭ1

ui

ϭ

2x.

If Ai is in Cg but not in Tg: ui ϭ ri ϩ 1. If Ai is in Cg and also in Tg:

ui ϭ ri.

Alleles from typed people that H declares to be non-contributors. V The set of alleles carried by typed people declared not to be

contributors to C. nV The known number of people declared not to be contributors to C. hV The known number of heterozygous declared non-contributors.

vi The known number of copies of Ai in V: Σi vi ϭ 2nV.

that occur in the crime sample (C ). For a particular proposition there are alleles (T ) carried by typed people declared to be contributors and alleles (U ) carried by unknown contributors to the sample, and there are alleles (V ) carried by any people declared not to have contributed to the sample. There are corresponding sets of distinct alleles—the genetic profiles—and these sets are indicated by a g subscript. Note that the same person may be declared to be a contributor to the sample under one proposition, and declared not to be contributor under another proposition.

Allele Sets

The alleles in the evidence profile are carried by typed people declared to be contributors or unknown people, so that C is the combination (union) of sets T and U. For a given proposition, the probability of the evidence profile depends also on the alleles carried by people who have been typed but are declared by that proposition not to have contributed to the profile. For a proposition in which there are x unknown contributors, we write the probability as Px (T, U, V ) in an extension of our previous notation (3). Note, however, that the present probability is for all the alleles in the sets T, U, V whereas the probability in (3) was for only the alleles in U conditional on those in T. In the total set of 2nC ϩ 2nV ϭ 2nT ϩ 2nU ϩ 2nV alleles, we see from Table 1 that allele Ai occurs ci ϩ

990 JOURNAL OF FORENSIC SCIENCES

vi ϭ t i ϩ ui ϩ vi times. We add the probabilities over all possible nx ϭ (c ϩ r Ϫ 1)!/[(c Ϫ 1)!r!] distinct sets of ui. As listed in Table 1, c is the number of distinct alleles in Cg and r is the number of alleles carried by unknown people that can be any one of these c al-

leles.

Generating the nx sets U is a two-stage process. Some of the alleles in each set must be present: these are the alleles in the set Cg that are not in set Tg. Other alleles are not under this constraint because they already occur in Tg, and there are ri copies of Ai alleles in this unconstrained set. It is a straightforward computing task to

let r1 range over the integers 0, 1, . . ., r, then let r2 range over the integers 0, 1, . . ., r Ϫ r1, then let r3 range over the integers 0, 1, . . ., r Ϫ r1 Ϫ r2, and so on. The final count rc is obtained by subtracting the sum of r1, r2, . . ., rcϪ1 from r. The total number of Ai alleles in set U is ∑ciϭ1 ui ϭ 2x where ui ϭ ri for those alleles in both Cg and Tg, and ui ϭ ri ϩ 1 for alleles in Cg but not in Tg.

For any ordering of the 2x ϭ ∑i ui alleles in U, successive pairs of alleles can be taken to represent genotypes and there are (2x)!/(∏ciϭ1 ui!) possible orderings. This is the number of possible sets of unknown genotypes that have each allelic set U. Although it

is the genotypes that correspond to the x unknown people, it is the

set of 2x alleles that we use to determine the probability, in combination with the 2nT ϩ 2nV alleles among the known people. Because the nT typed people all have specified genotypes, we consider not all possible orderings of the 2nT alleles but just a factor of 2 for each heterozygote. Similarly, we need a factor of 2 for each het-

erozygote among the set of nV non-contributors (this corrects erroneous statements in (7)).

For the single-perpetrator rape example above, now writing alleles a, b, c as A1, A2, A3, the evidence sample set is Cg ϭ (A1, A2, A3) and c ϭ 3. Under Hd (the victim and one unknown person contributed to the mixed stain) the set from known people is T ϭ (A1, A2) and nT ϭ 1, t ϭ 2. The set from the unknown person must contain A3 since c Ϫ t ϭ 1, x ϭ 1, r ϭ 1, but can also contain any of the three alleles in set Cg: i.e. there are nx ϭ 3 different sets of alleles from the unknown person. We also considered the situation

where Hd is that the evidence stain was from two unknown people, x ϭ 2 and no known contributors, nT ϭ t ϭ 0. Now U must contain all three alleles A1, A2, A3, c Ϫ t ϭ 3, and the r ϭ 1 other allele can be any of these three. There are nx ϭ 3 different sets U. The counts of alleles A1, A2, A3 in these sets are, therefore, (2,1,1), (1,2,1), (1,1,2) and each of these can be ordered in 4!/(2!1!1!) ϭ 12 ways.

Allele Dependencies

We now consider how to attach probabilities to the sets of a

lleles discussed in the last section. We suppose that a state of

evolutionary equilibrium has been established, so that the proba-

bilities of sets of alleles can be found from the Dirichlet distribu-

tion (13). This distribution depends on allele proportions and the

coancestry coefficient. The statement that the relationship between

pairs of alleles in a subpopulation can be quantified by the coances-

try coefficient has several interpretations (12). Here we will take

it to mean that the probability that two alleles taken at random from

the

subpopulation

are

both

of

type

A

i

is

p

2 i

ϩ

pi

(1

Ϫ

pi

),

where

pi

is the allele frequency of Ai averaged over subpopulations. When

allele frequencies over populations follow the Dirichlet distribu-

tion, the probability of a set of frequencies {pi} for alleles Ai is

given by

∏ ⌫(␥.)

Pr({pi}) ϭ ᎏ ∏i ⌫ᎏ (␥i )

( pi )␥iϪ1

i

where

∑ ␥i ϭ (1 Ϫ )pi /, ␥. ϭ ␥i ϭ (1 Ϫ )/ i

and ⌫ is the gamma function with the property ⌫(x ϩ 1) ϭ x⌫(x). The great advantage of this Dirichlet distribution is that it allows the probability of any set of alleles to be found very simply. If the set has m i copies of Ai, then the probability is

∏ ∏ Pr

Ami

⌫(␥.) ϭ ᎏᎏ

⌫(m i ϩ ␥i ) ᎏᎏ

(8)

i i

⌫(m . ϩ ␥.) i ⌫(␥i )

where m . ϭ ∑i m i. This is the result upon which Eqs 4.10 in the 1996 NRC report (2) are based (4).

In our mixed-stain situation, there are ti ϩ ui ϩ vi copies of allele Ai, and the required probability is

∑ (2x)!2hT ϩhV

Px(T, U, V ) ϭ0ՅriՅr ᎏ ∏ci ϭᎏ 1 ui! ∑ciϭ1 riϭr

∏ ⌫(␥.)

c ⌫(␥i ϩ ti ϩ ui ϩ vi )

ϫ ⌫ᎏ (␥. ϩ 2ᎏ x ϩ 2nTᎏ ϩ 2nV ) iϭ1 ᎏᎏ ⌫(␥i ) ᎏ (9)

Summing over the {ri} values accounts for all nx sets U. Although this is a very compact expression, implementing it in a

computer program is easier after some expansion. From the properties of the gamma function ⌫(и) and the definition of ␥i

⌫(␥.) ᎏᎏᎏ

ϭ

ᎏᎏ 2xϩ2nᎏ T ϩ2nV ᎏ

⌫(␥. ϩ 2x ϩ 2nT ϩ 2nV )

∏

2x ϩ jϭ0

2

n

T

ϩ

2

n

V

Ϫ1

[(1

Ϫ

)

ϩ

j ]

⌫(␥i ϩ ti ϩ ui ϩ vi )

∏

t iϩu j ϭ0

i

ϩ

v

i

Ϫ1

[(1

Ϫ

)pi

ϩ

j ]

ᎏᎏ ⌫(␥i ) ᎏ ϭ ᎏᎏ tiϩuiϩvi ᎏ

We can also make the summation over {ri} values more explicit by showing the range of values of each ri. Equation 9 becomes

∑ ∑ ∑ r rϪr1 rϪr1Ϫ. . .ϪrcϪ2 (2x)!2hT ϩhV

Px(T, U, V ) ϭ

иии

r1ϭ0 r2ϭ0

rcϪ1ϭ0

ᎏ ∏ciϭᎏ 1 ui!

∏ ciϭ1

∏

t j

iϩu ϭ0

iϩ

v

i

Ϫ1

[(1

Ϫ

)pi

ϩ

j ]

ϫᎏ ∏ j2ϭx ϩ0 2nᎏ T ϩ2nV Ϫ1ᎏ [(1 Ϫ )ᎏ ϩ j ]

(10)

Likelihood ratios are formed as the ratios of two such probabilities, and we note that people declared to be contributors under one proposition may be declared to be non-contributors under the other. In other words, every person typed is declared to be either a contributor or a non-contributor. The number of people typed, and the alleles they carry among them, are the same for every proposition. For this reason, nT ϩ nV, hT ϩ hV and ui ϩ vi will be the same in the probabilities for each proposition. The term 2hT ϩhV will cancel out of the likelihood ratio, as will some of the terms in the products in the numerator and denominator of the right hand side of Eq 10.

If population structure is ignored, and is set to zero, Eq 10 reduces to

Px(T, U, V )

∑ ∑ ∑ ∏ r

ϭ

rϪr1

rϪr1Ϫ.

иии

.

.ϪrcϪ2

(2x)!2hT ϩhV ᎏᎏ

c

p t iϩuiϩv i i

r1ϭ0 r2ϭ0

rcϪ1ϭ0

∏ciϭ1 ui! iϭ1

This is equivalent to Eq 3 in our earlier treatment (3) and may be in a form more convenient for computation. Because of cancelation of terms in the likelihood ratio, it can be seen that nT, nV, hT, hV, ti, vi are not used when ϭ 0. In this case the value of LR depends only on the numbers and frequencies of the alleles carried by unknown contributors. There is no need to consider the genotypes of typed people, whether or not they contribute to the evidence sample. This is different to the situation where population structure is taken into account—then the genotypes of all typed people are needed.

In the degenerate case where there are no typed people, contributors or non-contributors, ti ϭ vi ϭ 0, then ui ϭ ri and the sum for ϭ 0 is just a multinomial expansion:

∑ Px(U) ϭ

c

2x

pi

iϭ1

Examples

We now consider an example where the evidence sample Cg ϭ (A1A2A3 A4) (c ϭ 4) is known to be from two perpetrators but only one suspect, of type A1A2, has been apprehended. Proposition Hp is that this suspect and one unknown person were the contributors, so T ϭ (A1A2) (nT ϭ 1, t ϭ 2) and U has only one possibility (nx ϭ 1): the two alleles A3A4. There are no known non-contributors, so V ϭ , nV ϭ 0 where denotes the empty set. The probability under Hp is

P1({A1A2}, {A3A4}, {})

∏ ϭ

ᎏ2!2ᎏ1

⌫(␥.) ᎏᎏ

4

⌫(␥i ϩ 1) ᎏᎏ

1!1! ⌫(␥. ϩ 4) iϭ1 ⌫(␥i )

4(1 Ϫ )3p1p 2 p3p4 ϭ ᎏ (1 ϩ )(ᎏ 1 ϩ 2)

Proposition Hd is that there are no known contributors, T ϭ , nT ϭ 0, there is one person known not be a contributor, V ϭ (A1A2), nV ϭ 1, and there are two unknown contributors who must carry all four alleles between them. Once again, there is only one possible

set U ϭ (A1A2A3A4), nx ϭ 1 and the probability is

P2({}, {A1A2A3A4}, {A1A2})

∏ ∏ ϭ

ᎏ4!2ᎏ 1

⌫(␥.) ᎏᎏ

2

⌫(␥i ϩ 2) ᎏᎏ

4

⌫(␥i ϩ 1) ᎏᎏ

1!1!1!1! ⌫(␥. ϩ 6) iϭ1 ⌫(␥i ) iϭ3 ⌫(␥i )

48(1 Ϫ )3p1p 2 p 3p4[(1 Ϫ )p1 ϩ ][(1 Ϫ )p 2 ϩ ] ϭ ᎏᎏ (1 ϩ )(ᎏ 1 ϩ 2)(ᎏ 1 ϩ 3)(ᎏ 1 ϩ 4) ᎏ

The likelihood ratio for this example is, therefore,

(1 ϩ 3)(1 ϩ 4) LR ϭ ᎏ 12[(1 Ϫ ᎏ )p1 ϩ ᎏ ][(1 Ϫ ᎏ )p 2 ϩ ]

which reduces to 1/(12p1p 2) when ϭ 0 as has been given previously (1,3).

A more complicated example is for a rape committed by three

men. Suppose that the evidence sample has alleles (A1, A2, A3, A4), the victim is of type A1A2 and a single suspect has type A3A3. Then two alternative propositions are; Hp: “The victim, the suspect and two unknown men contributed to the sample,” and Hd: “The victim and three unknown men contributed to the sample.”

The evidence genetic profile has c ϭ 4 alleles Cg ϭ (A1, A2, A3,

CURRAN ET AL. • INTERPRETING DNA MIXTURES 991

A4). Under proposition Hp there are t ϭ 3 distinct alleles Tg ϭ (A1, A2, A3) from two known contributors and no alleles from people known not to be contributors, V ϭ . For x ϭ 2 unknown contributors, the number of sets of r ϭ 3 alleles these people can carry in addition to the A4 allele they must have among them is n2 ϭ 6!/(3!3!) ϭ 20. The counts u1, u2, u3, u 4 for all four alleles A1, A2, A 3, A 4 among the two unknown men, together with the multiplicities [4!21]/ [u1!u2! u3!u 4!], are

0,0,0,4:2 0,0,1,3:8 0,0,2,2:12 0,0,3,1:8 0,1,0,3:8

0,1,1,2:24 0,1,2,1:24 0,2,0,2:12 0,2,1,1:24 0,3,0,1:8

1,0,0,3:8 1,0,1,2:24 1,0,2,1:24 1,1,0,2:24 1,1,1,1:48

1,2,0,1:24 2,0,0,2:12 2,0,1,1:24 2,1,0,1:24 3,0,0,1:8

Under proposition Hd there are t ϭ 2 alleles, T ϭ (A1, A2), from a known contributor (the victim) and two alleles V ϭ A3, A3 from a person (the suspect) known not to be a contributor. For x ϭ 3 unknown contributors, the number of sets of r ϭ 4 alleles these people can carry in addition to the A3, A4 alleles they must have among them is n3 ϭ 7!/(4!3!) ϭ 35. The counts u1, u2, u3, u4 for A1, A2, A3, A4, with coefficients [6!21]/(u1!u2!u3!u4!), for the 35 possible sets are:

0,0,1,5:12 0,0,2,4:30 0,0,3,3:40 0,0,4,2:30 0,0,5,1:12

0,1,1,4:60 0,1,2,3:120 0,1,3,2:120 0,1,4,1:60 0,2,1,3:120

0,2,2,2:180 0,2,3,1:120 0,3,1,2:120 0,3,2,1:12 0,4,1,1:60

1,0,1,4:60 1,0,2,3:120 1,0,3,2:120 1,0,4,1:60 1,1,1,3:240

1,1,2,2:360 1,1,3,1:240 1,2,1,2:360 1,2,2,1:360 1,3,1,1:240

2,0,1,3:120 2,0,2,2:180 2,0,3,1:120 2,1,1,2:360 2,1,2,1:360

2,2,1,1:360 3,0,1,2:120 3,0,2,1:120 3,1,1,1:240 4,0,1,1:60

For each proposition, the multiplicities are multiplied by the appropriate Dirichlet probabilities and the 20 or 35 terms added together. Obviously this is a task better suited for a computer.

Multiple Subpopulations

So far we have considered the situation where all people involved in the evidence interpretation have been in the same subpopulation. Other situations are likely, especially when victim and suspect belong to different racial groups. The same sets of alleles are involved as before, but now the probabilities need to be calculated separately for the alleles within each subpopulation.

We begin by returning to our first example of a single-perpetrator rape where the victim was of type A1, A2, the suspect was of type A3A3 and the evidence sample was A1A2A3. If there was reason to believe that the perpetrator was of the same racial type as the suspect, but of a different type from the victim, then the victim’s alleles need to be separated from those of the suspect and, under Hd, from the unknown perpetrator. Suppose that the victim belonged to racial group 1, with coancestry 1 for her subpopulation and allele frequencies p1, p 2, for A1, A2. Suppose also that the suspect and perpetrator belong to racial group 2, with coancestry coefficient 2 for their subpopulation and allele frequencies q1, q2, q3 for alleles A1, A2, A3. Suppose, further, that there is zero coancestry between alleles in different racial groups so that alleles in groups 1 and 2 can be treated independently.

992 JOURNAL OF FORENSIC SCIENCES

Under Hp, the probability is

P0({A1A2A3A3}, {}, {})

ϭ 2(1 Ϫ 1 )p1p 2 ϫ q3[(1 Ϫ 2 )q3 ϩ 2]

since the pair A 1A 2 from group 1 and the pair A 3 A 3 from group 2 are treated separately. Under Hd, one of the three components of P1({A1A2}, U, {A3A3}) is

P1({A1A2 }, {A1A3}, {A3, A3}) ϭ 2(1 Ϫ 1)p1p 2 2(1 Ϫ 2 )q2q3[(1 Ϫ 2 )q3 ϩ 2][(1 Ϫ 2)q3 ϩ 22]

ϫ ᎏᎏ(ᎏ 1 ϩ 2 )(ᎏ 1 ϩ 22 )ᎏᎏ

since the pair A1A2 from group 1 and the two pairs A1A3, A3 A3 from group 2 are treated separately. Equation 6 is replaced by

(1 ϩ 2)(1 ϩ 22) LR ϭ [ᎏ (1 Ϫ 2)ᎏ q3 ϩ 22ᎏ ][(1 Ϫ 2ᎏ )(2q1 ϩᎏ 2q 2 ϩ q3ᎏ ) ϩ 32]

The general Eq 10 can be modified to allow for different subpopulations. However, when any of the three sets T, U, V contains alleles from different subpopulations, as was the case in the example just considered, it will be necessary to introduce further notation. Each of the counts ti, ui, vi would need to be split into a component for each subpopulation, and the multiplicity coefficients would also need to be derived separately for each subpopulation.

Discussion

We offer this treatment of the effects of population structure on DNA mixture calculations to complement two previous treatments—the effects of population structure on single stains (2,4) and the interpretation of mixed stains without population structure (1,3). Our study therefore closes a gap in current DNA forensic interpretation.

Our treatment is based firmly on the use of likelihood ratios and the accompanying need for conditional probabilities. There is no alternative when the evidence is less than certain under the proposition Hp. Conditional probabilities are necessary to incorporate the known genetic nature of DNA profiles. The full meaning of profiles cannot be found without accounting for the role of evolution in shaping the probabilities of sets of profiles. The novel feature of this study lies in accounting for the information contained in the profiles of people who are declared not to have contributed to the evidence profile. This has arisen for the situation of a suspect, who is not excluded from the evidence profile, being declared not to be a contributor under proposition Hd.

The arguments made for incorporating non-contributors can be extended. Several people may be typed during the course of an investigation. Even if they are excluded as being contributors, they provide information for the probability calculations when they can be considered to belong to the same subpopulation as (some of) people not excluded. They make their contribution to the calculation via allelic set V.

Our treatment has assumed a specific number of unknown contributors, but we realize that this number is very likely not to be known. Although some general statements about conservative assumptions can be made (3), such as assuming large numbers of unknown people for loci with few alleles and small numbers of unknown people for loci with many alleles, we prefer not to formulate rules. Instead we recommend the calculation of likelihood ratios

under plausible ranges of numbers, and the reporting of the more conservative results.

We have not allowed for unseen, or “null,” alleles as has been done previously (2,3) because the move away from RFLP technology in forensic science has diminished the need for such a treatment. We have not considered other typing-system features such as intensity or peak height differences as these have been discussed elsewhere. However, we do consider that the approach described here is sufficiently flexible to allow the interpretation of many different mixed-stain DNA profiles.

Software for conducting the calculations described in this paper can be obtained directly from the World Wide Web page www.stat.ncsu.edu (click on “Statistical Genetics”) or by sending email to [email protected]

Acknowledgments

This work was supported in part by a postdoctoral fellowship from the New Zealand Foundation for Research in Science and Technology to JMC, and by NIH grant GM 45344 to North Carolina State University. John Storey wrote computer software to implement the calculations in the paper.

References

1. Evett IW, Buffery C, Wilcott G, Stoney D. A guide to interpreting single locus profiles of DNA mixtures in forensic cases. J Forensic Sci Soc 1991; 31:41–7.

2. National Research Council. DNA technology in forensic science. Washington, DC: National Academy Press 1992.

3. Weir BS, Triggs CM, Starling LI, Walsh KAJ, Buckleton J. Interpreting DNA mixtures. J Forensic Sci 1997;42:213–22.

4. Balding DJ, Nichols RA. DNA profile match probability calculations: How to allow for population stratification, relatedness, database selection and single bands. Forensic Sci Int 1994;64:125–40.

5. Weir BS. The effects of inbreeding on forensic calculations. Ann Rev Genet 1994;28:597–621.

6. Aitken CGG. Statistics and the evaluation of evidence for forensic scientists. New York: Wiley 1995.

7. Evett IW, Weir BS. Interpreting DNA evidence: Statistical genetics for forensic science. Sunderland, MA; Sinauer 1998.

8. Faigman DL, Kaye DH, Saks MJ, Sanders J. Modern scientific evidence: The law and science of expert testimony. St. Paul, MN; West 1997.

9. Robertson B, Vignaux GA. Interpreting evidence: evaluating forensic science in the courtroom. Chichester, UK; Wiley 1995.

10. Royall R. Statistical evidence: A likelihood paradigm. London; Chapman and Hall 1997.

11. Schum DA. Evidential foundations of probabilistic reasoning. New York; Wiley 1994.

12. Weir BS. The coancestry coefficient in forensic science. Proc 8th Int Symp Human Identification. Madison, WI; Promega 1998.

13. Wright S. The genetical structure of populations. Ann Eugen 15:323–54.

Additional information and reprint requests: Bruce S. Weir, Ph.D. North Carolina State University Dept of Statistics PO Box 8203 Raleigh, NC 27695-8203

APPENDIX

In this Appendix we show the effects of population structure for each of the six common situations described in Chapter 7 of (7). A

diagram for the profiles in each case is shown in Fig. 1, and in each case setting ϭ 0 reduces the result to the one given in (7).

Case 1: Four-Allele Mixture, Heterozygous Victim, and Heterozygous Suspect

The victim is of type A3A4, the suspect is of type A1A2, and the crime sample of type A1A2A3A4. The two propositions are

Hp: The victim and the suspect contributed to the stain. Hd: The victim and an unknown person contributed to the stain.

The evidence sample is C ϭ Cg ϭ (A1A2A3A4) and c ϭ 4. Under Hp, the alleles from known contributors are T ϭ Tg ϭ

A1A2A3A4 and nT ϭ 2, hT ϭ 2, t ϭ 4. There are no alleles from unknown contributors or from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, the alleles from known contributors are T ϭ Tg ϭ A3A4 and nT ϭ 1, hT ϭ 1, t ϭ 2. The alleles from unknown con-

CURRAN ET AL. • INTERPRETING DNA MIXTURES 993

tributors are constrained to be U ϭ A1A2 and x ϭ 1, r ϭ 0. The alleles from people declared not to be contributors are V ϭ A1A2 and nV ϭ 1, hV ϭ 1.

The required probabilities are Hp: P0({A1A2A3A4}, , )

22(1 Ϫ )4p1 p2 p3 p4 ϭ (ᎏ 1 Ϫ )(ᎏ 1 ϩ )(ᎏ 1 ϩ 2)

Hd: P1({A3A4}, {A1A2}, {A1A2}) 222!(1 Ϫ )4p1 p2 p3p4[(1 Ϫ )p1 ϩ ][(1 Ϫ )p2 ϩ ]

ϭ ᎏ 1!1!(ᎏ 1 Ϫ )(1ᎏ ϩ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ϩᎏ 4)

and the likelihood ratio is (1 ϩ 3)(1 ϩ 4)

LR ϭ ᎏ 2[(1 Ϫ ᎏ )p1 ϩ ]ᎏ [(1 Ϫ )ᎏ p2 ϩ ]

FIG. 1

Case 2: Three-Allele Mixture, Homozygous Victim, and Heterozygous Suspect

The victim is of type A3, the suspect is of type A1A2, and the crime sample of type A1A2A3. The two propositions are

Hp: The victim and the suspect contributed to the stain. Hd: The victim and an unknown person contributed to the stain.

The evidence sample is C ϭ (A1A2A3A3), so Cg ϭ (A1A2A3) and c ϭ 3.

Under Hp, the alleles from known contributors are Tg ϭ A1A2A3 and nT ϭ 2, hT ϭ 1, t ϭ 3. There are no alleles from unknown contributors or from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, the allele from known contributors is Tg ϭ A3 and nT ϭ 1, hT ϭ 0, t ϭ 1. The alleles from the unknown contributor are constrained to include A1A2, and x ϭ 1, r ϭ 0. The alleles from the person declared not to be a contributor are V ϭ A1A2 and nV ϭ 1, hV ϭ 1.

The required probabilities are

Hp: P0({A1A2A3A3}, , )

21(1 Ϫ )3p1 p2 p3[(1 Ϫ )p3 ϩ ] ϭ ᎏ (1 Ϫᎏ )(ᎏ 1 ϩᎏ )(1 ϩ ᎏ 2)

Hd: P1({A3A3}, {A1A2}, {A1A2}) 212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p2 ϩ ][(1 Ϫ )p3 ϩ ]

ϭ ᎏ 1!1!(1 Ϫᎏ )(1 ϩ ᎏ )(1 ϩ 2)ᎏ (1 ϩ 3)(ᎏ 1 ϩ 4)

and the likelihood ratio is (1 ϩ 3)(1 ϩ 4)

LR ϭ ᎏ 2[(1 Ϫ ᎏ )p1 ϩ ]ᎏ [(1 Ϫ )ᎏ p2 ϩ ]

as it was for Case 1.

Case 3: Three-Allele Mixture, Heterozygous Victim, and Homozygous Suspect

The victim is of type A2A3, the suspect is of type A1, and the crime sample of type A1A2A3. The two propositions are

994 JOURNAL OF FORENSIC SCIENCES

Hp: The victim and the suspect contributed to the stain. Hd: The victim and an unknown person contributed to the stain.

The evidence sample is C ϭ (A1A1A2A3), so Cg ϭ (A1A2A3) and c ϭ 3.

Under Hp, the alleles from known contributors are Tg ϭ A1A2A3 and nT ϭ 2, hT ϭ 1, t ϭ 3. There are no alleles from unknown contributors or from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, the alleles from known contributors are Tg ϭ A2A3 and nT ϭ 1, hT ϭ 1, t ϭ 2. The alleles from the unknown contributor are constrained to include A1, and x ϭ 1, r ϭ 1. The unknown contributor may also carry alleles A1, A2 or A3. The alleles from the person declared not to be a contributor are V ϭ A1, so nV ϭ 1, hV ϭ 0.

The required probabilities are

21(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] Hp: P0({A1A1A2A3}, , ) ϭ ᎏ (1 Ϫᎏ )(1 ϩᎏ )(1 ϩᎏ 2)

Hd: P1({A2A3}, {A1A?}, {A1A1})

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p1 ϩ 2][(1 Ϫ )p1 ϩ 3]

ϭ ᎏ 2!(1ᎏ Ϫ )(1 ϩᎏ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ϩᎏ 4)

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] [(1 Ϫ )p1 ϩ 2][(1 Ϫ )p2 ϩ ]

ϩ ᎏ 1!1!(1ᎏ Ϫ )(1 ᎏ ϩ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ᎏ ϩ 4)

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p1 ϩ 2][(1 Ϫ )p3 ϩ ]

ϩ ᎏ 1!1!(1ᎏ Ϫ )(1 ᎏ ϩ )(1 ϩᎏ 2)(1 ϩᎏ 3)(1 ϩᎏ 4)

and the likelihood ratio is

(1 ϩ 3)(1 ϩ 4) LR ϭ [ᎏ (1 Ϫ )pᎏ 1 ϩ 2]ᎏ [(1 Ϫ )ᎏ ( p1 ϩ 2ᎏ p2 ϩ 2pᎏ 3) ϩ 7]

Case 4: Four-Allele Mixture, Heterozygous Suspect, and One Unknown

The suspect is of type A1A2, and the crime sample of type A1A2A3A4. The two propositions are

Hp: The suspect and an unknown person contributed to the stain.

Hd: Two unknown people contributed to the stain.

The evidence sample is C ϭ Cg ϭ (A1A2A3A4) and c ϭ 4. Under Hp, the alleles from known contributors are T ϭ Tg ϭ

A1A2 and nT ϭ 1, hT ϭ 1, t ϭ 2. There are two alleles A3A4 from unknown contributors, but no alleles from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, there are no alleles from known contributors are T ϭ Tg ϭ and nT ϭ 0, hT ϭ 0, t ϭ 0. The alleles from unknown contributors are constrained to be U ϭ A1A2A3A4 and x ϭ 2, r ϭ 0. The alleles from people declared not to be contributors are V ϭ A1A2 and nV ϭ 1, hV ϭ 1.

The required probabilities are

212!(1 Ϫ )4p1 p2 p3 p4 Hp: P1({A1A2}, {A3A4}, ) ϭ ᎏ 1!1!(1 Ϫ ᎏ )(1 ϩ )ᎏ (1 ϩ 2)

Hd: P2(, {A1A2A3A4}, {A1A2}) 214!(1 Ϫ )4p1 p2 p3 p4[(1 Ϫ )p1 ϩ ][(1 Ϫ )p2 ϩ ]

ϭ ᎏ 1!1!1!1ᎏ !(1 Ϫ )(ᎏ 1 ϩ )(1ᎏ ϩ 2)(1ᎏ ϩ 3)(1ᎏ ϩ 4)

and the likelihood ratio is (1 ϩ 3)(1 ϩ 4)

LR ϭ 1ᎏ 2[(1 Ϫ ᎏ )p1 ϩ ᎏ ][(1 Ϫ ᎏ )p2 ϩ ]

Case 5: Three-Allele Mixture, Heterozygous Suspect, and One Unknown

The suspect is of type A1A2, and the crime sample of type A1A2A3. The two propositions are

Hp: The suspect and one unknown person contributed to the stain.

Hd: Two unknown people contributed to the stain.

The evidence sample is C ϭ (A1A2A2A3A3), so Cg ϭ (A1A2A3) and c ϭ 3.

Under Hp, the alleles from known contributors are Tg ϭ A1A2 and nT ϭ 1, hT ϭ 1, t ϭ 2. The alleles from unknown contributors are constrained to include A3 and may also include A1, A2 or A3. There are no alleles from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, there are no alleles from known contributors, so nT ϭ 0, hT ϭ 0, t ϭ 0. The alleles from the unknown contributor are constrained to include A 1, and x ϭ 1, r ϭ 1. The unknown contributor may also carry alleles A1, A2 or A3. The alleles from the person declared not to be a contributor are V ϭ A1A2 and nV ϭ 1, hV ϭ 1.

The required probabilities are

Hp: P1({A1A2}, {A3A?}, ) 212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ]

ϭ ᎏ 1!1!(1ᎏ Ϫ )(1 ᎏ ϩ )(1 ϩᎏ 2)

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p2 ϩ ] ϩ ᎏ 1!1!(1ᎏ Ϫ )(1ᎏ ϩ )(1 ϩᎏ 2)

212!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p3 ϩ ] ϩ ᎏ 2!(1ᎏ Ϫ )(1 ϩᎏ )(1 ϩᎏ 2)

Hd: P2(, {A1A2A3?}, {A1A2}) 214!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ ) p1 ϩ 2][(1 Ϫ )p2 ϩ ]

ϭ ᎏ 2!1!1!(ᎏ 1 Ϫ )(1ᎏ ϩ )(1 ᎏ ϩ 2)(1 ᎏ ϩ 3)(1 ᎏ ϩ 4)

214!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p2 ϩ 2][(1 Ϫ )p2 ϩ 2]

ϩ ᎏ 1!2!1!ᎏ (1 Ϫ )(1ᎏ ϩ )(1 ᎏ ϩ 2)(1 ᎏ ϩ 3)(1 ϩᎏ 4)

214!(1 Ϫ )3p1 p2 p3[(1 Ϫ )p1 ϩ ] ϫ [(1 Ϫ )p2 ϩ 2][(1 Ϫ )p3 ϩ ]

ϩ ᎏ 1!1!2!(ᎏ 1 Ϫ )(1ᎏ ϩ )(1 ᎏ ϩ 2)(1 ᎏ ϩ 3)(1 ᎏ ϩ 4)

and the likelihood ratio is

(1 ϩ 3)(1 ϩ 4)[(1 Ϫ )(2p1 ϩ 2p2 ϩ p3) ϩ 5] LR ϭ 1ᎏ 2[(1 Ϫ ᎏ )p1 ϩ ]ᎏ [(1 Ϫ )ᎏ p2 ϩ ] ᎏᎏ

ϫ [(1 Ϫ )( p1 ϩ p2 ϩ p3) ϩ 5]

Case 6: Four-Allele Mixture, Two Heterozygous Suspects

The suspects are of type A1A2 and A3A4, and the crime sample is of type A1A2A3A4. The two propositions may be

Hp: The two suspects contributed to the stain. Hd: Two unknown people contributed to the stain.

The evidence sample is C ϭ Cg ϭ (A1A2A3A4) and c ϭ 4. Under Hp, the alleles from known contributors are T ϭ Tg ϭ

A1A2A3A4 and nT ϭ 2, hT ϭ 2, t ϭ 4. There are no alleles from unknown contributors or from people declared not to be contributors, so nV ϭ hV ϭ 0.

Under Hd, there are no alleles from known contributors, T ϭ Tg ϭ and nT ϭ hT ϭ t ϭ 0. The alleles from unknown contributors are constrained to be U ϭ A1A2A3A4 and x ϭ 2, r ϭ 0. The alleles from people declared not to be contributors are V ϭ A1A2A3A4 and nV ϭ 2, hV ϭ 2.

CURRAN ET AL. • INTERPRETING DNA MIXTURES 995

The required probabilities are 22(1 Ϫ )4p1 p2 p3 p4

Hp: P0({A1A2A3A4}, , ) ϭ (ᎏ 1 Ϫ )(ᎏ 1 ϩ )(ᎏ 1 ϩ 2) Hd: P2(, {A1A2A3A4}, {A1A2A3A4}) ϭ Q

where

224!(1 Ϫ )4p1 p2 p3 p4[(1 Ϫ )p1 ϩ ][(1 Ϫ )p2 ϩ ]

Q

ϭ

ϫ[(1 Ϫ )p3 ϩ ][(1 Ϫ )p4 ϩ ] ᎏᎏᎏᎏᎏᎏ

1!1!1!1!(1 Ϫ )(1 ϩ )(1 ϩ 2)

ϫ (1 ϩ 3)(1 ϩ 4)(1 ϩ 5)(1 ϩ 6)

and the likelihood ratio is

(1 ϩ 3)(1 ϩ 4)(1 ϩ 5)(1 ϩ 6) LR ϭ 2ᎏ 4[(1 Ϫ ᎏ )p1 ϩ ]ᎏ [(1 Ϫ )pᎏ 2 ϩ ] ᎏ

ϫ [(1 Ϫ )p3 ϩ ][(1 Ϫ )p4 ϩ ]

## Categories

## You my also like

### Advanced Topics in Forensic DNA Analysis Mixture Interpretation

441.2 KB4.4K533### In The Special Court :: Sonitpur, Tezpur:: Assam

644.5 KB7.3K1K### Principles of Inheritance & Variation

3.4 MB71.6K19.3K### You Are Being Lied To

2.3 MB14.4K3.9K### The Oxford Handbook of Criminal Law

3.5 MB18.1K5.2K### Writing Spaces: Readings on Writing

2.9 MB60K27.6K### Guidelines On Voluntary Contribution Under The Contributory

581.9 KB16K7K### Mla Citation Basics

304.1 KB11K1.9K### MLA FORMAT Citing eBOOKS eRESOURCES available through

20.8 KB47.6K18.6K