The Psychology of Numbers in the Courtroom: How to make DNA-Match Statistics Seem Impressive or Insufficient
Jonathan J. Koehler*
By now, everyone knows that forensic DNA analysis represents a stunning theoretical advance for the criminal justice system. The value of DNA analysis lies in its theoretical power to exclude large proportions of the population as potential contributors of genetic samples (e.g., blood, semen, hair). Thus, when a suspect’s DNA matches a DNA sample that is recovered from the scene of a violent crime, a prosecutor may suggest that the suspect is a likely source of that sample (and was therefore present at the crime scene) because the suspect is among the few who were not excluded as potential contributors. In most cases, the probative value of a DNA match is conveyed to triers of fact via frequency statistics that describe how common the DNA profiles are in a given population. A DNA match statistic of, say, one in one million means that approximately one person out of every one million in a population will match that DNA profile. An equivalent way to say this is that the chance that a randomly selected person will match this DNA profile is one in one million. To the extent that the DNA match statistic is low (e.g., one in many thousand, million, or billion), it is unlikely that the match between a suspect and a recovered crime scene sample is purely coincidental.
The early forensic DNA years were marked by controversy over the proper computation of DNA match statistics. Although disagreement has abated, little is known about how jurors think about and use DNA match statistics. It is widely assumed that DNA statistics are persuasive. That is, people assume that, after hearing that (a) a suspect matches traces of DNA evidence from a violent crime scene, and (b) the chance that a randomly selected person from the population would match is one in one million or one billion, jurors will be convinced that the matching suspect is indeed the source of the DNA. However, my own research with mock jurors indicates that the story is more complicated. Specifically, the way in which DNA match statistics are framed and presented to legal fact finders may affect how they think about and use the DNA evidence.
The possibility that the form in which jurors hear about the incidence of a matching DNA profile might affect how jurors perceive the probative value of DNA evidence captured the attention of the 1996 National Research Council [NRC II]. After noting that "scientifically valid testimony about matching DNA can take many forms," NRC II called for behavioral research on "the ability of jurors to understand the significance of a match as a function of the method of presentation." One member of NRC I made a similar plea years ago.
Consider the following four formulations of a one in one thousand DNA match statistic that might appear in a criminal case:
1. The probability that the suspect would match the blood specimen if he were not the source is 0.1%.
2. The frequency with which the suspect would match the blood specimen if he were not the source is one in one thousand.
3. One tenth of one percent of the people in Houston who are not the source would also match the blood specimen.
4. One in one thousand people in Houston who are not the source would also match the blood specimen.
These four formulations are legitimate, mathematically comparable ways to describe a one in one thousand DNA match statistic. However, they are psychologically different and their effect on jurors varies significantly. The bottom line appears to be this: When the statistic is framed in the language of probability (e.g., 0.1%) in a way that highlights a particular suspect’s chance of matching by coincidence, it tends to be persuasive. But when the statistic is framed in the language of frequencies (e.g., one in one thousand) in a way that highlights the chance that others will match by coincidence, it is much less persuasive. Similarly, DNA match statistics that target an individual suspect are more persuasive than the equivalent statistic that targets a broader population. Regarding the four formulations above, the first formulation provides the most convincing evidence, and the fourth formulation is least convincing. Indeed, as documented in the studies below, whereas a majority of jurors who are provided with the first formulation are at least 99% certain that the suspect is the source of the evidence sample, nearly as many who are provided with the fourth formulation are equally convinced that the suspect is not the source! In short, DNA match statistics that seem impressive when presented one way seem insufficient when presented another way.
This article describes a series of studies that I conducted with hundreds of jury-eligible subjects in Austin, Texas. The studies examine factors related to the presentation of DNA statistics that affect whether a juror will be impressed with and persuaded by the DNA evidence. The data are described in the context of a novel psychological theory for predicting when legal decisionmakers will and will not be impressed and persuaded by statistical evidence. Section I lays out the theory of interest, "exemplar cueing theory." Sections II, III and IV are three controlled behavioral experiments that develop and test key components of the theory. These experiments show how subtly different presentations of the identical DNA match statistics in a murder case affect (a) estimates of the chance that the suspect is the source of the matching DNA evidence, (b) estimates of the chance that the suspect is guilty, and (c) verdicts. One conclusion that emerges is that DNA match statistics that target the individual suspect and that are described as probabilities (e.g., "the probability that the suspect would match the blood drops if he were not their source is 0.1%") are more persuasive than statistical presentations that target a broad suspect population (e.g., all people in a large city) and that are framed as frequencies (e.g., "one in one thousand people in Houston would also match the blood drops"). Section VI contains a fourth controlled experiment that identifies some limiting conditions on target and frame effects that may have practical importance in the courtroom. This section also includes discussion of why mock jurors may not appear to be persuaded by statistics as small as one in one billion. Section VI is a brief summary of the experimental findings and a discussion of experimental limitations. Section VII is a broader discussion of the psychology of DNA statistics. After examining the difference between legitimate and illegitimate statistical forms of DNA statistics, Section VII discusses how prosecutors and defense attorneys can use exemplar cueing to their advantage in the courtroom. Section VIII is a brief closing comment.
I. THE THEORY
A large body of research on statistical reasoning suggests that people have poor intuitions when it comes to reasoning with statistics in general and forensic science statistics in particular. Recent research with mock jurors indicates that the impressions left by DNA statistics vary as a function of perceived and actual error rates, expectations, and the mathematical form of the DNA statistic. Recently, Laura Macchi and I offered a theory that may explain why some descriptions of DNA match statistics may have a larger impact on jurors than others. The theory may be described succinctly as follows: the perceived probative value of a statistical DNA match (and, by extension, other forensic match evidence) depends on the ease with which triers of fact can imagine examples of others who would also match the DNA profile. When triers of fact find it hard to imagine examples of others who might match by chance, the evidence will be treated as compelling proof that the matching suspect is the source of the recovered DNA evidence. But when such matches are easier to imagine, the evidence will seem less compelling. This theory is called "exemplar cueing theory." It is so named because it is based on an assumption that decisionmakers judge the probative value of a DNA match, in part, by the ease with which exemplars (examples) of others who might also match are cued (triggered) in their minds. When people find it hard to imagine such examples, it will seem reasonable to assume that the matching suspect is the source of the recovered DNA evidence. But when such examples are easier to imagine (for whatever reason), the evidence will seem less compelling or, perhaps, insufficient. In the experiments below, I show that attorneys and experts may be able to influence whether or not jurors tend to think about others who might match simply by wording the DNA statistic in particular ways.
Exemplar cueing theory has its roots in the "availability" heuristic from the judgment and decisionmaking literature. Availability holds that people judge the frequency or probability of an event by the ease with which exemplars come to mind. The problem with judging frequency in this way is that exemplars may be readily available in mind for reasons that have nothing to do with their prevalence in the environment. For example, events may be available because they are widely publicized, interesting, unusual, vivid, or personally relevant. To give an illustration, people overestimate the number of people who die from lightning, fire, and firearms because these "flashy" causes of death are over-represented in the news. Unlike availability, exemplar cueing is not a heuristic that people use to estimate a frequency or a probability. Instead, it is a theory about how people determine the subjective weight to assign to the hypothesis that the matching suspect is the source of the recovered evidence.
According to exemplar cueing theory, triers of fact do not base their judgments about the probative value of a DNA match on the magnitude of the DNA match probability (where perceived probative value is inversely proportional to the size of the match probability). Instead, the perceived probative value of a DNA match is inversely proportional to the ease with which coincidental match "exemplars" are cued or present themselves in mind. Thus, when triers of fact can easily think of instances in which such coincidences have occurred or can occur, they will find the evidence relatively less impressive. If true, this could lead to the unusual situation in which objectively strong DNA evidence (as measured by its match probability) is accorded less value than objectively weaker DNA evidence when exemplars are cued for the strong evidence but not the weak evidence. As I discuss later, this is precisely what occurred in at least one study.
One way to make people think about coincidental matches is to adjust the target of the DNA statistic away from the focal suspect and onto a larger reference population. This makes it easier for people to see that coincidental matches can occur, and that most of those who do match are not the source. Once the possibility of coincidental matches has been raised in this way, jurors may be less impressed with the DNA evidence. In contrast, when the target of the DNA match remains on the focal suspect, examples of coincidental matches are less likely to come to mind, and jurors will be more likely to treat small DNA matches as conclusive proof of identity.
To illustrate, suppose a juror hears that there is a one in one hundred thousand chance that a defendant who is not the source of the genetic evidence would match by coincidence. The juror will probably not give much thought to the possibility that the match was a coincidence because one in one hundred thousand is very close to zero. This might constitute overwhelming proof of identity in the juror’s mind. Now consider a juror who hears that one in every one hundred thousand people in, say, Houston who are not the source will match coincidentally. This juror may reason as follows: "If one in every one hundred thousand people in Houston match, then dozens of people from Houston would match, as would thousands of others throughout the U.S. For this reason, I do not find the evidence convincing as proof of identity."
I will refer to the difference between these two approaches as a difference in "target." DNA statistics may target a single defendant ("the frequency with which the suspect would match the blood sample if he were not the source is one in one hundred thousand") or they may target a broader class of potential matches ("one in one hundred thousand people in Houston would also match the blood drops"). I will refer to targets as "single" (suspect only) or "multi" (people in Houston).
Another factor that may influence whether jurors think about examples of coincidental matches is whether the DNA statistic is framed as a frequency (e.g., one in one hundred thousand) or as a probability (e.g., 0.001%). The two frames are mathematically—but not psychologically—identical. Research on probabilistic reasoning shows that people reason differently with frequencies and probabilities. Specifically, frequency frames encourage people to adopt a broad, "outside" view in which instant cases are viewed as a member of a larger class of events. This occurs because frequencies include a broad numerical reference class (e.g., one hundred thousand) within which to consider the instant case. In contrast, probability frames do not include a reference class or broader context within which to think about the instant case. Therefore, probability frames induce people to adopt a narrower, "inside" view in which instant cases are thought of in isolation. Applying this idea to the DNA context, frequency framings of DNA statistics may induce jurors to think about that one coincidental match out of one hundred thousand. In contrast, probability framings focus jurors’ attentions more narrowly on the suspect and discourage exemplar thoughts. If true, then we would expect jurors to be more impressed with and persuaded by DNA statistics that appear as probabilities (e.g., 0.001%) than those that appear as the equivalent frequencies (one in one hundred thousand).
In short, exemplar cueing theory predicts the way in which the presentation of DNA statistics influences whether or not jurors think about others who would also match by coincidence. If the DNA statistic focuses on the suspect (single target) or is framed as a probability (probability frame), then the evidence will be viewed as strong proof of identity. If the statistic promotes thoughts about others who might also match—either through the introduction of a broad target class (multi target) or the use of statistical frequencies (frequency frame), the DNA statistic will be less persuasive. Tests of this theory are presented below.
II. EXPERIMENT 1: CLINTON-LEWINSKY
I conducted an experiment involving 72 jury-eligible students ("mock jurors") in January 1998 at the University of Texas to test whether there might be a relationship between the ease of the exemplar generation (as influenced by the presentation of statistical DNA evidence) and the persuasiveness of the evidence. At the time of the experiment, the United States was abuzz with an as yet unsubstantiated rumor that President Clinton had an affair with former White House intern, Monica Lewinsky. Subjects were provided with the following background statement:
President Clinton has been accused of having a sexual relationship with former White House intern Monica Lewinsky. Mr. Clinton has denied the accusation. As part of his investigation of these and other activities by Mr. Clinton, Special Prosecutor Kenneth Starr has collected clothing from Ms. Lewinsky, including a black cocktail dress.
Next, jurors were asked to assume that the following series of events takes place:
(1) Ms. Lewinsky continues to maintain her silence, (2) some genetic material (i.e., semen) is recovered from Ms. Lewinsky’s cocktail dress, (3) a DNA expert reports that his tests could not rule out Mr. Clinton as a possible source of the recovered genetic material, (4) the expert also provides a statistic that describes the strength of the DNA match, and (5) Mr. Clinton’s attorney responds by arguing that the DNA evidence is irrelevant because there are no corroborating eyewitnesses.
The case materials were identical for all subjects with the exception of the wording of the fourth event. Half of the subjects (selected at random) received the following "single target / probability frame" (s/p) wording: "The probability that Mr. Clinton would match the genetic material if he were not the source is 0.1%." The other half received the "multi target / frequency frame" (m/f) wording: "One in one thousand people in Washington who were not the source would also match the semen stain." Next, all subjects estimated the probability that President Clinton was the source of the DNA on the dress. I predicted that s/p subjects would be less sensitive to the possibility of coincidental matches than m/f subjects and would therefore assign higher estimates for the probability that President Clinton was the source of the DNA.
Results and Discussion
Jurors’ responses were highly variable, ranging from 0% to 100%. This may have been the result of using a sex scandal vignette in a politically charged environment. Nevertheless, a nonparametric analysis revealed that s/p subjects thought it was significantly more likely that President Clinton was the source of the DNA than m/f subjects (82% vs. 60%). If the probability estimates for both sets of subjects seem low in the face of a presumed DNA match, it may help to recall that this experiment was conducted: (a) when President Clinton’s approval rating was at an all-time high (73%), (b) when 40% of Americans did not believe there was an affair, and (c) after several major news organizations (erroneously) reported that the FBI did not find evidence of semen on clothing taken from Ms. Lewinsky.
Another way of examining the results is in terms of the proportion of subjects who were certain or nearly certain that President Clinton was the source of the stain on the dress under the circumstances provided. The data showed that the proportion of subjects who were at least 99% certain that President Clinton was the source of the genetic material dropped from 28% among s/p jurors to 8% among m/f jurors.
Obviously, there are many variables at work in a loaded context such as this: politics, the influence of the news media, prior beliefs about the credibility of Clinton and Lewinsky, and prior beliefs about the nature of the relationship between Clinton and Lewinsky. Moreover, this experiment does not indicate whether the difference in target (single or multi), frame (probability or frequency) or both caused the observed differences between groups. In order to disentangle the central effects and put exemplar cueing theory to a more critical test, Experiment 2 was conducted using DNA matches in a hypothetical criminal context.
III. EXPERIMENT 2: TARGET AND FRAME EFFECTS
Imagine the following scenario: A masked man bursts into a Houston hardware store and announces his intent to rob it. Shortly thereafter, the owner of the store tries to overtake the would-be robber and strikes him on the head with a hammer. However, the robber shoots and kills the owner before fleeing from the store. Blood that does not belong to the storeowner is recovered at the crime scene and provides a DNA match with a suspect. At trial, a DNA expert testifies about the match and offers some probabilities that are intended to help jurors understand the probative value of the DNA evidence. Other than the DNA evidence, the case against the suspect is weak.
Ninety mock jurors read a sample experimental case. As in the Clinton-Lewinsky study, the information provided was identical for all jurors with the exception of a single sentence that described the DNA match statistic. The DNA match statistic, which was fixed at 0.001 (one in one thousand), was described in one of four mathematically equivalent ways: (1) single target / probability frame (s/p): "the probability that the suspect would match the blood sample if he were not their source is 0.1%"; (2) single target / frequency frame (s/f): "the frequency with which the suspect would match the blood sample if he were not their source is one in one thousand"; (3) multi target / probability frame (m/p): "0.1% of the people in Houston would also match the blood drops"; (4) multi target / frequency frame (m/f): "one in one thousand people in Houston would also match the blood drops." The mock jurors were assigned at random to one of these four groups (22–24 jurors per group) that differed only in terms of the presentation of the DNA match statistic. After reading the case, subjects (a) estimated the probability that the suspect was the source of the recovered DNA trace, "p(source)", (b) estimated the probability that the suspect was guilty of the murder, "p(guilt)", and (c) provided a verdict (guilty, not guilty).
Exemplar cueing theory predicts that jurors exposed to the frequency frame and multi target presentations of the DNA statistic will be more likely to think about examples of coincidental matches than jurors exposed to the probability frame and single target presentations. Consequently, I predicted that the frequency frame and multi target groups would not be as impressed by the DNA evidence and would assign lower values to p(source) and p(guilt), and return fewer guilty verdicts than the probability frame and single target groups.
Results and Discussion
As predicted, the results showed that jurors’ p(source) and p(guilt) estimates varied depending on how the DNA match statistic was targeted and framed. Mock jurors in the single target groups concluded that the suspect was significantly more likely to be the source and more likely to be guilty than did jurors in the multi target groups (p(source) averages: 79% vs. 61%; p(guilt) averages: 78% vs. 62%). Likewise, mock jurors in the probability frame groups concluded that the suspect was significantly more likely to be the source and more likely to be guilty than did jurors in the frequency frame groups (p(source) averages: 80% vs. 60%; p(guilt) averages: 74% vs. 66%). Jurors in the single target groups tended to return more guilty verdicts than jurors in the multi target groups (40% vs. 29%), although this difference fell short of statistical significance. Similarly, probability frame jurors tended to convict more often than frequency frame jurors (36% vs. 32%), but here too, the difference did not reach statistical significance. 
The proportion of "extreme responses" (1% or 99%) in the s/p and m/f groups provides insight into the influence these subtle presentation manipulations have on people’s beliefs about the value of the statistical evidence. Figure 1 shows the proportion of extreme p(source) responses made by jurors in the s/p and m/f groups. Most s/p jurors (63%) concluded that the suspect was at least 99% likely to be the source of the blood. Only 8% of the jurors concluded there was a 1% chance or less that the suspect was the source of the blood. In contrast, only 14% of the m/f jurors concluded that the suspect was at least 99% likely to be the source of the DNA. Incredibly, 32% of m/f jurors concluded that there was less than a 1% chance that the suspect was the source of the blood. In short, whereas more than one-half of the jurors in the s/p group were quite certain that the suspect was the source of the DNA evidence, about one-third of the jurors in m/f group were quite certain that the suspect was not the source. This disparity emerged even though the cases presented were identical across groups except for the wording of a single sentence that described identical DNA match probabilities.
Figure 1. Proportion of extreme (1% and 99%) p(source) estimates by mock jurors (DNA incidence rate = 0.001). S/P = single target, probability frame; M/F = multi target, frequency frame.
The major finding here is that jurors were substantially less impressed with the DNA evidence when it was presented in m/f form than in s/p form. Presumably, m/f jurors, unlike s/p jurors, were cued to imagine others who might match by coincidence, and assigned relatively lower p(source) and p(guilt) probabilities in response. Verdict, a binary variable, was a less sensitive measure and therefore did not differ significantly across most groups. Nevertheless, if future studies confirm the trends detected here, the legal system must take seriously the idea that the way in which a match statistic is worded by an expert or attorney can affect the way a juror thinks about the value of that evidence.
IV. EXPERIMENT 3: DUAL PERSPECTIVE
In an actual trial, the defense and prosecution may try to describe DNA evidence in ways that are most favorable to their position. In cases that include a DNA match, the prosecution may wish to present the evidence in terms of probability frames and single targets while the defense recharacterizes that evidence in frequency frames and multi targets. This prompts the empirical question of how persuasive the DNA evidence is for mock jurors who are presented with both target and frame perspectives.
A new set of 227 jury-eligible University of Texas subjects participated in this study for credit in an introductory business law class. Subjects were assigned at random to one of six experimental groups and provided with a hypothetical case and set of questions that were largely the same as those given to mock jurors in Experiment 2. Indeed, four of the six groups were identical to those used in Experiment 2 (twenty-five to thirty jurors per group). The two remaining groups were dual perspective groups (fifty-five to fifty-eight jurors per group). These included both an s/p and an m/f format for the DNA statistic. One of the dual perspective groups used an s/p–m/f order and the other used an m/f–s/p order.
How will dual presentation jurors respond? One possibility is that dual presentation jurors will respond similarly to m/f jurors. This is because both types of jurors will be exposed to a presentation form that induces them to think about others who might match. By this reasoning, the dual presentation jurors and m/f jurors will be less impressed with the evidence than s/p jurors. A second possibility is that dual presentation jurors will "anchor" on their initial impressions of evidence strength and pay less attention to the recharacterized statistical evidence. This is consistent with the anchoring and adjustment theory of decisionmaking, and the primacy effects predicted by the belief adjustment model for single-judgment decisionmaking tasks. A third possibility is that dual presentation jurors will adopt a little of each perspective and end up with views on the critical questions between those of the s/p and m/f groups.
Results and Discussion
The data from the two dual presentation conditions did not differ as a function of the order in which the s/p and m/f perspectives appeared. Therefore, the data from these groups (Groups 5 and 6) were collapsed into a single group for all other analyses.
The data revealed that mock jurors’ p(source) and p(guilt) estimates varied across the five groups. S/p jurors were most impressed with the evidence (p(source) average = 77%, p(guilt) average = 68%) and m/f jurors were least impressed (p(source) average = 38%, p(guilt) average = 38%). Dual presentation jurors gave responses that fell in between (p(source) average = 50%, p(guilt) average = 46%). On the verdict measure, dual presentation jurors were about as willing to convict as s/p jurors (22% vs. 26%). However, m/f jurors rarely convicted (3%).
These data are important for two reasons. First, they replicate the presentation effects in Experiment 2, which showed that the way statistical DNA evidence is targeted and framed affects the persuasive value of that evidence. Second, the data suggest that introduction of a dual perspective on the DNA statistic changes the way jurors think about DNA evidence. Some of the judgments and decisions made by jurors who received both s/p and m/f statistical presentations differed from those of jurors who received the s/p or m/f presentation alone. This suggests that introduction of the second presentation, particularly an s/p presentation, changes how jurors think about the evidence.
Those who received the m/f perspective in addition to the s/p perspective gave lower p(source) and p(guilt) estimates than those who received only the s/p perspective. This may have occurred because the m/f perspective reminded the jurors that others would also match. Nevertheless, these dual presentation jurors convicted about as often as the s/p jurors. Thus, although the m/f perspective weakened their belief about the defendant’s guilt, it apparently was not enough (in this scenario) to change their minds about the verdict.
Those who received the m/f presentation alone were notably less persuaded by the evidence than were those who received the s/p perspective in addition to the m/f perspective. Apparently, jurors in this group focused on the possibility that others would also match, and few were impressed with the evidence under these circumstances. Importantly, the conviction rate increased notably (from 3% to 22%) once the s/p perspective was introduced. Apparently, introduction of the s/p perspective can refocus jurors’ attention on the unlikelihood that this particular defendant would match if he were not the source even among jurors who previously were not persuaded by the evidence.
V. EXPERIMENT 4: SMALLER MATCH PROBABILITIES
The first three experiments used DNA matches that had an incidence rate of one in one thousand. But DNA match probabilities that find their way into the courtroom are often as small as one in millions, or even billions. Will the way in which the statistics are presented really matter for such small match probabilities? The answer provided by exemplar cueing theory is that it depends on whether jurors are able to generate coincidental match exemplars in some presentations but not others. If match exemplars cannot be generated in any presentation group, then we would expect that the effects for target and frame that were observed in previous experiments to disappear.
With this in mind, Experiment 4 examined the effects of target and frame at very low DNA incidence rates (one in one million, one in one billion) on 282 jury-eligible students. The case materials were identical to those used in Experiments 2 and 3. Subjects were assigned at random to one of eight groups. The groups consisted of all possible combinations of target (single, multi), frame (probability, frequency) and incidence rate (one in one million, one in one billion) groups.
I predicted that the target and frame effects would be smaller or even nonexistent at these very low incidence rates. This prediction followed from the idea that exemplars may be hard to generate in any of the target and frame groups (including the m/f group) for very small incidence rates.
Results and Discussion
As expected, the data in Experiment 4 showed that the target and frame effects observed earlier depend on incidence rate levels. At the one in one million incidence rate, there were small differences between the single target and the multi target on p(source) estimates (81% vs. 76%). There were also small differences between the probability frame and the frequency frame on p(source) estimates (82% vs. 74%). Similarly, there were small differences in p(guilt) estimates and verdicts across the different target and frame groups in the expected direction. Apparently, then, the form of the DNA statistic still mattered a bit even at an incidence rate as small as one in one million. This suggests that some people may think about match exemplars even when the incidence rate is very small.
At the one in one billion incidence rate, all differences between the different target and frame groups disappeared. For example, there were no differences between the single target and the multi target on p(source) estimates (83% vs. 83%) or between the probability frame and the frequency frame on p(source) estimates (84% vs. 82%). As the exemplar cueing theory acknowledges, the notion that a one in one billion match could be coincidental is, for most people, farfetched because it is hard to imagine examples of others in any reference population (except, perhaps, the entire world) that would match by coincidence. It appears, then, that exemplar reasoning requires a reference space that is sufficiently large so that exemplars may be identified.
One in One Billion: Not Persuasive?
At first blush, one surprising finding from Experiment 4 is that jurors were not utterly convinced by the DNA statistics, even when the match statistic was one in one billion. These jurors assigned average source and guilt probabilities of 83% and 77% respectively. Why weren’t they absolutely convinced by the DNA evidence?
One reason is that the hypothetical case used in the experiment was weaker than the typical DNA case. Aside from evidence of a DNA match, the prosecution did not offer additional evidence against the defendant. I intentionally used such a case in order to avoid "ceiling effects" in which jurors from all experimental groups became firmly convinced of the defendant’s guilt. Ceiling effects caused by overwhelming evidence of guilt could mask the subtle effects predicted by exemplar cueing theory.
Second, the reluctance of some jurors to assign extremely high p(source) and p(guilt) values in mock DNA cases is now a fairly consistent finding in the mock juror literature. This suggests that mock jurors can only be so impressed with DNA evidence. As the DNA match probability gets smaller, the probative value of the DNA evidence (holding aside error rate considerations) gets larger. However, the weight that mock jurors assign to such evidence may be limited for several reasons. One possibility is that jurors understand that the value of DNA evidence is limited by the chance that the reported match is erroneous. Support for this is found in a recent study that reported that the average juror assumes the chance of an erroneous DNA match call is on the order of one in fifteen. Another possibility is that jurors regard a DNA match to be a form of "naked statistical evidence," and are unwilling to assign extreme weights to such evidence without corroborating evidence.
Finally, the summary data showing jurors’ reluctance to assign extremely high source and guilt estimates when presented with one in one billion DNA match statistics may be misleading. The 83% source probability and the 77% guilt probability values are averages (means). As such, they include responses from all jurors, including a few who gave low source and guilt probability estimates. A few low estimates can move an average (i.e., an arithmetic mean) away from the median values, thereby giving a misleading impression of the "typical" juror. In fact, the median p(source) and p(guilt) estimates for the jurors in the one in one billion group were 98% and 90% respectively.
Nevertheless, the minority jurors should not be ignored. Six percent of jurors who were provided with the one in one billion DNA match statistic (9 of 147) assigned source probabilities 10%, and 3% (5 of 147) assigned a source probability 1%. Eight percent of the jurors (12 of 147) assigned guilt probabilities 10%, and 4% (6 of 147) assigned a guilt probability 1%. These jurors were almost certainly confused. Apparently they thought that the extremely small DNA match probability made it less (rather than more) likely that the suspect was the source. If such confused jurors appear at rates of roughly 6% or 8% on actual juries, the odds are that a jury of twelve will have one or more such individuals. This could be the difference between a conviction and a hung jury.
VI. The Experiments: Summary and Limitations
This paper represents the first attempt to offer and test a theory for predicting when legal decisionmakers will and will not be impressed with low probability DNA match statistics. The results dramatically illustrate that the persuasive power of DNA evidence may depend, in part, on how the match statistics are presented and framed. Experiments 1, 2, and 3 showed that when evidence is presented in ways that prompt triers of fact to think about examples in which coincidental DNA matches might occur, the evidence loses some of its persuasive force. But when such examples are not readily called to mind, the evidence has more persuasive force. Experiment 4 provided an important limiting condition for target and frame effects. When DNA incidence rates are so small that exemplar generation is difficult under any target and frame formulation, the way in which the statistics are presented is less significant.
All experiments have limitations and those presented here are not exceptions. Below, I briefly address two limitations, one imagined and one real. I also review data on one small, but telling, study.
1. "Students Aren’t Jurors": Not a Fatal Flaw
The subjects in these experiments were jury-eligible university students rather than either a broader cross-section of people or a sample of empanelled jurors. Although university students are younger and more intelligent than the average juror, a large amount of empirical literature provides little reason to believe that patterns of data obtained from student-subjects fail to generalize to the jury population. Moreover, in a previous DNA study, patterns of data obtained from student-subjects did not differ meaningfully from patterns obtained from empanelled jurors. Thus, the statistical confusion that some of our subjects exhibited in Experiment 4 probably exists among actual jurors. Stuart O’Brien finds support for this proposition in an unpublished study.
Four confused jurors—The O’Brien study
Stuart O’Brien gained access to four actual jurors in a Texas capital murder case several years after the jury convicted the defendant. The conviction was partially based on a PCR DNA blood match between the victim and a spot on the clothing owned by the defendant. The test revealed that the blood matched the blood type of the victim. The test gave a DNA match statistic of one in twenty (5%).
O’Brien posed a hypothetical to the four former jurors that was similar to the hypothetical case used in the experiments reported here. He asked them to imagine a murder case in which an expert testified that (a) DNA evidence recovered from clothing worn by a suspect matched the victim, and (b) the frequency of this DNA profile in the general population is one in one hundred. O’Brien then presented the former jurors with a series of statements related to the meaning of the one in one hundred statistic and asked them to indicate whether each was true or false.
In general, their performance was abysmal. One of the jurors accepted as true the misstatement that "A one in one hundred frequency indicates that there is a 99% chance that the victim is the source of the evidence." This is the source probability error. The error consists of equating the frequency of a profile (e.g., a DNA profile) with the probability that a person who matches the profile is not the source of that profile. Thus, when judges, experts, and attorneys claim that a DNA match probability of, say, one in one million means that there is only one chance in a million that the suspect is not the source of the recovered sample, they have committed the source probability error.
The statistical error committed by the other three jurors was worse. They concluded that the one in one hundred statistic indicated that there was only a 1% chance that the blood belonged to the victim. This is false. By equating the profile frequency with the source probability, these jurors turned the notion of probative value on its head in much the same way as did some of the mock jurors in Experiment 4. If the profile frequency actually did equal the source probability, then an extremely rare blood match (e.g., one in one million) would be less probative than an extremely common blood match (e.g. four in five) because there would be an 80% chance that the blood belonged to the victim in the latter case but less than a 1% chance in the former case. This is obviously wrong. Apparently, then, we have little reason to believe that people who reason badly about DNA statistics in a laboratory setting will improve in a courtroom setting.
2. "Real Cases Are More Complex": A Better Criticism
A more useful criticism of the present studies is that the case materials were not as rich as those that jurors have in an actual trial. This was intentional. In a controlled scientific study, it is important to minimize interference from other variables to determine whether there is a causal relationship among the variables of interest. However, such causal control comes at a price, namely, a reduction in realism. In actual DNA cases, jurors are probably influenced by numerous factors other than the way in which the DNA evidence is presented. For example, real jurors hear extended opening and closing arguments, observe direct and cross-examinations of witnesses, deliberate as a group, and consider the consequences of their verdicts.
Future researchers may wish to trade off some of the rigor and experimental control of the present studies in favor of increased realism. For example, they might wish to compare outcomes in cases where the DNA statistics were presented in s/p form with those in which the statistics were presented in m/f form. Although causal relationships may not be inferred from such a correlational study, these data could bolster or challenge the exemplar thesis.
VII. General Discussion: The Psychology of DNA Numbers
Some may find it alarming that the wording of a statistic can affect the chance of a defendant’s guilt in the eyes of a juror. Yet this observation is consistent with a large body of research on the psychology of numbers. Much of this research indicates that people think heuristically rather than probabilistically. That is, when presented with quantitative information, we do not perform algebraic computations and arrive at solutions by using tenets of logic and probability theory. Instead, we evaluate quantitative evidence via mental shortcuts and other rules of thumb. In the case of DNA evidence, the ease with which we can imagine scenarios or examples of a match other than the suspect may be the heuristic of choice.
A. Statistical Form vs. Statistical Fallacy: An Important Distinction
As noted earlier, all of the target and frame presentations used in the experiments reported here are legitimate, mathematically comparable ways of presenting a DNA match statistic. In contrast, some reformulations of statistical evidence are fallacious. For example, it is not legitimate to present the DNA match statistic as a statement about (a) the probability of guilt or innocence, (b) the source probability, or (c) the probability that another person would match. These fallacious reformulations implicitly assume knowledge about the strength of the non-genetic evidence against the matching suspect, or the size of the reference population (i.e., the number of others who could be the source).
Admittedly, the multi-target presentation (which identifies the proportion of others in a population who would match) bears a superficial resemblance to the "defense attorney’s fallacy" in which statistical evidence is undervalued or dismissed on the grounds that innocent individuals would also be falsely implicated by the match. The difference between the multi-target presentation technique and the defense attorney’s fallacy, however, is that the latter includes a faulty interpretation of the fact that the statistical profile is not unique. Other things being equal, nonunique evidence is less probative than unique evidence; however, nonunique evidence may still be extremely probative. The multi-target presentation encourages an awareness of the nonuniqueness of the evidence, but it does not provide an interpretation (faulty or otherwise) of that evidence.
B. Which Statistical Form is "Best?"
Although the different ways of presenting the DNA match statistic offered here are legitimate, they may not be equally likely to produce a proper weighting of the DNA evidence. As noted earlier, previous research suggests that jurors underweight statistical evidence, including DNA evidence. This would seem to bolster the argument that evidence should be presented in s/p rather than m/f form because the former is associated with greater perceived weight for the DNA evidence.
On the other hand, studies also show that people reason better with statistical data that are presented as frequencies rather than as probabilities. People are not only more likely to reason in accordance with the logic of probability theory with frequencies than with probabilities, but a frequency presentation decreases the risk that people will subscribe to various statistical fallacies.
One reasonable solution is to employ frequency presentations in which efforts are made to discourage people from devaluing the DNA evidence on grounds of nonuniqueness. Alternatively, decisionmakers could be informed that there are different ways of presenting the same statistical information. These proposed solutions deserve empirical study.
C. Strategic Considerations for Attorneys
On a broad level, the message of these experiments for trial attorneys is that they should think not only about the objective strength of their statistical evidence but also about the imagery that their evidence evokes. For more than twenty years, social scientists have known that laypeople underweight statistical information because it is dull and lacking in vividness. The present studies suggest that statistical information can overcome its natural propensity to be dull and unremarkable when it is presented in a way that promotes more concrete imagery about other exemplars.
At a slightly narrower level, the strategic implications of these data for trial attorneys and experts seem clear. Prosecutors, prosecution experts, and others who wish to persuade people using DNA match evidence should favor an s/p (single target/probability frame) perspective for the DNA statistics: "The probability that the suspect would match the blood drops if he were not their source is 0.01%." In contrast, defense attorneys and defense experts who wish to cast doubt on the DNA match evidence should favor an m/f (multi-target/frequency frame) perspective for the DNA statistics: "One in ten thousand people in Houston who are not the source would also match the blood drops." This is particularly true in DNA cases involving relatively large match probabilities (e.g., one in hundreds or thousands) as opposed to very small match probabilities (e.g., one in millions or billions).
1. Prosecution: Beyond Single Targets and Probability Frames
The theoretical point of this paper is that presentations of statistical evidence that call match exemplars to mind will cause the evidence to seem weaker than presentations that do not call such exemplars to mind. By logical extension, presentations that suggest exactly zero match exemplars should induce perceptions of great evidence strength. In a recent study, Laura Macchi and I showed that the ordinarily defense-friendly m/f method of presenting DNA statistics could be turned into a prosecution-friendly method by blocking people’s cognitive access to exemplars. We accomplished this by offering mathematically equivalent variations of the DNA match frequency statistic to hundreds of mock jurors. Three versions of two different DNA match incidence rates were used. For a one in one thousand incidence rate, the statistics were presented as "one out of one thousand," "0.1 out of one hundred," or "two out of two thousand." For a one in one hundred thousand incidence rate, the statistics were presented as "one out of one hundred thousand," "0.1 out of ten thousand," or "two out of two hundred thousand." The three formulations within each incidence rate were mathematically, but not psychologically, identical. We predicted that the fractional numerators (0.1) would make it harder for people to think about others who might also match. Indeed, the near-zero numerators might even suggest that no one else is likely to match ("0.1 out of ten thousand sounds a lot like no matches to me"). In contrast, we predicted that numerators larger than one would invite thoughts of others who might match. ("Two out of one hundred thousand? I wonder which of the two is the source.")
As expected, we found that jurors were more impressed by the DNA evidence when it was presented with a fractional numerator, and less impressed when it was presented with a numerator larger than one. This effect was so pronounced that jurors were more persuaded by "0.1 out of one hundred" than they were by "one out of one hundred thousand" or "two out of two hundred thousand" (p(source)=63%, 53% and 46% respectively). In other words, jurors regarded objectively weaker evidence (0.1 out of one hundred) to be more persuasive than objectively stronger evidence.
2. Defense: Beyond Multi Targets and Frequency Frames
a. Tiny DNA match statistics do not account for laboratory error
When DNA match probabilities are extremely small (e.g., one in millions or billions), the prosecution obviously has the upper hand. However, defense attorneys may point out that those incidence rates do not account for the possibility that the reported match is not, in fact, a true match. When the possibility of a false positive laboratory error as estimated by laboratory proficiency tests is taken into account, the underlying probative value of the reported DNA match may be more like one in several hundred or several thousand. If defense attorneys can persuade jurors to think about DNA evidence not merely in terms of what it indicates about the possibility of a coincidental match, but also what it indicates about the more inclusive possibility of a coincidental match or a false positive error, they may be able to invoke exemplar imagery.
b. Use the largest plausible reference population
The evidence presented here shows that attorneys are not at the mercy of the DNA statistic. For one thing, attorneys can frame the statistics. They can also invoke reference populations of different sizes to encourage or to discourage the production of match exemplars. For a small match statistic (e.g., one in one million), a prosecuting attorney who invokes a relatively small reference population (e.g., the city of forty thousand people in which the crime occurred) makes it difficult for jurors to think about others who might match. This may be true even if the defense attorney invokes an m/f frame. Jurors may realize that the chance of locating even a single coincidental match in the city is so small that the DNA match statistic constitutes overwhelming proof of identity.
However, the defense attorney may wish to invoke a much larger reference population (e.g., the county or state). This can be justified by arguing that smaller reference classes presume that there is no possibility whatsoever that the donor of the genetic evidence came from outside the local area. Defense attorneys might do well to strike a balance here between choosing a reference class that is large enough to evoke match exemplars yet small enough so as not to seem absurd. In most cases, the largest reference class, all the people in the world, would probably fail this intuitive test of absurdity. After all, it would strain credulity to suggest that a rape that occurred in a small town in Indiana might have been committed by any of the 500,000,000 males in China.
VIII. Final Comment
This paper began with the assertion about the well-known power of DNA analysis in the courtroom. I caution that this power is a theoretical one. In practice, DNA analyses produce statistics that may or may not persuade jurors about whether a particular person is the source of a particular sample. The studies offered here suggest that the persuasive value of DNA evidence may depend as much on an understanding of the psychology of numbers as it does on the underlying science and the statistical expression of that science.