02 August 2022

Genomics

'Four misconceptions about investigative genetic genealogy' by Christi J Guerrini1, Ray A Wickenheiser, Blaine Bettinger, Amy L McGuire and Stephanie M Fullerton in (2021) Journal of Law and the Biosciences comments 

Investigative genetic genealogy (IGG) is a new technique for identifying criminal suspects that has sparked controversy. The technique involves uploading a crime scene DNA profile to one or more genetic genealogy databases with the intention of identifying a criminal offender’s genetic rel- atives and, eventually, locating the offender within the family tree. IGG was used to identify the Golden State Killer in 2018 and it is now being used in connection with hundreds of cases in the USA.Yet, as more law enforcement agencies conduct IGG, the privacy implications of the technique have come under scrutiny. While these issues deserve careful attention, we are con- cerned that their discussion is, at times, based on misunderstandings related to how IGG is used in criminal investigations and how IGG departs from traditional investigative techniques. Here, we aim to clarify and sharpen the public debate by addressing four misconceptions about IGG. We begin with a detailed description of IGG as it is currently practiced: what it is and—just as important—what it is not. We then examine misunderstood or not widely known aspects of IGG that are potentially confusing efforts to have con- structive discussions about its future. We conclude with recommendations intended to support the productivity of those discussions. 

 The authors argue 

 Investigative genetic genealogy (IGG) is a new technique for identifying criminal suspects that has sparked controversy. The process of IGG involves uploading a crime scene DNA profile to one or more genetic genealogy databases with the intention of partially matching it to a criminal offender’s genetic relatives and, eventually, locating the offender within their family tree. In April 2018, investigators announced the successful use of IGG to identify Joseph James DeAngelo as the Golden State Killer (GSK) responsible for at least 13 murders and 45 rapes throughout California in the 1970s and 1980s. According to one expert, by September 2020, IGG had led to the successful identification of over 150 suspects. 

Most of the major firms that maintain genetic genealogy databases—including Ancestry, 23andMe, and MyHeritage—have adopted policies that forbid law enforcement from participating in their databases for investigative purposes, either through requirements that users provide only their own DNA for analysis or explicit bans on the conduct of IGG in their databases. However, in December 2018, FamilyTreeDNA (FTDNA), which maintains a genetic genealogy database consisting of 1.15 million autosomal DNA profiles, adopted a policy permitting law enforcement to participate in its database to identify violent criminals and human remains. FTDNA database participants can choose whether to make their information available for law enforcement searches in a process known as ‘law enforcement matching.’  Specifically, registered participants located in the USA are automatically opted in to law enforcement matching, but they can choose to opt out of law enforcement matching at any time by selecting this option in their user profile. By contrast, on May 18, 2019, all registered participants of the genetic genealogy database known as GEDmatch were automatically opted out of law enforcement matching. At that time, the GEDmatch database consisted of approximately 750,000 unique single nucleotide polymorphism (SNP) profiles designated as public (in April 2020 that number was approximately 900,000). However, GEDmatch participants can choose to opt in to law enforcement matching at any time by selecting this option in their user profile. Individuals joining GEDmatch after May 18, 2019 are required to decide at the time of registration whether they will opt in to or out of law enforcement matching, where the opt-in choice is now selected by default. In December 2019, GEDmatch was acquired by Verogen, a forensic genomics company, and in January 2021, FTDNA’s parent company announced its merger with myDNA, an Australian personalized genomics company. 

As more law enforcement agencies use IGG in investigations, the privacy implications of the technique have come under scrutiny. Privacy concerns associated with IGG stem from the sources and types of genetic information maintained in genetic genealogy databases, which differ in important respects from the composition of law enforcement databases, such as the National DNA Index System (NDIS) in the USA. US law enforcement databases—which are generally referred to by the acronym Combined DNA Index System (CODIS) for the software that supports them—are comprised of DNA profiles of persons who have been convicted of, and in some cases arrested for, crimes. The profiles consist of 20 short tandem repeats (STRs) generated by accredited forensic laboratories that must comply with a host of quality assurance standards and requirements. 

By contrast, genetic genealogy databases are populated voluntarily by individuals interested in exploring their ancestry and family lineage. The genetic data that they contribute are autosomal DNA profiles consisting of 600,000–700,000 SNPs generated by commercial test providers. Unlike STRs, SNPs are more evenly (and densely) distributed throughout a person’s genome and hence can carry information about a person’s medical history and appearance. If analyzed with regard to patterns of linked variation along sections of chromosomes, SNPs can also be used to identify more distant genetic relatives than STRs. For these reasons, IGG represents an expansion over standard CODIS searching in terms of the population of persons whose genetic information might be searched in an investigation, even if the search objectives are different, and the kinds of information that are the basis for identification. Although this is presumably known and accepted by genetic genealogy database participants who opt in to law enforcement matching, the same cannot be said of all of their non-participant relatives whose names might become part of an investigation by virtue of the fact that they are members of a suspect’s family tree. Because most genetic genealogy database participants are persons of European ancestry, the privacy of their relatives is especially at risk. Moreover, IGG has reportedly been conducted in at least one database without the company’s— or their participants’—knowledge and consent. That practice would violate the database’s current terms of service, which explicitly prohibit IGG. 

While these issues deserve careful attention, we are concerned that their discus- sion is, at times, based on misunderstandings related to how IGG is conducted and used in criminal investigations and how IGG departs from traditional investigative techniques. Here, we aim to clarify and sharpen the public debate by addressing four misconceptions about IGG. We begin with a detailed description of IGG as it is currently practiced: what it is and—just as important—what it is not. We then examine misunderstood or not widely known aspects of IGG that are potentially confusing efforts to have constructive discussions about its risks and benefits. Along the way, we identify persistent concerns and controversies related to each misconception that might benefit from policy intervention. We conclude with broad recommendations intended to support the productivity of discussions about the future of IGG.