This overview describes the technique known as investigative genetic genealogy1 and distinguishes it from the investigative use of law enforcement DNA databases, such as those in the Combined DNA Index System (CODIS). The overview also identifies issues for agencies considering the use of investigative genetic genealogy in their investigations.
Investigative genetic genealogy refers to the DNA analysis of a crime scene sample or unidentified human remains (UHR) sample (both of which are referred to as a sample from a "person of interest"), using a high density single nucleotide polymorphism (SNP) array2 to generate genetic data that is used to search a third-party3 SNP database for genetic relatives of the person of interest. The potential genetic relatives identified by that search may then be researched by genealogists and law enforcement using genealogical and other available records to assess possible genealogical relationships to the person of interest. The goal of the SNP database search and genealogical research is to provide leads to identify the person of interest. There are several points during this process at which law enforcement intersects with vendor laboratories and third-party SNP databases. The steps of this process4, some of which are described in more detail below, typically include the following:
Most third-party SNP databases are maintained by direct-to-consumer (DTC) genetic testing providers. Currently, the providers that have the largest databases are Ancestry, FamilyTreeDNA, MyHeritage, and 23andMe.6 These providers generate genetic data from customers' DNA samples using SNP microarrays that produce between 600,000 to 700,000 SNPs.7 They then analyze certain markers in these SNPs for a range of purposes. Depending upon the provider, these purposes might include reporting to customers their biogeographical ancestry information and identifying potential genetic relatives for customers who elect to participate in relative matching services. They might also include reporting to customers their health, wellness, trait conditions or predispositions.8
Many DTC genetic testing providers maintain their customers' SNP data in a database. However, the providers with the largest databases also allow the removal of a customer's data, at any time, upon request. In addition, many DTC genetic testing providers permit customers to retrieve their data to personally maintain, control and share their SNP file. Individuals can share their SNP file with researchers and third-party services, such as GEDmatch, that offer to interpret their SNP data.9 GEDmatch does not provide genetic testing services but instead provides a central location for users to upload and share their SNP file and provides tools to users to help them identify possible genetic relatives among other users in the database.10 GEDmatch allows participation in its database and use of many of its tools for free.
Relative matching services offered by DTC genetic testing providers and GEDmatch function by ranking and reporting a user's relative matches according to how many centimorgans (cM) are shared between the user's SNP data and each potential genetic relative's SNP data. A cM is a measure of shared DNA segments or genetic linkage. The higher the number of shared cM, the closer is the potential familial relationship.11 Upon receiving this information, the user (or one or more genetic genealogists working on behalf of the user) can evaluate the genetic results and review/construct family trees in an effort to determine genealogical relationships and identify other relatives. In some instances, relatives identified in a family tree might volunteer DNA samples for analysis to support or eliminate genetic relationships.
Participation in genetic databases maintained by DTC genetic testing and third-party services such as GEDmatch are typically governed by some combination of terms-of-service policies, privacy policies, and informed consent agreements. These policies vary by provider,12 but they generally inform users how their genetic information will and potentially might be used, shared, and searched. Providers often post their policies online, and users are required to accept (or are deemed to have accepted) the policies upon purchase of the services, registration with the provider, or use of services. Providers expect that users will comply with their policies, and users expect that other users will comply with the policies as well.13
Currently, two providers – FamilyTreeDNA and GEDmatch – allow law enforcement to search their genetic databases for investigative purposes under the same terms as other users14,15 with the important caveat that their policies are dynamic and subject to change. At this time both providers permit law enforcement to use their databases to assist in identifying UHRs. Additionally, FamilyTreeDNA allows law enforcement to use its database to assist in identifying perpetrators of homicide and sexual assault, whereas GEDmatch permits law enforcement to use its database to assist in identifying perpetrators of violent crime, which GEDmatch defines as murder, non-negligent manslaughter, aggravated rape, robbery, and aggravated assault.
Both providers have addressed individual genetic privacy by allowing users of their genetic databases to choose to exclude their SNP data from law enforcement searches. FamilyTreeDNA allows users to opt out of law enforcement searches, whereas GEDmatch users are automatically opted out of law enforcement searches but may choose to opt in to such searches. However, individual privacy rights may be implicated in other ways, such as when law enforcement approaches a person to obtain that person's DNA for testing, knowing that the person is not the suspected perpetrator but appears to be a relative of a person of interest based on a search of a SNP database.16 The SNP analysis of possible relatives of the person of interest is sometimes referred to as "targeted collection" or "targeted analysis." Other issues relating to privacy may include: the terms of consent for knowingly providing a DNA sample for further testing if identified on a family tree relating to a person of interest; or, so as not to disclose the interest of law enforcement in an individual, the surreptitious collection of an abandoned DNA sample(s) from a relative of a person of interest.
Generally, law enforcement agencies that currently employ investigative genetic genealogy are using this technique on cold cases or active, unsolved violent crimes.17 When operating in collaboration with a forensic laboratory, the law enforcement agency will first send the sample from the person of interest to the forensic laboratory for analysis and searching of any resulting STR DNA profile in their state and the national DNA databases. If the search of the STR DNA profile within the law enforcement DNA databases does not produce any investigative leads, the law enforcement agency or forensic laboratory may decide to employ investigative genetic genealogy in accordance with applicable agency protocols. This may include a review to ensure that there is sufficient crime scene sample and/or DNA extract to proceed with this technique as well as consultation with the prosecutor's office to obtain approval (and to ensure that any reasonable leads will be pursued). Once approval to proceed is obtained, the law enforcement agency and/or laboratory will transmit the crime scene sample and/or DNA extract to a commercial genetic laboratory that will analyze the sample/extract using SNP technology.
The resulting SNP data are either provided to the law enforcement agency for searching in third-party SNP databases that permit it, or a law enforcement agency may contract out the genealogy services and searching of the SNP data to a genetic genealogist or commercial laboratory. The search of the third-party SNP database returns a list of the potential genetic relative matches ranked by the amount of shared DNA. Once the genetic search results are obtained, the genetic genealogist will assess the genetic distances or relationships between matches and, based upon the amount of DNA shared among the matches, construct and review family trees. Working in conjunction with the law enforcement agency, other non-genetic investigative databases and research can be used to eliminate individuals based on sex, age, location, and other information to generate the best investigative leads. As an additional step, when a likely person of interest is identified, the law enforcement agency will obtain a DNA sample from the person identified with the crime scene for forensic STR analysis for exclusionary/inclusionary purposes. In UHR investigations, the law enforcement agency will obtain a DNA sample from the relative(s) of the person of interest for forensic DNA analysis for exclusionary/inclusionary purposes.
In contrast to these genealogy services and databases, law enforcement DNA databases are only accessible to criminal justice agencies through their forensic DNA laboratories. The DNA samples are analyzed for designated law enforcement identification markers (20 STR loci known as the CODIS Core Loci) that have been specifically selected because they are not predictive of medical conditions or disease status.18 DNA records in the law enforcement DNA databases have been collected, analyzed and databased in accordance with applicable state and federal laws that specify offenders and, in some cases, arrestees that are required to provide DNA samples. For UHR investigations, DNA samples from family members of missing persons can be voluntarily submitted and the profiles entered into a separate index of the law enforcement databases.19
Using a law enforcement DNA database, the crime scene or UHR DNA profiles are searched against the offender/arrestee DNA profiles to generate candidate matches which are reviewed and as appropriate, confirmed. The confirmation process requires the verification of the offender's biographical information and, in many laboratories, a reanalysis of the offender's/arrestee's DNA sample and direct comparison to the crime scene or UHR profile as an additional quality assurance measure. It is only following confirmation that the forensic DNA laboratory releases the name of the person of interest to the law enforcement agency that submitted the crime scene evidence or UHR sample.20 At present, most forensic DNA laboratories do not have the capability to develop the SNP data needed for genealogy database searching purposes. Accordingly, law enforcement agencies will typically outsource the analysis of the crime scene or UHR sample to a laboratory to obtain SNP genetic data to search in the third-party SNP database.
Although some commentators have compared investigative genetic genealogy to forensic familial DNA searching,21 familial searching is an additional search of a law enforcement DNA database which is conducted after a routine search of the crime scene evidence DNA profile has been completed with no resulting matches. Whereas both approaches involve working with nongenetic investigative databases, investigative genetic genealogy begins with a search of a thirdparty SNP database. A forensic familial search is performed using the STR DNA profile, and with a different search algorithm, for the purpose of identifying close biological relatives of the unknown DNA profile developed from the crime scene sample. Familial searching can create privacy concerns that are similar to those associated with investigative genetic genealogy. In consideration of these privacy concerns, jurisdictions employing familial searching have generally established the following threshold criteria for the use of this searching tool: (1) the search is used for single source crime scene DNA profiles; (2) the search is limited to serious violent felony offense cases such as homicide, sexual assault, and kidnapping; (3) all other investigative leads must have been exhausted; and (4) the prosecutor must agree to pursue investigative lead information in the case.22
Law enforcement agencies exploring the use of investigative genetic genealogy should include the relevant legal authority representative in their discussions and consider the following:
Approved by the Scientific Working Group on DNA Analysis Methods February 18, 2020
The Scientific Working Group on DNA Analysis Methods, known as SWGDAM, serves as a forum to discuss, share, and evaluate forensic biology methods, protocols, training, and research to enhance forensic biology services as well as provide recommendations to the FBI Director on quality assurance standards for forensic DNA analysis.
More information on The Scientific Working Group on DNA Analysis Methods is available on the websiste: https://www.swgdam.org/
Article posted March 10, 2023