03 August 2020

Genealogy Data Breaches

In March this year Julia Creet commented 

Surprising news recently emerged from the personal genetics business. The two leading direct-to-consumer companies in North America, 23andMe and Ancestry.com, announced within a week of each other that they were laying off a significant proportion of their workforce as a result of a steep drop in sales. This past Christmas, the sales of testing kits were expected to take a sharp hike — nothing says family like a gift that says prove it. But sales plummeted instead.
 
According to Second Measure, a company that analyzes website sales, 23andMe's business plummeted 54 per cent and Ancestry kits sales declined 38 per cent.
 
Industry executives, market watchers and genealogists have all speculated about the causes of the drop in consumer interest. Market saturation? Early adopters tapped out? Limited usefulness? Recession fears? Whatever the theory, everyone seems to agree on one factor: privacy concerns.
 
For observers like me, who have been watching the trends in the industry of family history for years and have repeatedly raised concerns about genetic and family privacy, there's a certain relief that consumers have taken notice.
 
Two third-party uses of genetic genealogy have given consumers pause for thought.
 
One: Almost every database shares information with the pharmaceutical industry. 23andMe was clear from the beginning that its health information would be used by its research partners and asked consumers to consent. But when it started to sign major deals with drug developers in 2015, consumers began to realize that, once again, similar to social sharing platforms, they were the product. A fact not so surprising from a company whose initial investors were from Google and Facebook.
 
Still, as long as testing prices were low and continued to fall, consumers bought the sell. Companies promised consumers they were contributing to a greater good. Medical science could use their genetic information to develop treatments, even if they might never need the drugs (or indeed if any drugs would ever be developed).
 
So even though the companies were profiting from their information, the number of people sending in their spit grew exponentially. Business was going well. Then a second third-party use was revealed and sales started tumbling. ...
 
Shortly after California detectives announced they had used GEDmatch, a public genetic genealogy database, to solve the cold case of a sadistic rapist and killer known as the Golden State Killer, the exponential rate of growth in the industry began to decline. That 2018 case set off a wave of privacy concerns about genetic genealogy and divided people who had already submitted their samples.
 
Almost overnight, a new industry was hatched using genetic genealogy databases to solve cold cases. GEDmatch, the company at the centre of the debate, was caught in the middle.
 
The GEDmatch founders, a couple of genealogists who just wanted to provide a place for genealogists to share DNA results without the privacy restrictions of the testing companies, eventually sold the company after attempting and failing to align its privacy policy with something viable for consumers and the company.
 
Sealing the marriage of genetic genealogy with policing, GEDmatch sold its database to Verogen, a forensics equipment company that services law enforcement. Ironically, Verogen promised it would offer better privacy protections and resist police incursions.

I waited for the excitement and for more fodder for my book on genomic privacy. It wasn't a long wait. 

The NY Times now reports that there have been two substantial databreaches at Verogen's GEDMatch. The Times piece by Heather Murphy notes that nearly two-thirds of GEDmatch’s users opt out of helping law enforcement. Data breaches have resulted in them being gifted with numerous 'relatives' and a million or so users who had opted not to help law enforcement had been forced to opt in. The piece states 

GEDmatch, a longstanding family history site containing around 1.4 million people’s genetic information, had experienced a data breach. The peculiar matches were not new uploads but rather the result of two back-to-back hacks, which overrode existing user settings, according to Brett Williams, the chief executive of Verogen, a forensic company that has owned GEDmatch since December. 
 
Though the growth of genealogy sites has slowed slightly in recent years, their use by the police has increased. After the authorities in California used GEDmatch in 2018 to identify a suspect in the decades-long Golden State Killer case, police departments across the country began to dig through their cold case files in the hopes that this new technique could solve old crimes.
 
And GEDmatch was often their preferred site. Unlike the genealogy services Ancestry and 23andMe, which are marketed to people who are new to using DNA to learn about themselves, GEDmatch caters to more advanced researchers. The site appeals to the police because it allows DNA that has been processed elsewhere to be uploaded. Verogen has a long history of working with law enforcement, and the acquisition of GEDmatch further solidified this collaboration.
 
Scientists and genealogists say the GEDmatch breach — which exposed more than a million additional profiles to law enforcement officials — offers an important window into what can go wrong when those responsible for storing genetic information fail to take necessary precautions.
 
In an interview, Mr. Williams said that the first breach occurred early on July 19. After shutting down the site, his team “covered up the vulnerability,” he said, and brought it back online, but only briefly. “On Monday we took the site down again because it was clear the hackers were trying again,” he said.  This time the site remained down for nearly a week. ...
 
Mr. Williams said he had hired an outside security team and contacted the F.B.I. to see if the agency would investigate. The F.B.I. did not respond to a request for comment.
 
All was far from resolved when the site’s settings were restored, said Debbie Kennett, a genealogist in England, who wrote about the breach on her blog. We’re stuck with our DNA for life, she said. “Once it’s out there it’s not like an email address you can change,” she said in an interview. Because of its interconnected nature, she added, when any one person’s genetic information is exposed, the exposed DNA can potentially affect their family members too.
 
That's a point I've made in several publications with Dr Wendy Bonython.

The Times states 

 In a paper published last year, Michael Edge, a professor of biological sciences at the University of Southern California, and fellow researchers warned several genealogy websites that they were vulnerable to data breaches.
 
“Of course, hacks happen to lots of companies, even entities that take security very seriously,” he said. “At the same time, GEDmatch’s, and eventually Verogen’s, response to our paper didn’t inspire much confidence that they were taking it seriously.” Other genealogy websites, he added, seemed more open to the researchers’ recommendations for improving security.
 
For many, the presence of fake users in GEDmatch was as alarming as the breach itself. Genealogists know that they cannot trust names or emails. They also know that a user can easily upload someone else’s genetic profile. But the breach exposed that behind the scenes, hidden by privacy settings, were all kinds of profiles of people who were not even real.
 
 
The giveaway that the matches were not actual relatives was that their DNA was too good to be true, said Leah Larkin, a biologist who runs DNA Geek, a genealogical research company. People who managed profiles for many clients and relatives repeatedly found that these fake users somehow were displayed as close relatives across the unrelated profiles. Their visible ancestry information reinforced the matches were impossible and suggested the fake profiles had been designed to trick the site’s search algorithm for some reason.
 
In Dr. Edge’s paper, he warned that it was possible to create fake profiles to identify people with genetic variants associated with Alzheimer’s and other diseases.
 
“If something is just a geeky genealogist messing around, there is no concern,” Dr. Larkin said. But it becomes a problem, she said, if users are trying to find people who all share a particular genetic mutation or trait, as Dr. Edge cautioned. Such information could be abused by insurance companies, pharmaceutical companies or others, she said.
 
The breach also reinforced something that genealogists have been saying for years: Mixing genealogy and law enforcement is messy, even when you try to draw clear lines. Until two years ago, the primary DNA databases that law enforcement used for investigations were maintained by the F.B.I. and the police. That changed with the Golden State Killer case in 2018.
 
As police departments rushed to reinvestigate cold cases, GEDmatch, which at the time was run by two family history hobbyists as a sort of passion project, tried to serve two audiences: genealogists who simply wanted to trace their family tree and law enforcement officials who wanted to know if a murder or a rapist was hiding in one of its branches. Amid a backlash, GEDmatch changed its policy in May 2019 so that only users who explicitly opted to help law enforcement would show up in police searches. Still, there is little regulation around how the authorities can use GEDmatch and other genealogy databases, so it’s largely up to the companies and their users to police themselves.
 
And as the breach demonstrated, users’ wishes could be quickly overridden.