Excellent post. As someone who has worked in cybersecurity (a cringe term we would never ourselves use) I would often get people asking me if “Siri was listening to me” because they had been talking with a friend, then later on started seeing ads on their iPhone relevant to that discussion despite never proactively searching for related terms.
At which point I’d have to explain that everything they did online was being bought and sold in auctions at millisecond latency. Everything their friends did too. Essentially the social graph of the entire planet and its online activities is traded as a commodity in real time.
In practical terms this means, yes, Google knows your friend came to your house. Google does not ask permission to track your location in the background for *your* benefit. They know what your friend has been searching, purchasing and discussing online.
They know you too, your demographics, your income, your relationship status. They know if you’re healthy, sick, sexually active, menstruating, depressed. They might not bundle your data with that level of specificity but they will slot you into a demographic.
Worrying about your DNA in the context feels a bit like worrying about catching a cold while you’re treading water in the ocean.
Hi Steven. First of all, I'm a long-time reader and admirer of your work. Thank you for sharing everything you do.
Your argument that "25 years after the human genome was sequenced...there are almost no SNPs that tell you anything consequential about your health" is shortsighted. Given the rapidity of scientific understanding, those SNPs could become far more predictive as machine learning and population genetics advance. Data collected now could be reanalyzed with future tools to reveal health risks, behavioral tendencies, or other sensitive information not apparent today.
Second, genetic data doesn't just reveal information about the individual—it exposes relatives who never consented to data collection. It can identify family members, reveal paternity, and expose genetic conditions in relatives. This creates privacy issues and, in some cases violations, extending beyond the original customer to their entire family tree, including future generations.
And then, of course, there's the thing we don't want to be thinking about, but are being forced to because of how the Trump administration is using data to locate immigrants in the U.S. Genetic databases are increasingly being used by law enforcement through techniques like genetic genealogy. While this can solve crimes, it also means genetic data can be accessed by authorities in ways 23andMe customers probably never anticipated when they spit into a tube.
Ed, thanks for the thoughtful comments. I realize those points, of course - but the 1000s of association studies from the past 25 years have tried to link those SNPs to genetic traits, and most of them turned up nothing. It appears most of those locations in the genome are not particularly critical to our biology (with a few exceptions). Sure, it's possible more will be learned, but likely not from those SNPs - that's what I'd bank on. We'll have whole-genome data from millions of people in the years to come, and that will be far more valuable and informative, but 23andMe doesn't have anything like that.
IMO, you can do a lot of probabilistic prediction of metabolic health and more from the SNPs in 23&Me, Ancestry, etc. And that is enough for actuarial calculations on large populations to be a competitive advantage. I've seen it myself with my proof-of-concept consulting service using such DNA data; you can see prominent trends in their genetic data, and when you talk to them, you can see how it has played out in their life, sometimes strikingly well. It's a research paper waiting to be done IMO, we are just getting started with genetics because it only got cheap relatively recently, and there is no bio-tech / pharma grant money to fund studies on general health stuff.
In the USA there is the GINA act that is preventing this from happening, it passed in 2008. It's what I refer to when potential customers asked about that issue. It is illegal for health insurance companies to use that information for setting health insurance premiums and other purposes, and thank goodness for that.
With AI image & video recognition, there is probably a similar thing waiting to happen health wise just from pictures of people as they age, or for LLMs if they ask you a bunch of questions.
I'm not convinced by your argument to keep DNA with 23andMe. It’s misleading to suggest that because they only have a fraction of your genome, the risk is trivial or “nothing to worry about.”
Science is moving fast, and SNP data already contains a wealth of personal and familial information with the potential to reveal much more in the future—especially as research links more variants to health, family relationships, or genetic traits. Unlike browsing or social media data, your DNA can’t be changed; it uniquely identifies you and your relatives, meaning breaches or misuse have permanent consequences. Companies can go bankrupt, get sold, or change their data policies, and what’s “safe” today could be up for grabs tomorrow. The wise move is to delete your DNA data now—before you lose control for good.
Sure, I get it - and others have made those points too. It's just that I have a different view of the likely potential risk from this (I consider it very low), and I like the services provided by 23andMe, so on balance I'll keep my account. Also another point I didn't make in the main article: if you have any close relatives who are sharing their DNA, then yours is mostly (depending on how closely related they are) out there too. You can't really control that.
Thanks for responding and I appreciate your perspective. I do want to point out that shifting to the idea that "your relatives have uploaded DNA, so yours is out there anyway" seems like moving the goalposts.
The original question was whether an individual should take steps to protect their own genetic privacy, not whether it's possible to control every potential exposure. While it’s true we can’t fully control what others do, that doesn’t mean our own choices are meaningless; every additional dataset amplifies the risk of identification or misuse, especially as the science and industry around genetics evolve.
Even if the current risk seems low, future applications or policy changes could turn today’s decision into tomorrow’s vulnerability. Deleting your data isn’t about achieving perfect privacy, but about minimizing your footprint and future-proofing your privacy as much as possible.
Excellent post. As someone who has worked in cybersecurity (a cringe term we would never ourselves use) I would often get people asking me if “Siri was listening to me” because they had been talking with a friend, then later on started seeing ads on their iPhone relevant to that discussion despite never proactively searching for related terms.
At which point I’d have to explain that everything they did online was being bought and sold in auctions at millisecond latency. Everything their friends did too. Essentially the social graph of the entire planet and its online activities is traded as a commodity in real time.
In practical terms this means, yes, Google knows your friend came to your house. Google does not ask permission to track your location in the background for *your* benefit. They know what your friend has been searching, purchasing and discussing online.
They know you too, your demographics, your income, your relationship status. They know if you’re healthy, sick, sexually active, menstruating, depressed. They might not bundle your data with that level of specificity but they will slot you into a demographic.
Worrying about your DNA in the context feels a bit like worrying about catching a cold while you’re treading water in the ocean.
Hi Steven. First of all, I'm a long-time reader and admirer of your work. Thank you for sharing everything you do.
Your argument that "25 years after the human genome was sequenced...there are almost no SNPs that tell you anything consequential about your health" is shortsighted. Given the rapidity of scientific understanding, those SNPs could become far more predictive as machine learning and population genetics advance. Data collected now could be reanalyzed with future tools to reveal health risks, behavioral tendencies, or other sensitive information not apparent today.
Second, genetic data doesn't just reveal information about the individual—it exposes relatives who never consented to data collection. It can identify family members, reveal paternity, and expose genetic conditions in relatives. This creates privacy issues and, in some cases violations, extending beyond the original customer to their entire family tree, including future generations.
And then, of course, there's the thing we don't want to be thinking about, but are being forced to because of how the Trump administration is using data to locate immigrants in the U.S. Genetic databases are increasingly being used by law enforcement through techniques like genetic genealogy. While this can solve crimes, it also means genetic data can be accessed by authorities in ways 23andMe customers probably never anticipated when they spit into a tube.
Thanks for reading.
-Ed
Ed, thanks for the thoughtful comments. I realize those points, of course - but the 1000s of association studies from the past 25 years have tried to link those SNPs to genetic traits, and most of them turned up nothing. It appears most of those locations in the genome are not particularly critical to our biology (with a few exceptions). Sure, it's possible more will be learned, but likely not from those SNPs - that's what I'd bank on. We'll have whole-genome data from millions of people in the years to come, and that will be far more valuable and informative, but 23andMe doesn't have anything like that.
Good to hear this from you, dear Steve. You know your onions.
Hm. "Onion" because I recall a book on DNA which described how to extract onion DNA with a blender and ice-cold vodka.
IMO, you can do a lot of probabilistic prediction of metabolic health and more from the SNPs in 23&Me, Ancestry, etc. And that is enough for actuarial calculations on large populations to be a competitive advantage. I've seen it myself with my proof-of-concept consulting service using such DNA data; you can see prominent trends in their genetic data, and when you talk to them, you can see how it has played out in their life, sometimes strikingly well. It's a research paper waiting to be done IMO, we are just getting started with genetics because it only got cheap relatively recently, and there is no bio-tech / pharma grant money to fund studies on general health stuff.
In the USA there is the GINA act that is preventing this from happening, it passed in 2008. It's what I refer to when potential customers asked about that issue. It is illegal for health insurance companies to use that information for setting health insurance premiums and other purposes, and thank goodness for that.
With AI image & video recognition, there is probably a similar thing waiting to happen health wise just from pictures of people as they age, or for LLMs if they ask you a bunch of questions.
Thank you for posting.
So little grounded common-sense publications online these days.
👍🤯
I'm not convinced by your argument to keep DNA with 23andMe. It’s misleading to suggest that because they only have a fraction of your genome, the risk is trivial or “nothing to worry about.”
Science is moving fast, and SNP data already contains a wealth of personal and familial information with the potential to reveal much more in the future—especially as research links more variants to health, family relationships, or genetic traits. Unlike browsing or social media data, your DNA can’t be changed; it uniquely identifies you and your relatives, meaning breaches or misuse have permanent consequences. Companies can go bankrupt, get sold, or change their data policies, and what’s “safe” today could be up for grabs tomorrow. The wise move is to delete your DNA data now—before you lose control for good.
Sure, I get it - and others have made those points too. It's just that I have a different view of the likely potential risk from this (I consider it very low), and I like the services provided by 23andMe, so on balance I'll keep my account. Also another point I didn't make in the main article: if you have any close relatives who are sharing their DNA, then yours is mostly (depending on how closely related they are) out there too. You can't really control that.
Thanks for responding and I appreciate your perspective. I do want to point out that shifting to the idea that "your relatives have uploaded DNA, so yours is out there anyway" seems like moving the goalposts.
The original question was whether an individual should take steps to protect their own genetic privacy, not whether it's possible to control every potential exposure. While it’s true we can’t fully control what others do, that doesn’t mean our own choices are meaningless; every additional dataset amplifies the risk of identification or misuse, especially as the science and industry around genetics evolve.
Even if the current risk seems low, future applications or policy changes could turn today’s decision into tomorrow’s vulnerability. Deleting your data isn’t about achieving perfect privacy, but about minimizing your footprint and future-proofing your privacy as much as possible.