Data from 500,000 UK Biobank volunteers is being offered for sale on Alibaba after Chinese research institutions violated access agreements.

Data from 500,000 UK Biobank volunteers is being offered for sale on Alibaba after Chinese research institutions violated access agreements.

      Summary: Genetic, medical, and lifestyle information from all 500,000 UK Biobank participants was offered for sale on Alibaba following a breach of data-sharing agreements by three Chinese research institutions that had authorized access. Although the data was de-identified, it encompasses genome sequences, medical diagnoses, and biological metrics that experts believe could potentially be re-identified. Alibaba removed the listings before any transactions occurred, UK Biobank has halted all external data access, and the Information Commissioner's Office (ICO) is conducting an investigation. A prior investigation in March had already revealed multiple leaks of the data via GitHub.

      This week, the UK government confirmed that genetic, medical, and lifestyle data of 500,000 UK volunteers was listed for sale on Alibaba, highlighting a breach that was not the result of hacking but rather a failure of contract by trusted researchers. Three research institutions in China, which had legitimate access to the UK Biobank, downloaded the data and made it available for sale. The Minister of State, Ian Murray, informed the House of Commons that UK Biobank had alerted the government on 20 April about three listings on Alibaba, with at least one seemingly containing data from all 500,000 participants. The data was de-identified, omitting names, addresses, contact information, and NHS numbers, but it did include details like gender, age, birth month and year, socio-economic factors, lifestyle habits, and biological sample measures. Thanks to the cooperation of both the UK and Chinese governments, Alibaba removed the listings before any sales transpired, and the three institutions lost their access privileges. UK Biobank has paused all external data access while it seeks a technical solution to prevent bulk downloads and has reported the incident to the ICO.

      Overview of UK Biobank

      UK Biobank represents one of the most significant biomedical research resources globally. Between 2006 and 2010, it enlisted 500,000 volunteers aged 40 to 69 from across Great Britain, who agreed to share their health data and partake in monitoring over a period of at least 30 years. The database presently contains over 10,000 variables for each participant, including whole genome sequences (fully released in 2023), blood and urine biomarkers, brain and body imaging scans, hospital diagnosis records, GP data, and comprehensive lifestyle questionnaires. Around 22,000 researchers globally are permitted to access this data for approved studies related to cancer, heart disease, diabetes, Alzheimer’s, and other conditions. This resource has contributed to thousands of peer-reviewed publications and is deemed foundational for contemporary genomic medicine.

      The data sharing is conducted under the premise of de-identification, with researchers required to sign material transfer agreements that prohibit redistribution. However, the recent incident involved three institutions breaching these agreements, and it only came to light because of their audacity to publicly list the data for sale.

      The issue of re-identification

      Although the government stated that the data did not contain identifying names or addresses, this assertion was only partly accurate. An investigation by the Guardian in March uncovered that de-identified UK Biobank data had been leaked online on numerous occasions, mainly due to researchers accidentally uploading partial or complete datasets to GitHub, the code-sharing platform. From July to December 2025, UK Biobank sent 80 legal requests to GitHub for the removal of such data. In one instance, a dataset that included millions of medical diagnoses and their associated dates for over 400,000 participants was published publicly.

      The Guardian illustrated that the data could be less anonymous than it appears; a reporter could identify a volunteer's extensive medical records using just their birth month and year and details of a significant surgery, which are commonly shared in casual conversation. Dr. Luc Rocher, an associate professor at the Oxford Internet Institute, explained to the publication that removing identifiers "often does not guarantee anonymity" and that knowing an individual's birthday and a specific medical event might be enough to reliably identify their record. If a record is identified, it could disclose sensitive information such as psychiatric diagnoses, HIV test results, or histories of substance abuse.

      According to UK GDPR, data is only considered truly anonymized if individuals cannot be identified "by any reasonably likely means." Given the size and richness of such datasets, particularly those with complete genome sequences, the concern is not whether re-identification could theoretically happen but whether it is challenging enough in practice to offer real protection. As datasets grow and AI tools enhance cross-referencing capabilities, the governance gap regarding data security is widening. Privacy experts argue that UK Biobank’s reliance on de-identification as a safeguard contradicts the reality that many individuals share parts of their health information online, and in this age of advanced language models, that information can be pieced together.

      A recurring issue, not an isolated event

      The Alibaba listings represent the most noticeable indication of a deep-rooted problem that UK Biobank has been trying to address with limited success for months. The investigation from March revealed the occurrence of data leaks

Data from 500,000 UK Biobank volunteers is being offered for sale on Alibaba after Chinese research institutions violated access agreements.

Other articles

The EU is set to compel Google to make Android accessible to ChatGPT and Claude as part of the Digital Markets Act. The European Commission is preparing regulations that will mandate Google to provide competing AI assistants with the same access to Android as Gemini, with a final decision expected by July 2026. Rilian secures $17.5 million to introduce agentic AI in sovereign defense. Rilian has secured $17.5 million in funding led by 8VC to implement agentic AI in air-gapped defense and sovereign cloud settings. Zapata Quantum has secured $15 million following its exit from bankruptcy. Zapata Quantum has secured $15 million after narrowly avoiding liquidation in 2024 and undergoing a two-phase restructuring that tackled $18.7 million in debt. How AI Is Transforming Workers' Compensation Claims and Healthcare Processes How AI Is Transforming Workers' Compensation Claims and Healthcare Processes Workers' compensation is advancing as artificial intelligence enhances claims processing, decision-making, and access to medical services. Here’s how businesses such as Claim Clarity are tackling this issue. Tesla sales and soaring gas prices: Interest in electric vehicles is increasing, yet the US market has contracted by 28% following the expiration of the tax credit. Tesla sales and soaring gas prices: Interest in electric vehicles is increasing, yet the US market has contracted by 28% following the expiration of the tax credit. Gas prices in the US exceeded $4 per gallon, and interest in electric vehicles reached levels not seen since 2026. However, overall electric vehicle sales declined by 28%, and Tesla fell short of its delivery projections. The tax credit had a greater impact than fuel prices. SoftBank is looking for a $10 billion margin loan collateralized by OpenAI shares at a rate of SOFR+425 basis points as its leverage structure becomes more complex. SoftBank is looking for a $10 billion margin loan collateralized by OpenAI shares at a rate of SOFR+425 basis points as its leverage structure becomes more complex. SoftBank is securing a $10 billion loan using its stake in OpenAI as collateral, with a spread nearly three times higher than that of its 2018 margin loan with Alibaba. S&P has downgraded its credit outlook to negative.

Data from 500,000 UK Biobank volunteers is being offered for sale on Alibaba after Chinese research institutions violated access agreements.

Health information from 500,000 volunteers in the UK Biobank was listed for sale on Alibaba after three Chinese research organizations breached data-sharing agreements. The ICO is currently conducting an investigation.