Data security in medical studies: IT researchers break anonymity of gene databases
"Ever since American scientists successfully attacked a study in 2008, where just parts of the DNA were enough to identify participants, researchers have been debating if, and in what detail, we should be permitted to publish gene data," says Michael Backes, who is professor for Cryptography and IT Security at Saarland University, and the scientific director of the Competence Center for IT security, CISPA. "Luckily we don't have that problem in Germany, that health insurance companies can ask for more money from someone who is sick," says Pascal Berrang, who is researching aspects of data privacy in genetic data as part of his doctoral studies at CISPA. This is different in the United States, for instance, where there is already a flourishing trade in health data. Not even medical studies are safe, says Berrang.
The researchers from Saarbrücken, together with their colleagues Mathias Humbert and Praveen Manoharan, focused on analyzing data security issues for a specific kind of gene information, one that is now commonly used in medical research: microRNAs. These short molecules of ribonucleic acid have recently gained importance as new forms of biomarkers – biological identifiers that clearly indicate a patient's general health condition, or the presence of certain diseases, to physicians and researchers. MicroRNAs can therefore divulge even more details about a patient's condition than conventional DNA analysis, since the latter only yields the probability of the patient developing the disease in question. This aspect of microRNA analysis makes the Saarbrücken computer scientists' findings even more significant. Using two different attack techniques, they were able to break the anonymity of the test subjects in a microRNA study. "If the results were published, and a health insurance fund knew the microRNA profile of one of its members, it could deduce whether that patient was part of the study, and pinpoint individual diseases," Pascal Berrang says; that would be more than enough information.
The CISPA researchers have also been working on developing countermeasures. The main challenge was to maintain the anonymity of the data without making it unusable for medical research and diagnostics. These circumstances made two different strategies necessary. The first was to omit any telltale molecules of microRNA that were not relevant to the diagnosis; the other was to introduce additional random noise to the data, which helps to protect the identity of individual participants without distorting the overall tendency of the results. The second technique has become a commonly used tool for publishing statistical information, as it helps prevent the disclosure of identifying information; a principle experts call "differential privacy".
"Leaving telltale molecules out of the publication doesn't really help. Even if you published only ten molecules instead of a hundred, the attack would still be feasible," says Pascal Berrang. The second intervention, the addition of random noise, did not prevent the attack, but just made work more difficult for the medical staff. For this reason, the CISPA researchers recommend that the data be randomized as little as possible, and that a sufficiently large number of participants take part in the trial. "This has several advantages: It increases the statistical relevance of the study, it requires less random noise, and the study is not as susceptible to these forms of attacks, because the more people take part, the more the individual blends into the greater crowd," says Berrang. In terms of specific numbers, he says: "Two hundred. At least 200 people in the study, and a bit of random noise in the data, that should be enough."
The CISPA researchers will be discussing further details of their gene data analysis at the Cebit computer fair in Hannover (Stand C47 in Hall 6).
Background: Research Center for IT Security CISPA
CISPA was founded at the Saarland University as a competence center for IT security in October 2011, with the support of the German Federal Ministry of Education and Research. It combines the IT security research of the Saarland University's Computer Science department, as well as that of its on-campus partners, the Max Planck Institute for Computer Science, the Max Planck Institute for Software Systems, and the German Research Center for Artificial Intelligence, DFKI. Meanwhile CISPA has developed into an established research center for IT security with international appeal. Due to the excellent quality of its scientific publications and projects, CISPA is one of the leading research centers for IT security in the world today.
Center for IT Security, Privacy and Accountability (CISPA)
Saarland Informatics Campus E9.1
Phone: +49 681 / 302-57376
E-Mail: [email protected]
Competence Center Computer Science Saarland
Phone: +49 681 302-70741
E-Mail: [email protected]