Current approaches for protecting identities of people using activity trackers may not be effective, researchers suggested from a modeling exercise using National Health and Nutrition Examination Survey (NHANES) data.
After “partially aggregating” data from two NHANES iterations that included accelerometer-measured activity over 1 week, two machine-learning algorithms were nevertheless able to match records for about 95% of the adults and 80% of the children in those NHANES cohorts, reported Anil Aswani, PhD, of the University of California Berkeley, and colleagues in JAMA Network Open.
The partial aggregation procedure involved summing the original activity data, collected at 1-minute resolution, into 20-minute trends. The researchers also separated data collected Monday-Wednesday from that on Thursday-Friday. The researchers used the Monday-Wednesday trend data to “train” the machine-learning algorithms, and then applied the models to the Thursday-Friday data, along with the participants’ demographics (age, sex, education, household income, race/ethnicity, and nation of origin), to attempt to match the deliberately altered datasets for each participant to their original NHANES record.
Notably, Aswani and colleagues did not have enough information to identify actual NHANES participants by name, but that wasn’t the goal. The study was intended as a proof of concept that data-aggregation methods touted as protecting individuals’ identities are not foolproof.
Although this method did not perfectly replicate the aggregation of actual activity-tracker data now collected with consumer devices, the researchers said the study’s findings add to the growing literature suggesting that reidentification of aggregated data is possible. When trackers are linked to smartphones, the data are typically uploaded to the device maker, at which point the information is beyond the wearer’s control.
Companies claim that aggregated wearable device information can’t be matched, in part because individuals’ activity varies considerably over time, Aswani and colleagues noted. But they pointed to prior research showing that high temporal-resolution information from wearables can convert intra-individual variability into trends that make matching possible. Location data collected by trackers are another piece of information that can allow individuals to be identified.
Aswani and colleagues urged policymakers to regulate PA information sharing by device developers. “Although these organizations are collecting and sharing sensitive health data, they are likely not bound by existing regulations in most circumstances,” the study authors continued.
Moreover, privacy risks may be alleviated by gathering information over time and from individuals of varying demographics, the study authors emphasized. “This consideration is particularly important for governmental organizations making public releases of large national health data sets, such as NHANES.”
Previous reports have highlighted a number of data sets that like activity tracker information can be used to identify an individual. Given that list, the results of the present investigation are not surprising, noted Thomas H. McCoy Jr., MD, of Massachusetts General Hospital in Boston, and Michael C. Hughes, PhD, of Tufts University in Medford, Massachusetts, in an accompanying editorial.
“However, these findings are important because they speak to a core value of medicine — confidentiality — in a context of growing relevance: waveform data of the sort used by Na and colleagues are becoming more common with the widespread availability of sensors to generate these data and the potential for remote monitoring reimbursement to speed their clinical adoption,” the editorialists wrote.
“The prior literature on reidentification as a reminder to researchers and physicians that the nature of confidentiality we provide to patients evolves with technology, which frequently changes faster than patient expectations,” the editorialists continued.
Aswani and colleagues had similar words of warnings. “These technologies will come with new risks too — risks that may never be wholly removed. Physicians have balanced real risks and benefits for millennia by acknowledging and quantifying both; now is not the time to stop.”
This study is supported by the University of California Berkeley Center for Long-Term Cybersecurity and by the National Institute of Nursing Research.
Aswani did not disclose any relevant conflicts of interest.
McCoy disclosed relationship with The Stanley Center at the Broad Institute, the Brain & Behavior Research Foundation, the National Institute on Aging, and Telefonica Alpha.