Google research scientists have provided in a post on GoogleBlog an update on several years of work on privacy-safe approaches for handling sensitive user data.
The challenge, as stated by the researchers, has been:
"Given a database containing several attributes about users, how can one create meaningful user groups and understand their characteristics? Importantly, if the database at hand contains sensitive user attributes, how can one reveal these group characteristics without compromising the privacy of individual users?"
In developing a solution, the researchers have created a new "differentially private clustering algorithm" which can privately generate representative data points from a dataset, so as to reveal group characteristics without revealing the private data of the individuals in the dataset.
To test the new algorithm, the researchers ran it on 4 large, publicly-available benchmark databases and compared its performance to that of several publicly-available algorithms.
In the researchers' words:
"We analyze the normalized k-means loss (mean squared distance from data points to the nearest center) while varying the number of target centers (k) for these benchmark datasets. The described algorithm achieves a lower loss than the other private algorithms in three out of the four datasets we consider."
Which in plain English means that the Google clustering algorithm produced more accurate representations of the characteristics of 3 of the 4 data sets on which they ran it, compared to results from the algorithms used for comparison.
The conclusion reached from the research results, in the words of the researchers, is:
"This work proposes a new algorithm for computing representative points (cluster centers) within the framework of differential privacy. With the rise in the amount of datasets collected around the world, we hope that our open source tool will help organizations obtain and share meaningful insights about their datasets, with the mathematical assurance of differential privacy".
This work looks promising, and I'm glad to see that phrase "open source" in there!
Stay tuned for further updates.