26 May 2005
The New York Times, May 24, 2005
By STEVE LOHR
International Business Machines is introducing software today that is intended to let companies share and compare information with other companies or government agencies without identifying the people connected to it.
Security specialists familiar with the technology say that, if truly effective, it could help tackle many security and privacy problems in handling personal information in fields like health care, financial services and national security.
"There is real promise here," said Fred H. Cate, director of the Center for Applied Cybersecurity Research at Indiana University. "But we'll have to see how well it works in all kinds of settings."
The technology for anonymous data-matching has been under development by S.R.D. (Systems Research and Development), a start-up company that I.B.M. acquired this year.
Much of the company's early financial backing came from In-Q-Tel, a venture capital firm financed by the Central Intelligence Agency that invests in companies whose technologies have government security uses.
S.R.D., now I.B.M.'s Entity Analytics unit, has worked for years on specialized software for quickly detecting relationships within vast storehouses of data. Its early market was in Las Vegas, where casinos used the company's technology to help prevent fraud or employee theft. The matching software might sift through databases of known felons, for example, to find any links to casino employees.
By the late 1990's, United States intelligence agencies had discovered S.R.D. and the potential to use its technology for winnowing leads in pursuing terrorists or spies. After 9/11, the government's interest increased, and today most of the company's business comes from government contracts.
The new product goes beyond finding relationships in different sets of data. The software, which I.B.M. calls DB2 Anonymous Resolution, enables companies or government agencies to share personal information on customers or citizens without identifying them.
For example, say the government were looking for suspected terrorists on cruise ships. The government had a "watch list," but it did not want to give that list to a cruise line, fearing it might leak out. Similarly, the cruise lines did not want to hand over their entire customer lists to the government, out of privacy concerns.
The I.B.M. software would convert data on a person into a string of seemingly random characters, using a technique known as a one-way hash function. No names, addresses or Social Security numbers, for example, would be embedded within the character string.
The strings would be fed through a program to detect a matching pattern of characters. In the case of the cruise line and the government, an alert would be sent to both sides that a match had been detected.
"But what you get is a message that there is a match on record Number 678 or whatever, and then the government can ask the cruise line for that specific record, not a whole passenger list," explained Jeff Jonas, the founder of S.R.D. and now chief scientist of I.B.M.'s Entity Analytics unit. "What you get is discovery without disclosure."
To date, the software for anonymously sharing and matching data has been tested in a few projects, but I.B.M. is aiming for day-to-day use in several industries.
In health care, for example, more secure and anonymous handling of patient information could alleviate privacy concerns in the shift to electronic health records, potentially increasing efficiency and reducing costs, analysts said.
The technology, specialists noted, could also reduce the risk of identity theft, especially if personal data held by companies were made anonymous.
From: Linda Casals <lindac@dimacs.rutgers.edu> Subject: [Sy-nextgen-global] DIMACS Short Course: Statistical De-identification of Confidential Health Data with Date: Wed, 25 May 2005 09:47:43 -0400 (EDT) Application to the HIPAA Privacy Regulations ************************************************************** DIMACS Short Course: Statistical De-identification of Confidential Health Data with Application to the HIPAA Privacy Regulations October 18 - 20, 2005 DIMACS Center, CoRE Building, Rutgers University Organizers: Larry Cox, ljtcox at aol.com Daniel Barth-Jones, Wayne State University, dbjones@med.wayne.edu Presented under the auspices of the Special Focus on Communication Security and Information Privacy and Special Focus on Computational and Mathematical Epidemiology. ********************************************************************* Workshop Announcement: This DIMACS short course will provide researchers, analysts and managers with an overview of the federal HIPAA Privacy regulations and an introduction to the principles and methods of statistical disclosure limitation that can be used to statistically de-identify healthcare data to meet privacy regulations. Background The Health Insurance Portability and Accountability Act of 1996 (HIPAA) established the Standards for the Privacy of Individually Identifiable Health Information (i.e., HIPAA Privacy Rule), which provides privacy protections for the personal health information (PHI) of individuals. These federal regulations became effective April 14, 2003 and have wide reaching implications for many important uses of healthcare information. Prior to the implementation of the privacy rule, epidemiologic, healthcare systems and other types of biomedical research had been routinely conducted with administrative healthcare data, with such analyses demonstrating considerable utility and value. The recent implementation of the HIPAA privacy standards, however, has necessitated dramatic changes in the process of conducting many analyses with administrative data. The privacy rule "safe-harbor" provision requires the removal of 18 types of identifying information before the resulting "de-identified" data can be used without restriction. This safe-harbor approach necessitates the removal of specific dates of patient care and lower level geographic information (such as 5 digit zip codes), which can greatly diminish the utility of such data for many analytic purposes. An alternative approach permitted under the privacy rule is the "statistical de-identification" of PHI certified by an expert statistician. Conducting analyses with statistically de-identified healthcare data is an attractive option because such data can be used without privacy rule restrictions. In order for data to be considered statistically de-identified, "statistical disclosure" analyses must be conducted and documented which determine that the re-identification risks for the data are "very small". The principles and methods of statistical disclosure analysis and disclosure limitation address the risk that persons might be identifiable from information about them in data sets and provide a variety of methods by which risks of disclosure can be measured and reduced to acceptably low levels. Course Objectives This two-and-a-half day short course will provide participants with a detailed overview of the HIPAA privacy regulations, theory and methods for statistical disclosure limitation, and applied experience with disclosure limitation methods. Participants completing the course should be able to: 1) understand the permissible uses of healthcare data for various purposes under the HIPAA regulations; 2) conceptualize and document data intrusion scenarios; 3) conduct and document statistical disclosure analyses measuring disclosure risks; 4) select and use appropriate disclosure limitation methods; 5) evaluate the associated trade-offs between disclosure risks and statistical information quality. Development of these skills should enable participants to supervise and work successfully with an expert certifying statistician. Participants will learn about statistical disclosure for both tabular data sets and microdata files, but the primary focus will be on statistical disclosure for microdata in healthcare databases. While statistical disclosure theory will be covered in some detail, the course orientation will be practical and applied, focusing primarily on providing participants with the knowledge and experience needed to statistically de-identify healthcare datasets in accordance with the HIPAA privacy rule and to identify confidentiality problems of potential concern. Upon completion of the course, it is expected that participants would be able to implement or supervise the implementation of basic disclosure limitation analyses and methods on their own and would be prepared to undertake further learning in statistical disclosure on their own. Participants will be provided with lecture slides, classroom notes, and simulated example datasets. The course will include hands-on computer-based instruction in conducting disclosure analyses and implementing disclosure control methods. Who Should Attend Researchers (epidemiologists, biostatisticians, medical informatics and health systems scientists, etc.), analytic professionals (from business, marketing, pharmaceutical industry, etc.) and the managers who supervise staff in these fields will benefit from this short course. Technical and management personnel in the pharmaceutical and healthcare information industries will find the course particularly useful. Participants should have some prior background in mathematics, statistics, and data/information management. Knowledge of SAS statistical software will be desirable for the in-class computer instruction, but participants with experience in other statistical packages (SPSS, etc.) should also be able to complete the computer instruction portions of the class. **************************************************************** Registration: Seating is limited to the first 40 participants. This course must be prepaid in advance by check or credit card in order to hold your place. We cannot guarantee a place for you unless we have received payment. Note: Our usual policies for fee waivers and reductions do not apply to this course. However, limited financial support might be available. Please see website for additional registration information: http://dimacs.rutgers.edu/Workshops/Hipaa/ ********************************************************* _______________________________________________ Sy-nextgen-global mailing list Sy-nextgen-global@dimax.rutgers.edu http://dimax.rutgers.edu/mailman/listinfo/sy-nextgen-global