Data De-identification Methods
De-identification is a tool that organizations can use to remove personal information from data that they collect, use, archive, and share with other organizations. This page provides guidance about various methods under frameworks such as HIPAA and GDPR.
Section 164.514(a) of the HIPAA Privacy Rule provides the standard for de-identification of protected health information. Under this standard, health information is not individually identifiable if it does not identify an individual and if the covered entity has no reasonable basis to believe it can be used to identify an individual. HIPAA names two different methods of de-identifying data: Safe Harbor and Expert Determination.
- The Safe Harbor method of de-identification requires removing 18 types of identifiers, like those listed below, so that residual information cannot be used for identification:
- Names
- Dates, except the year
- Telephone numbers
- Geographic data
- Fax numbers
- Social Security numbers
- Email addresses
- Medical record numbers
- Account numbers
- Health plan beneficiary numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers, including license plates
- Web URLs
- Device identifiers and serial numbers
- Internet protocol addresses
- Full face photos and comparable images
- Biometric identifiers
- Any unique identifying number, characteristic, or code
- Expert determination involves applying statistical and scientific principles to data to achieve a very small risk of re-identification. This method makes it possible to tailor the de-identification process to the use case at hand while also maximizing utility; it is therefore praised for its flexibility.
For additional information on De-identification, please visit How To: De-identify Data
Under Article 4(5) of the General Data Protection Regulation (GDPR), pseudonymization of data is the process of removing personal identifiers from data and replacing those identifiers with placeholder values. In doing so, personal data can no longer be attributed to a specific data subject without the use of additional information.
Pseudonymization can be achieved using various methods such as data masking, encryption, or tokenization Although the pseudonymization of data will not make the data exempt from the obligations imposed by the GDPR, pseudonymizing data still has an important role to play in data protection as the use of pseudonymization techniques can help increase data security and help fulfill data minimization requirements.