ICPSR's Approaches to Confidentiality
The success of social science research relies on participants’ willingness to engage in the research process. People often participate in research projects under an assumption that their responses will be kept confidential and will not be linked back to them. Thus, it is critically important to protect the identities of research participants. One way to protect participants’ identities is by assessing each study’s disclosure risk, which is the degree of risk that a data record from a study could be linked to a specific person or organization, thereby revealing information that otherwise would not be known or known with as much certainty. Concerns about disclosure risk have grown as more datasets have become available online and it has become increasingly easy to link data. ICPSR is committed to preserving the confidentiality of respondents and works to ensure that the appropriate level of confidentiality remains intact for all of its data holdings.
Preserving Respondent Confidentiality
Confidentiality, Informed Consent, and Data Sharing
Protection of respondent confidentiality is a core tenet of responsible research practice that begins with obtaining informed consent. Informed consent is a process of communication between a participant and researcher which enables the participant to decide voluntarily whether or not to participate in a study. Human subjects involved in a project must participate willingly and be adequately informed about the research. The informed consent must include a statement describing how the confidentiality of subject records will be maintained. However, it is also important that informed consent be written in a way that does not unduly limit an investigator’s discretion to share data with the research community. View recommended informed consent language for data sharing.
Confidentiality and IRBs
Institutional Review Boards (IRBs) take different approaches to secondary analysis of research datasets such as those distributed on ICPSR’s website. Some institutions require IRB review of proposals to analyze secondary data. Other institutions provide IRB exemption for projects involving secondary data if the data were acquired from preapproved sources such as ICPSR. Other institutions are establishing unique policies to address these issues. Learn more about Institutional Review Boards.
Identifiers
Two kinds of variables often found in social science data present problems that could endanger research subjects’ confidentiality: direct identifiers and indirect identifiers. Data depositors are asked to review their data and documentation for information that could identify respondents.
Direct identifiers are variables that point explicitly to particular individuals or units. Examples include:
- Names
- Addresses, including ZIP and other postal codes
- Telephone numbers, including area codes
- Social Security numbers
- Other linkable numbers such as driver’s license numbers, certification numbers, etc.
Indirect identifiers are variables that can be problematic as they may be used together or in conjunction with other information to identify individual respondents. Examples include:
- Detailed geographic information (e.g., state, county, province, or census tract of residence)
- Organizations to which the respondent belongs
- Educational institutions from which the respondent graduated and year of graduation
- Detailed occupational titles
- Place where respondent grew up
- Exact dates of events (e.g., birth, death, marriage, divorce)
- Detailed income
- Offices or posts held by respondent
Deposit Options for Confidential Data
The vast majority of ICPSR data holdings are public-use files with no restrictions on their access; however, ICPSR does accept data with identifying information under conditions consistent with the informed consent of the study participants and the relevant Institutional Review Board (IRB) approval. Sometimes the protective measures taken to reduce disclosure risk would significantly reduce the research potential of the data. In these cases, ICPSR works with data depositors to address disclosure risks and provides access to restricted-use versions that protect confidentiality by imposing stringent requirements for accessing them.
On the data deposit form, depositors inform ICPSR if the data contain confidential information. If yes, depositors can use the “Additional Information” box on the form to request one or more of ICPSR’s restricted-use data dissemination options described below. Contact ICPSR staff at deposit@icpsr.umich.edu with questions about any of these options.
- Secure Download. For most restricted-use data, ICPSR offers users the ability to request the data via an online application through the ICPSR Data Access Request System (IDARS). Users must sign in to the application system with a Researcher Passport account or with their Facebook or Google passwords. To access ICPSR member-only data, users must be affiliated with a member institution. The restricted data application requires:
-
- Names, titles, and institutional affiliation of investigators
- Description of the proposed research
- Information on data formats needed, data storage technology, and data security
- Approval for the research project from the Institutional Review Board of the applicant’s institution
- A signed data use agreement
Upon completion, requests are reviewed by ICPSR staff. When approved, the encrypted data are sent to researchers via a secure link. Please note that ICPSR does not evaluate the scientific merit of the proposed research questions; we merely evaluate the security measures undertaken by the researcher and verify that all the necessary paperwork has been submitted.
-
- Virtual Data Enclave. The virtual data enclave (VDE) provides access to restricted-use data via a virtual machine launched from the researcher’s own computer but operating on a remote server. The virtual machine is isolated from the user’s physical computer, restricting the user from downloading files or parts of files to their physical computer. The virtual machine is also restricted in its external access, preventing users from emailing, copying, or otherwise moving files outside of the secure environment, either accidentally or intentionally. To receive output or other files from the VDE, users must request a disclosure review from ICPSR staff.
- Physical Data Enclave. Approximately 50 studies are only accessible for analysis on-site in the physical data enclave at the Perry Building in Ann Arbor, MI. The data in the physical enclave contain highly sensitive personal information collected from, for example, prison inmates, victims of violence, or serious criminal offenders. When using the physical enclave, several guidelines are in effect:
- Investigators cannot bring laptops or other electronic equipment into the enclave.
- The enclave is equipped with a Windows computer with the Microsoft Office Suite and the SPSS, SAS, and Stata statistical packages. Arrangements must be made in advance for other software.
- The computer is not connected to the Internet, and the removable media ports are disabled.
- An ICPSR staff member is present at all times when a researcher is using the enclave. The staff member inspects and approves all material brought into the enclave.
- All output, notes and other material must be submitted for disclosure review before the investigator leaves the enclave.
- ICPSR staff will conduct a disclosure review of all files that the investigator wants to use after leaving the enclave.
- Approved analysis output will be sent to the researcher electronically.
- Delayed Dissemination. In some cases, ICPSR can preserve data under a delayed dissemination model, in which the depositor and ICPSR establish a release date. ICPSR preserves the data until that date and distributes them according to the dissemination plan afterwards.
Processing & Access for Confidential Data
Once data are deposited with ICPSR, staff employ stringent procedures to protect the confidentiality of individuals and organizations whose personal information may be part of the archived data collection. Steps ICPSR staff take to maintain data confidentiality include:
- Completing a detailed review of all datasets to assess disclosure risk
- If necessary, modifying data to reduce disclosure risk
- Limiting access to datasets for which modifying the data would substantially limit their utility or the risk of disclosure remains high
- Training staff and consulting with data producers in methods of disclosure risk assessment and mitigation
Disclosure Risk
With the exception of self-published deposits, ICPSR reviews all datasets to assess disclosure risk. ICPSR trains data curators to apply specified procedures to protect respondent confidentiality in all of the data ICPSR curates, archives, and distributes, including, for example, checking each study for identifiers present in the data (see above).
ICPSR may also recode data to reduce disclosure risk. Recoding can include converting dates to time intervals, exact dates of birth to age groups, detailed geographic codes to broader levels of geography, and detailed income to income ranges or categories.
If modifications to address identifiers to create a public-use dataset will seriously reduce the analytic utility of the data, ICPSR may release a restricted-use dataset or both public- and restricted-use datasets. Restricted-use datasets retain confidential, identifying information, and are accessible under controlled conditions.
Levels of Restricted Data Access
Depending on the outcome of the disclosure risk review, ICPSR may suggest modifying the data and/or distributing the data at a higher level of restriction. Sometimes data cannot be modified to protect confidentiality without significantly compromising the research potential of the data. In these cases, access to the data is restricted in order to impose further confidentiality safeguards.
ICPSR has established several mechanisms through which restricted data can be distributed:
- Secure Download: With this option, users submit an application to access the data, and after approval, download the data using a single-use password. At the end of the approved access period, users must destroy the data.
- Virtual Data Enclave (VDE): The VDE is a secure, online environment in which approved users analyze restricted data via a remote desktop using several available software options, including SAS, Stata, and SPSS. Researchers do not receive a copy of the data, but rather analyze the data stored on ICPSR’s servers. Final analysis output is vetted and, if approved, released to the researcher.
- Physical Data Enclave: For highly restricted data, ICPSR has a physical enclave which requires that approved users be on site at ICPSR to use the data. Data use in the physical data enclave is monitored by ICPSR staff. Final analysis output is vetted and, if approved, released to the researcher.
- Secure online analysis: This option provides analysis of restricted-use data behind an interface with programmable disclosure protection for selected users. With this option, users submit an application to access the data.
For more information the management of restricted-use data, including Restricted-data Use Agreements, please refer to ICPSR Restricted-use Data Deposit Dissemination Procedures (pdf).
Consulting
In addition to the steps ICPSR takes to ensure the confidentiality of data that has already been deposited, we also offer the following services related to disclosure risk assessment and mitigation to researchers who have not yet deposited their data or who are in the earlier stages of the data collection process:
- Informed consent review
- Consultation regarding issues of disclosure risk (no charge)
- Basic disclosure risk assessment
- Full disclosure analysis: risk assessment and options for mitigation and data distribution
- Training
For further information on ICPSR services, contact us at ICPSR-help@umich.edu or 734-647-2200.
Additional Resources
- American Statistical Association, Data Access and Personal Privacy: Appropriate Methods of Disclosure Control (pdf)
- The American Statistical Association, Committee on Privacy and Confidentiality – Methods for Reducing Disclosure Risks When Sharing Data
- Statistical Policy Working Paper 22 – Report on Statistical Disclosure Limitation Methodology (pdf)