Data Selection & Appraisal Criteria
ICPSR has specific criteria for selecting datasets for archival purposes: the data must have substantial value for research or instruction, enduring archival significance, and uniqueness. After selecting a dataset, the ICPSR staff evaluates its acquisition priority using simultaneous appraisal criteria. Data are immediately approved when no concerns exist. If concerns lower the priority, ICPSR weighs the benefits against the costs and acquires only what it can handle in the short term, deferring lower-priority data for future consideration or referring it to other archives.
Important questions arise as soon as data are created in the course of social science research.
- Which datasets are worthy of long-term retention? Who decides? What criteria are used?
- How can repositories work effectively with data producers to get all of the information they need for archiving?
ICPSR’s approach, as stated in our Collection Development Policy, prioritizes the following types of data:
- ICPSR seeks data that have demonstrated importance to the social science community as determined by: substantive value for research and/or instruction, enduring archival value for research and/or instruction, uniqueness.
- ICPSR seeks data that support its mission.
- ICPSR seeks to acquire data in core social science substantive areas.
- ICPSR seeks data that are useful in utilization of current and emerging research and statistical techniques.
- ICPSR seeks data that permit the use of quantitative and/or qualitative social science research techniques.
Within those criteria, ICPSR is especially interested in data in five areas:
- Diversity Data. Data that fosters understanding of the experiences of racial and ethnic minorities and other marginalized peoples living in the United States.
- Complex Data. Data arising from longitudinal research, survey research, and non-standard types: biological data, administrative records, video data, spatial data, remotely sensed data, and relational databases.
- Mixed Method Data. Data that can support both qualitative and quantitative analyses; data resulting from concurrent (both at the same time), sequential (one following the other), or conversion (one method to the other) mixed method study designs.
- Interdisciplinary Data. Data from interdisciplinary studies, and data resulting from studies using the research methods of multiple disciplines.
- International Data. Data originating outside the United States and data that support crossnational, comparative research. We are especially interested in data from countries and regions of the world that do not have a national structure for archiving, disseminating, and preserving research data.
Datasets that meet these criteria are further reviewed by ICPSR staff. Datasets are accorded a high priority for inclusion in the archive when:
- The data are not available anywhere else, or are not likely to be available elsewhere in the future.
- The data are in the public domain.
- Copyright is clear.
- Copyright owners agree to ICPSR’s dissemination policies.
- The dataset adheres to standards for privacy and confidentiality.
- The technical documentation is complete.
- The data are in a format that facilitates ease of use.
Details on Appraisal Criteria
After identifying a dataset using the considerations listed above, ICPSR staff apply the following criteria to assess the dataset’s priority for acquisition.
The following appraisal criteria are applied in a simultaneous fashion. Data are immediately approved for possible acquisition when there are no concerns that lower the priority of the acquisition. If there are one or more concerns reducing the priority-level of a data resource, ICPSR considers the potential benefits and costs associated with acquiring the data and acquires, in the short-term, only what it has the capacity to accept. Lower priority collections not acquired in the short-term are either deferred for possible acquisition by ICPSR at a later date or referred to another archive whenever possible.
Data Availability
- If a dataset is available at an alternative site at a reasonable cost and if there is confidence that availability will continue over time, ICPSR may lower the priority for acquiring a dataset.
- ICPSR may provide links to data available on the Internet as an alternative to physical possession of files, when long-term archival conservation will not be compromised.
Security, Privacy, and Confidentiality Considerations
- ICPSR requires that studies deposited in the archive meet recognized standards for privacy and confidentiality of subjects studied. (For information on these standards, see the University of Michigan’s Human Subjects Protection Web page, specifically the section titled “Use of Human Subjects in Research.”).
- ICPSR prefers to acquire data that can reside in the public domain.
- ICPSR requires that data intended for public use be formatted so that identifiers inadvertently included in the data can be removed using standard practices without reducing the research value of the original data.
- Any access limitations that ICPSR might apply to specific data collections (e.g., a requirement that restricted-use agreements must be signed) should be legally justified and manageable given ICPSR’s resources, goals, and mission.
Copyright and other Legal Issues
- ICPSR prefers to acquire data for which it can be discerned who has explicit or implicit intellectual property rights to make a copy of the data available for public use through ICPSR.
- ICPSR requires that the person, or institution, that has explicit or implicit intellectual property rights to data being submitted to ICPSR agree to ICPSR’s deposit terms.
- ICPSR requires the “owner” to grant permission for the data collection to be used by ICPSR for the following purposes:
- To redisseminate copies of the data collection in a variety of media formats
- To promote and advertise the data collection in any publicity (in any form) for ICPSR
- To describe, catalog, validate and document the data collection
- To store, translate, copy or re-format the data collection in any way to ensure its future preservation and accessibility
- To incorporate metadata or documentation in the data collection into public access catalogues
- To enhance, transform and/or rearrange to the data collection, including the data and metadata, for any of the following purposes: protect respondent confidentiality and/or improve use
Data Quality
- ICPSR strongly prefers data collections that have comprehensive technical documentation providing ample information on sampling procedures, weighting, recoding rules, skip patterns, constructed variables, and data collection procedures to allow users to assess the quality and analytical reliability of the data.
- ICPSR considers the acquisition of lower quality data if the data have unique historical value.
- ICPSR prefers data in the most complete and original form, with the exception of data extracts specifically intended for instructional purposes.
Data Format
- ICPSR prefers data in a readily useable format, accessible to members at a variety of computing and technological settings.
- ICPSR prefers data formats that promote easy access and use without compromising research value.
- ICPSR requires that data files deposited in a raw format be transformable or convertible into formats useable by a variety of statistical or analytical software.
- ICPSR prefers data files unaccompanied by value-added software.
Financial Considerations
- ICPSR prefers to obtain data at low or no cost; however, the value of the data to membership can outweigh acquisition and other expected costs that might be incurred in processing and preservation.
- ICPSR may acquire commercially produced data when they are available at reasonable cost to the membership.