Preparing for Deposit with ICPSR

Preparing your data for deposit can be smooth and straightforward if you follow a few best practices. Think of it like packing a suitcase for a trip—being organized and including all the essentials will make the journey easier for everyone who uses your data.

Deposits should include all data and documentation necessary for others to independently read and interpret the data. At minimum, ICPSR requires that you submit data files, documentation files (such as codebooks, user guides, or questionnaires), and descriptive information about your study and methodology. Some important considerations and guidelines are below. For a quick checklist of everything you’ll need to start your deposit, see the Depositor Checklist. As you begin preparing your deposit, double-check your informed consent or Institutional Review Board (IRB) documentation (or the Terms of Use if you gathered your data from existing sources) to ensure your data can be shared.

This page walks you through preparing your data for archiving with ICPSR. For additional information on data management, reference ICPSR’s Guide to Social Science Data Preparation and Archiving.


Maintain a List of Publications

Help others find your data by keeping a list of any publications related to your data – these references can be included with your deposit.

Additional Information

Looking for more information about data sharing? Check out other resources from ICPSR about sharing data


Data & Files: Organization & Preparation

Plan for data input and format (numeric or character). Determine how you will check for errors, inconsistencies, and version management. For example, archives increasingly use checksums and other techniques to ensure integrity.

ICPSR can accept data organized as:

“Flat” or Rectangular Files:

  • Data organized in long records, often starting with an ID followed by variables.
  • Suitable for most datasets and easy to read by analytic programs.

Hierarchical Files:

  • Efficient for large datasets with many empty fields, like detailed surveys with varying numbers of respondents’ children.
  • Stores data in a header record and multiple secondary records, saving space but needing more complex programming.
  • Alternatively, separate files for different records can be used, (e.g., respondents and children), providing flexibility and easier analysis.

Relational Databases:

  • Collections of linked data tables using key variables (e.g., “Family ID”).
  • Allow for specific queries and data combinations from multiple tables.
  • Recommend exporting as flat files and using SQL to preserve table relationships.

Longitudinal/Multi-Wave Study Files:

  • Data collected from the same participants over multiple times or waves, usually organized as hierarchical files.
  • Must maintain consistent file information, use linking identifiers, and align variable labels and values across waves for ease of data comparison

For quantitative data, submit files in SAS, SPSS, Stata, or ASCII (with setup syntax files). For qualitative data, submit files in plain text (*.txt), rich text (*.rtf), scanned image of text with OCR (*.pdf), or Microsoft Word (*.doc, *.docx). Other formats are also accepted. Ensure each variable has clear, exclusive codes and labels. Define any missing data codes. Follow the Depositor Checklist or contact ICPSR staff at ICPSR-help@umich.edu for guidance in preparing your data

Provide full documentation such as codebooks, data collection instruments, summary statistics, and project summaries. Documentation should integrate question text with variable information where possible. Common documentation formats are PDF, .doc, .xls(x), etc.

Deidentifying Sensitive Data

It is crucial to handle research data with care to protect participant confidentiality​​. During the planning phase, ensure that data sharing complies with participants’ consent and IRB requirements. Keep data secure. When preparing for data sharing, deidentify any variables that might compromise confidentiality. The good news is: ICPSR reviews all data it receives for disclosure risk. Read more about data confidentiality at ICPSR.

ICPSR’s Disclosure Risk Guide for Data Depositors includes remediation suggestions to handle both indirect and direct identifiers. For an overview of the management of restricted-use data, please refer to ICPSR’s Restricted-use Data Deposit and Dissemination Procedures (pdf).

If your data include sensitive questions or contextual details that are analytically important but might increase the chance that a participant could be reidentified, ICPSR will recommend releasing a restricted-use version of the data.

Metadata: Maximizing Data Usefulness

Metadata, or detailed information about data collections, are crucial for maximizing their usefulness. They allow users to understand and use the data without needing to contact the data producers. Good metadata standardize data descriptions, improve understanding, facilitate searches, and enhance web display.

At ICPSR, metadata are created primarily from information provided by data producers and metadata specialists. Data producers should submit the following at minimum:

  • Clear and consistent titles (i.e., Title, Location, and Years would appear on the ICPSR site as “Aging in Women [United States], 2005-2006”).
  • Project description, including goals, main topics, and methodology
  • Private Investigator (PI) names and organizational affiliations (ICPSR uses the Virtual International Authority File to match names)
  • Dates of data collection
  • Intended unit of analysis (who or what is being studied)
  • Sample description
  • Universe description
  • Project/study website, if available
  • Funding source(s) and grant number(s)

Please review the ICPSR Metadata Documentation Portal to learn more about the study-level metadata to include with your data deposit.

Maintain a List of Publications

Keep a list of any publications related to your data – these references can be included with your deposit. Any publications included in your deposit will be added to the ICPSR Bibliography of Data-related Literature, helping others find your data.

Additional Information

Looking for more information about data sharing? Check out the resources below.

  • FAIR Principles – internationally accepted guidelines for managing and sharing scientific data.
  • Data Documentation Initiative (DDI) – an international standard for describing data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. DDI can document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving.

Contact us if you have questions about preparing your deposit.