Dissemination - Technical options
An access policy taking into account the confidentiality level of the data must be determined for each dataset. Options include public use files, licensed files, files accessible through remote facilities, and files accessible in data enclave only.
Since it requires very little administration, this option can be advantageous for the data provider. However, it requires to have a staff trained in anonymization techniques.
The advantage for the users is that data is freely accessible, either immediately or in a very short period of time. There are disadvantages, however. The anonymization process adds noise to the data and reduces information, which in turn, can have an impact on the validity of social science analysis.
Some conditions can be associated with the dissemination of public use files. These conditions could be formulated as follows:
1. The data and other materials provided by the National Statistics Office (NSO) will not be redistributed or sold to other individuals, institutions, or organizations without the written agreement of the NSO.
2. The data will be used for statistical and scientific research purposes only. They will be used solely for reporting of aggregated information or the development of statistical models, and not for investigation of specific individuals or organizations.
3. No attempt will be made to re-identify respondents, and no use will be made of the identity of any person or establishment discovered inadvertently. Any such discovery would immediately be reported to the NSO.
4. No attempt, without prior approval, will be made to produce links among datasets provided by the NSO, or among data from the NSO and other datasets that could identify individuals or organizations.
5. Any books, articles, conference papers, theses, dissertations, reports, or other publications that employ data obtained from the National Data Archive will cite the source of data in accordance with the Citation Requirement provided with the dataset.
6. An electronic copy of all reports and publications based on the requested data will be sent to the NSO.
7. The original collector of the data, the NSO, and the relevant funding agencies bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
Some examples:
- Demonstrate a need to access confidential data in order to fulfill a stated statistical or research purpose;
- Comply with the conditions set forth in a formal (legal) data access agreement; and
- Guarantee that any resulting outputs are lawful, and that the outputs will satisfy any disclosure control policies applying to the data.
This approach makes it possible for the data depositor to release higher quality files to trusted researchers. There are, however, increased monitoring and supervision costs.
Users interested in accessing the data should submit their request using a data request form designed and provided by the data depositor. A transparent procedure for authorizing the release of data must be put in place. When access to data is granted, a data access agreement will be evidence in writing of the instructions determining the manner in which, and the purposes for which, the data are to be processed.
The conditions associated with the dissemination of licensed data files could be formulated as follows:
1. The data and other materials provided by the National Statistics Office (NSO) will not be redistributed or sold to other individuals, institutions, or organizations without the written agreement of the NSO.
2. The data will be used for statistical and scientific research purposes only. They will be used solely for reporting of aggregated information or the development of statistical models, and not for investigation of specific individuals or organizations.
3. No attempt will be made to re-identify respondents, and no use will be made of the identity of any person or establishment discovered inadvertently. Any such discovery would immediately be reported to the NSO.
4. No attempt, without prior approval, will be made to produce links among datasets provided by the NSO, or among data from the NSO and other datasets that could identify individuals or organizations.
5. Any books, articles, conference papers, theses, dissertations, reports, or other publications that employ data obtained from the National Data Archive will cite the source of data in accordance with the Citation Requirement provided with the dataset.
6. An electronic copy of all reports and publications based on the requested data will be sent to the NSO.
7. The original collector of the data, the NSO, and the relevant funding agencies bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
8. The primary and other researchers who will be involved in using the data must be identified.
9. The researchers’ organization must be identified as must a suitable representative of the organization who must be a signatory to the license.
10. The intended use of the data including a list of expected outputs and the organization’s dissemination policy must be identified.
11. A formal agreement must be signed that the files will not be shared beyond the boundaries of the organization. In the case of a blanket agreement where it is agreed that the data can be used broadly within the receiving organization in a secure manner, the receiving organization must demonstrate a capacity to manage the data files in a secure manner (with an identified individual assigned formal responsibility for doing so) and each additional new user be made aware of the terms and conditions that apply to the data files. This must be achieved by having the users sign an affidavit. Where a blanket agreement exists and data security procedures are in place, it will not be necessary for the users to destroy the data after use is complete.
Some examples:
- UK Office for National Statistics, Data Access Agreement (see Appendix B)
- Procedures for Authorising the Release of Microdata - A Case Study for the UK, by Marta Haworth, Office for National Statistics (in MS-Word)
- UK Data Archive, End User License
- ICPSR, Responsible Use Statement
- Australia Social Science Data Archive (ASSDA)
With this option, users of the data do not have access to the microdata files. There are two types of remote access facilities:
Off-line remote analysis
With off-line remote analysis, the researcher writes analysis programs (in Stata, SAS, SPSS or any other supported software), and submits them to the depositor who runs them off-line. The results are then sent back to the researcher after checking for confidentiality. To enable researchers to write and test their programs, data depositors provide them with dummy microdata files. The methodology can be referred to as "remote job execution systems." This method is rarely used. It has major disadvantages for researchers. First, results are obtained with a possibly long delay. Second, it is often difficult for a researcher to decide in advance what analytical method will be most appropriate for his purpose. In many cases, researchers will want to test various options after seeing the results of their regressions or other output. This "try and error" is difficult to implement when programs are run with delays and at a cost.
Example: Luxemburg Income Study (LIS) - See the section Database Access
On-line remote analysis
With on-line remote facilities, the researcher is granted access to a software that allows him to perform the analysis on anonymized microdata files through the internet. Conditions are such that the microdata itself cannot be downloaded. Due to software limitations, this solution offers tabulation and simple analyses only. It serves more as a preliminary study tool. Implementing this option is costly (even when the software application itself is free) and technically very complex.
Available software include:
![]() |
Nesstar Publisher is a DDI-compliant advanced data management program. It consists of data and metadata conversion and editing tools, enabling the user to prepare these materials for publication and make such resources available to a wider audience. |
![]() |
SDA (Survey Documentation and Analysis) is a DDI-compliant set of programs for the documentation and Web-based analysis of survey data. The programs also include procedures for creating customized subsets of datasets. SDA is developed and maintained by the Computer-assisted Survey Methods Program (CSM) at the University of California, Berkeley. |
![]() |
The Virtual Data Center (VDC) is a DDI-compliant web application dedicated to maintaining and disseminating collections of research studies. The VDC includes facilities for the storage, archiving, cataloging, translation and dissemination of each collection. It also includes on-line analysis, powered by the R Statistical environment. Additionally, it provides extensive support for distributed and federated collections such as location-independent naming of objects, distributed authentication and access control, federated metadata harvesting, remote repository caching, and distributed virtual collections of remote objects. |
![]() |
REDATAM is a tool dedicated to the analysis and mapping of census, survey and sectoral data at the local and regional levels. |
![]() |
SPSS Server |
![]() |
SAS Server |
In this modality, authorized researchers physically access data on a site controlled by the agency owning the data, and are monitored by its employees. The computers within the enclave are not linked to the outside world; researchers do not have email or internet access, and all analysis must be done within the enclave. Furthermore, there is an extensive review process to ensure that their work fits within the mandate of the agency owning the data. As a result, confidential data may only be used for the purpose for which the data are supplied; i.e., the approved research projects. A full disclosure review of the output is also conducted.
Although data enclaves have been effective in controlling identification risk, particularly for data sets where a confidentialized microdata file is not possible, as is the case with business data, they still require access conditions to provide an adequate level of protection. Their main weakness has been the lack of convenience for researchers who are sometimes forced to use unfamiliar data analysis software. Data enclaves are also expensive to manage compared with other options.
Example: the Interuniversity Consortium for Political and Social Research (ICPSR)
Summary of key features of different microdata dissemination options.
| Number of users | Disclosure risk | Data utility | Cost | |
| Public use files | High | Low | Low / Medium | Low |
| Licensing | Medium | Low / Medium | Medium / High | Medium |
| Remote access | Low | Low | Low / Medium | High |
| Data enclave | Very Low | Very Low | High | Very high |






