Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Use of administrative data for research and for planning services has increased over recent decades due to the value of the large, rich information available. However, concerns about the release of sensitive or personal data and the associated disclosure risk can lead to lengthy approval processes and restricted data access. This can delay or prevent the production of timely evidence. A promising solution to facilitate more efficient data access is to create synthetic versions of the original datasets which are less likely to hold confidential information and can minimise disclosure risk. Such data may be used as an interim solution, allowing researchers to develop their analysis plans on non-disclosive data, whilst waiting for access to the real data. We aim to provide an overview of the background and uses of synthetic data and describe common methods used to generate synthetic data in the context of UK administrative research. We propose a simplified terminology for categories of synthetic data (univariate, multivariate, and complex modality synthetic data) as well as a more comprehensive description of the terminology used in the existing literature and illustrate challenges and future directions for research.

More information Original publication

DOI

10.23889/ijpds.v7i1.1727

Type

Journal article

Publication Date

2022-01-01T00:00:00+00:00

Volume

7

Keywords

administrative datasets, data confidentiality, data linkage, data utility, statistical disclosure control, synthetic data, Humans, Disclosure, Research Personnel