Measuring the information loss

Most anonymization techniques consist in reducing the level of detail in the information provided, or in suppressing information. Therefore, they typically result in a loss of information. The challenge for the statistician is to strike a proper balance between the conflicting objectives of reducing the disclosure risk and minimizing this loss.

Various methods are available to assess the information loss. For categorical data, these methods include direct comparison, comparison of contingency tables, and entropy-based measures. For continuous data, the methods include comparisons of mean square, mean absolute, and mean variation.

Information on the techniques is available in the following documents: