ANON-KI

We build methods to anonymize longitudinal and event-history personal data. Focus: Statistical Disclosure Control, Synthetic Data Generation (SDG), robust risk measurement, and conceptual research on identity & anonymity.

Factsheet

  • Schools involved School of Engineering and Computer Science
  • Institute(s) Institut für Optimierung und Datenanalyse IODA
  • Funding organisation SNSF
  • Duration (planned) 01.11.2023 - 31.10.2026
  • Head of project Prof. Dr. Murat Sariyar
  • Project staff Marko Miletic
  • Partner Fachhochschule Nordwestschweiz (FHN (Leading House)

Situation

Modern applications rely on data analysis and ML/DL, often using personal data with repeated measurements over time (longitudinal/event-history). Such data has high value for research and industry but is strictly regulated (e.g., Swiss FADP). History shows that removing direct identifiers (name, address, AHV no.) is insufficient to prevent re-identification. While SDC methods and software exist for cross-sectional data, there are major gaps for longitudinal settings (e.g., mobility, health trajectories, COVID-tracking). An integrated computational and methodological framework that unites anonymization, risk measurement (re-identification risk vs. utility), and SDG for complex time-dependent data is missing. This project addresses that gap to enable safe, useful data sharing for research, industry, and public bodies.

Course of action

We develop a modular framework for longitudinal and event-history data: Modeling & preprocessing: harmonization, timeline mapping, episode building, feature engineering. Anonymization methods: time-aware generalization, microaggregation, differential-privacy-inspired mechanisms, sequential perturbation, and SDG (generative models) for realistic yet protective synthetic trajectories. Risk measurement: metrics for linkage, inference, and trajectory re-ID risks; systematic utility–risk trade-off analyses. Evaluation: extensive simulations and case studies (health, mobility); benchmark suites and reproducibility. Conceptual research: formal notions of identity, pseudonymity, and de-facto anonymity in longitudinal structures; governance guidance for releasing non-aggregated datasets. Transfer: open-source components, documentation, best-practice guides, and stakeholder workshops.

Looking ahead

The outcomes enable safe individual-level analyses (e.g., regression, event analysis) with reduced disclosure risk. Next steps: integration into data centers and secure data spaces, support for approval workflows, standardized risk/utility reporting, expanded SDG for multimodal data (sensors, text, images), and tighter links to legal/organizational governance. Long term, the project strengthens trustworthy, data-driven innovation across research and industry.

This project contributes to the following SDGs

  • 3: Good health and well-being
  • 9: Industry, innovation and infrastructure
  • 10: Reduced inequalities