POLITECNICO DI MILANO – Department of Electronics, Information and Bio-engineering – Computer Engineering Area

DEIB – Computer Engineering – Data Management

Relational and No-SQL DBMS

Data Cleaning, integration and annotation for data analysis purposes

Data Integration, Data Lakes, Data Warehousing, Data Quality and Ethical Issues

The main phases of Data Analysis are:

  • Data extraction, transformation, loading
  • Data cleaning and annotation
  • Data integration, pruning and possibly new cleaning
  • Parameters tuning and Model training and deployment

Technological breakthroughs in machine learning (ML) and artificial intelligence (AI) require modern data management and query engines, massive data analysis and processing and massive data integration. Therefore the data that are the basis of these techniques need to be reliable, both in terms of quality and of ethics.

Fabio Azzalini, Cinzia Cappiello, Chiara Criscuolo, Camilla Sancricca, Letizia Tanca: Data Quality and Data Ethics: Towards a Trade-off Evaluation. VLDB Workshops 2023
Fabio Azzalini, Davide Piantella, Emanuele Rabosio, Letizia Tanca: Enhancing domain-aware multi-truth data fusion using copy-based source authority and value similarity. VLDB J. 32(3): 475-500 (2023)
Donatella Firmani, Letizia Tanca, Riccardo Torlone: Editorial: Special Issue on Data Quality and Ethics. ACM J. Data Inf. Qual. 14(4): 24:1-24:3 (2022)
Fabio Azzalini, Chiara Criscuolo, Letizia Tanca: FAIR-DB: A system to discover unfairness in datasets. ICDE 2022: 3494-3497
Fabio Azzalini, Songle Jin, Marco Renzi, Letizia Tanca: Blocking Techniques for Entity Linkage: A Semantics-Based Approach. Data Sci. Eng. 6(1): 20-38 (2021)
Donatella Firmani, Letizia Tanca, Riccardo Torlone: Ethical Dimensions for Data Quality. ACM J. Data Inf. Qual. 12(1): 2:1-2:5 (2020)