
Contact: Letizia Tanca
Mail: letizia.tanca@polimi.it
Application field
Technology
Activity
DEIB – Computer Engineering – Data Management
Relational and No-SQL DBMS
Data Cleaning, integration and annotation for data analysis purposes
Keywords
Data Integration, Data Lakes, Data Warehousing, Data Quality and Ethical Issues
Operation of the proposed solution
The main phases of Data Analysis are:
- Data extraction, transformation, loading
- Data cleaning and annotation
- Data integration, pruning and possibly new cleaning
- Parameters tuning and Model training and deployment
Challenges
Technological breakthroughs in machine learning (ML) and artificial intelligence (AI) require modern data management and query engines, massive data analysis and processing and massive data integration. Therefore the data that are the basis of these techniques need to be reliable, both in terms of quality and of ethics.
Bibliography
Fabio Azzalini, Cinzia Cappiello, Chiara Criscuolo, Camilla Sancricca, Letizia Tanca: Data Quality and Data Ethics: Towards a Trade-off Evaluation. VLDB Workshops 2023
Fabio Azzalini, Davide Piantella, Emanuele Rabosio, Letizia Tanca: Enhancing domain-aware multi-truth data fusion using copy-based source authority and value similarity. VLDB J. 32(3): 475-500 (2023)
Donatella Firmani, Letizia Tanca, Riccardo Torlone: Editorial: Special Issue on Data Quality and Ethics. ACM J. Data Inf. Qual. 14(4): 24:1-24:3 (2022)
Fabio Azzalini, Chiara Criscuolo, Letizia Tanca: FAIR-DB: A system to discover unfairness in datasets. ICDE 2022: 3494-3497
Fabio Azzalini, Songle Jin, Marco Renzi, Letizia Tanca: Blocking Techniques for Entity Linkage: A Semantics-Based Approach. Data Sci. Eng. 6(1): 20-38 (2021)
Donatella Firmani, Letizia Tanca, Riccardo Torlone: Ethical Dimensions for Data Quality. ACM J. Data Inf. Qual. 12(1): 2:1-2:5 (2020)
Video