TIGP (SNHCC) -- Generative Data Science towards Trustworthy Data Collaboration
- LecturerDr. Chi-Hua Wang (Department of Statistics and Data Science, University of California, Los Angeles (UCLA))
Host: TIGP (SNHCC) - Time2025-09-15 (Mon.) 09:00 ~ 11:00
- LocationGoogle Meet
Live Stream
Abstract
The modern digital economy increasingly depends on data collaboration, yet concerns about privacy, fairness, and regulatory compliance hinder the direct sharing of sensitive data. This talk revisits the role of generative data science in enabling trustworthy collaboration by focusing on synthetic tabular data generation and its integration with differential privacy. We begin by motivating why data collaboration has become essential for branding, marketing, and digital platforms under concurrent regulations such as the EU Digital Markets Act and GDPR. We then survey the progress of generative modeling for tabular data, highlighting representative approaches including GAN-based (CTGAN), diffusion-based (TabDDPM), and language-based (GReaT) methods, and discuss how fidelity, utility, and privacy jointly determine the quality of synthetic data. Finally, we examine differential privacy as a legal and technical standard for protecting personally identifiable information, and explore its application to statistics, machine learning models, and generative frameworks. Together, these perspectives outline a pathway towards building a principled framework of generative data science that supports trustworthy data collaboration in both academic and industrial contexts.