【95周年校庆系列讲座】Testing Equivalence of Clustering

时间:2020-05-27         阅读:

光华讲坛——社会名流与企业家论坛第 5737 期


主题bet9TestingEquivalence of Clustering

主讲人bet9宾夕法尼亚大学 马宗明副教授

主持人统计学院 常晋源教授


直播平台及会议ID:腾讯会议,会议ID:715 897 972



Zongming Ma is an Associate Professor of Statistics at the Wharton School of the University of Pennsylvania. His research interest includes network data analysis, high dimensional statistics and nonparametric statistics. He is the recipient of a Sloan Fellowship and an NSF CAREER Award.



We test whether two datasets share a common clustering structure. As a leading example, we focus on comparing clustering structures in two independent random samples from two mixtures of multivariate Gaussian distributions. Mean parameters of these Gaussian distributions are treated as potentially unknown nuisance parameters and are allowed to differ. Assuming knowledge of mean parameters, we first determine the phase diagram of the testing problem over the entire range of signal-to-noise ratios by providing both lower bounds and tests that achieve them. When nuisance parameters are unknown, we propose tests that achieve the detection boundary adaptively as long as ambient dimensions of the datasets grow at a sub-linear rate with the sample size. The talk is based on a joint work with Chao Gao.

本文在检验两个数据集是否共享一个公共的聚类结构时,作为一个典型例子,会重点比较来自两个多元高斯混合分布的两组独立随机样本的聚类结构。这些高斯分布的均值参数会被认为是可能未知的冗余参数,并允许有差异。假设均值参数已知,我们首先可以通过提供下限和得到下限的检验来确定整个信噪比范围检验问题的相图。当冗余参数未知时,我们提出只要数据集的维数随样本量以亚线性速度增长,就可以实现检验临界值自适应化。这次报告是基于与Chao Gao的合作。