Maximum-of-Differences Test for Comparing Multivariate K-Sample Distributions
Wei Lan, Long Feng, Runze Li, Chih-Ling Tsai
Comparing $K$-sample distributions is a fundamental problem in data science that arises in a wide variety of fields and applications. In this article, we introduce a maximum-of-differences approach to make such comparisons. Specifically, we first calculate the pairwise distances from the pooled observations of the $K$ samples. We then define the two observations as connected if their distance is less than a pre-specified threshold value. For each observation, we next calculate the ``within" and the ``between" probabilities associated with these two types of connections for the given observation, i.e., with other observations within the same sample and between the given observation and the observations in other samples. Subsequently, we propose a maximum-of-differences (MOD) test that finds the maximum value among the standardized squared differences between the ``within" and the ``between" probabilities of all observations. Accordingly, the proposed test is not only applicable to multivariate data with $K$ samples, but can also be extended to multivariate regression models. Furthermore, we obtain the covariance-adjusted (CA) version of the MOD (CA-MOD) test, which converges to the Type I extreme value distribution under some conditions. Moreover, we demonstrate the asymptotic properties of the two tests under both the null and alternative hypotheses. The performance and usefulness of the tests are illustrated via simulation studies and real examples.
Read on ELI