William F. Christensen, Brinley Zabriskie

**Abstract**

A two-tailed test comparing the means of two independent populations is perhaps the most commonly used hypothesis test in quantitative research, featured centrally in medical research, A/B testing, and throughout the sciences. When data are skewed, the standard two-tailed *t* test is not appropriate and the permutation test comparing the two means (or medians) has been a widely recommended alternative, with statistical authors and statistical software packages touting the permutation test’s utility, particularly for small samples. In this presentation, we illustrate that when the two samples are skewed and the sample sizes are unequal, the two-tailed permutation test (as traditionally implemented) can in some cases have power equal to zero, even when the *k* highest values in the combined data are all found in the group with *k* observations. Further, in many cases the standard permutation test exhibits decreasing power as the total sample size increases! We illustrate the causes of these perverse properties via both simulation and real-world examples, and we recommend approaches for ameliorating or avoiding these potential problems.