Some collections are better for clustering than others. This can be measured through any of several tests of cluster tendency.

One such test of the cluster hypothesis is the overlap test in which it is seen what fraction of the RR (relevant to relevant) distribution overlaps with the RNR (relevant to non-relevant) distribution.

Other tests are the Voorhees' nearest neighbor test that indicates how many of n nearest neighbors are relevant, and the density test which measures the
total number of postings /
number of documents * number of terms