Lieber nahdran, als zu weit weg.

Ranking Agreement Measures

Methods We calculated the processing hierarchies from several ranking metrics: relative processing effects, probability of generating the best value and surface under the cumulative ranking curve (SUCRA). We estimated the degree of adequacy between treatment hierarchies in different dimensions: Kendalls- and Spearmans – correlation; and the Yilmaz and Average Overlap to give more weight at the top of the leaderboard. Finally, we studied the impact of the amount of information contained in a network on the agreement between treatment hierarchies using the average variance, the relative range of variation and the total number of samples on the number of interventions of a network. To estimate the degree of agreement between the processing hierarchies obtained with the three chosen classification methods, we used several dimensions of correlation and similarity. An increase in the rank correlation coefficient implies a growing convergence between rankings. The coefficient is within the range [1, 1] and takes value: for example, if one variable is the identity of a college basketball program and another variable is the identity of a college football program, could one test a relationship between the ranking of the two types of program: colleges with a high-level basketball program tend to have a higher level football program? A rank correlation coefficient can measure this relationship, and measuring the significance of the rank correlation coefficient can indicate whether the measured relationship is small enough to be likely a coincidence. To illustrate the impact of the amount of information on treatment hierarchies from different grading metrics, we used a network of nine antihypertensive treatments for primary cardiovascular disease prevention, which show large differences in the accuracy of overall mortality estimates.21 The network diagram and forest diagram of the relative treatment effects of each treatment versus placebo are presented in Figure 1. The reported relative treatment effects are estimated using a random NMA model. Our study shows that despite the theoretical differences between the ranking metrics and some extreme examples, they produce very similar treatment hierarchies on published networks.