7.9
CiteScore
 
3.6
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Research Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
7.2
CiteScore
3.7
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Research Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
View/Download PDF

Translate this page into:

Research Article
ARTICLE IN PRESS
doi:
10.25259/JKSUS_1138_2025

Area under the ROC curve estimation based on ranked set sampling via genetic algorithm

Department of Statistics, Ankara University, Ankara, 06100, Turkey
Department of Statistics, Kırıkkale University, Kırıkkale, 71450, Turkey
Department of Statistics, Eskisehir Osmangazi University, Eskisehir, 26040, Turkey

* Corresponding author: E-mail address: otanju@ankara.edu.tr (Ö Gürer)

Licence
This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-Share Alike 4.0 License, which allows others to remix, transform, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.

Abstract

In this study, the problem of estimating the area under the receiver operating characteristic (ROC) curve, a widely used accuracy index in the context of medical diagnosis, is addressed under non-normality. Instead of using transformation methods such as Box-Cox, the original data is used when test scores are assumed to follow generalized logistic (GL) distribution which can effectively handle positively skewed, negatively skewed and symmetric data. In selecting the sampling units, ranked set sampling (RSS) method is used as an alternative to the conventional simple random sampling (SRS) method due to its known advantage in improving the efficiency of an estimator. In estimation phase, genetic algorithm (GA) based maximum likelihood (ML) is utilized since the likelihood equations involve nonlinear functions of distribution parameters. Unlike the classical GA, here we use a data driven search space as an efficient alternative to the fixed search space. The performances of the proposed AUC estimators are assessed in term of bias, efficiency and robustness criteria via an extensive Monte Carlo simulation study. The performances of RSS based AUC estimators are also evaluated under imperfect ranking conditions. Finally, the proposed methodology is applied to a diabetes data set to demonstrate the practical implementation of it.

Keywords

Genetic algorithm
Maximum likelihood
Monte carlo simulation
Ranked set sampling
Receiver operating characteristic curve

1. Introduction

The receiver operating characteristic (ROC) curve is a common statistical method used to evaluate the performance or accuracy of classification models in many fields of science, e.g., medicine, psychology, engineering, economics, etc. It was firstly developed to analyze radar signals during the second world war with the aim of identifying enemy objects in battlefields and has been widely used in signal detection theory since then (Marcum, 1960). In 1960, Lusted employed the ROC curve in roentgen diagnosis as one of the very first applications in the context of medical research. Then, ROC analysis has gradually found its way into the field of medical diagnostic accuracy studies, see, for example, Aegerter et al. (1994), Cantor and Kattan (2000), Zhang et al. (2021), Yang et al. (2024), etc. It is now a popular device used to measure diagnostic tests’ discrimination ability between diseased and non-diseased patients. Many diagnostic tests use continuous biomarker values as test scores, such as prostate-specific antigen (PSA) for cancer, bilirubin for liver disease, creatinine for kidney malfunction, etc. Individuals whose test scores are above (or below) a predetermined threshold value are grouped as diseased (or non-diseased). A ROC curve graphically describes the relation between the true positive rate (TPR) and false positive rate (FPR) computed over the entire range of threshold values. The diagnostic information revealed by the ROC curve is summarized by the area under the ROC curve (AUC), which is the most widely applied diagnostic accuracy index in the literature, see Bamber (1975) for the details of AUC.

AUC is defined as the probability P(X>Y) where X and Y are the test scores of randomly selected diseased and non-diseased patients, respectively. It represents the probability that a randomly chosen diseased patient has a higher test score than a randomly chosen non-diseased patient. In conventional ROC analysis, X and Y are assumed to follow a normal distribution; however, in real-life problems data often deviate from normality. Therefore, there exist many studies estimating AUC under the assumption of non-normally distributed test scores. For example, Campbell and Ratnaparkhi (1993) derived the ROC curve under the assumption of the Lomax distribution and used ML estimators of the related parameters to calculate AUC. Faraggi and Reiser (2002) compared two non-parametric and two parametric methods to estimate AUC by using Box-Cox transformation for non-normal cases in the parametric approaches. Bandos et al. (2006) discussed the resampling methods such as bootstrap, jackknife, and permutations in the context of AUC. Molodianovitch et al. (2006) extended the approach by Wieand et al. (1989) to non-normal data by applying a Box-Cox transformation to compare the areas under the two correlated ROC curves. Vardhan et al. (2012) focused on AUC estimation using normal, exponential and Weibull distributions and compared the performances of proposed methodologies via a simulation study. Ch et al. (2022) reported the results corresponding to AUC of the bi-generalized exponential ROC model.

In all of the above-mentioned studies about AUC estimation, simple random sampling (SRS) which is a well-accepted and practical sampling method based on the assumption that each unit in the population has an equal probability of being selected is used. However, reducing sample size, cost, time and effort can be considerably important in clinical trials due to expensive medical equipment, expensive laboratory analyses and lack of expert users to perform these analyses, etc. Ranked set sampling (RSS), on the other hand, is a cost-effective alternative to SRS when measurement of the variable of interest is either difficult or costly, but ranking can be performed easily, see McIntyre (1952). In this ranking process, exact measurement of the units is not performed. However, they are ranked by using visual assessment, subjective judgment, prior information, or an auxiliary variable that can readily be observed and correlated with the variable of interest, etc. In recent years, RSS has gained attention and become a desirable sampling scheme in the context of AUC estimation, especially in the field of medicine, since it yields highly efficient estimators at low costs. For example, Mahdizadeh and Zamanzade (2021) developed a non-parametric estimator of AUC using kernel density estimation based on multistage RSS. Mahdizadeh and Zamanzade (2022) developed three estimators for AUC based on RSS, one of which is under the normality assumption and the others are obtained through a Box-Cox transformation on data. Moon et al. (2022) proposed an empirical likelihood method to construct confidence intervals for AUC based on RSS. Abdallah (2023) proposed concomitant-based estimators for AUC based on paired RSS. Akbari Ghamsari et al. (2024) proposed four different approaches for AUC estimation based on nomination sampling which is regarded as a variation of RSS.

In this study, we have three motivations for estimating AUC under non-normality. Our primary motivation is to use the original data rather than the transformed data to avoid information loss in the data set and also to avoid difficulties in interpreting the results. Therefore, AUC is estimated under the assumption that the distributions of the test scores X and Y are generalized logistic (GL). The reason for assuming GL distribution for the test scores is that it is a flexible distribution that can be used in modeling negatively skewed, positively skewed and symmetric data sets. Also, GL distribution reduces to Logistic distribution when b=1 , which is a plausible alternative to the widely used normal distribution with its symmetric shape and heavier tails. Our second motivation is to use the RSS sampling method as an efficient alternative to the commonly used SRS sampling method for estimating the AUC. The efficiencies of AUC estimators are then compared with respect to the mentioned sampling methods. Our final motivation in this study is the use of the genetic algorithm (GA), a population-based metaheuristic method, in solving likelihood equations that include nonlinear expressions of distribution parameters with an aim to avoid the drawbacks of traditionally used derivative-based numeric methods such as getting stuck in a local optimum, non-convergence and mathematical difficulties. The performance of GA relative to traditional numeric methods in obtaining ML estimates of AUC is also evaluated.

The subsequent sections of this paper are arranged as follows. In Section 2, the related material and methods are briefly introduced. In Section 3, the steps of GA based ML estimation of AUC under RSS are explained in detail. Section 4 is reserved to the Monte Carlo simulation study that examines the performance of the proposed estimator. Section 5 investigates the performances of RSS based estimators under imperfect ranking conditions. Section 6 presents an application of the proposed methodology to a real data set. Finally, in Section 7, the study is summarized by highlighting its remarkable points.

2. Materials and Methods

In this section, brief descriptions of GL distribution, ROC curve, RSS method and GA method are given.

2.1 GL distribution

Let X denote a random variable following GL distribution with location parameter μ, scale parameter σand shape parameter b. Its probability density function (PDF) is expressed as given in Eq. (1)

(1)
fX x=bσ e xμσ 1+e xμσ b+1 ; <x<; <μ<;σ>0

and the corresponding cumulative distribution function (cdf) is given in Eq. (2)

(2)
FX(x)= 1+e xμσ b .

Table 1 gives the skewness β1  and kurtosis β2 values of GL distribution for various representative values of the shape parameter b.

Table 1. Skewness and kurtosis values of GL distribution.
b 0.5 1 2 4 6
β1 -0.86 0 0.33 0.75 0.92
β2 5.40 4.20 4.33 4.76 4.95

Table 1 clearly shows that GL is a highly flexible distribution representing negatively skewed, positively skewed and symmetric distributions depending on whether its shape parameter b < 1 , b > 1 or b = 1 , respectively. Also, its kurtosis is always greater than 3 (leptokurtic) as seen from Table 1. For b = 1 , it reduces to the logistic distribution, which is widely used as a long-tailed alternative to the well-known and widely used normal distribution, see, for example, Berkson (1951).

Fig. 1 shows the PDF plots of the standard GL distribution for the values of the shape parameter b given in Table 1.

The PDF plots of the GL μ=0, σ=1, b distribution.
Fig. 1.
The PDF plots of the GL μ=0, σ=1, b distribution.

2.2 ROC curve

Let X and Y be the two independent random variables and represent the test scores of diseased and non-diseased patients, respectively. Suppose that a patient is classified as diseased if the corresponding test score exceeds a discriminating threshold value c and as non-diseased otherwise. Sensitivity and specificity are two basic quantities in ROC analysis, which are defined as the correct diagnosis rates of the diseased and non-diseased patients, respectively. A ROC curve illustrates the relation between 1-specificity (or FPR) versus sensitivity (or TPR) for all possible values of c. TPR and FPR at point c are defined as given in Eq. (3)

(3)
TPR c=1FX c and FPR c=1FY c

where FX and FY are the cdfs of X and Y, respectively. Then, ROC curve is defined as a plot of 1FY c,1FX c or, equivalently, a plot of t, 1FX FY1 1t where t=FPR c

for c , .

2.3 Ranked set sampling

The concept of RSS was firstly suggested by McIntyre (1952) to estimate the mean pasture yields as a cost-effective alternative to SRS methodology especially for the cases where the sampling units are easily ranked than quantified. Halls and Dell (1966) was first to coin the name "RSS" and applied this methodology in a study of forage yields. Takahasi and Wakimoto (1968) studied the mathematical theory of RSS under perfect ranking assumption and showed that sample mean of an RSS data is an unbiased estimator of the population mean and more efficient than the estimator obtained by using an SRS data of the same size. Dell and Clutter (1972) demonstrated that the sample mean of an RSS data remains unbiased and more efficient than that of an SRS data even under imperfect ranking. In recent years, there has been a noticeable expansion in the literature regarding RSS, see, for example, Helu et al. (2010), Ayech and Ziou (2015), Mahdizadeh and Tamandi (2016), Zamanzade and Wang (2018), Wang et al. (2020), Taconeli and Giolo (2020), Pedroso et al. (2021), Azimian et al. (2022), Ozturk et al. (2023), Qui and Raqab (2024), etc.

RSS procedure is described as follows:

  • Step 1: Draw m sets of size m via SRS.

  • Step 2: Rank units within each set in ascending order using an easily observed criterion, without doing any exact measurement.

  • Step 3: Select the smallest from set 1, the second smallest from set 2, and the largest from the set m for exact measurement.

  • Step 4: Repeat the first three steps for r cycles to construct a sample of size n=m×r.

Here, m and r represent the set and cycle sizes, respectively. The resulted sample is independently and non-identically distributed (inid) and it is denoted by X iic , i=1,, m; c=1,,r. It represents the ith order statistic associated with the ith set in cycle c. Hereafter, Xic will be used as a shorthand for X iic . Table 2 summarizes the RSS procedure for a better understanding.

Table 2. RSS procedure.
Ranked units Selected units
cycle 1 set 1 X 1 11 ,X 2 11 ,,X m 11 X 1 11 or X 11
set 2 X 1 21 ,X 2 21 ,,X m 21 X 2 21 or X 21
set m X 1 m1 ,X 2 m1 ,,X mm1 X mm1 or Xm1
cycle r set 1 X 1 1r ,X 2 1r ,,X m 1r X 1 1r or X 1r
set 2 X 1 2r ,X 2 2r ,,X m 2r X 2 2r or X 2r
set m X 1 mr ,X 2 mr ,,X mmr X mmr or Xmr

2.4 Genetic algorithm

GA, first introduced by Holland (1975), is a population-based metaheuristic method. Later, a comprehensive study on this subject was presented by Goldberg (1989). Inspired by Darwinian evolutionary principles, this method is based on the principle that individuals who are superior to others in the population are more likely to survive and transfer their genetic heritage to the next generations. Similarly, GA works on a set of possible solutions in the search space, called as the population, for the optimization of an objective function. In order to mimic the biological process, it utilizes genetic operators (such as selection, crossover, mutation, etc.) to these possible solutions while obtaining the best or approximately the best solution for this objective function. In addition, GA can easily overcome the problem of being trapped in local optimum due to its operators, which enhance the genetic diversity within the population. The steps of GA can be listed as follows.

  • Step 1: Prespecify the initial settings of GA parameters such as the search space, crossover probability, mutation probability, elitism, and population size.

  • Step 2: Generate a population from the prespecified search space.

  • Step 3: Calculate and evaluate the fitness (or objective) function values of each individual in the population.

  • Step 4: Perform elitism operator, which transmits a set of individuals with the highest fitness values to the next generation.

  • Step 5: Perform selection, crossover and mutation operators to generate a new population.

  • Step 6: Repeat steps 3-5 until the stopping condition, predefined as the difference between the fitness values obtained in successive generations being less than the convergence tolerance, is met.

3. GA based ML Estimation

This section is reserved for the details of the estimation procedure of AUC. Here, test scores of diseased (X) and non-diseased (Y) individuals are assumed to follow a GL distribution with the assumption of known shape parameters. First, the unknown distribution parameters of the GL distribution are estimated via ML methodology based on RSS in subsection (3.1), then AUC is estimated based on these estimates in subsection (3.2).

3.1 Estimation of GL parameters

Let Xic , i=1,, mx; c=1,,rx and Yjk , j=1,, my; k=1,,ry be the ranked set samples from GL μx,σx,bx and GL μy,σy,by , respectively. Here, nx=mx rx and ny=my ry are the sample sizes, mx and my are the set sizes, rx and ry are the cycle sizes.

The PDFs of Xic and Yjk are obtained as given in Eqs. (4) and (5)

(4)
f Xic xic = mx! i1 ! mxi ! FX xic i1 1FX xic mxi fX xic

and

(5)
f Yjk yjk = my! j1 ! myj ! FY yjk j1 1FY yjk myj fY yjk ,

respectively. Here, fX . and fY . are the pdfs of GL μx,σx,bx and GL μy,σy,by distributions, respectively, and FX . and FY . are the corresponding cdfs. In order to estimate the unknown distribution parameters, the likelihood (L) function is written as given in Eq. (6)

(6)
L=LXLY

where LX= c=1 rx i=1 mx f Xic xic and LY= k=1 ry j=1 my f Yjk yjk . The corresponding log-likelihood (lnL) is then given in Eq. (7)

(7)
lnL=lnLX+lnLY.

Differentiating lnL with respect to μx and σx and equating to zero yields the following likelihood Eqs. (8) and (9)

(8)
lnL μx = 1 σx c=1 rx i=1 mx i1 f zic F zic c=1 rx i=1 mx mxi f zic 1F zic + c=1 rx i=1 mx f zic f zic =0

and

(9)
  lnL σx = 1 σx nx+ c=1 rx i=1 mx i1 zic f zic F zic c=1 rx i=1 mx mxi zic f zic 1F zic + c=1 rx i=1 mx zic f zic f zic =0,

respectively. Here, zic = xic μx σx , μx is the location and σx is the scale parameter of GL μx,σx,bx distribution. Similarly, the following likelihood Eqs. (10) and (11) obtained for the parameters μy and σy

(10)
lnL μy = 1 σy k=1 ry j=1 my j1 f wjk F wjk k=1 ry j=1 my myj f wjk 1F wjk + k=1 ry j=1 my f wjk f wjk =0

and

(11)
lnL σy = 1 σy ny+ k=1 ry j=1 my j1 wjk f wjk F wjk k=1 ry j=1 my myj wjk f wjk 1F wjk + k=1 ry j=1 my zjk f wjk f wjk =0, 

respectively. Here, wjk = yjk μy σy , μy is the location and σy is the scale parameter of GL μy,σy,by distribution.

The likelihood given in Eqs. (8)-(11) do not yield closed-form solutions owing to their nonlinearity in the parameters. Therefore, numeric methods are needed to solve them. Here, we resort to GA to obtain the ML estimates of μx , σx , μy and σy as parallel to its wide usage in the literature and also to avoid the limitations of the derivative-based numeric methods addressed in Section (1).

The selection of initial parameters (population size, search space, crossover, mutation operators, etc.) is a crucial step in the implementation of GA. Especially, the appropriate selection of the search space plays a critical role for enhancing the efficiency and convergence rate of GA. Yalçınkaya et al. (2018) demonstrated that using confidence intervals based on modified maximum likelihood (MML) estimators as a search space yields more efficient results than the corresponding arbitrarily chosen fixed search space, see also Acitas et al. (2019), Yalçınkaya et al. (2021), and Yalçınkaya et al. (2024) in the context of the proposed search space and see Tiku (1967) in the context of MML methodology.

In this study, we adopt the methodology proposed by Yalçınkaya et al. (2018) and use the following confidence intervals given in Eqs. (12) and (13) for the parameters μx and σx

(12)
μ^x,MML zα/2 V^ar μ^x,MML ,μ^x,MML +zα/2 V^ar μ^x,MML

and

(13)
σ^x,MML zα/2 V^ar σ^x,MML ,σ^x,MML +zα/2 V^ar σ^x,MML ,

respectively, as the search space. Here, μ^x,MML and σ^x,MML are the MML estimators of GL distribution parameters under RSS and they are given in Eq. (14)

(14)
μ^x,MML = Kx Dx σ^x,MML and σ^x,MML = Bx+ Bx2 +4 nx Cx 2 nx nx1

where Kx= c=1 rx i=1 mx δi x ic Mx , Mx=rx i=1 mx δi , Dx= i=1 mx Δi i=1 mx δi , Bx= c=1 rx i=1 mx Δi x ic Kx , Cx= c=1 rx i=1 mx δi x ic Kx 2 , δi= i1 β 1i + mxi β 2i +β 3i and Δi= i1 α 1i mxi α 2i +α 3i .

The details regarding the derivation of MML estimators for the distribution parameters of GL under RSS are given in the Appendix for maintaining the coherence of the main text. It should also be noted that V^ar μ^x,MML and V^ar σ^x,MML in Eqs. (12) and (13) are calculated by using Monte Carlo simulation with 10,000 replications, consistent with the main simulation study. Since the search space for μy and σy is obtained in a similar fashion, it is not given here for the sake of brevity.

3.2 Estimation of AUC

AUC is the most frequently employed index summarizing a diagnostic test’s discriminatory accuracy. It is calculated by integrating the area under the ROC curve as shown in Eq. (15)

(15)
AUC=P X>Y = 0 1 1FX c d 1FY c

or equivalently as given in Eq. (16),

(16)
AUC= 0 1 1FX FY1 1t dt,

see subsection (2.2) for the details of ROC curve. Notice that larger values of AUC indicate higher diagnostic accuracy. Fig. 2 illustrates three hypothetical cases: (i) An ideal test with AUC=1 (line A), (ii) A typical ROC curve with 0.50<AUC<1 (curve B) and (iii) An uninformative test with AUC=0.50 (line C).

Three hypothetical cases illustrated by three ROC curves: (i) An ideal test with AUC=1 (line A), (ii) A typical ROC curve with 0.50<AUC<1 (curve B) and iii An uninformative test with AUC=0.50 (line C).
Fig. 2.
Three hypothetical cases illustrated by three ROC curves: (i) An ideal test with AUC=1 (line A), (ii) A typical ROC curve with 0.50<AUC<1 (curve B) and iii An uninformative test with AUC=0.50 (line C).

In our case, the cdfs FX x and FY y corresponding to GL μx,σx,bx and GL μy,σy,by distributions, respectively, are incorporated into (16), and AUC is calculated as given in Eq. (17)

(17)
AUC= 0 1 1 1+exp uμx σx bx dt,

where u=μy+σyln 1/ 1t 1/by 1 , see Eq. (2) for the cdf of GL distribution. In computing AUC, GA based ML estimates of μx,σx, μy and σy under RSS are plugged into Eq. (17) and the resulting estimator of AUC is denoted by AU^CMLGA . SRS counterpart of this estimator is denoted by AU^CMLGA* and proposed for the first time in this study. In its computation, firstly MML estimators of the location and scale parameters of GL distribution under SRS are taken from Şenoğlu and Tiku (2001) and then, confidence intervals based on them are constructed to define the search space for GA. Here, we resort to numeric integration methods in order to solve the integral in Eq. (17) since it cannot be solved explicitly.

4. Monte Carlo Simulation Study

In this section, the efficiencies of AU^CMLGA and AU^CMLGA* are compared with the corresponding ML estimators obtained using traditional numeric methods, i.e., AU^CML and AU^CML* , respectively. It should be noted here that in computing the ML estimates of AUC via the traditional numeric methods, optim() function with BFGS algorithm in R statistical software is utilized. In the comparison of these estimators,

(18)
biasθ^= 1s i=1 s θ^iθ   and   MSEθ^=Varθ^+bias2 θ^

criteria are used where θ^ and s are the estimators of the parameter θ and the number of Monte Carlo runs, respectively, see Eq. (18).

Table 3 provides some representative values of the location, scale and shape parameters of GL distribution corresponding to the diseased X population, i.e., GL μx,σx,bx , so as to produce AUC of 0.5, 0.7 and 0.9. The location and scale parameters corresponding to the non-diseased (Y) population are taken as (μy,σy)=(0,1) for all cases. The shape parameter values for both diseased X and non-diseased Y populations are assumed to be equal in each scenario, i.e., bx=by=b.

Table 3. Parameter values for GL μx,σx, bx distribution giving specified AUC values; when μy,σy = 0,1 and bx=by .
AUC μx,σx,  bx
0.5 (0, 1, 0.5), (0, 1, 1), (0, 1, 4)
0.7 (1.7, 1, 0.5), (3.75, 2, 0.5), (1.27, 1, 1), (1.97, 2, 1), (0.95, 1, 4), (0.38, 1.5, 4)
0.9 (4.4, 1, 0.5), (8.5, 2, 0.5), (3.2, 1, 1), (5, 2, 1), (2.42, 1, 4), (1.84, 2, 4)

Also, the following cycle, set and sample sizes are considered throughout the simulation study.

r = 5, 10
(mx,my) = (3,3), (3,5), (5,3), (5,5)
(nx,ny) = (15,15), (15,25), (25,15), (25,25)

The GA parameters are set in accordance with the values widely used in the literature as follows: population size = 250, probability of crossover = 0.9, probability of mutation = 0.1, elitism = 8 and selection is "roulette-wheel", see Gen and Cheng (1999), Haupt and Haupt (2004) and Eiben and Smith (2015). Note that the search space for the unknown parameters of GL distribution is given in Eqs. (12) and (13).

All computations in the Monte Carlo simulation study are conducted in R software. Simulated bias and MSE values for the estimators of AUC obtained from 10,000 Monte Carlo runs are given in Table S1. It is clear from the results that all AUC estimators have negligibly small bias values in each scenario. Efficiency comparisons according to sampling and estimation methods are given in detail as follows.

Table S1

According to sampling methods: RSS based AUC estimators perform better than their SRS analogs for all cases. The superiority of the proposed method becomes increasingly evident with larger set sizes in most of the scenarios. When the variances of test scores of diseased and non-diseased populations are different, it is observed that the efficiencies of AUC estimators obtained using RSS and SRS sampling methods increase when the sample sizes are determined directly proportional to the variances. It should also be realized for these cases that the efficiency gain under RSS is greater than the efficiency gain under SRS in the supplementary materials (Table S1).

According to estimation methods: To provide fair and consistent comparisons for the efficiencies of the AUC estimators according to the estimation methods, line plots are used (Figs. 3 and 4).

MSE values vs. set sizes mx,my for the RSS based AUC estimators AU^CML and AU^CML−GA under μy,σy = 0,1 with bx=by=b.
Fig. 3.
MSE values vs. set sizes mx,my for the RSS based AUC estimators AU^CML and AU^CMLGA under μy,σy = 0,1 with bx=by=b.
MSE values vs. sample sizes nx,ny for the SRS based AUC estimators AU^CML* and AU^CML−GA* under μy,σy = 0,1 with bx=by=b.
Fig. 4.
MSE values vs. sample sizes nx,ny for the SRS based AUC estimators AU^CML* and AU^CMLGA* under μy,σy = 0,1 with bx=by=b.

Efficiency comparisons according to estimation methods are done separately for RSS and SRS as follows:

i Under RSS scheme, the performance of AU^CMLGA  is similar to that of AU^CML and for AUC=0.9 however, there are still instances where AU^CMLGA performs better than AU^CML . As the distributions of the diseased and non-diseased populations move closer and the AUC becomes 0.7, AU^CMLGA is more efficient than AU^CML in many of the cases especially when b=1. When the distributions of the diseased and non-diseased populations fully overlap, i.e., AUC=0.5 , AU^CMLGA outperforms AU^CML in most of the cases (Fig. 3).

ii Under SRS scheme, when the overlapping area of distributions of the diseased and non-diseased populations is the smallest, i.e., AUC=0.9 , AU^CMLGA* is more efficient than its rival in many of the scenarios, especially for the symmetric case b=1 . When AUC=0.7 , AU^CMLGA* is better than AU^CML* in each scenario for the symmetric case b=1 . For non-symmetrical cases (b=0.5 and 4), AU^CMLGA* still outperforms AU^CML* in many cases however, the performances of the two methodologies are much closer to each other than the previous case. When the distributions of the diseased and non-diseased populations are exactly the same, i.e., AUC=0.5 , AU^CMLGA* is more efficient than its rival for almost all cases. Its superiority is more obvious for b=0.5 (Fig. 4).

It should be realized that when the distributions of the diseased and non-diseased populations become more similar, the discrimination ability between the two groups decreases, making AUC estimation more complex. Simulation results reveal that the GA performs relatively better in such challenging scenarios since it explores the parameter space more broadly than the conventional optimization methods and is therefore more likely to find better solutions.

In light of these results, it is recommended to prioritize RSS based GA over traditional numeric methods in obtaining the ML estimates of AUC.

4.1 Robustness

An estimator is regarded as robust if it achieves full efficiency under the assumed distribution and retains a high level of efficiency under plausible alternatives, see Tiku et al., (1986). In this part of the study, we only investigate the robustness property of the proposed estimator AU^CMLGA , since it was previously shown to be the most efficient among its rivals, i.e., AU^CML , AU^CML* , AU^CMLGA* .

Here, the distributions of the diseased X and non-diseased Y populations are assumed to be identical so that the resulting AUC equals 0.50 for illustration. In this context, the true models for both X and Y are assumed to be GL μ =  0, σ =  1, b =  1 distribution and the following sample models are considered as plausible alternatives:

Dixon’s outlier model

Model 1: mp observations come from (GL(μ=0, σ=1, b=1)  and p observations come from GL(μ=0, σ=2, b=1), p= 0.5+0.1m .

Mixture model

Model 2: Mixture model: 0.90 GL(μ=0, σ=1, b=1)+0.10 GL (μ=0, σ=2, b=1)JK1138_205 - Copy.eps]

Contamination model

Model 3: 0.90 GL(μ=0, σ=1, b=1)+0.10 Uniform(1,1)

Model misspecification

Model 4: GL μ=0, σ=1, b=2

Model 5: Generalized Gamma a=1, d=1.83,p=2

Model 6: Skew Normal μ=0, σ=1,λ=2

where m is the set size and is the integer value function. See Stacy (1962) and Azzalini (1985) for the details of the Generalized Gamma and Skew Normal distributions, respectively.

Table 4 presents the simulated bias and MSE values of the RSS based AUC estimators obtained from 10,000 Monte Carlo runs.

Table 4. Simulated bias and MSE values of AU^CMLGA for the Models 1-6; μx,σx = μy,σy = 0,1 , AUC=0.5, r=5, mx ,  my = 5,5 .
True model Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
bias -0.0015 0.0014 -0.0003 -0.0018 0.0015 -0.0009 -0.0009
MSE 0.0022 0.0021 0.0020 0.0023 0.0022 0.0023 0.0023

Simulation results show that AU^CMLGA retains a high level of efficiency for the Models 1-6, therefore it is concluded that AU^CMLGA is robust to the plausible deviations from the assumed model.

5. Imperfect Ranking

The previous parts of the study rely on the assumption of perfect ranking. Nevertheless, there may be cases where there exist errors in the ranking process of RSS since the ranking is conducted without doing any exact measurement as mentioned earlier. Therefore, in this part of the study, how imperfect ranking affects the efficiencies of AU^CMLGA and AU^CML is investigated.

In this section, we also consider the conventional unbiased non-parametric AUC estimator, which is closely related to the Mann–Whitney U statistic, see Bamber (1975), Hanley and McNeil (1982), Faraggi and Reiser (2002), Qin and Zhou (2006). Under the RSS framework it is defined as given in Eq. (19)

(19)
AU^CN= 1 mx rx my ry c=1 rx i=1 mx k=1 ry j=1 my I(Xic >Yjk )  

where I. denotes the indicator function. Sengupta and Mukhuti (2008) demonstrated that AU^CN remains unbiased and exhibits higher efficiency than its SRS counterpart, even when the ranking is imperfect. To make the simulations more comprehensive, this estimator is also included into our study.

When comparing the efficiencies of estimators under the presence of ranking error, the simulation procedure introduced by Dell and Clutter (1972) is used. In this approach, random ranking errors are introduced by adding independent noise terms to the true measurements. First, random samples Xtic , t,i=1,, mx; c=1,,rx and Ysjk , s, j=1,, my; k=1,,ry are generated from GL μx,σx,bx and GL μy,σy,by distributions, respectively. Then, random errors etic and esjk corresponding to Xtic and Ysjk are generated from N 0,σ2 and summed as shown in Eqs. (20) and (21):

(20)
Xtic * =Xtic +etic ,t,i=1,,mx;c=1,,rx, 

(21)
Ysjk * =Ysjk +esjk ,s,j=1,,my;k=1,,ry.

Here, Xtic* and Ysjk* are called as the concomitants of Xtic and Ysjk , respectively. They are ranked in ascending order and the corresponding Xtic and Ysjk values are selected to construct the RSS sample, see subsection (3.2). Then repeat this process for each value of σ2 =0, 0.3, 0.6 and 1. It is obvious that the parameter σ2  controls the magnitude of ranking error. When σ2 =0 , there is no ranking error, and it represents the perfect ranking case. As σ2 increases, the ranking error becomes more apparent, representing higher degrees of imperfect ranking. In other words, σ2 is a direct measure of the ranking error, with larger values indicating greater imperfection in ranking.

In the simulation study, we use the same settings yielding AUC=0.7 and take the cycle size as 5 just for an illustration, see Section (4) in the context of the simulation setup for the case of interest. Table 5 shows the related bias and MSE values under imperfect ranking.

Table 5. Simulated bias and MSE values for AUC estimators under imperfect ranking; AUC=0.7, μy,σy = 0,1 , bx=by=b, r=5.
AU^CN
AU^CML
AU^CMLGA
AU^CN
AU^CML
AU^CMLGA
mx,my σ2 bias MSE bias MSE bias MSE bias MSE bias MSE bias MSE
b=0.5,  μx,σx = 1.70,1 b=1, μx,σx = 1.27,1
(3,3) 0 -0.0004 0.0052 0.0037 0.0048 -0.0007 0.0046 -0.0011 0.0050 0.0036 0.0048 0.0037 0.0042
0.3 -0.0003 0.0053 0.0007 0.0049 -0.0010 0.0050 0.0003 0.0057 -0.0002 0.0051 0.0009 0.0049
0.6 0.0010 0.0056 0.0002 0.0051 0.0030 0.0049 -0.0002 0.0058 -0.0036 0.0054 -0.0040 0.0053
1 -0.0009 0.0060 -0.0041 0.0053 -0.0025 0.0050 -0.0003 0.0062 -0.0086 0.0056 -0.0062 0.0050
(3,5) 0 0.0016 0.0039 0.0036 0.0036 0.0047 0.0035 0.0010 0.0036 0.0025 0.0035 0.0054 0.0034
0.3 -0.0008 0.0041 0.0006 0.0037 0.0022 0.0033 -0.0008 0.0039 -0.0028 0.0035 0.0007 0.0035
0.6 0.0004 0.0043 -0.0028 0.0038 -0.0028 0.0037 -0.0006 0.0042 -0.0078 0.0038 -0.0076 0.0037
1 -0.0005 0.0046 -0.0061 0.0041 -0.0038 0.0038 -0.0010 0.0045 -0.0123 0.0041 -0.0126 0.0038
(5,5) 0 -0.0001 0.0021 0.0020 0.0019 0.0005 0.0020 -0.0002 0.0021 0.0018 0.0020 0.0020 0.0019
0.3 0.0006 0.0022 -0.0017 0.0021 -0.0030 0.0020 -0.0011 0.0024 -0.0042 0.0022 -0.0015 0.0021
0.6 -0.0003 0.0025 -0.0059 0.0023 -0.0057 0.0022 0.0002 0.0028 -0.0097 0.0024 -0.0105 0.0023
1 0.0002 0.0027 -0.0107 0.0025 -0.0100 0.0023 -0.0002 0.0030 -0.0163 0.0028 -0.0179 0.0026
b=0.5,  μx,σx = 3.75,2 b=1, μx,σx = 1.97,2
(3,3) 0 0.0002 0.0059 0.0047 0.0050 0.0042 0.0049 -0.0010 0.0057 0.0042 0.0051 0.0029 0.0048
0.3 -0.0018 0.0058 0.0037 0.0051 0.0042 0.0050 -0.0012 0.0058 0.0021 0.0052 0.0013 0.0048
0.6 0.0004 0.0059 0.0024 0.0051 0.0003 0.0051 0.0010 0.0061 0.0008 0.0053 0.0006 0.0052
1 0.0006 0.0062 0.0021 0.0052 -0.0001 0.0051 -0.0007 0.0063 -0.0025 0.0054 -0.0046 0.0051
(3,5) 0 -0.0005 0.0053 0.0049 0.0047 0.0067 0.0044 -0.0006 0.0050 0.0037 0.0045 0.0024 0.0044
0.3 -0.0016 0.0055 0.0037 0.0046 0.0052 0.0046 0.0008 0.0052 0.0021 0.0046 0.0014 0.0041
0.6 -0.0011 0.0055 0.0032 0.0047 0.0034 0.0046 0.0004 0.0053 0.0000 0.0046 -0.0021 0.0043
1 -0.0006 0.0056 0.0005 0.0047 -0.0008 0.0047 0.0003 0.0055 -0.0024 0.0047 -0.0018 0.0046
(5,3) 0 0.0007 0.0029 0.0019 0.0026 0.0018 0.0024 0.0011 0.0030 0.0029 0.0026 0.0019 0.0023
0.3 -0.0005 0.0031 0.0016 0.0026 0.0011 0.0024 0.0008 0.0032 0.0003 0.0027 0.0009 0.0025
0.6 -0.0002 0.0031 0.0003 0.0025 0.0017 0.0025 -0.0002 0.0034 -0.0025 0.0029 -0.0034 0.0025
1 0.0002 0.0032 -0.0012 0.0027 -0.0039 0.0026 0.0010 0.0035 -0.0037 0.0029 -0.0029 0.0027
(5,5) 0 -0.0002 0.0025 0.0024 0.0021 0.0020 0.0020 -0.0002 0.0024 0.0021 0.0021 0.0040 0.0020
0.3 0.0001 0.0026 0.0008 0.0021 0.0021 0.0021 -0.0008 0.0025 -0.0003 0.0022 0.0007 0.0021
0.6 -0.0006 0.0026 -0.0007 0.0022 -0.0022 0.0020 -0.0006 0.0027 -0.0030 0.0023 -0.0053 0.0023
1 -0.0010 0.0027 -0.0026 0.0022 -0.0023 0.0020 -0.0003 0.0029 -0.0067 0.0024 -0.0056 0.0022
AU^CN
AU^CML
AU^CMLGA
mx,my σ2 bias MSE bias MSE bias MSE
b=4, μx,σx = 0.95,1
(3,3) 0 -0.0009 0.0050 0.0042 0.0049 0.0052 0.0048
0.3 -0.0011 0.0057 -0.0019 0.0053 -0.0014 0.0050
0.6 -0.0007 0.0063 -0.0050 0.0055 -0.0071 0.0052
1 0.0018 0.0065 -0.0122 0.0059 -0.0123 0.0057
(3,5) 0 0.0003 0.0032 0.0030 0.0030 0.0048 0.0029
0.3 -0.0012 0.0039 -0.0057 0.0034 -0.0042 0.0033
0.6 0.0002 0.0043 -0.0120 0.0037 -0.0110 0.0039
1 0.0002 0.0047 -0.0190 0.0043 -0.0199 0.0040
(5,5) 0 0.0006 0.0020 0.0018 0.0019 0.0032 0.0019
0.3 0.0002 0.0027 -0.0076 0.0023 -0.0089 0.0023
0.6 0.0002 0.0031 -0.0149 0.0026 -0.0165 0.0025
1 -0.0001 0.0034 -0.0230 0.0032 -0.0201 0.0028
b=4, μx,σx = 0.38,1.5
(3,3) 0 0.0011 0.0052 0.0047 0.0049 0.0095 0.0050
0.3 -0.0004 0.0056 -0.0016 0.0051 0.0009 0.0050
0.6 0.0006 0.0060 -0.0046 0.0053 -0.0005 0.0052
1 -0.0003 0.0064 -0.0078 0.0056 -0.0096 0.0057
(3,5) 0 0.0005 0.0039 0.0031 0.0038 0.0010 0.0036
0.3 -0.0011 0.0044 -0.0027 0.0039 -0.0015 0.0037
0.6 0.0009 0.0047 -0.0072 0.0042 -0.0044 0.0042
1 0.0010 0.0051 -0.0121 0.0045 -0.0129 0.0040
(5,3) 0 0.0003 0.0033 0.0031 0.0031 0.0024 0.0032
0.3 -0.0012 0.0037 -0.0021 0.0034 -0.0019 0.0030
0.6 -0.0002 0.0041 -0.0072 0.0036 -0.0053 0.0037
1 0.0002 0.0045 -0.0131 0.0037 -0.0096 0.0035
(5,5) 0 0.0002 0.0021 0.0029 0.0020 0.0003 0.0019
0.3 -0.0007 0.0025 -0.0049 0.0022 -0.0041 0.0021
0.6 -0.0004 0.0028 -0.0114 0.0025 -0.0113 0.0025
1 -0.0007 0.0032 -0.0175 0.0028 -0.0172 0.0026

Table 5 clearly demonstrates that the RSS based estimators of AUC have much better performance than their SRS based competitors in all cases, (Table S1, supplementary material). The relative efficiencies of RSS based estimators with respect to SRS based estimators slightly decrease as the error in ranking increases as expected. However, even with severe errors in ranking, RSS based estimators still perform much better than their rivals.

It is also obvious that AU^CMLGA outperforms AU^CN in all cases. Moreover, AU^CMLGA shows better performance than AU^CML in almost all scenarios for the symmetric case (b=1). For the non-symmetric cases (b=0.5 and b=4 ), although their performances are closer overall, there are still many instances where AU^CMLGA is better than AU^CML . Particularly in equal variance cases, the relative efficiency of AU^CMLGA over AU^CML becomes even more pronounced for σ2 =1, where the error in ranking is the highest.

6. Application

Diabetes is a chronic metabolic disorder defined by persistently increased blood glucose levels that is often caused by the pancreas's failure to produce enough insulin, body cells' resistance to insulin or a combination of both. It constitutes one of the major causes of death and disabilities such as blindness, renal failure, lower limb amputation, nerve damage, etc. According to International Diabetes Federation, diabetes affects approximately 532 million adults worldwide, thus imposing a heavy cost to health care systems. Therefore, there exists a vast literature on prevention, early diagnosis, and treatment of diabetes, most of which reinforce the importance of understanding the risk factors. Many studies have shown that body mass index (BMI), a measure of weight adjusted for height, can be regarded as a biomarker for diabetes.

In this part of the study, a data set taken from the National Health and Nutrition Examination Survey (NHANES) is used to illustrate the implementation of the proposed methodology given in Section (3). The data for the periods 2009-2010 and 2011-2012 are provided in NHANES package in R statistical software. In this study, we focus on BMI and Weight variables from the NHANES data for the period 2011-2012 and aim to estimate AUC=P(X>Y), where X and Y denote the BMI values of individuals with and without diabetes, respectively. Here, Weight is used as the concomitant of BMI since they are highly correlated, demonstrating that the use of RSS is particularly appropriate for this practical example. Mahdizadeh and Zamanzade (2022) also used the same data to estimate AUC based on RSS by using Box-Cox transformation to achieve normality. Different than their study, here we used the original data to avoid the disadvantages caused by the use of transformed data and showed that GL distribution provides good fit for both X and Y (Fig. 5).

GL Q-Q plots for the BMI values of the diseased and non-diseased patients; bx=by=10. (a) Diseased patients (X), (b) Non-diseased patients (Y).
Fig. 5.
GL Q-Q plots for the BMI values of the diseased and non-diseased patients; bx=by=10. (a) Diseased patients (X), (b) Non-diseased patients (Y).

The above mentioned NHANES data with a size of 779 for the diseased and 7817 for the non-diseased population is taken as the hypothetical population. In estimating AUC, RSS samples are drawn from this hypothetical population by taking set and cycle sizes as mx=my=5 and rx=ry=10 , respectively. SRS samples of sizes nx=ny=50 are drawn from the same population in order to make a fair comparison. Then, a Monte Carlo simulation study based on NHANES data is carried out. RSS and SRS based AUC estimates are computed based on 10,000 samples. In order to compare the performances of the proposed AUC estimators with the existing ones, bias and MSE values are calculated (Table 6). Here, it should be noted that the real AUC is 0.78, which is calculated based on the entire information from the hypothetical population.

Table 6. Bias and MSE values for the AUC estimators for the NHANES data.

RSS
SRS
AU^CML AU^CMLGA AU^CML* AU^CMLGA*
bias -0.0177 (0.0314) -0.0159 (0.0291) 0.0011 (0.0458) 0.0012 (0.0447)
MSE 0.0013 0.0011 0.0021 0.0020
Numbers in parentheses show the standard errors.

All estimators have negligibly small bias values, as obviously seen from Table 6. In view of efficiency, RSS based AUC estimators perform better than their SRS based counterparts. Moreover, AU^CMLGA and AU^CMLGA* are more efficient than AU^CML and AU^CML* , respectively. These results are also consistent with the simulation outcomes.

Estimating the AUC is crucial in medical diagnostics, as it reflects the ability of a test to discriminate between diseased and non-diseased patients. The use of RSS instead of the conventional SRS in the context of AUC estimation provides higher efficiencies at lower costs especially when the test scores are more easily ranked than quantified, which is the case for the present diabetes application.

7. Conclusions

AUC estimation plays a significant role in medical diagnostic studies, which is generally based on the normality assumption. However, test scores in medical diagnostic applications commonly follow non-normal distributions. Therefore, in this study, it was assumed that the test scores of diseased and non-diseased individuals follow GL distribution due to its flexible tails. Moreover, RSS method is preferred in selecting the sampling units as an alternative to the traditional SRS method, since cost-effectiveness is highly critical in clinical trials. Here, GA based ML estimation is employed in estimating the AUC corresponding to GL distributed test scores under RSS. An extensive Monte Carlo simulation is conducted to assess the performance of the proposed estimator in terms of bias, efficiency and robustness criteria. In the simulation study, comparisons are made with respect to sampling methods and estimation methods. First, RSS based estimators are compared with the traditionally used SRS based estimators. Then, ML estimators obtained by using GA are compared with the corresponding ML estimators obtained by using the conventional numeric methods. Simulation results show that the use of RSS significantly improve efficiency compared to SRS, as expected. Also, GA demonstrates superior performance compared to traditional numeric methods in obtaining ML estimates of AUC in majority of the cases. Furthermore, robustness part of the simulation study shows that the proposed AUC estimator retains a high level of efficiency under plausible deviations from the underlying model. The efficiencies of RSS based estimators are also evaluated under imperfect ranking conditions and it is found that AU^CMLGA outperforms its parametric and non-parametric counterparts in most of the cases. The application of proposed methodology to a diabetes data set further confirms its practical utility in real world scenarios. Overall, it is recommended to prioritize GA based ML for AUC estimation, as it generally outperforms its competitor in most of the cases.

Acknowledgement

The authors would like to thank the reviewers and the editor for their insightful comments and suggestions, which significantly improved the manuscript.

CRediT authorship contribution statement

All authors contributed equally to this manuscript.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data avalibility

The data used in this study are publicly available and can be obtained from the NHANES package of R statistical software.

Declaration of generative AI and AI-assisted technologies in the writing process

The authors confirm that there was no use of artificial intelligence (AI)-assisted technology for assisting in the writing or editing of the manuscript and no images were manipulated using AI.

Supplementary Data

Supplementary material to this article can be found online at https://dx.doi.org/10.25259/JKSUS_1138_2025.

Appendix

References

  1. . More efficient estimators of the area under the receiver operating characteristic curve in paired ranked set sampling. Stat Methods Med Res. 2023;32:1217-1233. https://doi.org/10.1177/09622802231167434
    [Google Scholar]
  2. , , . A new approach for estimating the parameters of Weibull distribution via particle swarm optimization: An application to the strengths of glass fibre data. Reliability Eng Syst Saf. 2019;183:116-127. https://doi.org/10.1016/j.ress.2018.07.024
    [Google Scholar]
  3. , , , . Evaluation of screening methods for Down's syndrome using bootstrap comparison of ROC curves. Comput Methods Programs Biomed. 1994;43:151-157. https://doi.org/10.1016/0169-2607(94)90065-5
    [Google Scholar]
  4. , , . Using nomination sampling in estimating the area under the ROC curve. Comput Stat. 2024;39:2721-2742. https://doi.org/10.1007/s00180-023-01409-6
    [Google Scholar]
  5. , . Segmentation of Terahertz imaging using k-means clustering based on ranked set sampling. Expert Syst with Applications. 2015;42:2959-2974. https://doi.org/10.1016/j.eswa.2014.11.050
    [Google Scholar]
  6. , , , . Ranked set sampling in finite populations with bivariate responses: An application to an osteoporosis study. Stat Med. 2022;41:1397-1420. https://doi.org/10.1002/sim.9285
    [Google Scholar]
  7. . A class of distributions which includes the normal ones. Scandinavian J Statistics. 1985;12:171-178. https://www.jstor.org/stable/4615982
    [Google Scholar]
  8. . The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol. 1975;12:387-415. https://doi.org/10.1016/0022-2496(75)90001-2
    [Google Scholar]
  9. Bandos, A. I., Rockette, H. E., Gur, D., 2006. Resampling methods for the area under the roc curve. Proceedings of the 3rd International Workshop on ROCML (ROCML 2006), Pittsburgh, PA, USA, June 29, 2006, pp. 1-8.
  10. . Why i prefer logits to probits. Biometrics. 1951;7:327. https://doi.org/10.2307/3001655
    [Google Scholar]
  11. , . An application of lomax distributions in receiver operating characteristic(roc)curve analysis. Commun Stat - Theory Methods. 1993;22:1681-1687. https://doi.org/10.1080/03610929308831110
    [Google Scholar]
  12. , . Determining the area under the ROC curve for a binary diagnostic test. Med Decis Making. 2000;20:468-470. https://doi.org/10.1177/0272989X0002000410
    [Google Scholar]
  13. , , . Experimental results of the AUC of the bi-generalized exponential roc model using spread sheet functions. International Journal of Information Research and Review. 2022;09:7433-7439. https://www.ijirr.com/sites/default/files/issues-pdf/3852.pdf
    [Google Scholar]
  14. , . Ranked set sampling theory with order statistics background. Biometrics. 1972;28:545. https://doi.org/10.2307/2556166
    [Google Scholar]
  15. , . Introduction to evolutionary computing. New York: Springer; . https://doi.org/10.1007/978-3-662-44874-8
  16. , . Estimation of the area under the ROC curve. Stat Med. 2002;21:3093-3106. https://doi.org/10.1002/sim.1228
    [Google Scholar]
  17. , . Genetic algorithms and engineering optimization. John Wiley & Sons, Inc.; New York; .
  18. . Genetic algorithms in search, optimization, and machine learning. Addison Wesley Reading Mass; .
  19. , . Trial of ranked-set sampling for forage yields. Forest Science. 1966;12:22-26. https://doi.org/10.1093/forestscience/12.1.22
    [Google Scholar]
  20. , . The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29-36. https://doi.org/10.1148/radiology.143.1.7063747
    [Google Scholar]
  21. , . Practical genetic algorithms. Hoboken, New Jersey: John Wiley Sons., Inc.; .
  22. , , . Bayes Estimation of weibull distribution parameters using ranked set sampling. Commun. Stat. - Theory Methods. 2010;39:2533-2551. https://doi.org/10.1080/03610920903061039
    [Google Scholar]
  23. . An introductory analysis with applications to biology, control, and artificial intelligence. Adaptation In Natural and Artificial Systems (First Edition). Usa: The University of Michigan; .
  24. . Logical analysis in roentgen diagnosis. Radiology. 1960;74:178-193. https://doi.org/10.1148/74.2.178
    [Google Scholar]
  25. , . A new approach to parameter estimation in ranked set sampling. International Journal of Statistics Economics. 2016;17:40-49. http://www.ceser.in/ceserp/index.php/bse/article/view/4034
    [Google Scholar]
  26. , . Smooth estimation of the area under the ROC curve in multistage ranked set sampling. Stat Papers. 2021;62:1753-1776. https://doi.org/10.1007/s00362-019-01151-6
    [Google Scholar]
  27. , . On estimating the area under the ROC curve in ranked set sampling. Stat Methods Med Res. 2022;31:1500-1514. https://doi.org/10.1177/09622802221097211
    [Google Scholar]
  28. . A statistical theory of target detection by pulsed radar. IEEE Trans Inform Theory. 1960;6:59-267. https://doi.org/10.1109/tit.1960.1057560
    [Google Scholar]
  29. . A method for unbiased selective sampling, using ranked sets. Aust J Agric Res. 1952;3:385-390. https://doi.org/10.1071/ar9520385
    [Google Scholar]
  30. , , . Comparing the areas under two correlated ROC curves: Parametric and non-parametric approaches. Biom J. 2006;48:745-757. https://doi.org/10.1002/bimj.200610223
    [Google Scholar]
  31. , , . Empirical likelihood inference for area under the receiver operating characteristic curve using ranked set samples. Pharm Stat. 2022;21:1219-1245. https://doi.org/10.1002/pst.2230
    [Google Scholar]
  32. , , . Models for cluster randomized designs using ranked set sampling. Stat Med. 2023;42:2692-2710. https://doi.org/10.1002/sim.9743
    [Google Scholar]
  33. , , . Estimation based on ranked set sampling for the two-parameter Birnbaum--Saunders distribution. J Stat Computation Simulation. 2021;91:316-333. https://doi.org/10.1080/00949655.2020.1814287
    [Google Scholar]
  34. , . Empirical likelihood inference for the area under the ROC curve. Biometrics. 2006;62:613-622. https://doi.org/10.1111/j.1541-0420.2005.00453.x
    [Google Scholar]
  35. , . On weighted extropy of ranked set sampling and its comparison with simple random sampling counterpart. Commun Stat - Theory Methods. 2024;53:378-395. https://doi.org/10.1080/03610926.2022.2082478
    [Google Scholar]
  36. , . Unbiased estimation of P (X> Y) using ranked set sample data. Statistics. 2008;42:223-230. https://doi.org/10.1080/02331880701823271
    [Google Scholar]
  37. . A generalization of the gamma distribution. Ann Math Statist. 1962;33:1187-1192. https://doi.org/10.1214/aoms/1177704481
    [Google Scholar]
  38. , . Analysis of variance in experimental design with non-normal error distributions. Commun Stat - Theory Methods. 2001;30:1335-1352. https://doi.org/10.1081/STA-100104748
    [Google Scholar]
  39. , . Maximum likelihood estimation based on ranked set sampling designs for two extensions of the Lindley distribution with uncensored and right-censored data. Comput Stat. 2020;35:1827-1851. https://doi.org/10.1007/s00180-020-00984-2
    [Google Scholar]
  40. , . On unbiased estimates of the population mean based on the sample stratified by means of ordering. Ann Inst Stat Math. 1968;20:1-31. https://doi.org/10.1007/bf02911622
    [Google Scholar]
  41. . Estimating the mean and standard deviation from a censored normal sample. Biometrika. 1967;54:155-165. https://doi.org/10.1093/biomet/54.1-2.155
    [Google Scholar]
  42. , , . Robust Inference. New York: Marvel Delker. Inc.; .
  43. , , . Estimation of area under the roc curve using exponential and Weibull distributions. Bonfring International Journal of Data Mining. 2012;2:52. https://doi.org/10.9756/BIJDM.1362
    [Google Scholar]
  44. , , , . Using ranked set sampling with binary outcomes in cluster randomized designs. Can J Statistics. 2020;48:342-365. https://doi.org/10.1002/cjs.11533
    [Google Scholar]
  45. , , , . A family of non-parametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989;76:585-592. https://doi.org/10.1093/biomet/76.3.585
    [Google Scholar]
  46. , , . A new approach using the genetic algorithm for parameter estimation in multiple linear regression with long-tailed symmetric distributed error terms: An application to the Covid-19 data. Chemometr Intell Lab Syst. 2021;216:104372. https://doi.org/10.1016/j.chemolab.2021.104372
    [Google Scholar]
  47. , , . Estimating the parameters of generalized logistic distribution via genetic algorithm based on reduced search space. J Math Sci. 2025;289:28-44. https://doi.org/10.1007/s10958-024-07088-y
    [Google Scholar]
  48. , , . Maximum likelihood estimation for the parameters of skew normal distribution using genetic algorithm. Swarm Evolutionary Computation. 2018;38:127-138. https://doi.org/10.1016/j.swevo.2017.07.007
    [Google Scholar]
  49. , , , , . Transformed ROC curve for biomarker evaluation. Stat Med. 2024;43:5681-5697. https://doi.org/10.1002/sim.10268
    [Google Scholar]
  50. , . Proportion estimation in ranked set sampling in the presence of tie information. Comput Stat. 2018;33:1349-1366. https://doi.org/10.1007/s00180-018-0807-x
    [Google Scholar]
  51. , , , . ROC curve analysis of electrophysiological monitoring and early warning during intracranial aneurysm clipping. World Neurosurg. 2021;155:e49-e54. https://doi.org/10.1016/j.wneu.2021.07.131
    [Google Scholar]
Show Sections