Translate this page into:
Investigating the impact of measurement error on AUC estimation using hybrid ROC models: An application to medical and industrial domains
*Corresponding author: E-mail address: siva.g@vitap.ac.in (S G)
-
Received: ,
Accepted: ,
Abstract
This study addresses the challenge of estimating the Area under the curve (AUC) in the presence of measurement errors (MEs) within two hybrid receiver operating characteristic (ROC) frameworks that incorporate non-normal distributions. Specifically, we examine the combinations of Half-Normal & Exponential, and Half-Normal & Rayleigh distributions. For each hybrid ROC curve, a generalized bias-corrected approximation to the AUC is derived to mitigate the systematic underestimation induced by measurement errors. The performance of the proposed estimators is examined under two classification scenarios: (i) best-case settings characterized by well-separated populations and (ii) worst-case settings involving substantial distributional overlap. The practical applicability and robustness of the proposed methodology are illustrated through real datasets, demonstrating its effectiveness in recovering the true discriminative ability in hybrid and error-contaminated environments.
Keywords
AUC
Bias-corrected approximation
Hybrid ROC curve
Measurement error
Mean squared error
1. Introduction
Accurate assessment of diagnostic performance is a crucial aspect of clinical decision-making, as it directly impacts the ability to distinguish between individuals with and without disease. The receiver operating characteristic (ROC) curve serves as a fundamental tool for evaluating the performance of such diagnostic tests by plotting the true positive rate (TPR) against the false positive rate (FPR) across various threshold settings. A key summary measure derived from the ROC curve is the Area under the Curve (AUC), which quantifies the overall diagnostic accuracy of a classifier. According to Bamber (1975) the AUC can be interpreted as ”, representing the probability that a randomly selected diseased individual will have a higher test score than a healthy individual.
Traditionally, the ROC curve has been studied under both parametric and non-parametric frameworks. The “bi-normal ROC model” (Egan 1975) is one of the most widely used parametric approaches, where it is assumed that both healthy and diseased populations follow the normal distribution. However, in real-life scenarios, the markers often deviate from normality, exhibiting characteristics such as skewness, heavy tails, or asymmetry. To overcome these limitations, several non-normal ROC models have been proposed by various authors, including England (1988), Campbell and Ratnaparkhi (1993), Hussian (2012) and more. For a comprehensive overview of bi-distributional ROC models, readers may refer to the Balaswamy and Vishnu Vardhan (2016). It has also been noted that, in many practical scenarios, the distribution of diseased population tends to follow non-normal distributions due to various statistical properties like huge variability in data, skewed nature and many more. Taking this into account, Balaswamy and Vishnu Vardhan developed the Hybrid ROC models, which assume that the markers from the healthy and diseased populations follow different distributions (Balaswamy et al., 2015; Balaswamy and Vardhan, 2015a, 2015b, 2015c).
Although hybrid ROC models enhance flexibility in modeling heterogeneous distributions across populations, their diagnostic performance can still be significantly affected by measurement errors (MEs), which are commonly encountered in real-world data. These errors may arise from various sources, such as equipment malfunction, human observation inconsistencies, or environmental influences, all of which contribute to deviations between observed and true values. The presence of MEs poses a significant concern in ROC analysis, as it increases the variance of test outcomes and introduces systematic bias, leading to a downward shift in the AUC of the ROC curve. Consequently, the true accuracy of a test is often underestimated. To address the distortion introduced by MEs, several bias-correction approaches have been developed over the years. Initially, Coffin and Sukhatme (1996, 1997) proposed bias-corrected estimators for the AUC under both parametric and non-parametric frameworks. Building upon this methodology, several authors have developed various approaches to handle MEs under normality assumptions (Kim and Gleser 2000; Faraggi 2000; Reiser 2000; Dunn 1989; Schisterman et al., 2001; Siva and Vishnu Vardhan 2022; Siva et al., 2022; Siva et al., 2023). These methods have provided a broader perspective on ROC analysis under measurement error conditions. Later, Wang and Feng (2023) developed a skew-normal framework that corrects diagnostic accuracy measures in the presence of ME for skewed biomarkers.
However, a key limitation is that these approaches rely on the assumption that both populations follow the same (normal) distribution. For example, Kim and Gleser (2000) applied the simulation extrapolation (SIMEX) approach to deal the MEs, but their methods were restricted to normally distributed biomarkers. More recently, Siva and Vishnu Vardhan (2022) and Siva et al., (2023) developed mixture-based bias-corrected estimators to handle MEs in both univariate and multivariate mixture ROC frameworks under the assumption of normal distribution. This assumption does not hold in hybrid settings, where the healthy and diseased populations follow different and often asymmetric distributions. In such cases, the propagation of ME interacts differently across populations, and normal based corrections fail to recover the true AUC, leading to biased and inefficient estimates. For instance, when one population follows a Half-Normal distribution and the other an Exponential distribution, symmetric corrections derived under normality are inadequate to capture the heterogeneous error structure. This highlights the necessity of developing hybrid-specific bias-corrected estimators.
Although the existing literature of hybrid ROC models offers greater flexibility and reflects real-world data characteristics, their behavior under MEs has not been widely explored, primarily due to the inherent complexity of methodology. To address this gap, in this paper we derived a generalized bias-corrected approximation for AUC estimation under MEs in the hybrid ROC framework. This correction approach is developed for two hybrid ROC curves: the half-normal & exponential (HE) and the half-normal & rayleigh (HR) hybrid ROC curves. These hybrid ROC curves were selected due to their analytical tractability, as they possess closed-form AUC expressions in existing literature, which facilitates the development of bias-corrected estimators under measurement errors. Moreover, they are well-suited for modeling asymmetric distributions commonly encountered in practical classification tasks, particularly in medical and industrial domains.
The remainder of the paper is organized as follows: Section 2 presents the formulation and derivation of the HE and HR Hybrid ROC curves and details the proposed bias correction methodology under MEs. Section 3 outlines the simulations and summarizes the results. Section 4 demonstrates the practical applicability of proposed methods through real datasets. Finally, Section 5 concludes the study with a summary.
2. Materials and Methods
2.1 HE hybrid ROC curve
Let and be the two markers considered from the health (H) and disease (D) populations, respectively. Here, is assumed to follow a Half-Normal distribution, i.e., , and is assumed to follow an Exponential distribution, i.e., . In this setting, and denote the scale parameters associated with respective distributions. Based on these distributional assumptions, the intrinsic measures of the hybrid ROC curve arising from the Half-Normal and Exponential model are defined as follows
where ‘t’ denotes the threshold value, which can be obtained by using the expression in Eq. (1), i.e., . Substituting the expression ‘t’ into Eq. (2), then the expression of the HEROC curve is defined as
where, is the inverse cumulative standard normal distribution function and , are the scale parameters of H and D populations, respectively. The AUC expression for the HEROC curve can be obtained by integrating the Eq. (3) over (0, 1).
here, is the AUC of HEROC curve. Let then
Assume
By substituting the above expressions in , the integral will become
where, . On further simplification using Mathematica, one can get the AUC expression as follows, in Eq. (4)
2.2 HR hybrid ROC curve
In this setup, and , where , are the scale parameters of the Half-Normal and Rayleigh distributions, respectively. The intrinsic measures for this hybrid ROC curve are defined as
From Eq. (5), the expression for t is and after simplifying Eq. (6), the expression for the HR hybrid ROC curve is obtained as
where, . The AUC expression of this HRROC curve is obtained by integrating the Eq. (7) and is defined as as
here, is the AUC expression of HRROC curve, on further simplification, the closed form for is as follows
Using the maximum likelihood estimates of the parameters and , the natural estimators of and are given by from Eq. (4 and 8)
With the help of Taylor series expansion, the expected values of and can be easily expressed as , where, m and n denote the number of samples in H and D populations, respectively. However, if the sample observations are measured with errors, then the expected values of may not be reliable and lead to biased outcomes. In order to correct this bias in we derived bias-corrected approximations, which are clearly explained in the following sections.
2.3 Proposed bias- corrected approximations
2.3.1 Estimation of AUC of HEROC curve under MEs
Let the actual values and are observed with additive MEs and , respectively. Thus, the error-contaminated observations are defined as
where & represents the contaminated observations of H and D populations, respectively. In such case, the natural estimator of AUC can be obtained as
here, is the estimated AUC of the HEROC curve under MEs. Now, the expected value of becomes , where denotes the constant-order of the bias term introduced by the contaminated observations, which does not vanish as the sample size increases. To correct the bias in , we consider
Where, , denotes the cumulative distribution function of Y and is the density function of . In Eq. (9), the integral part cannot be evaluated directly since it involves both s and a. Therefore, we utilize the standard Taylor series expansion to separate the term into two parts and the expansion term of around is (see, e.g., Rudin (1976))
Upon substituting Eq. (10) in to Eq. (9), we obtain
Now, the bias can be estimated using provided that the remaining term in the earlier expression is sufficiently small to be disregarded.
where is the actual/true AUC and represents the bias (loss of information/accuracy). Thus, the approximate bias in using to estimate is
Therefore, the bias-corrected approximation for is
2.3.2 Estimation of AUC of HRROC curve under MEs
Based on the methodology presented in Section (2.3.1), the bias term for the HRROC curve under MEs is derived as follows
where,
Therefore, the bias-corrected estimator for is
3. Simulation Studies
An extensive simulation study was carried out to evaluate the performance of the proposed bias-corrected AUC estimators. Both hybrid ROC curves were assessed under two classification scenarios- best-case (well-separated populations) and worst-case (substantial overlap) across varying sample sizes and levels of MEs, thereby reflecting a broad range of classification challenges commonly encountered in practice. For each scenario, 1,000 Monte Carlo simulation replications were performed to compute the AUC values and to estimate the bias and mean squared error (MSE) of the estimators.
3.1 Best classification scenario
The best-case classification scenario represents an ideal setting in which the underlying populations are well-separated, enabling the classification model to assign observations to their respective groups with high accuracy and reliability. In this context, the markers exhibit strong discriminatory power, and the overlap between the two distributions is minimal, resulting in optimal classification conditions. For the HEROC curve, samples observations were generated from and , whereas for the HRROC curve samples were drawn from and In both cases, observations were simulated at various sample sizes m = n = {25, 50, 100, 300, 500, 1000}. To stimulate the existence of MEs in the data, error-contaminated observations were generated from a normal distribution at different levels to reflect realistic situations where the markers are subject to small instrument variability (mild), notable device or observer inconsistences (moderate), and substantial external noise or environment influence (severe). The errors were assumed independent between populations and are added into the original data to assess the performance of the bias-corrected estimators. The detailed simulation results for both ROC curves are presented in Tables 1 and 2.
| m=n | Bias | MSE | Bias | MSE | ||||
|---|---|---|---|---|---|---|---|---|
| 0.85248 | 0.6 | 25 | 0.51566 | -0.33682 | 0.11345 | 0.84001 | -0.01247 | 0.000156 |
| 50 | 0.51999 | -0.33249 | 0.11055 | 0.84584 | -0.00664 | 0.000044 | ||
| 100 | 0.52578 | -0.32670 | 0.10673 | 0.84631 | -0.00617 | 0.000038 | ||
| 300 | 0.53301 | -0.31947 | 0.10206 | 0.85105 | -0.00143 | 0.000002 | ||
| 500 | 0.53583 | -0.31665 | 0.10027 | 0.85220 | -0.00028 | 0.000000 | ||
| 1000 | 0.54948 | -0.30300 | 0.09181 | 0.85352 | 0.00104 | 0.000001 | ||
| 1.5 | 25 | 0.62599 | -0.22649 | 0.05130 | 0.84117 | -0.01131 | 0.000128 | |
| 50 | 0.63710 | -0.21538 | 0.04639 | 0.84425 | -0.00823 | 0.000068 | ||
| 100 | 0.63903 | -0.21345 | 0.04556 | 0.84539 | -0.00709 | 0.000050 | ||
| 300 | 0.64672 | -0.20576 | 0.04234 | 0.85112 | -0.00136 | 0.000002 | ||
| 500 | 0.64865 | -0.20384 | 0.04155 | 0.85132 | -0.00116 | 0.000001 | ||
| 1000 | 0.65270 | -0.19978 | 0.03991 | 0.85733 | 0.00485 | 0.000024 | ||
| 2.0 | 25 | 0.60931 | -0.24317 | 0.05913 | 0.84416 | -0.00832 | 0.000069 | |
| 50 | 0.62547 | -0.22701 | 0.05153 | 0.85028 | -0.00221 | 0.000005 | ||
| 100 | 0.62621 | -0.22627 | 0.05120 | 0.85040 | -0.00208 | 0.000004 | ||
| 300 | 0.62843 | -0.22405 | 0.05020 | 0.85225 | -0.00024 | 0.000000 | ||
| 500 | 0.63740 | -0.21508 | 0.04626 | 0.85228 | -0.00020 | 0.000000 | ||
| 1000 | 0.64093 | -0.21155 | 0.04476 | 0.85618 | 0.00370 | 0.000014 |
– Actual AUC; – Estimated AUC with ME; – Bias-corrected AUC
| m=n | Bias | MSE | Bias | MSE | ||||
|---|---|---|---|---|---|---|---|---|
| 0.92538 | 0.6 | 25 | 0.64691 | -0.27847 | 0.07754 | 0.91035 | -0.01503 | 0.000226 |
| 50 | 0.65559 | -0.26979 | 0.07279 | 0.91165 | -0.01373 | 0.000189 | ||
| 100 | 0.69730 | -0.22808 | 0.05202 | 0.91269 | -0.01269 | 0.000161 | ||
| 300 | 0.69951 | -0.22586 | 0.05101 | 0.91376 | -0.01162 | 0.000135 | ||
| 500 | 0.70052 | -0.22486 | 0.05056 | 0.92036 | -0.00502 | 0.000025 | ||
| 1000 | 0.70127 | -0.22410 | 0.05022 | 0.92144 | -0.00394 | 0.000016 | ||
| 1.5 | 25 | 0.64470 | -0.28068 | 0.07878 | 0.91373 | -0.01165 | 0.000136 | |
| 50 | 0.71361 | -0.21177 | 0.04485 | 0.91418 | -0.01120 | 0.000125 | ||
| 100 | 0.73247 | -0.19291 | 0.03721 | 0.91985 | -0.00553 | 0.000031 | ||
| 300 | 0.73901 | -0.18636 | 0.03473 | 0.92200 | -0.00338 | 0.000011 | ||
| 500 | 0.74668 | -0.17869 | 0.03193 | 0.92462 | -0.00076 | 0.000001 | ||
| 1000 | 0.77495 | -0.15043 | 0.02263 | 0.92634 | 0.00096 | 0.000001 | ||
| 2.0 | 25 | 0.71808 | -0.20730 | 0.04297 | 0.91120 | -0.01418 | 0.000201 | |
| 50 | 0.72296 | -0.20241 | 0.04097 | 0.91195 | -0.01343 | 0.000180 | ||
| 100 | 0.73599 | -0.18939 | 0.03587 | 0.91413 | -0.01125 | 0.000127 | ||
| 300 | 0.74701 | -0.17837 | 0.03182 | 0.91413 | -0.01124 | 0.000126 | ||
| 500 | 0.75032 | -0.17506 | 0.03065 | 0.92133 | -0.00404 | 0.000016 | ||
| 1000 | 0.75099 | -0.17439 | 0.03041 | 0.92441 | -0.00097 | 0.000001 |
– Actual AUC; - Estimated AUC with ME; – Bias-corrected AUC
The results across both hybrid ROC curves show that MEs lead to a notable downward bias in the estimated AUC, reflecting the loss of diagnostic accuracy. However, the proposed bias-corrected estimators effectively counteracted this distortion, yielding AUC estimates that closely approximated the true values across all sample sizes and contamination levels. For example, in HEROC curve at sample size = 100 and ME level 1.5, the estimated AUC is obtained as 0.63903, whereas the true AUC is 0.85248, demonstrating a significant loss of accuracy due to presence of MEs. To correct this bias, the proposed bias-corrected estimator was applied, yielding AUC =0.84539 close to the true AUC with a minimal MSE of 0.000050. Similar trends were also observed for all the sample sizes, where the contaminated AUC estimates were significantly biased downward and the proposed estimator improved accuracy substantially. A graphical comparison of true and contaminated ROC curves across different sample sizes for both hybrid ROC curves were presented in Figs. 1 and 2. The solid lines represent the actual ROC curves, while the dotted lines denote the contaminated ROC curves of various sample sizes. The plots consistently illustrate that the presence of MEs causes a noticeable downward shift in the ROC curves, indicating reduced classification accuracy. In addition, Figs. 3 and 4 provide a graphical summary of bias patterns across different sample sizes under varying ME levels. These plots show that the proposed bias-corrected estimators exhibit minimal bias across all sample sizes compared to the contaminated ones, thereby demonstrating robustness in approximating the true diagnostic performance. This visual trend highlights the detrimental effect of measurement contamination and underscores the necessity of employing correction strategies to recover the true diagnostic performance.

- The true and contaminated HEROC curves at different sample sizes for the best classification.

- The true and contaminated HRROC curves at different sample sizes for the best classification.

- The bias comparison of contaminated and corrected AUC at different sample sizes for the best classification of HEROC.

- The bias comparison of contaminated and corrected AUC at different sample sizes for the best classification of HRROC.
3.2 Worst classification scenario
This setting represents a challenging diagnostic environment where the distributions of the healthy and diseased populations exhibit substantial overlap. For the HEROC curve, both populations were simulated with the same parameter 2.0, while in the HRROC curve, the parameters were set as 0.9, resulting in identical distributions. As in the best-case scenario, simulations were performed across the same range of sample sizes and error levels. The results are summarized in Tables 3 and 4.
| m=n | Bias | MSE | Bias | MSE | ||||
|---|---|---|---|---|---|---|---|---|
| 0.55686 | 0.6 | 25 | 0.33451 | -0.22235 | 0.04944 | 0.54013 | -0.01673 | 0.000280 |
| 50 | 0.33592 | -0.22094 | 0.04881 | 0.54399 | -0.01288 | 0.000166 | ||
| 100 | 0.33624 | -0.22062 | 0.04867 | 0.54977 | -0.00709 | 0.000050 | ||
| 300 | 0.34553 | -0.21134 | 0.04466 | 0.55004 | -0.00683 | 0.000047 | ||
| 500 | 0.35108 | -0.20578 | 0.04235 | 0.55035 | -0.00651 | 0.000042 | ||
| 1000 | 0.35725 | -0.19961 | 0.03985 | 0.55431 | -0.00255 | 0.000006 | ||
| 1.5 | 25 | 0.31277 | -0.24409 | 0.05958 | 0.54112 | -0.01574 | 0.000248 | |
| 50 | 0.33115 | -0.22571 | 0.05095 | 0.54281 | -0.01405 | 0.000197 | ||
| 100 | 0.33773 | -0.21913 | 0.04802 | 0.55006 | -0.00680 | 0.000046 | ||
| 300 | 0.34068 | -0.21618 | 0.04673 | 0.55020 | -0.00666 | 0.000044 | ||
| 500 | 0.34089 | -0.21597 | 0.04664 | 0.55138 | -0.00548 | 0.000030 | ||
| 1000 | 0.36281 | -0.19405 | 0.03766 | 0.55592 | -0.00094 | 0.000001 | ||
| 2.0 | 25 | 0.31292 | -0.24395 | 0.05951 | 0.54263 | -0.01424 | 0.000203 | |
| 50 | 0.33628 | -0.22058 | 0.04866 | 0.55061 | -0.00625 | 0.000039 | ||
| 100 | 0.34106 | -0.21581 | 0.04657 | 0.55373 | -0.00313 | 0.000010 | ||
| 300 | 0.34725 | -0.20961 | 0.04394 | 0.55482 | -0.00204 | 0.000004 | ||
| 500 | 0.34984 | -0.20702 | 0.04286 | 0.55509 | -0.00177 | 0.000003 | ||
| 1000 | 0.35451 | -0.20235 | 0.04095 | 0.55827 | 0.00141 | 0.000002 |
– Actual AUC; – Estimated AUC with ME; – Bias-corrected AUC
| m=n | Bias | MSE | Bias | MSE | ||||
|---|---|---|---|---|---|---|---|---|
| 0.59643 | 0.6 | 20 | 0.29529 | -0.30114 | 0.09069 | 0.58208 | -0.01435 | 0.000206 |
| 50 | 0.32647 | -0.26996 | 0.07288 | 0.58373 | -0.01271 | 0.000161 | ||
| 100 | 0.33562 | -0.26081 | 0.06802 | 0.58695 | -0.00948 | 0.000090 | ||
| 300 | 0.35293 | -0.24351 | 0.05929 | 0.58828 | -0.00815 | 0.000066 | ||
| 500 | 0.35460 | -0.24184 | 0.05849 | 0.59479 | -0.00164 | 0.000003 | ||
| 1000 | 0.36081 | -0.23562 | 0.05552 | 0.59492 | -0.00151 | 0.000002 | ||
| 1.5 | 20 | 0.28958 | -0.30685 | 0.09416 | 0.58870 | -0.00773 | 0.000060 | |
| 50 | 0.29065 | -0.30578 | 0.09350 | 0.58992 | -0.00652 | 0.000042 | ||
| 100 | 0.29478 | -0.30165 | 0.09099 | 0.59411 | -0.00233 | 0.000005 | ||
| 300 | 0.30173 | -0.29470 | 0.08685 | 0.59534 | -0.00110 | 0.000001 | ||
| 500 | 0.32274 | -0.27370 | 0.07491 | 0.59561 | -0.00083 | 0.000001 | ||
| 1000 | 0.32384 | -0.27259 | 0.07430 | 0.59642 | -0.00001 | 0.000000 | ||
| 2.0 | 20 | 0.31331 | -0.28312 | 0.08016 | 0.58338 | -0.01306 | 0.000170 | |
| 50 | 0.32280 | -0.27363 | 0.07487 | 0.58428 | -0.01215 | 0.000148 | ||
| 100 | 0.33706 | -0.25937 | 0.06727 | 0.58517 | -0.01126 | 0.000127 | ||
| 300 | 0.34114 | -0.25530 | 0.06518 | 0.58679 | -0.00965 | 0.000093 | ||
| 500 | 0.34474 | -0.25169 | 0.06335 | 0.58914 | -0.00730 | 0.000053 | ||
| 1000 | 0.39316 | -0.20327 | 0.04132 | 0.59827 | 0.00184 | 0.000003 |
– Actual AUC; - Estimated AUC with ME; – Bias-corrected AUC
Even under severe overlap, the proposed bias-corrected estimators demonstrated strong performance. While the contaminated AUC estimates were significantly biased downward due to the effect of overlapping distributions and ME, the bias-corrected values consistently exhibited minimal deviation from the true AUC. Notably, the advantage of bias correction became more pronounced with increasing sample size, as reflected in the reduced MSE. Figs. 5 and 6 illustrate how overlapping distributions, when combined with measurement errors, further degrade diagnostic performance and underscore the difficulty of maintaining classification accuracy under such adverse conditions. In particular, the contaminated ROC curves often displayed non-regular or improper shapes, deviating from the expected monotonicity of true ROC curves. This irregularity arises due to high overlap between the H and D populations. Such improper ROC behavior further emphasizes the need for bias correction to ensure reliable diagnostic inference. Figs. 7 and 8 further illustrate the bias trends across different sample sizes, showing that while the contaminated estimators remain heavily biased, the proposed bias-corrected estimators achieve minimal bias, even in worst-case conditions.

- The true and contaminated HEROC curves at different sample sizes for the worst classification.

- The true and contaminated HRROC curves at different sample sizes for the worst classification

- The bias comparison of contaminated and corrected AUC at different sample sizes for the worst classification of HEROC.

- The bias comparison of contaminated and corrected AUC at different sample sizes for the worst classification of HRROC.
Across both hybrid ROC frameworks, the simulation findings confirm that MEs substantially distort the estimated AUC. The proposed bias-corrected estimators not only mitigate this distortion but also maintain high accuracy and efficiency across a range of sample sizes and contamination levels.
4. Real Dataset Analysis
The methodology was applied to several real datasets under both Hybrid ROC curves. The selected datasets span across medical and industrial domains, where MEs are commonly encountered and can adversely affect accuracy. The results corresponding to each distributional setting are presented and discussed in the following subsections.
4.1 HEROC curve under MEs
Here, we considered two datasets, namely, the Gallstone Dataset (GD) and the Stroke Prediction Dataset (SPD). The GD was obtained from the University of California, Irvine (UCI) Machine Learning Repository (Esen et al., 2024), while the SPD was sourced from the Kaggle (Fedesoriano 2022).
Dataset 1: GD
This dataset consists of 319 individuals, including 161 individuals diagnosed with gallstone disease and 158 healthy individuals. The study specifically focused on the feature ‘Serum total cholesterol’, which plays a critical role in detecting the presence of gallstone disease.
Dataset 2: SPD
Comprises data from 5110 individuals, of which 249 had experienced a stroke, and 4861 were classified as non-stroke cases. Among the available clinical variables, the feature average ‘glucose levels’ was selected for analysis, as it serves as a key indicator in assessing stroke risk.
Since real datasets with explicit MEs are difficult to obtain, we simulated error observations at to ensure their presence in the analysis and are incorporated into the respective datasets. This gives us a situation where the variables in the dataset were affected due to the presence of MEs. The distributional adequacy of the contaminated observations was assessed using the Kolmogorov-Smirnov test, and the resulting p-values are reported in Table 5. In our results, the reported p-values were consistently above 0.05, confirming that the error-contaminated observations retained distributional adequacy for the models considered.
| Distribution type | Dataset | p-value | Fit of the distribution |
|---|---|---|---|
| Half-normal (healthy) | GD | 0.43913 | Yes |
| SPD | 0.49481 | ||
| Exponential (diseased) | GD | 0.47108 | Yes |
| SPD | 0.52819 |
The introduction of MEs in both datasets resulted in a notable downward bias in the estimated AUC, thereby degrading classification performance. The corresponding results are summarized in Table 6. For instance, in the GD, the true AUC was observed to be 0.75031, which dropped significantly to 0.58344 upon contamination . However, after applying the proposed approximation, the corrected AUC closely approximated the true value, with substantially reduced bias and MSE. A similar pattern was observed in the Stroke dataset, further confirming the effectiveness of the proposed methodology in mitigating the impact of measurement errors. To validate the statistical assessments of the contaminated and corrected AUCs, paired bootstrap resampling with 1000 replications was performed, and the corresponding confidence intervals are also reported in Table 6. For the GD, the improvement in discriminatory ability was = 0.1587 (95% CI: 0.1204, 0.1946), while, for the stroke dataset, the improvement was even more pronounced with =0.3568 (95% CI:0.3117, 0.4029). In both cases, the confidence intervals exclude zero, confirming that the improvements are statistically significant.
| Datasets | Bias | MSE | Bias | MSE | (95% CI) | |||
|---|---|---|---|---|---|---|---|---|
| GD | 0.75031 | 0.58344 | -0.16687 | 0.02785 | 0.74213 | -0.00818 | 0.000067 | 0.1587 (0.1204, 0.1946) |
| SPD | 0.67317 | 0.31210 | -0.36107 | 0.13037 | 0.66887 | -0.00430 | 0.000018 | 0.3568 (0.3117, 0.4029) |
– Actual AUC; – Estimated AUC with ME; – Bias-corrected AUC; - Difference in AUC with 95% CI
Furthermore, the visual representation of the true and contaminated ROC curves for both datasets are depicted in Fig. 9. It is clearly seen that the contaminated ROC curves deviate from the original, indicating the poor performance of the classifiers due to the measurement distortion.

- True and contaminated HEROC curves for the real datasets.
4.2 HRROC curve under MEs
Here also, two datasets have been considered to examine the proposed methodology under the half-normal & rayleigh distribution framework. The datasets are: the Predictive maintenance dataset (PMD) (Agarwal 2022) and the weather forecast dataset (WFD) (Ahmad 2022), both obtained from Kaggle.
Dataset 1: PMD
Consists of 1000 samples, each representing operational time intervals of industrial machinery. Among these, 562 instances were labeled as failure events, while 438 represented non-failure states. The analysis focused on the attribute sensor-based vibration reading, which plays a vital role in detecting early signs of mechanical failures in rotating equipment and is highly susceptible to measurement noise in real world monitoring systems.
Dataset 2: WFD
Consists of approximately 2,500 daily weather observations. Each record includes key meteorological variables such as temperature, humidity, pressure, wind speed, and a binary label indicating rainfall occurrence. For this study, daily average temperature was selected as the continuous variable due to its known sensitivity to sensor noise and critical role in precipitation forecasting.
As in the case of the HEROC curve, MEs were incorporated into each dataset at same measurement level 0.6. To check the goodness of fit of the contaminated data, we applied the Kolmogorov-Smirnov test for each dataset, and the obtained p-values are reported in Table 7. Similar to the previous case, all the obtained p-values were greater than 0.05, indicating that the contaminated observations remained consistent with the assumed Half-Normal and Rayleigh distributions. The corresponding AUC estimates, both contaminated and bias-corrected, are summarized in Table 8.
| Distribution type | Dataset | p-value | Fit of the distribution |
|---|---|---|---|
| Half-Normal (Healthy) | PMD | 0.36720 | Yes |
| WFD | 0.50245 | ||
| Rayleigh (Diseased) | PMD | 0.44352 | Yes |
| WFD | 0.47682 |
| Datasets | Bias | MSE | Bias | MSE | (95% CI) | |||
|---|---|---|---|---|---|---|---|---|
| PMD | 0.72310 | 0.41002 | -0.31308 | 0.09802 | 0.71002 | -0.01308 | 0.000171 | 0.3000 (0.2917, 0.3083) |
| WFD | 0.59042 | 0.34012 | -0.25030 | 0.06265 | 0.58121 | -0.00921 | 0.000085 | 0.2411 (0.2339, 0.2483) |
– Actual AUC; - Estimated AUC with ME; – Bias-corrected AUC; - Difference in AUC with 95% CI
Here also, similar pattern has been identified. Across both the datasets, the bias-corrected AUCs were notably closer to the true AUC values. From the bootstrap confidence intervals, the results indicate that the improvements in AUC were statistically significant for both datasets. Fig. 10 illustrates the HRROC curves for the two datasets, clearly showing the impact of MEs on classification performance. The contaminated ROC curves exhibit a visible deviation from the true curves, indicating a loss of discriminative ability.

- True and contaminated HRROC curves for real datasets.
The findings across both the HE and HR ROC frameworks clearly indicate that the presence of MEs compromises the accuracy of diagnostic evaluation, resulting in a downward bias in the estimated AUC. The proposed bias-corrected estimators effectively address this issue, producing AUC estimates that are substantially closer to the true values, with notably reduced bias and MSE.
5. Conclusions
This study addresses a crucial yet often overlooked challenge in ROC analysis—the distortion of diagnostic accuracy due to MEs, particularly within hybrid distributional settings. While traditional AUC estimation techniques assume symmetric or identical distributions for healthy and diseased populations, such assumptions are frequently violated in real-world data, where diagnostic markers often exhibit non-normal characteristics. To overcome this limitation, we developed a generalized bias-corrected approach for AUC estimation under hybrid ROC curves involving Half-Normal & Exponential and Half-Normal & Rayleigh distributions in the presence of MEs.
Extensive simulation studies were conducted to assess the performance of the proposed estimators under both ideal (well-separated) and challenging (highly overlapping) classification scenarios. The results consistently indicate that AUC estimates derived from contaminated data are biased downward. In contrast, the bias-corrected estimators demonstrate high accuracy, exhibiting minimal bias and MSE across various levels of MEs. Furthermore, we applied it to multiple real datasets and obtained the results. In all cases, the corrected estimators outperformed their uncorrected counterparts, yielding AUC values that closely approximated the true diagnostic accuracy and thereby demonstrating the robustness of the proposed methodology.
In conclusion, the proposed framework offers a reliable and flexible solution for ROC analysis under MEs. It enhances diagnostic reliability and provides a strong foundation for future extensions in ROC methodology, especially in settings where distributional asymmetry and data contamination are prevalent.
CRediT authorship contribution statement
Danisiri Tanuja: Conceptualization, methodology, software, formal analysis, investigation, data curation, visualization, writing – original draft preparation. Siva G: Conceptualization, supervision, validation, writing – review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence the work reported in this paper.
Declaration of generative AI and AI-assisted technologies in the writing process
The authors confirm that there was no use of artificial intelligence (AI)-assisted technology for assisting in the writing or editing of the manuscript and no images were manipulated using AI.
References
- Agarwal, H. 2022“Predictive maintenance dataset”. Kaggle. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
- Ahmad, Z., 2022“Weather forecast dataset”. Kaggle. https://www.kaggle.com/datasets/zeeshier/weather-forecast-dataset
- “Interface between the ratio β with area under the ROC curve and kullback-leibler divergence under the combination of half normal and rayleigh distributions.”. Am J Biostatistics. 2015;5:69-77. https://doi.org/10.3844/amjbsp.2015.69.77
- [Google Scholar]
- “Estimation of confidence intervals of a GHROC curve in the presence of scale and shape parameters.”. Res J Math Stat Sci. 2015;3:4-11. https://www.isca.me/MATH_SCI/Archive/v3/i10/2.ISCA-RJMSS-2015-042.pdf
- [Google Scholar]
- “Confidence interval estimation of an ROC curve: an application of generalized half normal and Weibull distributions.”. J Probability Statistics. 2015;1:1-8. https://doi.org/10.1155/2015/934362
- [Google Scholar]
- “An anthology of parametric ROC models” research and reviews:. J Statistics. 2016;5:32-46. https://sciencejournals.stmjournals.in/index.php/RRJoST/article/view/3577
- [Google Scholar]
- The hybrid ROC (HROC) curve and its divergence measures for binary classification. IJSMR. 2015;4:94-102. https://doi.org/10.6000/1929-6029.2015.04.01.11
- [Google Scholar]
- The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol. 1975;12:387-415. https://doi.org/10.1016/0022-2496(75)90001-2
- [Google Scholar]
- An application of lomax distributions in receiver operating characteristic (ROC) curve analysis. Commun Stat-Theory Methods. 1993;22:1681-1687. https://doi.org/10.1080/03610929308831110
- [Google Scholar]
- Parametric approach to measurement errors in receiver operating characteristic studies. In: Lifetime data: Models in reliability and survival analysis Lifetime data: Models in reliability and survival analysis. boston, mA: Springer US; p. :71-75. https://doi.org/10.1007/978-1-4757-5654-8_11
- [Google Scholar]
- Receiver operating characteristic studies and measurement errors. Biometrics. 1997;53:823-837. https://doi.org/10.2307/2533545
- [Google Scholar]
- Design and analysis of reliability studies: The statistical evaluation of measurement errors. Edward Arnold Publishers; 1989.
- Signal detection theory and ROC-analysis. Academic press; 1975.
- An exponential model used for optimal threshold selection on ROC curves. Med Decis Making. 1988;8:120-131. https://doi.org/10.1177/0272989X8800800208
- [Google Scholar]
- “Gallstone”. UCI machine learning repository. 2024 https://doi.org/10.1097/md.0000000000037258
- [Google Scholar]
- The effect of random measurement error on receiver operating characteristic (ROC) curves. Stat Med. 2000;19:61-70. https://doi.org/10.1002/(sici)1097-0258(20000115)19:1<61::aid-sim297>3.0.co;2-a
- [Google Scholar]
- Fedesoriano. 2022“Stroke prediction dataset”. Kaggle. https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset
- Estimating the AUC of the MROC curve in the presence of measurement errors. Commun Stat-Theory Methods. 2022;29:533-545. https://doi.org/10.29220/csam.2022.29.5.533
- [Google Scholar]
- The Bi-Gamma ROC curve in a straightforward manner. J Basic Appl Sci. 2012;8:309-314. https://doi.org/10.6000/1927-5129.2012.08.02.09
- [Google Scholar]
- “SIMEX approaches to measurement error in ROC studies.” . Commun Stat-Theory Methods. 2000;29:2473-2491. https://doi.org/10.1080/03610920008832617
- [Google Scholar]
- Measuring the effectiveness of diagnostic markers in the presence of measurement error through the use of ROC curves. Stat Med. 2000;19:2115-2129. https://doi.org/10.1002/1097-0258(20000830)19:16<2115::aid-sim529>3.0.co;2-m
- [Google Scholar]
- Principles of mathematical analysis (3rd edition). Newyork: McGraw-Hill; 1976.
- Statistical inference for the area under the receiver operating characteristic curve in the presence of random measurement error. Am J Epidemiol. 2001;154:174-179. https://doi.org/10.1093/aje/154.2.174
- [Google Scholar]
- “Estimating the AUC of mixture ROC curve in the presence of measurement errors”. Stat Appl. 2022;21:41-49. https://ssca.org.in/media/5_SA110122021_R1_SA_16022022_Vishnu_Vardhan_Mix_ME_Univariate_Finally_Final.pdf
- [Google Scholar]
- Estimating the AUC of mixture MROC curve in the presence of measurement errors. MAS. 2023;18:237-244. https://doi.org/10.3233/mas-231432
- [Google Scholar]
- A flexible method for diagnostic accuracy with biomarker measurement error. Math (Basel). 2023;11:549. https://doi.org/10.3390/math11030549
- [Google Scholar]
