7.2
CiteScore
3.7
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
7.2
CiteScore
3.7
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
View/Download PDF

Translate this page into:

Original article
32 (
8
); 3470-3475
doi:
10.1016/j.jksus.2020.10.009

A semi-parametric estimator of the quantile residual life for heavily censored data

Department of Statistics and Operations Research, College of Science, King Saud University, Riyadh, Saudi Arabia

⁎Corresponding author, Second address: Department of Mathematics and Computer Science, Faculty of Science, Suez University, Suez, Egypt. drkayid@ksu.edu.sa (M. Kayid),

Disclaimer:
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.

Peer review under responsibility of King Saud University.

Abstract

The p-quantile residual life function summarizes the lifetime data in a useful and simple concept and in units of time. For uncensored data or when the upper tail of the observations is not censored, this function can be estimated by applying the well-known Kaplan-Meier survival estimator. But, when research terminates in heavy right-censored lifetime data which is the case of many biomedical and survival studies, the p-quantile residual life function is not estimable in this way. In this paper, we propose a novel semi-parametric estimator of the p-quantile residual life function in such cases. It combines the nonparametric Kaplan-Meier survival estimator with an approximated tail model motivated by the extreme value theory. The proposed estimator has been examined by a simulation study and applied to a lifetime data set in the sequel.

Keywords

62N86
Right censorship
Kaplan-Meier estimator
Generalized Pareto model
Extreme value theory
PubMed
1

1 Introduction

In many fields such as epidemiology, biology, medicine, and survival analysis, the researcher's interest is on time to event data, e.g., the survival time of a creature or time of tumor recurrence. The most familiar measure for induction and analyzing such data is the survival function, which for every time t ≥ 0 computes the probability of the event to occur beyond time t. The p-quantile residual life (p-QRL) function, 0 < p < 1, is another relevant measure in this context providing an intuitive meaning. For example, in the case p = 0.5, we have a median residual life which at time t captures the remaining time that half of the survived population at t will experience the event. This fact that unlike the survival function, the p-QRL is expressed in the time units by which the observations are measured makes its interpretation easier.

Besides, in reliability analysis, the p-QRL measure is quite useful for describing the lifetime of manufactured devices. For instance, it is very likely for some devices to fail in the early stages of their work. It is the case especially when the failure rate has a bathtub shape. Then we can consider a burn-in time t₀ that every produced device should pass before releasing to field operation. We can find such t₀ that maximizes the p-QRL function. For more details, we refer to Conboy et al. (2020).

Sometimes, time events may be invisible due to some censoring mechanism which cannot be naturally avoided. During the study, some items may be lost to follow-up before experiencing the event or reaching the end-time of the study. They are said to be right-censored. However, items passing the end-time of the study are referred to be Type I censored.

Given a right-censored data set, the survival function can be estimated using Kaplan-Meier (KM) estimator proposed by Kaplan and Meier (1958). The KM survival plot, which summarizes the possibly censored data graphically, has been vastly used in the aforementioned areas. On the other hand, taking account of right censoring, many authors focused on the problem of estimating the p-QRL function. Among them, we refer to Jeong et al. (2008), Franco-Pereira and de Uña-Álvarez (2013), Jeong and Fine (2013), Lin et al. (2015), Zhang et al. (2015), and Lin et al. (2016).

One drawback in right-censored data corresponds to the case that the upper tail and especially last observations are censored. In this case, the KM survival estimator does not vanish to zero in the support tail. Thus, the inverse of the survival function at small values will not be estimable by the KM estimator. Therefore, we cannot estimate the p-QRL function especially at large values (cf. Franco-Pereira and de Uña-Álvarez (2013)). The results of many types of research consist of highly censored data sets in which major of their large observations are censored. For such data sets, we may not estimate the p-QRL function, namely, qp(t), (for example, the median residual life) even at rather small ts. Therefore, our idea is to provide an approach to estimate the p-QRL function for all t values.

In this paper, we propose a new semi-parametric method for estimating the p-QRL function that overcomes the problem of dealing with high censored lifetimes. It applies the KM estimator of the survival function in a proper threshold time u, and the generalized Pareto distribution (GPD) as an approximated tail model (Coles (2001)). The approximation of the tail model is motivated by the results of the extreme value theory, refer to Castillo et al. (2005) and Coles (2001). Then, the uncensored observations in the tail (which we suppose to be greater than the threshold u) are used to provide the maximum likelihood estimation of the model parameters.

The paper has been organized as follows. Section 2 provides preliminaries and states the problem. In Section 3, the new estimator of the p-QRL function has been proposed. The attributes of this estimator are investigated through a simulation study in Section 4. In Section 5, the results of a research investigation of the effect of some treatments on the colon recurrence time are considered. Then the first quartile residual life (0.25-QRL) and the median residual life functions are estimated. Finally, a conclusion is drawn in Section 6.

2

2 Preliminaries: Non-parametric estimation of p-QRL

Let the random variable T which represents the lifetime of an object follows the survival function S(t)=P(T>t). The p-QRL function is defined as

(1)
qp(t)=S-1(p-S(t))-t,t0,where S-1(α)=inf{x:S(x)α} is the inverse function of S and p-=1-p. Let Ti, i=1,2,,n stand for n independent realizations of T and are right-censored by random variable Ci, i.e., Ti will be observable if TiCi. Let Xi=min{Ti,Ci} and δi=I(TiCi). We count the number of items failed up to or at time t byN-(t)=i=1nNi(t),where Ni(t)=I(Xit, δi=1) and the number of items at risk at t byY-(t)=i=1nYi(t),where Yi(t)=I(Xit). Then the KM estimator of the survival function is given byS^(t)=st(1-ΔN-sY-s),t0,where ΔN-s=N-s-N-s-which represents the number of failures at time s. When P(Xit)=πi(t)=π(t) we haveE(S^(t)-S(t))(1-S(t))1-π(t)n,which shows that the KM estimator is asymptotically unbiased and has a sharper slope for earlier times and/or in the case of greater censoring variable. See Fleming and Harrington (1991) for more details. We can estimate the p-QRL function by
(2)
q^p(t)=S^-1(p-S^(t))-t,tX,
where X be the largest uncensored observation and S^ is the KM survival function estimator. It is asymptotically consistent and under proper normalization converges in distribution to a zero-mean normal process. For more information, refer to Franco-Pereira and de Uña-Álvarez (2013).

When the last observation has not been censored, the estimator (2) is well-defined for all tX. Otherwise, it is not defined for all values t>t where t stands for S^-11p-S^(X). When t>t, p-S^(t)<S^(X) that is p-S^(t) falls below the computable range of the inverse of KM survival function. Fig. 1 explains the issue graphically.

This illustrative plot shows that when large observations have been censored the QRL function qp (t) is not estimable for t>t∗ by the inverse of the KM survival function. Here, p∗ and p0 equal to 1p S^(X∗) and S^(X∗) respectively.
Fig. 1
This illustrative plot shows that when large observations have been censored the QRL function qp (t) is not estimable for t>t by the inverse of the KM survival function. Here, p and p0 equal to 1p S^(X) and S^(X) respectively.

3

3 Semi-parametric estimation of p-QRL

This section aims to propose a semi-parametric estimator of qp(t) for t>t. Note that if 0<p2<p0<p1<1, then

(3)
S-1(p2)=u+Su-1(p2/p1),where S(u)=p1 and Su(t)=P(T-u>t | T>u) shows the survival of the remaining lifetime given survival to time u. Moreover, let u be properly given and p2 be p-S(t), then we estimate p1 and p2 by the KM estimate S^(u) and p-S^(t). So, to estimate S-1(p2) it is sufficient to estimate the survival function Su-1. In the following, we will argue that when u is sufficiently large, Su can be approximated by the GPD. This approximation is motivated by an asymptotic result concerning the weak convergence of the sample maximum to the generalized extreme value distribution (GEV). Therefore, our proposed estimation of qp(t) combines the nonparametric KM estimation of the survival function u with the maximum likelihood estimation of the GPD model.

Consider an arbitrary sequence of i.i.d. random variables T1,T2,… following the same distribution of T and let Mn be maximum of the first n elements of this sequence. Let there exist sequences an>0 and bn of constants such that the normalized random sequence an(Mn-bn) converges weakly to non-degenerate distribution G. Then, G accommodates GEV

(4)
G(x)=exp-1+ξx-μσ-1ξ,with the support {x:1+ξx-μσ>0} where ξ, μR and σ>0. This result ensures that for some sufficiently large but fixed n, the approximationFTnxan+bnG(x),where G is given in (4) and FT is the distribution function of the lifetime T. So FTn shows the distribution function of the largest order statistics of i.i.d. sample of lifetimes with size n. Note that bn which centralizes growing maximum increases with n and makes the whole expression xan+bn to be larger than some threshold u. By taking t=xan+bn we haveFT(t)G1n(an(t-bn)),tu.

On the other hand, the max-stability property of the GEV model states that G1n(an(t-bn)) equals with G(an'(t-bn')). Thus, for tu, FT(t) belongs to this family too, notationally, FT(t)=G(t), tu. In the light of the preceding discussion, we can approximate Su(t) for some threshold u by:Su(t)=S(t+u)S(u)G-(t+u)G-(u)=1-exp-1+ξu+t-μσ-1ξ1-exp-1+ξx-μσ-1ξ.

Since u is sufficiently large, we have1-exp-1+ξu+t-μσ-1ξ1-exp-1+ξx-μσ-1ξ1+ξu+t-μσ-1ξ1+ξx-μσ-1ξ.

Then, by simplifying the right side of this approximation, it follows that

(5)
Su(t)1+ξtσ-1ξ,1+ξtσ0,t0,where σ=σ+ξ(u-μ). This relation shows that Su(t) approximately follows a GPD for sufficiently large u. Depending on the shape parameter ξ, three distinct cases can be described by this model.
  • ξ<0: light tail with finite upper bound -σuξ.

  • ξ=0: exponential tail.

  • ξ>0: heavy tail.

When ξ>1s, the sth moment of GPD is infinite. So, when we are confident of finite mean and/or variance it may be useful to restrict the parameter space by 0<ξ1and/or0.5. However, such restriction is not recommended in a neat semi-parametric framework. For GDP(5), the inverse of the survival function equals with

(6)
Su-1(p)=σξp-ξ-1,0<p<1.

To implement the method for censored data, let Ti, Ci' and T be true survival time, random right censoring time, and a constant time presenting end of the study, respectively. Taking Ci=min(Ci',T), we observe Xi=min(Ti,Ci) and δi=I(TiCi). To estimate qp(t) for t>t, we suggest the following steps.

  • At first we should select a proper value for u which is a trade-off between the accuracy of the tail model approximation and the share of the observations available for estimating the model parameters. Large values of u improve the approximation of the model but limit the volume of the observations for estimation of the parameters. Yet, there is not any standard approach for finding an optimum value for u. Nevertheless, it seems that the 80 percent quantile of the observed (uncensored) times be a proper value for threshold u. That is 20 percent of observed and uncensored events lie between u and T (see Alvarez-Iglesias et al. (2015) for a similar discussion).

  • Construct new sample xi=ti-u for all censored or uncensored observations ti grater than u and let k denote the count of them. Applying this data set, we obtain the maximum likelihood estimation of the parameters of GPD model (5). The likelihood function is

L(ξ,σ)=i=11σ1+ξxiσ-1ξ-1δii=1k1+ξxiσ-1ξ1-δi,where δi equals 1 for uncensored items and zero for censored ones. We can find the maximum likelihood estimation ξ^ and σ^ by maximizing this expression numerically.
  • Now, in the light of Relations (3) and (6), we propose the estimator

q^p(t)=u+S^u-1p-S^(t)S^(u)-t
(7)
=u+σ^ξ^p-S^(t)S^(u)-ξ^-1-t,ttt(n),
where S^ refers to KM estimator of S and σ^ and ξ^ show the maximum likelihood estimations.

Variance (bias) of this estimator is comprised of variation (bias) due u, and S^u-1(p-S^(t)/S^(u)) along with their covariance and seems to be more complicated than to be expressed by a closed expression. In the case of the heavy tail of the true lifetime model which causes that ξ^>0, it’s variance increases with t. Fortunately, we can use the bootstrap method to estimate its bias and variance and in turn approximate confidence intervals. However, simulation studies heuristically imply that bias and variance reduce strongly by sample size.

4

4 Simulation study

To design a simulation framework, we consider gamma, log-normal, and Weibull models along with four censoring schemas. Both right censoring and Type I censoring have been taken in account. Once the model and the censoring scheme determined, r=100 replicates of samples of sizes n=500 and 1000 have been drawn and censored. Then for two values p=0.5 (corresponding to the median residual life function) and p=0.75 the proposed estimator q^p(t) has been computed by each of r replicates. For each replicates the bias is computed by the difference q^p(t)-qp(t). We report their mean and standard deviation as bias and sd in Tables 1–3. In addition, Mq^p that shows the mean of the estimation values q^p(t) has been entered in the tables.

Table 1 Simulation results for the log-normal (1, 0.5).
n p (0.1,0.3) (0.2, 0.2) (0.2, 0.1) (0.1, 0.05)
500 Mq^p 2.5821 3.7561 2.6533 2.3134
0.75 bias 0.5342 1.7089 0.6085 0.2380
sd 1.9196 4.2724 2.4804 2.6121
Mq^p 1.2076 1.2915 1.1338 1.1159
0.50 bias 0.1893 0.2644 0.1124 0.0619
sd 0.5883 0.9855 0.5585 0.3956
1000 Mq^p 2.2879 2.2565 2.3828 2.0106
0.75 bias 0.2397 0.2049 0.3381 −0.0652
sd 1.4182 1.4428 2.0186 0.7124
Mq^p 1.0744 1.1491 1.0932 0.9911
0.50 bias 0.0562 0.0222 0.0718 −0.0337
sd 0.3949 0.4405 0.3540 0.2237
Table 2 Simulation results for the gamma (0.9, 2).
n p (0.1, 0.3) (0.2, 0.2) (0.2, 0.1) (0.1, 0.05)
500 Mq^p 0.5533 0.5014 0.4852 0.4907
0.75 bias 0.1305 0.0832 0.0631 0.0586
sd 0.3290 0.3003 0.2453 0.2078
Mq^p 0.2460 0.2494 0.2369 0.2276
0.50 bias 0.0364 0.0426 0.0276 0.0126
sd 0.1427 0.1129 0.0778 0.0809
1000 Mq^p 0.5003 0.4807 0.4664 0.4528
0.75 bias 0.0773 0.0624 0.0438 0.0204
sd 0.2793 0.1903 0.1600 0.1349
Mq^p 0.2212 0.2128 0.2197 0.2334
0.50 bias 0.0114 0.0059 0.0103 0.0183
sd 0.0657 0.0609 0.0604 0.0608
Table 3 Simulation results for the Weibull (1.2, 3).
n p (0.1,0.3) (0.2,0.2) (0.2,0.1) (0.1,0.05)
500 Mq^p 5.3993 5.9644 4.5763 3.3337
0.75 bias 2.3480 2.8360 1.5028 0.4228
sd 7.5687 8.3373 5.9047 2.0737
Mq^p 1.8733 2.0957 1.8772 1.5307
0.50 bias 0.3073 0.4824 0.2973 0.0435
sd 1.1001 1.1999 0.9687 0.5527
1000 Mq^p 3.6093 3.9250 3.8343 3.1620
0.75 bias 0.5612 0.7976 0.7619 0.2489
sd 2.4968 3.4923 2.9527 1.0536
Mq^p 1.7904 1.9712 1.6409 1.5194
0.50 bias 0.2255 0.3588 0.0618 0.0333
sd 0.6970 0.9163 0.5111 0.3830

To provide censored random samples, some proper combinations of right censoring and Type I censoring have been selected. Let α1 and α2 represent the proportions of Type I censoring and random right censoring respectively. The simulation process starts withdrawing a sample of size n from the true model that is the distribution of T following survival function S. We take the distribution of the right censoring random variable C to be uniform on the interval (0,M). Moreover, according to type I censoring, the observations will be censored if they are greater than T. For fixed values α1 and α2, we should compute M and T through the equations P(C<min{T,T})=α2 and P({T<T<C}{T<C<T})=α1. It is straightforward to show that these equations can be simplified to the system equationsS(T)(M-T)=Mα1,T0S(x)dx=Mα2,which can be solved in terms of M and T by standard numerical methods. Then, we can pursue the procedure according to the bellow steps.

  • Generate a random sample t1,t2,,tn from the true model with the survival function S.

  • Generate a random sample c1,c2,,cn of uniform (0,M) as random censoring times.

  • Compute the observable data xi=min{ti,ci,T} and δi=I(tici,tiT) for i=1,2,,n.

  • Repeat steps 1 to 3 r times.

  • Let t0j be the maximum observed (uncensored) lifetime of the sample in the jth replication, j=1,2,,r. Then, take t0 to be the mean of t0j. Of course, t0 is suitable for applying in the simulation, since it is expected that S^-1(p-S^(t0)) will not be computed by the KM estimator S^.

  • For each replication, compute q^p(t0) introduced by (7).

Results of simulations have been gathered in Tables 1–3. All tables agree on the fact that the bias and sd values show a strong reduction from p=0.75 to 0.5.

5

5 Applications

Moertel et al (1995) reported a data set related to one trial for investigation of the effectiveness of Fluorouracil (5-FU) and Levamisole (Lev) in reducing the recurrence rate of stage B/C colon cancer. The trial involves three treatments for Observation (Obs), Levamisole (Lev), and Levamisole plus 5-FU (Lev + 5-FU). Under right censoring, for every person, both events of recurrence of cancer and death have been recorded. We focus on the recurrence times which near 50 percent of them have been censored. For each of the three treatments and the overall data, the KM survival function has been drawn in Fig. 2, which reveals high censoring rates.

The KM survival plot for three treatments and the overall data. Every censored item is included by a+.
Fig. 2
The KM survival plot for three treatments and the overall data. Every censored item is included by a+.

We are interested in the estimation of the first QRL function and the median residual life function respectively corresponding to p=0.25 and 0.5. As before, let t stand for the minimum of t values where the QRL can not be computed directly by inverting the KM survival function. Values of t computed for these data sets have been gathered in Table 4.

Table 4 Values of t*.
Obs Lev Lev + 5-FU Overall
p 0.25
0.5
871
230
668
191
449
0
636
99

The first quartile residual life function and the median residual life function have been plotted in Figs. 3 and 4, respectively. For t<t these functions have been estimated by (2) and have been plotted by a thinner line.

The first quartile residual life function for three treatments and the overall data.
Fig. 3
The first quartile residual life function for three treatments and the overall data.
The median residual life function for three treatments and the overall data.
Fig. 4
The median residual life function for three treatments and the overall data.

These figures distinguish larger median residual life and first quartile residual life functions for Lev + 5-FU treatment, which are even noticeably above the QRL functions related to overall data. However, for two treatments Obs and Lev, both of these QRL functions are comparable and lie below the QRL functions of the overall group.

6

6 Conclusion

The p-quantile residual life function summarizes the lifetime data in a useful and simple concept and in units of time. For uncensored data or when the upper tails of the observations are not censored, this function can be estimated by applying the well-known Kaplan-Meier survival estimator. However, when research terminates in heavy right-censored lifetime data, which is the case of many biomedical and survival studies, the p-quantile residual life function is not estimable in this way. In the current investigation, we proposed a novel semi-parametric estimator of the p-quantile residual life function in such cases. The proposed estimator has been examined by a simulation study and applied to a real lifetime data set in the sequel.

Funding

This Project was funded by the National Plan for Science, Technology, and Innovation (MAARIFAH), King Abdulaziz City for Science and Technology, Kingdom of Saudi Arabia, Award Number (14-MAT2052-02).

Acknowledgments

The authors are grateful to anonymous three referees for their constructive comments that lead to an improvement in the quality of the paper. The authors would like to extend their sincere appreciation to the strategic technology program of the National Plan for Science, Technology and Innovation in the Kingdom of Saudi Arabia for its funding this project No. 14-MAT2052-02.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. , , , . Summarising censored survival data using the mean residual life function. Stat. Med.. 2015;34:1965-1976.
    [Google Scholar]
  2. , , , . Extreme Value and Related Models with Applications in Engineering and Science. Hoboken, New Jersey: Wiley Series in Probability and Statistics Wiley; .
  3. . An Introduction to Statistical Modeling of Extreme Values. London: Springer; .
  4. , , , . Using business analytics to enhance dynamic capabilities in operations research: a case analysis and research agenda. Eur. J. Operation. Res. Elsevier. 2020;281(3):656-672.
    [Google Scholar]
  5. , . Counting Processes and Survival Analysis. New York: Wiley; .
  6. , . Estimation of a monotone percentile residual life function under random censorship. Biom. J.. 2013;55:52-67.
    [Google Scholar]
  7. , . Nonparametric inference on cause-specific quantile residual life. Biom. J.. 2013;55:68-81.
    [Google Scholar]
  8. , , . Nonparametric inference on median residual life function. Biometrics. 2008;64:157-163.
    [Google Scholar]
  9. , . Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc.. 1958;53:457-481.
    [Google Scholar]
  10. , , . Inference on quantile residual life function under right-censored data. J. Nonparametr. Stat.. 2016;28:617-643.
    [Google Scholar]
  11. , , . Conditional quantile residual lifetime models for right-censored data. Lifetime Data Anal.. 2015;21:75-96.
    [Google Scholar]
  12. , , , , , , , , , , , . Fluorouracil plus Levamisole as an effective adjuvant therapy after resection of stage II colon carcinoma: a final report. Ann. Intern. Med.. 1995;122:321-326.
    [Google Scholar]
  13. , , . Smoothed estimator of quantile residual lifetime for right-censored data. J. Syst. Sci. Complex. 2015;28:1374-1388.
    [Google Scholar]
Show Sections