Translate this page into:
Zero-inflated and hurdle models with an application to the number of involved axillary lymph nodes in primary breast cancer
⁎Corresponding author at: Charité – Universitätsmedizin Berlin, Institute of Public Health, Charitéplatz 1, 10117 Berlin, Germany. florian.fischer1@charite.de (Florian Fischer)
-
Received: ,
Accepted: ,
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.
Peer review under responsibility of King Saud University.
Abstract
Objectives
This study aims to explore factors influencing the number of axillary lymph nodes in women diagnosed with primary breast cancer by choosing an efficient model to assess excess of zeros and overdispersion presented in the study population.
Methods
It is based on a retrospective analysis of hospital records among 5196 female breast cancer patients in Pakistan. Zero-inflated and hurdle modelling techniques are used to assess the association between under-study factors and the number of involved lymph nodes in breast cancer patients. Count data models including Poisson and negative binomial, zero-inflated models (zero-inflated Poisson and zero-inflated negative binomial), and hurdle models (hurdle Poisson and hurdle negative binomial) were applied. Performance evaluation of models was compared based on AIC, BIC, and zero counts capturing.
Results
The zero-inflated negative binomial model provided an acceptable fit. Findings indicate women who had a larger tumor in size suffered from the greater number of axillary involved lymph nodes from high-risk patients' group, also tumor grades II and III contributed to higher numbers of lymph nodes. Women’s ages do not have any significant influence on nodal status.
Conclusions
Our analysis showed that the zero-inflated negative binomial is the best model for predicting and describing the number of involved nodes in primary breast cancer when overdispersion arises due to a large number of patients with no lymph node involvement. This is important for accurate prediction both for therapy and prognosis of breast cancer patients.
Keywords
Count data
Zero-inflated
Hurdle models
Information measures
Breast cancer
Nodal status
- AIC
-
Akaike Information Criterion
- BIC
-
Bayesian Information Criterion
- ER
-
Estrogen receptor
- Her2
-
Human epidermal growth factor receptor 2
- HNB
-
Hurdle negative binomial
- HP
-
Hurdle Poisson
- NB
-
Negative binomial
- PR
-
Progesterone receptor
- ZINB
-
Zero-inflated negative binomial
- ZIP
-
Zero-inflated Poisson
Abbreviations
1 Introduction
Count data usually occur in all disciplines, one approach to model such data is logistic regression after converting count into binary values. Such dichotomization conversion approach is suffered from the loss of information (Suissa and Blais, 1995). As a result, Poisson becomes the most adaptive regression model for analyzing count response data (Consul and Famoye, 1992), without dichotomization. The major drawback of Poisson distribution is the limitation of equal mean and variance, which cannot be fulfilled in many real-world scenarios. If neglected the assumption of equal mean and variance, Poisson regression produces biased estimates and misleading results (Winkelmann and Zimmermann, 1995), researchers have recommended the application of negative binomial distribution to relax this constraint; negative binomial distribution accounts for over-dispersion in count data by an additional parameter (Hilbe, 2001). When overdispersion occurs due to a large number of zeros, analyzing such data using conventional count models (Poisson and negative binomial) is inappropriate. Zero-inflated models have proven their usefulness in this regard, by modelling the count response variable as a mixture of direct mass at the excess of zeros and count components. The zero-inflated Poisson (ZIP) model (Lambert, 1992; Böhning et al., 1999; Hall, 2000; Lee et al., 2001) is the most applied one in the literature of excess zeros count data. It assumes that the count component is displayed by the Poisson distribution. If count response data exhibit high variability due to excess of zeros and overdispersion, a negative binomial distribution is assumed to fit such data under mixture modelling technique, usually known as zero-inflated negative binomial (ZINB) model (Hall, 2000; Yesilova et al., 2010; Yau et al., 2003). One can also apply hurdle count models, if excess zeros only occur due to sampling variability in the data (Mullahy, 1986), it means hurdle count models consider the source of overdispersion only due to excess of zeros. The hurdle models, originally introduced by Mullahy (1986), are two-component models: the first component is modelled the probability of excess zeros and, the second component accounts for the non-excess zeros and non-zero counts. For the hurdle Poisson (HP) model, it is postulated that the positive count component is modelled via truncated Poisson distribution (Zorn, 1996; Moloas and Lesaffre, 2010). In case of overdispersion and excess zeros; the positive count component is modelled by the truncated negative binomial distribution, which is called the hurdle negative binomial (HNB) model (Rose et al., 2006).
Zero-inflated and hurdle along with other count models have been successfully employed in medical and health researches (Yau et al., 2003; Rose et al., 2006; Gilthorpe et al., 2009; Lee et al., 2006). The number of involved lymph nodes outcome variable falls under the category of count data, such count data exhibit many zero observations when there is no lymph node involvement at the initial diagnosis stage of breast cancer, which has a strong indication to apply zero-inflated and hurdle models. A study described patients may have a large number of negative nodal status at an early stage due to reporting error (Afifi et al., 2007). Furthermore, chances of false-negative recorded nodes cannot be neglected because of the non-dissection of complete axillary lymph nodes (Hur et al., 2002).
The main objective of the research reported in this article is to apply Poisson (P), negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), hurdle Poisson (HP), and hurdle negative binomial (HNB) models to analyze factors that may influence the number of involved nodes particularly in a case where there are chances of a high proportion of no involvement of lymph nodes exist. Poisson, negative binomial, and ZI and hurdle parameterizations for the Poisson and negative binomial distributions were fitted to breast cancer data. The complete modelling methodology is presented and results were compared.
2 Materials and methods
2.1 Study design
This study is based on a retrospective analysis of data from hospital records. Overall, 5196 primary breast cancer women who registered at Mayo hospital Lahore, Pakistan, from 2013 to 2019 are included in the analysis. Information about the age at diagnosis, cancer type, histological grade, estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (Her2), and tumor size are included. The number of involved lymph nodes is taken as the response variable, negative nodal status data indicated by zeros. Complete information of predictors and response were available for all selected cases. Exclusion criteria were incomplete information, patients who had a secondary tumor or had metastasis from other organs to the breast at the time of registration, unknown pathological nodal status (Nx), immeasurable primary tumor (Tx), and Paget’s disease of the nipple without tumor. The association between the understudy factors mentioned above, and the number of involved nodes assessed using zero-inflated and zero-hurdle models. Age at diagnosis, cancer type, tumor size, estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (Her2), tumor grade, all predictors and their forms were chosen with the help of clinicians and oncologists.
In this data, age at diagnosis (in years) is divided into three categories (≤35, 36─45, and ≥ 46). Age is mostly categorized in the literature related to breast cancer, because breast cancer risk factors have different effects on younger and older women. Different authors have defined a variety of age cutoffs due to a variety of reasons (Chollet-Hinton et al., 2016). Cancer type is represented by binary variable (0 = Lobular Carcinoma and other, 1 = Ductal Carcinoma), estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (Her2) are also represented by binary variables (0 = Negative, 1 = Positive), tumor grade is represented categorically (I, II, and III), and tumor size was classified into three categories (≤1.9 cm, 2─4.9 cm, and ≥ 5 cm). Along with the aim of this study, which is the comparison of different count models, it is also of major interest to find out predictors that have a significant impact on the number of involved nodes.
2.2 Modelling framework
The modelling framework of count models are emerged from generalized regression models (Agresti, 1996), while generalized linear models are extended form of the simple linear regression model, which is written as:.
The function
is a linear function of the regressors, it can be denoted by
called the link function, which transforms the expectation of the response variable, and can also be written in log link function as:.
For count data, random error from the equation (1) often follows a Poisson distribution (Zar, 1999), and response variable has nonnegative whole integers, also maximum likelihood techniques are applied to assess the best-fitted model. If the variance of observed is greater than the expected value of , over-dispersion occurs. According to Agresti (1996), if the over-dispersed parameter is calculated and multiplied with estimated standard errors, over-dispersion can be explained. In terms of count models, NB distribution has an extra parameter to account for over-dispersion (Crawley, 1997).
The number of involved nodes is considered to be a count outcome discrete variable, the Poisson regression model is the most common technique employed to model such count data. The probability mass function of the Poisson distribution is given as:.
A solution to this overdispersion can be solved by applying a gamma-Poisson mixture distribution, which is known as the NB distribution. Its probability mass function is given as:.
The mean and variance of the negative binomial distribution are , and is the dispersion parameter, if , negative binomial approaches to the Poisson model (Hilbe, 2001).
2.3 Zero-inflated models
For a better fit, an over-dispersed model that incorporates excess zeros is divided into two types, true and false zeros (Cheung, 2002), via zero-inflated models. True zeros are included in the study, which is part of the natural process, classified into structural and random zeros. False zeros are occurred due to observers’ poor experience, caused due to sampling errors or errors in the experimental design (Tang et al., 2018). In breast cancer, patients’ study excess zeros are assessed because of that group of patients who are “not-at-high risk” during the observation period, or who are “at-risk”. For example, the number of lymph nodes involvement is an important factor in breast cancer prognostic and prevention research, but at a specific time some patients may not involve any lymph nodes, but later the chances of lymph nodes involvement may increase. It is also possible that there would be zero number of involved lymph nodes at a diagnostic stage, but still, that group may be at high risk. The negative nodal status is divided into two groups of patients, one with a very low-risk of involved nodes (structural zeros), and the other with a high-risk of involved nodes (random zeros). Zero-inflated models are used to account for overdispersion due to excess of zeros and unobserved heterogeneity among women diagnosed with breast cancer as a primary disease. Under zero-inflated modelling techniques, true zeros are described through logistic regression and false zeros via the zero-inflated part of the count model.
Zero-inflated models add additional probability mass to the outcome of excess zeros. It yields two states mixture distribution with PMF of ZIP model, which is given by:.
The ZINB distribution is used to account for both over-dispersion and excess of zeros. For extra zeros, it gives weight π, while
weight is assigned to the negative binomial distribution, where a range of
is
to
. The mixture form of ZINB distribution can be written as:.
2.4 Hurdle models
The hurdle count models are two-part models. In the first part, zeros are modelled through logistic regression, and in the second part, the positive counts are explained through a zero-truncated Poisson or negative binomial distribution. HP model addresses, excess zeros in the first part, and truncated positive outcomes in the second part via zero-truncated Poisson distribution. The HP model can be written as:.
The HNB is appropriate to model the data which exhibit over-dispersion due to only excess zeros (Cameron and Trivedi, 1999; Chipeta et al., 2014), so it does not account for unobserved heterogeneity which exists in our breast cancer data. The HNB model is given as:.
2.5 Model assessment and evaluation
Comparison of fitted models is done via measures of fit, which describe the performances of fitted models for a given data set, the good model is selected, based on log-likelihood, the Akaike information criteria (AIC), and the Bayesian information criteria (BIC).
2.6 Ethical approval
The study was approved by the Advanced Studies and Review Board, University of the Punjab, Lahore (Pakistan). After that, the letter of support written by the departmental head was submitted to the selected hospital. Prior to data collection, written consent was obtained from the head of the oncology department and confidentiality was maintained by coding, from data collection to analysis. No written consent was needed because the analysis is based on routinely collected data, for which the hospital has already informed patients.
3 Results
3.1 Sample characteristics and lymph node involvement
The analysis is based on 5196 female patients with breast cancer, of which more than half (54.5%) were invasive ductal carcinoma. The median age at diagnosis was 48 years. The patients were almost equally distributed among the histological grades (I, II, III). About half of the patients were ER-positive (51.2%), PR-positive (51.0%), and Her2 positive (53.0%). The majority of the patients (70.2%) had a tumor size between 2 and 4.9 cm (Table 1).
n (%)
Tumor type
IDC
2,832 (54.5%)
Other
2,364 (45.5%)
Baseline age (inyears)
≤35
736 (14.2%)
36–45
1,092 (21.0%)
≥46
3,368 (64.8%)
Tumor grade
I
2,021 (38.9%)
II
1,618 (31.1%)
III
1,557 (30.0%)
Estrogen receptor (ER)
Positive
2,662 (51.2%)
Negative
2,534 (48.8%)
Progesterone receptor (PR)
Positive
2,652 (51.0%)
Negative
2,544 (49.0%)
Human epidermal growth factor receptor 2 (Her2)
Positive
2,754 (53.0%)
Negative
2,442 (47.0%)
Tumor size (in cm)
≤1.9
963 (18.5%)
2–4.9
3,650 (70.2%)
≥5
583 (11.2%)
Fig. 1 shows that a large proportion of individuals, i.e. overall, 2406 breast cancer patients (46,3%) had no lymph node involved at the diagnostic stage. There was overwhelming evidence of over-dispersion, which was confirmed by the presence of excess zeros (Fig. 1).Frequency of number of nodes.
3.2 Model comparison
The comparison of models is presented in Table 2, using the values from the AIC and BIC for assessment basis. Although the zero-inflated negative binomial (ZINB) has a superiority over the hurdle negative binomial in terms of small AIC and BIC (AIC = 16,559, BIC = 16,710), in terms of zero-capturing (2,406) the hurdle negative binomial (HNB) model showed good performance (AIC = 16,587, BIC = 16,737). The difference between results of (AIC and BIC values) ZINB and HNB models is greater than 10, so the best model to fit the understudy data is ZINB, as it has the lowest AIC and BIC values.
Model selection criterion
Poisson
NB
ZIP
ZINB
Hurdle P
Hurdle NB
df (degree of freedom)
11
12
22
23
22
23
Log-likelihood
−10378
−9375
−8307
−8257
−8318
−8271
AIC
20,778
18,774
16,658
16,559
16,680
16,587
BIC
20,849
18,851
16,803
16,710
16,824
16,737
The Poisson count model was not appropriate for this data set, because it only captured 1,262 numbers of zeros, same is with the NB model which captured 1,335 zeros out of a total 2,406. ZIP and ZINB were much better in capturing the zero counts 2,395 and 2,393 respectively. The best models to capture zeros were HP and HNB, both captured 2,406 zeros which were equal to the observed number of zeros (Table 3).
Observed
Poisson
NB
ZIP
ZINB
Hurdle P
Hurdle NB
2406
1262
1607
2395
2393
2406
2406
Also, it is important to consider that all patients were at risk of nodes involvement, so due to sampling zeros, inflated models are technically suitable to predict nodes involvement frequency among women diagnosed with primary breast cancer. It is important to be noted that the ZIP and ZHP models account for overdispersion due to excess zeros, but if overdispersion exists due to unobserved heterogeneity or progressive dependency in nodal involvement data, ZINB and HNB models give a better fit.
After a comparison of models, the ZINB model was used as the best-fitted model to count lymph nodes data in primary breast cancer patients and determination of factors that contributed to involved lymph node status.
3.3 Modelling and interpreting main effects
The final model ZINB, accounts for excess zeros count response data, having a mean number of involved nodes ( ), and variance .
Table 4 provides the estimates of regression coefficients corresponding to various factors for the ZINB model with
level of significance. The NB (count) part of the ZINB model exhibits the risk of a greater number of lymph nodes, given that women are in a high-risk group. It is noted that patients of tumor grade II
and III
had a higher risk of having more involved lymph nodes as compared to grade I patients. ER-negative
, and PR-negative
patients had a lower risk of having a greater number of nodes than ER and PR-positive patients. Results show that greater tumor size
and
have a higher likelihood of having a larger number of involved axillary nodes than
Baseline age and Her2 status have not been significantly associated with nodal status.
Parameter
Lymph nodesOR
(95% CI)Zero-inflation portion Lymp nodes
OR (95% CI)
Intercept
1.626 (1.416–1.867)
3.538 (2.527–4.952)
Tumor type
IDC
1
1
Other
1.010 (0.955–1.068)
6.868 (5.632–8.376)
Age (in years)
≤35
1
1
36–45
0.995 (0.926–1.070)
1.089 (0.838–1.417)
≥46
1.020 (0.958–1.085)
1.040 (0.831–1.302)
Tumor grade
1
1
1
11
1.002 (0.994–1.064)
0.374 (0.309–0.454)
111
1.323 (1.248–1.402)
0.453 (0.372–0.550)
Estrogen receptor
Positive
1
1
Negative
0.951 (0.913–0.993)
0.544 (0.460–0.642)
Progesterone receptor
Positive
1
1
Negative
0.897 (0.856–0.939)
0.276 (0.229–0.334)
Her2.neu receptor
Positive
1
1
Negative
0.969 (0.928–1.014)
0.429 (0.363–0.508)
Tumor size (in cm)
≤1.9
1
1
2–4.9
2.068 (1.836–2.329)
0.496 (0.394–0.626)
≥5
5.230 (4.625–5.913)
0.036 (0.022–0.059)
Table 4 also contains ORs from the logistic part of the ZINB model, this part shows the probability of negative nodal status, given that breast cancer patients are in a low-risk group. Women of tumor type other than ductal carcinoma had a greater chance to exist in the negative nodal status group. Patients of tumor grade I had more chances of having no lymph node involvement. ER, PR and Her2-positive women significantly increase the likelihood of not having any number of axillary lymph nodes involvement at the initial stage of breast cancer. Women who had tumor size were more likely to not have any number of positive lymph node than those who had higher tumor size and . Baseline age has no significant impact on the negative nodal status.
4 Discussion
Breast cancer, a commonly diagnosed malignancy in females, represents a major public health issue worldwide (Barnard et al., 2015). Previous studies have shown a large absolute number of incident breast cancer cases in developing countries, in which abnormal growth starts in breast tissues with the risk of spreading to other body parts (Barnard et al., 2015). This malignancy is classified into two major types, ductal and lobular carcinoma. Ductal carcinoma – which most breast cancers belong to – starts in the ducts; lobular carcinoma starts in the milk-producing parts of the breast (lobules). Significant prognostic factors of poor survival are higher age, nodal involvement, higher tumor grade, advanced clinical stage, greater tumor size, and metastasis (Barnard et al., 2015; Gann et al., 1999; Olivotto et al., 1998; Ravdin et al., 1994).
The presence or absence of auxiliary lymph nodes has been recognized as an important predictor of breast cancer risk. Studies have shown node-positive patients had lower survival rates than node-negative ones (Chua et al., 2001; Fisher et al., 1983). Furthermore, it is studied that a higher number of positive lymph node involvements contributes to an increased risk of complications (Chua et al., 2001; Fisher et al., 1983). Many studies show the association between various factors and the progression of breast cancer; all of them highlight the importance of lymph node involvement in breast carcinoma (Harden et al., 2001; Sakorafas et al., 2000). The research applied statistical distributions for involved lymph nodes in breast cancer, this study highlighted the problem of overdispersion due to temporal dependency in axillary involved nodes (Guern and Vinh-Hung, 2008).
Data involving the number of lymph nodes often contains surplus zeros, which indicates overdispersion in the data, therefore such data must be fitted by zero-inflated and zero-hurdle models. To estimate false zeros in the axillary involved lymph node data, it is to be noted that some negative nodal status might be observed among women who were at a “low risk” group of breast cancer and some among women who were at a “high risk” group of breast cancer. It is because not all women possess an equal intensity of breast cancer while having different tumor types, tumor grade, the status of ER, PR, Her2, and tumor size. With this logical consideration, the ZINB model is employed as a final model. The better fit of the ZINB model over the HNB model suggests that overdispersion is due to unobserved heterogeneity among women regarding the intensity of breast cancer and a larger number of negative nodal status as well.
In this study, not only fitted and compared several count models to investigate the number of involved nodes in primary breast cancer patients, but we also explained the significance of applying zero-inflated models in case when there exist both true and false zeros. Results of data analysis recommended the effectiveness of zero-augmented (zero-inflated and hurdle) models as compared to generalized linear (Poisson and negative binomial) models. Due to the excess of zeros and over-dispersion examined in this study, zero-augmented negative binomial models (ZINB and HNB) performed better than zero-augmented Poisson models (ZIP and HP). The ZINB and HNB models are similar in identifying factors associated with the number of involved lymph nodes. The ZINB model has been found to provide the best fit for modelling the involved lymph nodes data as a response variable and patients’ age, tumor type, tumor grade, molecular subtypes, and tumor size as the explanatory variables in primary breast cancer patients. Our model selecting logic is the same as the results presented in the articles (Rose et al., 2006; Baughman, 2007), it is also suggested that model selection should be based on study objectives.
The best model ZINB was used to determine significant factors, which influence the number of involved lymph nodes in breast cancer patients. Women with higher tumor grades (II and III), estrogen and progesterone receptors positive, and higher tumor size are factors contributing to a greater number of positive involved lymph nodes. Age at diagnosis does not have any significant impact on nodal involvement in primary breast cancer.
4.1 Limitations
Some limitations may be noted. First, the use of a single case study may be viewed as a limitation; a simulation study can be conducted to strengthen our conclusions. Second, we were not able to account a longitudinal assessment that may reveal other aspects related to “high risk” and “low risk” groups of understudy data. Apart from these future tasks, this study is trying to fill the statistical modelling gap to analyze patterns of nodal involvement in primary breast cancer patients, using a large data set collected in Pakistan. Mayo hospital Lahore is one of the best governmental hospitals where patients come from all over Pakistan.
4.2 Conclusions
Zero-inflated models assume zeros can be both true or false zeros, such zeros are estimated by binary and count components, while hurdle models (HP and HNB) assume that all zeros are true zeros and all patients belong to the same high-risk group. We believe that our study successfully quantified the “high and low risk” breast cancer patients by incorporating time-independent covariates which are associated with the presence of involved lymph nodes, so the zero-inflated negative binomial model was the best choice. Also, we applied hurdle models, which have successfully demonstrated the advantage of fitting count nodal data, it has two components; a binary logit model for positive counts, and a negative binomial model for truncated below at 1.
In short, the conclusion is, this paper provides the evidence to support that involved node count data at primary breast cancer are rightly skewed with excess zeros, so should be modeled by the zero-augmented negative binomial models. Between ZINB and HB models, the ZINB model is considered to be the best model for describing the number of involved nodes in primary breast cancer patients.
5 Ethical approval and consent to participate
According to the Ethical Guidelines for Epidemiologists (IEF-EGE) and the regulations of the ethics committee located at the Advanced Studies and Review Board, University of the Punjab Lahore (Pakistan), no ethics approval is needed, because the analysis is based on routine data. At data collection, all patients provided written informed consent.
6 Consent for publication
Not applicable.
7 Availability of data and materials
Data is available from corresponding author upon reasonable request.
8 Authors’ contributions
ML conceived the original idea of the study, designed the study, analyzed the data and drafted the manuscript. SK supervised the whole study design. NZ has been responsible for data acquisition. SK and FF revised it critically for important intellectual content. All authors approved the final version of the manuscript.
Funding
The work was supported by the Higher Education Commission Pakistan under grant No. 46-2SS2- 123 awarded to the first author. The funder had no role in study design, in the collection, analysis and interpretation of data, in the writing of the report, and in the decision to submit the article for publication.
Acknowledgements
We thank the Staff of Oncology and Radiology Department, Mayo Hospital, Lahore, who supported in data collection. We also wish to appreciate Dr. Abbas Khokar (MBBS, FCPS), Head of Oncology Department from Mayo Hospital, Lahore, Pakistan, for all the efforts he put to organize patients’ records so systematically.
We acknowledge the support from the German Research Foundation (DFG) and the Open Access Publication Fund of Charité – Universitätsmedizin Berlin.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- Methods for improving regression analysis for skewed continuous or counted responses. Ann. Rev. Public Health.. 2007;28:95-111.
- [CrossRef] [Google Scholar]
- An Introduction to Categorical Data Analysis. New York: John Wiley and Sons; 1996.
- Established breast cancer risk factors and risk of intrinsic tumor subtypes. Biochim. Biophys. Acta.. 2015;1856(1):73-85.
- [CrossRef] [Google Scholar]
- Mixture model framework facilitates understanding of zero-inflated and hurdle models for count data. J. Biopharm. Stat.. 2007;17(5):943-946.
- [CrossRef] [Google Scholar]
- The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J. R. Statis. Soc. A.. 1999;162(2):195-209.
- [Google Scholar]
- Essentials of Count Data Regression (Chapter 15). A Companion to Theoretical Econometrics. Malden: Blackwell Publishing Ltd.; 1999.
- Zero-inflated models for regression analysis of count data: a study of growth and development. Stat. Med.. 2002;21:1461-1469.
- [CrossRef] [Google Scholar]
- Zero adjusted models with applications to analysing helminths count data. BMC Res. Notes.. 2014;7:856.
- [CrossRef] [Google Scholar]
- Breast cancer biologic and etiologic heterogeneity by young age and menopausal status in the Carolina Breast Cancer Study: a case-control study. Breast Cancer Res.. 2016;18(1)
- [CrossRef] [Google Scholar]
- Frequency and predictors of axillary lymph node metastases in invasive breast cancer. A. N. Z. J. Surg.. 2001;71(12):723-728.
- [CrossRef] [Google Scholar]
- Generalized Poisson regression model. Commun. Statist. – Theory Methods.. 1992;2(1):89-109.
- [CrossRef] [Google Scholar]
- GLIM for Ecologists. Oxford: Blackwell Science; 1997.
- Relation of number of positive axillary nodes to the prognosis of patients with primary breast cancer. Cancer.. 1983;52(9):1551-1557.
- [CrossRef] [Google Scholar]
- Factors associated with axillary lymph node metastasis from breast carcinoma descriptive and predictive analyses. Cancer.. 1999;86(8):1511-1518.
- [CrossRef] [Google Scholar]
- Modelling count data with excessive zeros: the need for class prediction in zero-inflated models and the issue of data generation in choosing between zero-inflated and generic mixture models for dental caries data. Stat. Med.. 2009;28(28):3539-3553.
- [CrossRef] [Google Scholar]
- Distribution statistique des ganglions lymphatiques axillaires envahis lors du cancer du sein [Statistical distribution of involved axillary lymph nodes in breast cancer] Bull. Cancer.. 2008;95(4):449-455.
- [CrossRef] [Google Scholar]
- Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics.. 2000;56(4):1030-1039.
- [CrossRef] [Google Scholar]
- Predicting axillary lymph node metastases in patients with T1 infiltrating ductal carcinoma of the breast. Breast.. 2001;10(2):155-159.
- [CrossRef] [Google Scholar]
- Negative Binomial Regression. Cambridge: Cambridge University Press; 2001.
- Modeling clustered count data with excess zeros in health care outcomes research. Health Serv. Outcomes Res. Methodol.. 2002;3:5-20.
- [CrossRef] [Google Scholar]
- Jackman, S., 2008. Classes and methods for R developed in the political science computational laboratory, Stanford University. Stanford, California: Department of Political Science, Stanford University. R package version 0.95; http://CRAN.R-project.org/package=pscl.
- Zero-inflated poisson regression with an application to defects in manufacturing. Technometrics. 1992;34(1):1-14.
- [CrossRef] [Google Scholar]
- Analysis of zero-inflated Poisson data incorporating extent of exposure. Biometrical J.. 2001;43(8):963-975.
- [CrossRef] [Google Scholar]
- Multi-level zero-inflated Poisson regression modeling of correlated count data with excess zeros. Stat. Methods Med. Res.. 2006;15(1):47-61.
- [CrossRef] [Google Scholar]
- Hurdle models for multilevel zero-inflated data via h-likelihood. Stat Med.. 2010;29(30):3294-3310.
- [CrossRef] [Google Scholar]
- Specification and testing of somemodified count data models. J Econometrics.. 1986;33(3):341-365.
- [CrossRef] [Google Scholar]
- Prediction of axillary lymph node involvement of women with invasive breast carcinoma a multivariate analysis. Cancer.. 1998;83(5):948-955.
- [CrossRef] [Google Scholar]
- Akaike’s information criterion in generalized estimating equations. Biometrics.. 2001;57(1):120-125.
- [Google Scholar]
- Prediction of axillary lymph node status in breast cancer patients by use of prognostic indicators. J. Natl. Cancer Inst.. 1994;86(23):1771-1775.
- [CrossRef] [Google Scholar]
- On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. J. Biopharm. Stat.. 2006;16(4):463-481.
- [CrossRef] [Google Scholar]
- Axillary lymph node dissection in breast cancer: current status and controversies, alternative strategies and future perspectives. Acta Oncol.. 2000;39(4):455-466.
- [CrossRef] [Google Scholar]
- Binary regression with continuous outcomes. Stat. Med.. 1995;14(3):247-255.
- [CrossRef] [Google Scholar]
- Untangle the structural and random zeros in statistical modelings. J. Appl. Stat.. 2018;45(9):1714-1733.
- [CrossRef] [Google Scholar]
- Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) Psychol. Methods.. 2012;17(2):228-243.
- [CrossRef] [Google Scholar]
- Recent developments in count data modelling: theory and application. J. Econ. Surv.. 1995;9(1):1-24.
- [CrossRef] [Google Scholar]
- Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros. Biometrical J.. 2003;45(4):437-452.
- [CrossRef] [Google Scholar]
- Modeling insect-egg data with excess zeros using zero-inflated regression models. Hacettepe J. Math. Stat.. 2010;39(2):273-282.
- [Google Scholar]
- Biostatistical Analysis. Upper Saddle River: Prentice-Hall; 1999.
- Evaluating zero-inflated and hurdle Poisson specifications. Midw. Polit. Sci. Assoc. 1996:1-16.
- [Google Scholar]