Translate this page into:
Determining confidence interval and asymptotic distribution for parameters of multiresponse semiparametric regression model using smoothing spline estimator
⁎Corresponding author. nur-c@fst.unair.ac.id (Nur Chamidah)
-
Received: ,
Accepted: ,
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.
Abstract
The multiresponse semiparametric regression (MSR) model is a regression model with more than two response variables that are mutually correlated, and its regression function is composed of parametric and nonparametric components. The study objectives are propose a new method for estimating the MSR model using smoothing spline. Also, find the confidence interval (CI) of parameters and the distribution asymptotically of the model parameters estimator. Methods used in this study are reproducing kernel Hilbert space (RKHS) method and a developed penalized weighted least squares (PWLS), and apply pivotal quantity, central limit theorem, and theorems of Cramer-Wold and Slutsky. The results are an 100(1–α)% CI estimate and an asymptotic normal distribution for the parameters of the MSR model. In conclusion, the estimated MSR model is a combined components estimate of parametric and nonparametric which is linear to observation, and CIs of parameters depend on t distribution and estimator of parameters is asymptotically normally distributed. Future time, this study results can be used as theoretical bases to design standard growth charts of the toddlers which can then be used to assess the nutritional status of the toddlers.
Keywords
Asymptotic distribution
Confidence interval
Nutritional status
Semiparametric regression
Smoothing spline
1 Introduction
Regression models are widely applied to analyze functional association between response and predictor variables for prediction and interpretation purposes. Based on regression function shapes, the regression models consist of parametric regression (PR) and nonparametric regression (NR) models. The PR and NR models combination forms semiparametric regression (SR) models. The SR model will form MSR model when it has two or more variables of response that are mutually correlated.
In regression modeling, determining estimators of regression functions such as spline, kernel, PWLS, local linear, local polynomial, is main problem. Some estimators were used to estimate the regression functions, namely splines (Eubank, 1988; Wahba, 1990; Wang et al., 2000; Gu, 2002; Wang, 2011; Chamidah et al., 2019b; 2020a; Fatmawati et al., 2019; Khan & Shahna, 2019; Shahna & Khan, 2019; and Islamiyati et al., 2022;), kernel (Yilmaz et al., 2021), PWLS (Lestari et al., 2020; 2022), local linear (Chamidah et al., 2018; 2019c; 2020b), local polynomial (Chamidah et al., 2019a; Chamidah & Lestari, 2019). Next, both kernel and spline estimators in multiresponse NR (MNR) models and in NR model were discussed by Lestari et al. (2018; 2019) and Osmani et al. (2019), respectively. The estimators mentioned above except for the spline, are very dependent on the neighbors of the target point (bandwidth). Hence, if these estimators are applied to estimate fluctuated data model, we need small bandwidth and this will give the estimation curve too rough. These estimators only examine goodness of fit and not smoothness. Thus, these estimators are less reliable for estimating the fluctuated data models in the sub intervals, because these estimators will provide estimation results with large mean square errors (MSE). This is different from the spline estimator which considers fit and smoothness factors. The ability of the spline estimators to estimate the MNR model for prediction purposes has been discussed by Fatmawati et al. (2019) and Lestari et al. (2020). Although there have been several previous studies discussing these estimators for estimating the regression function, these estimators were applied to NR and MNR models only. This means that previous researchers have not applied these estimators to estimate the uniresponse semiparametric regression (USR) model.
Furthermore, several estimators in USR models have been discussed by researchers namely splines (Gao & Shi, 1997; Wang & Ke, 2009; Diana et al., 2013; Mohaisen & Abdulhussein, 2015; Ramadan et al., 2019; Aydin et al., 2019; Chen & Ren, 2020; Fernandes et al., 2020; Chamidah et al., 2021), kernel (Yilmaz et al., 2021). While, Amini & Roozbeh (2015), Roozbeh (2018), and Roozbeh et al. (2020) estimated the restricted SR models using ridge, and selected optimal shrinkage parameter and kernel smoother bandwidth based on developed generalized cross validation (GCV) criterion. But, these previous researchers discussed estimators in USR models only. Although Wibowo et al. (2012) and Chamidah et al. (2022) estimated the MSR model using penalized spline and truncated spline, respectively, but these researchers have not yet applied smoothing spline to estimate MSR model regression function.
In this study we develop a estimation method for the MSR model, and determine the CI and asymptotic distribution of parameters estimator in the MSR model using smoothing spline. The smoothing spline can handle data with too smooth or too coarse character, and changes at certain sub-intervals. It considers both goodness of fit stated by WLS function and smoothness of model estimation stated by penalty function where balance between them are controlled by smoothing parameters. The smoothing spline becomes less practical when sample size is large because it uses knots. To overcome this practical problem, in this article we therefore provide asymptotic distribution determination of parameters estimator in MSR model.
2 Materials and Methods
Suppose a paired dataset
,
;
;
where relationship between
and
meets the MSR model:
where is value of ith observation for kth response, is unknown function for kth response, is unknown smooth function for kth response contained in Sobolev space , and is random error with mean zero and variance .
The MSR model regression function in (1) is composed of parametric function component namely , and nonparametric function components namely . So, we use WLS method to estimate , and use smoothing spline to estimate by developing PWLS method proposed by Wang et al. (2000). Next, we apply pivotal quantity, central limit theorem, and theorems of Cramer-Wold and Slutsky to obtain CI and distribution asymptotically of the model parameters estimator of MSR model.
3 Results
Following are results of this study including regression function estimation, determination of CI parameter and asymptotic distribution for parameter estimator of MSR model.
3.1 Regression function estimation
We may present the MSR model (1) as follows:
We can rewrite model (2) as follows:
Suppose
is the true WLS estimate of
. Hence, we can express model (3) as follows:
Next, let ; ; ; and where ; ;
; and .
Hence, we can present the MSR model (4) in the following matrix equation:
where , (namely).
The smoothing spline estimator of function
in model (6) can be determined by solving the PWLS:
where ; are weight matrices that are inverse of covariance matrix, , and are smoothing parameters that set the balance between good fit and smoothness of estimation.
Based on Eq.(6), it is easy to show that the covariance matrix of random errors in MSR model (1) is:
where , and .
Solution to optimization PWLS in (7) is obtained by using RKHS method. We can read details of RKHS in Aronszajn (1950), Eubank (1988), Wahba (1990), Gu (2002), and Wang (2011). Firstly, we express the model (4) into general smoothing spline regression model (Wang, 2011):
where ; ; is a function which unknown and smooth contained in Hilbert space ; and is a linear function and bounded.
Suppose we may decompose the Hilbert space
into direct sum of two subspaces
and
such that we have:
where is orthogonal to . Hence, for every function , can be expressed as follows:
; ; .
Next, if
is basis of space
and
is basis of space
, then we can express every function
,
as follows:
where ; ; ; and
Hereinafter, since
is bounded linear function and
,
then we have:
Based on Eq. (12) and Riesz representation theorem (Wang, 2011), there is a representer of such that:
;.
where
notates a product of inner. By considering Eq.(11) and inner-product properties, the following equation is obtained:
Next, by using Eq. (13) for
we get:
Hence, based on Eq. (14) for we have:
where ; ;
; and .
Similarly, we get:
, … .
Therefore, generally, the following expression of
is obtained:
where
;
; A is a matrix with dimension
; c is a vector with dimension
; B is a matrix with dimension
; and d is a vector with dimension
. Generally, based on Eq.(15), the MSR model (6) can be written as follows:
Hereafter, to obtain regression function estimation of MSR model (16), we determine the solution to PWLS (7) which can be presented as follows:
with constraint
,
. Solution to the PWLS optimization is same as the solution to the following PWLS optimization:
where are smoothing parameters. These smoothing parameters set the balance between , as goodness of fit, and as the smoothness. To solve PWLS optimization (17), we decompose the penalty in (17) such that we get:
where . Also, we get the goodness of fit:
Hence, by combining penalty and goodness of fit, we obtain PWLS optimization whose solutions are:
and.
where
. Therefore, the estimated regression function in nonparametric component of MSR model (1) or (6) is:
Based on Eq. (3), we can express Eq. (18) as:
Hence, the sum of squared errors (SSE) is given by:
Next, by minimizing the SSE, we obtain the estimation of parameter
namely
as follows:
where
is a WLS estimator for parameters in parametric component of MSR model (1). Furthermore, by substituting Eq. (22) into Eq. (18), we get estimator of
as follows:
where is smoothing spline estimator for regression function in nonparametric component of MSR model (1).
Finally, by considering MSR model (1) and based on estimation results given by equations (22) and (23), we obtain MSR model estimation based on smoothing spline as follows:
where ; ; and .
Based smoothing spline in MSR model, estimator of given in (23) is called weighted partial smoothing spline estimator of regression function of MSR model (1).
3.2 Determining confidence interval of β
To determine a CI, we use pivotal quantity (Sahoo, 2013). We assume that
in (1) follows Normal distribution that independent and identic with mean zero and variance
or we write
where
is unknown. Next, the
CI for
,
;
is designed such that we have a pivotal quantity of parameter
:
where , , is the element for response of parameters vector , and is diagonal element of . We can use GCV or CV instead of MSE to overcome over fitting (Amini & Roozbeh, 2015; Roozbeh, 2018; and Roozbeh et al., 2020).
Hereinafter, if
, then MSE(λ) in (25) is given by:
where
. Hence, the pivotal quantity (25) can be expressed as follows:
The pivotal quantity (27) follows a -student distribution with degree of freedom.
Furthermore, to determine the 100(1 –
)% CI for
,
, we must take the solution to probability equation:
where
is lower limit of CI and
is upper limit of CI, and
is level of confidence. Next, we substitute Eq.(27) into Eq.(28) so that we get:
We can write Eq.(29) as:
where ; ;
and .
If interval length of CI is shortest then the CI is good. Therefore, we find values of and that results length of CI in (30) is the shortest. If is length of CI in (30), then we have:
Hence, the shortest length of CI for
is determined by taking the solution to optimization:
that meets the condition:
or (32).
where
represents distribution of probability of
and
represents distribution of cumulative probability of
. Next, by applying Lagrange method, it results equation as follows:
where
is constant of Lagrange. Hereafter, the following equations are obtained:
From equations (34) and (35), we obtain the following relationship:
The Eq.(37) implies
or
. Since,
is not satisfied, then the shortest CI can be determined from the
and
values which fulfill:
By using level of confidence the and values which fulfill condition (38) can be obtained from the distribution table.
Consequently, the shortest smoothing spline CI for parameters of MSR model fulfills the following probability:
where value of can be determined from Eq.(38) which is . Hence, we have:
Finally, by using distribution of
-student, the
CIs parameters
,
;
of MSR model (1) are:
where ; ; ; ; and is given in (19). The asymptotic distribution of is Normal as presented by Theorem 2 in section 3.3.
3.3 Determining asymptotic distribution
For investigating asymptotic distribution of , we consider the following lemmas and theorem.
Suppose is matrix presented in (19) and then.
Suppose in (23) is a estimator of smoothing spline function which makes the PWLS (7) is minimum, then for and we have:
. □.
If is matrix as given in (19) and or then
With a little algebraic explanation, we obtain:
Consequently, we have relationship:
For or , Lemma 1 gives:
or . □.
If is matrix presented in (19) and or then
as ,
Here, we apply the Cramer-Wold theorem (Cramer & Wold, 1936; Sen & Singer, 1993). Firstly, a vector is given such that:
where is zero mean independent random variable, namely has mean 0 and variance .
Next, the following assumptions (A1, A2, A3) are given:
(A1). ; ; .
(A2). follow a distribution that independent and identic with mean zero and covariance , and the third absolute moment is finite.
(A3). .
Taking into account the assumptions (A1, A2, A3) and Lemma 1, then for or , converges to . Hence, we have:
Hence, we have relationship::
Since Lemma 2 and the third absolute moment of is finite, the leads to zero. Hence, converges to namely Normally distributed. □.
Based on these lemmas and theorem, estimator is asymptotically normally distributed. More details for this are given in the following theorem.
If is parameters estimator of smoothing spline in parametric component the MSR model (1), and or then.
as
We can express as:
Hence, we obtain:
for ; and , as .
From Theorem 1, we have:
as .
Next, by applying Slutsky theorem (Sen & Singer, 1993), we obtain:
as . □.
4 Discussion
The estimated regression function of MSR model is a combination between the estimated parametric component namely , and the estimated nonparametric functions namely . In this case, is a WLS estimator for parameter contained in component of parametric and is smoothing spline regression function estimator of contained in component of nonparametric of the MSR model. Hence, the smoothing spline MSR model estimation is to be linear to observations where its hessian matrix given by Eq.(24) is also a combination between hessian matrix of parametric component, , and hessian matrix of nonparametric component, .
In interval estimation concept, a good CI is the one with the shortest interval length. Therefore, we determine lower limit value of CI ( ) and upper limit value of CI ( ) such that length of CI is the shortest. The shortest CIs for parameters of MSR model are given in Eq. (39) that depend on -student distribution because variance of population is unknown. Hereafter, for more statistical inference purposes, the asymptotic distribution of MSR model parameters estimator was also undertaken, and finally we obtained that estimator in (22) is asymptotically normally distributed, namely as given in proof of Theorem 2.
5 Conclusion
The estimated MSR model is a composed estimations between component of parametric and component of nonparametric, and its functional relationship is linear to observation. Also, the CIs for parameters ( ; ) follow distribution of -student namely , and the estimator is asymptotically normally distributed. Future time, this study results can be used as theoretical bases to design standard growth charts of the toddlers for assessing the nutritional status of the toddlers.
Acknowledgements
We thank the editors and reviewers who have given constructive corrections, criticisms, and suggestions which can be used to improve the quality of the manuscript.
Funding
We disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the DRPM, Ministry of Education and Culture, Republic of Indonesia through the PDUPT Grant No. 473/UN3.15/PT/2021.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- Optimal partial ridge estimation in restricted semiparametric regression models. J. Multivariate Anal.. 2015;136:26-40.
- [Google Scholar]
- Estimation of semiparametric regression model with right-censored high-dimensional data. J. Stat. Comp. Simul.. 2019;89(6):985-1004.
- [Google Scholar]
- Chamidah, N., Lestari, B., Massaid, A., Saifudin, T., 2020a. Estimating mean arterial pressure affected by stress scores using spline nonparametric regression model approach. Commun. Math. Biol. Neurosci. 2020 (2020) 72, 1–12.
- Estimating of covariance matrix using multi-response local polynomial estimator for designing children growth charts: a theoretically discussion. J. Phys.: Conf. Ser.. 2019;1397(1):012072
- [Google Scholar]
- Standard growth charts for weight of children in East Java using local linear estimator. J. Phys.: Conf. Ser.. 2018;1097(1):012092
- [Google Scholar]
- Improving of classification accuracy of cyst and tumor using local polynomial estimator. TELKOMNIKA. 2019;17(3):1492-1500.
- [Google Scholar]
- Modeling of blood pressures based on stress score using least square spline estimator in bi-response nonparametric regression. Int. J. Innov., Creat. Change. 2019;5(3):1200-1216.
- [Google Scholar]
- Standard growth charts of children in East Java province using a local linear estimator. Int. J. Innov., Creat. Change. 2019;13(1):45-67.
- [Google Scholar]
- Identification the number of mycobacterium tuberculosis based on sputum image using local linear estimator. Bullet. Elect. Eng. Inform.. 2020;9(5):2109-2116.
- [Google Scholar]
- Z-Score standard growth chart design of toddler weight using least square spline semiparametric regression. AIP Conf. Proc.. 2021;2329:060031
- [Google Scholar]
- Consistency and asymptotic normality of estimator for parameters in multiresponse multipredictor semiparametric regression model. Symmetry. 2022;14(2) 336:1-18.
- [Google Scholar]
- Polynomial-based smoothing estimation for a semiparametric accelerated failure time partial linear model. Open Access Library J.. 2020;7:1-15.
- [Google Scholar]
- Smoothing spline in semiparametric additive regression model with Bayesian approach. J. Math. Stats.. 2013;9(3):161-168.
- [Google Scholar]
- Spline Smoothing and Nonparametric Regression. New York: Marcel Dekker; 1988.
- Comparison of smoothing and truncated spline estimators in estimating blood pressures models. Int. J. Innov., Creat. Change. 2019;5(3):1177-1199.
- [Google Scholar]
- Smoothing spline semiparametric regression model assumption using PWLS Approach. Int. J. Adv. Sci. Technol.. 2020;29(4):2059-2070.
- [Google Scholar]
- M-Type smoothing splines in nonparametric and semiparametric regression models. Statistica Sinica. 1997;7(4):1155-1169.
- [Google Scholar]
- Smoothing Spline ANOVA Models. New York: Springer-Verlag; 2002.
- Biresponse nonparametric regression model in principal component analysis with truncated spline estimator. J. King Saud Univ.-Sci.. 2022;34(3) 101892:1-9.
- [Google Scholar]
- Non-polynomial quadratic spline method for solving fourth order singularly perturbed boundary value problems. J. King Saud Univ.–Sci.. 2019;31(4):479-484.
- [Google Scholar]
- Estimation of regression function in multiresponse nonparametric regression model using smoothing spline and kernel estimators. J. Phys.: Conf. Ser.. 2018;1097(1):012091
- [Google Scholar]
- Smoothing parameter selection method for multiresponse nonparametric regression model using spline and kernel estimators approaches. J. Phys.: Conf. Ser.. 2019;1397(1):012064
- [Google Scholar]
- Spline estimator and its asymptotic properties in multiresponse nonparametric regression model. Songklanakarin J. Sci. Technol.. 2020;42(3):533-548.
- [Google Scholar]
- Reproducing kernel Hilbert space approach to multiresponse smoothing spline regression function. Symmetry. 2022;14(11) 2227:1-22.
- [Google Scholar]
- Kernel and regression spline smoothing techniques to estimate coefficient in rates model and its application in psoriasis. Medic. J. Islamic Rep. Iran. 2019;33(90):1-5.
- [Google Scholar]
- Standard growth chart of weight for height to determine wasting nutritional status in East Java based on semiparametric least square spline estimator. IOP Conf. Ser.: Mater. Sci. Eng.. 2019;546:052063
- [Google Scholar]
- Optimal QR-based estimation in partially linear regression models with correlated errors using GCV criterion. Comput. Stats. Data Anal.. 2018;117:45-61.
- [Google Scholar]
- Generalized cross-validation for simultaneous optimization of tuning parameters in ridge regression. Iranian J. Sci. Technol. Transactions A: Science. 2020;44:473-485.
- [Google Scholar]
- Probability and Mathematical Statistics. Lousville: University of Louisville; 2013.
- Large Sample in Statistics: An Introduction with Applications. London: Chapman & Hall; 1993.
- Approximation for higher order boundary value problems using non-polynomial quadratic spline base don off-step points. J. King Saud Univ.–Sci.. 2019;31(4):737-745.
- [Google Scholar]
- Spline Models for Observational Data. Philadelphia: SIAM; 1990.
- Smoothing Splines: Methods and Applications. London: Chapman and Hall; 2011.
- Spline smoothing for bivariate data with applications to association between hormones. Statistica Sinica. 2000;10(2):377-397.
- [Google Scholar]
- Smoothing spline semiparametric nonlinear regression models. J. Comp. Graphical Stats.. 2009;18(1):165-183.
- [Google Scholar]
- On multiresponse semiparametric regression model. J. Math. Stats.. 2012;8(4):489-499.
- [Google Scholar]
- Choice of smoothing parameter for kernel type ridge estimators in semiparametric regression models. REVSTAT-Stat. J.. 2021;19(1):47-69.
- [Google Scholar]
Appendix A
Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jksus.2023.102664.
Appendix A
Supplementary data
The following are the Supplementary data to this article: