7.2
CiteScore
3.7
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
7.2
CiteScore
3.7
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
View/Download PDF

Translate this page into:

Original article
04 2022
:34;
101892
doi:
10.1016/j.jksus.2022.101892

Biresponse nonparametric regression model in principal component analysis with truncated spline estimator

Department of Statistics, Faculty of Mathematics and Natural Sciences, Hasanuddin University, Makassar 90245, Indonesia
Department of Mathematics, Faculty of Mathematics and Natural Sciences, Hasanuddin University, Makassar 90245, Indonesia

⁎Corresponding author. annaislamiyati701@gmail.com (Anna Islamiyati)

Disclaimer:
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.

Peer review under responsibility of King Saud University.

Abstract

Objectives

This study aims to model data that contain two correlated responses, multicollinearity in predictors, and has a pattern that does not follow a parametric form.

Methods

We propose the use of principal component analysis of truncated splines in a biresponse model. The use of principal components to overcome correlations between predictors, and biresponse to overcome correlations between responses by involving weighted estimates from the covariance matrix. In the PCA spline contains the optimal knot points which control the accuracy of the regression curve. The knot point chosen is the point which has the smallest GCV value among all knot points. In addition, we also consider the value of MSE in showing the model's ability.

Results

We demonstrated the ability of this method through simulation studies and obtained smaller GCV and MSE values compared to parametric regression and PCA. Furthermore, the data for type 2 diabetes mellitus, obtained two main components with different patterns of change. Based on the analysis, it was found that LDL cholesterol, total cholesterol, and triglycerides had a greater effect on changes in the pattern of fasting blood sugar and HbA1C.

Conclusions

The small errors of the simulation data indicate the accurate capabilities of the biresponse spline PCA model. The diabetes data analysis, it shows that patients need to pay attention to their cholesterol and triglyceride levels within normal limits.

Keywords

Biresponse
Diabetes
Principal component
Spline truncated
1

1 Introduction

At this time, we have entered the era of big data on the number of samples, responses, and predictor variables. What concerns us here is that the larger the data, the greater the likelihood of assumptions for error correlation and multicollinearity in the predictors. One popular statistical approach to addressing this problem is principal component analysis (PCA). Several researchers who have studied PCA include Jolliffe and Cadima (2016) have developed PCA, which can reduce predictor variables through eigenvalues so that the components are mutually independent. The ability of PCA has been demonstrated by Bouwmans and Zahzah (2014) in image data analysis. Ghasemi et al. (2013) have classified the mineral composition of water samples, Vichi and Saporta (2009) have classified economic problems, and Hannachi et al. (2006) on climate issues. All of these PCA studies used a parametric approach that was limited to constructing the major components for a single response.

Another problem that can occur is that there are multicollinearity data that have an irregular pattern or do not follow a parametric pattern so that it is difficult to model it with the PCA parametric regression approach. Therefore, researchers developed nonparametric regression research, including Durand (1993) who has worked on instrumental variables with spline transformations. Wang et al. (2016) used PCA local polynomials and Shiokawa et al. (2018) with the PCA kernel. The use of another estimator by Lavado and Calapez (2011) have developed PCA with M Spline. For the spline estimator, there is a spline that contains a penalty function in its estimation criteria that can be used to overcome multicollinearity, namely spline smoothing by Lestari et al. (2010) and spline penalized by Islamiyati et al. (2020a). However, there is also another spline estimator that does not contain a penalty function, namely the truncated spline which cannot overcome the multicollinearity of the predictor. Therefore, in this article, we are developing a study on spline truncated PCA for two responses.

On a larger response dimension in nonparametric regression studies, Soo and Bates (1996) have developed a multi-response spline estimator using the Generalized Gauss-Newton algorithm. Wang, et al. (2000) have analyzed the bivariate data with the smoothing spline estimator. Furthermore, Chamidah et al. (2012) examined the use of local polynomial estimators in nonparametric regression. Zahra and Mhlawy (2013) made a numerical study on an exponential spline. Khan and Shahna (2019) used a quadratic spline. Tohari and Chamidah (2020) used a negative bi-response binomial regression with a linear local estimator. Furthermore, Islamiyati et al. (2018) developed a penalized spline estimator in the longitudinal biresponse case. However, all these studies have not considered the multicollinearity cases that can occur in large predictor data dimensions. They only consider the correlations that occur in responses that are overcome by weight in the estimation criteria, such as using weight in the variance–covariance matrix.

We demonstrated the capabilities of the method through simulation data and compared it with the parametric regression model approach, PCA, and the nonparametric spline regression model. Next, we applied it to real data, namely data on type 2 diabetes mellitus that we obtained from the Hasanuddin University Teaching Hospital. Islamiyati et al. (2020b) has examined the effect of treatment time on blood sugar through a longitudinal penalized spline. Islamiyati et al. (2020c) examined the pattern of changes in blood sugar based on the diet of diabetic patients through a biresponse approach Islamiyati et al. (2020c); Zahra and Mhlawy (2013). Furthermore, Islamiyati (2022) obtained several segments of changes in blood sugar based on lifestyle factors of diabetic patients. All of them indicate that the blood sugar fits the spline approach because there are changes at certain intervals.

2

2 Spline truncated function in the PCA

Given the pairs of observation data t i 1 , t i 2 , t ip , y 1 . i , y 2 . i , the predictor variable t as many as p and the response variable y as many as two which follow the nonparametric pattern in i = 1 , 2 , , n . If it is assumed that the predictor variables are strongly correlated, then multicollinearity occurs and must first be resolved. In a statistical approach, one method of handling multicollinearity is principal component analysis (PCA) which has been widely used in many applications. Jolliffe and Cadima (2016) explain that PCA reduces a group of predictor variables into a group of new variables as much as predictors called principal component. It is a linear combination of predictor variables in which the number of principal components formed is as many as predictors. The assumption is that the components are orthogonal so that they are not correlated and it is believed that the information provided does not overlap.

It is known that Σ is the variance matrix of the predictor variable t 1 , t 2 , , t p which is used as the basis for selecting the number of main components. If c is the main component, then the equation for each component can be stated as follows:

(1)
c 1 i = γ 11 t 1 i + γ 12 t 2 i + + γ 1 p t pi c 2 i = γ 21 t 1 i + γ 22 t 2 i + + γ 2 p t pi c pi = γ p 1 t 1 i + γ p 2 t 2 i + + γ pp t pi

Eq. (1) can also be expressed in vector form, namely: c 1 = T γ 1 , c 2 = T γ 2 , , c p = T γ p where c 1 , c 2 , , c p is called the principal component 1, 2,…, p and each has a variance of λ 1 , λ 2 , , λ p , T is the predictor matrix and γ is the principal component coefficient vector. The order of the main components is taken based on the large variety so that the largest variance is in the 1st component and the smallest variance is in the p-component with γ 1 = γ 1.1 , γ 1.2 , , γ 1 . p T and γ 1 T γ 1 = 1 . Suppose λ 1 λ 2 λ p is the characteristic root corresponding to the feature vector γ 1 , γ 2 , , γ p of the matrix Σ and γ 1 T γ 1 = 1 for j = 1 , 2 , , p , then c 1 = T γ 1 , c 2 = T γ 2 , , c p = T γ p is the 1st, 2nd,…, pth principal component of t. For data applications, the number of principal components is selected based on the cumulative variance described by the components.

In many multivariate studies, the principal component problem only comes to Eq. (1), which describes the principal components that are formed based on their total variety. However, the problem is different when our data is nonparametric. To model the data, the principal components obtained in Eq. (1) are then connected to the predictors through an estimator function in nonparametric regression, namely the truncated spline.

If the principal component selected is m from p component and is symbolized by c j , j = 1 , 2 , , m , , p for m p , then the principal component function of the truncated spline based on the predictor can be stated as follows:

(2)
c 1 = f 1 t 1 + f 1 t 2 + + f 1 t p + ξ 1 c 2 = f 2 t 1 + f 2 t 2 + + f 2 t p + ξ 2 c m = f m t 1 + f m t 2 + + f m t p + ξ m c p = f p t 1 + f p t 2 + + f p t p + ξ p where c 1 , c 2 , , c m , , c p is called the 1st, 2nd,…, mth,…, pth principal component, f j t 1 , f j t 2 , , f j t p is the spline function in the predictors t1, t2, …tp and ξ 1 , ξ 2 , , ξ m , , ξ p is the error in the spline function truncated by the 1st, 2nd,…, mth,…, pth principal component.

The function of each predictor f j t 1 , f j t 2 , , f j t p in (2) is a vector of the spline function of unknown shape for j = 1 , 2 , , m . It is estimated with a truncated spline in the order q and the point of knots K. The spline function in each predictor for each jth component can be described as follows:

(3)
f j t 1 = u 1 = 0 q 1 β j . u 1 t 1 u 1 + v 1 = 1 d 1 β j . ( q 1 + v 1 ) 1 t 1 - K j . v 1 + q 1 f j t 2 = u 2 = 0 q 2 β j . u 2 t 2 u 2 + v 2 = 1 d 2 β j . ( q 2 + v 2 ) 2 t 2 - K j . v 2 + q 2 f j t p = u p = 0 q p β j . u p t p u p + v p = 1 d p β j . ( q p + v p ) p t p - K j . v p + q p where q is the degree of spline, β is the feature vector that corresponds to the root of the feature, K is the knot point, and v is the number of knot point. The truncated elements are shown as follows: t j - K j . v j + q j = t j - K j . v j ; t j > K j . v j 0 ; t j K j . v j

Eq. (3) can be expressed in vector form, which is as follows: f j t 1 + f j t 2 + + f j t p = X j β j where X is the predictor matrix containing the knots point and β j = β 1 , β 2 , , β p T is the feature vector for each predictor.

Furthermore, the spline function of the first principal component can be stated as follows: c 1 = X 1 β 1 + ξ 1 where β 1 = β 1.1 , β 1.2 , , β 1 . p T .

Furthermore, the spline function of the second main component, up to p, can be stated as follows: c 2 = X 2 β 2 + ξ 2 , , c p = X p β p + ξ p

3

3 Biresponse nonparametric regression model with spline PCA

The biresponse nonparametric regression model on PCA is a nonparametric regression model that contains two response variables (yr) with r = 1 , 2 and several main component variables (cj). Suppose that the number of main components selected is m, then the observation data pair c i 1 , c i 2 , c im , y 1 . i , y 2 . i , with i = 1 , 2 , , n , satisfies the biresponse nonparametric regression model as follows:

(4)
y i = f c i 1 , c i 2 , , c im + ε i , i = 1 , 2 , , n

The model in (4) can be stated as:

(5)
y = f c 1 + f c 2 + + f c m + ε where y is the response vector which contains the 1st response vector and the 2nd response, namely y = y 1 , y 2 T . Vector ε is the random error vector, namely ε = ε 1 , ε 2 T with E ε = 0 and Var ε = V . The vector ε i = ε 1 . i , ε 2 . i T is assumed that:
(6)
E ε 1 , i = E ε 2 , i = 0 , E ε 1 , i 2 = σ 1 , i , 2 E ε 2 , i 2 = σ 2 , i 2
with σ 12 . i = σ 21 . i . The assumption in (6) shows that there is a correlation error between the 1st response with the 2nd response on the same i, but the error is not correlated for every i that is different in the response. Therefore, the model involves the weights obtained from the estimation of the covariance matrix, namely θ ̂ - 1 as follows: θ ̂ = Σ ̂ 1 Σ ̂ 12 Σ ̂ 21 Σ ̂ 2 where Σ ̂ 1 is the estimate of the variance matrix in the 1st responses, Σ ̂ 12 = Σ ̂ 21 is the estimate of the covariance matrix of the 1st and 2nd responses, and Σ ̂ 2 is the estimate of the variance matrix in the 2nd response.

Eq. (5) can also be written in matrix form, namely:

(7)
y = X α + ε

Furthermore, the Eq. (7) as a biresponse nonparametric regression model in PCA spline, it was estimated using weighted least square (WLS). The WLS estimator symbolized by P is as follows: P = ε T θ ̂ - 1 ε

Further obtained:

(8)
α ̂ = X T θ ̂ - 1 X - 1 X T θ ̂ - 1 y

Based on the estimation results of the regression parameters in (8), we get an estimate of the biresponse nonparametric regression model on PCA through a truncated spline estimator as in Eq. (9).

(9)
y ̂ = X α ̂ = X X T θ ̂ - 1 - 1 X - 1 X T θ ̂ - 1 y

4

4 Simulation data

We make different experimental functions on the predictors, namely f t i 1 is in the form of polynomial while f t i 2 and f t i 3 is in the form of trigonometry. The number of subjects tested was n = 10, 30, 50, 100, 150 with correlation between predictors between 0.7 and 0.8. In this study, we choose a positive correlation because it is related to the condition variable to the real data. Simulations are being performed on a single response to demonstrate the ability of the PCA spline to model multicollinearity nonparametric data. The nonparametric regression model follows y i = f t i 1 , t i 2 , t i 3 + ε i with i = 1 , 2 , , n . The functions of the 1st predictor, 2nd predictor, and 3rd predictor are indicated by f t i 1 = 0.6 t i 2 + 2 t i 1 + 3 , f t i 2 = 3 × sin 2 π t i 2 , and f t i 3 = 5 + 2 sin π t i 3 .

In this section, we present a data plot for a sample size of n = 150 as shown in Fig. 1 for the 1st, 2nd, and 3rd predictors, respectively. The results of the correlation test between the predictors showed that there was multicollinearity in the data where there was a strong correlation between t1 and t2 of 0.86, t1 and t3 of 0.82, t2 and t3 of 0.71. In this article, the predictors are reduced to independent components via PCA with 3 principal components that correspond to the number of predictors. Based on the value of the cumulative proportion which can also be seen through the scree plot in Fig. 2, we take two principal components to be analyzed because the proportion of variance that can be explained has reached 97%. Furthermore, the predictor variables entered into each component are shown through the loading factor. The first component contains the three predictors, namely t1, t2, and t3, while the second component contains only two predictors, namely t2 and t3. This indicates that the simulation data can be made into two independent components with each influencing predictor. There are two different conditions in the data, one is that there is a group of data that is influenced by all the predictors and there is another group that is only affected by two predictors. However, in the data, it is not only multicollinearity that occurs, but the data also has plots that do not follow a parametric pattern. The use of PCA alone has not been able to solve the problems that occur in the data. Therefore, in this study, we estimated the principal component based on the predictor through the truncated spline. Through the loading factor in PCA, it is shown the factors that significantly influence each main component. Significant predictors were then estimated from PC values through the nonparametric regression model of spline truncated PCA.

Plot of data between predictors and responses.
Fig. 1
Plot of data between predictors and responses.
Scree plot of simulation data.
Fig. 2
Scree plot of simulation data.

Fig. 3a shows the first component contains the significant predictors, t1, t2 and t3 and shows an ascending linear pattern. The second component contains the t2 and t3 predictors shown in Fig. 3b. Furthermore, the two main components were modeled based on significant predictors through truncated spline PCA. We model it using knot points of 1 to 11 knots. Based on the truncated spline PCA, we obtain a spline regression curve with several optimal knot points. There is a different regression curve for each selected knot point, both for the first and second components. Therefore, we need to select the optimal knot point for each major component through the minimum GCV and MSE values as in Table 1 which corresponds to the knot points in Table 2. The minimum GCV and MSE values obtained at c1 for t1, t2, and t3 are 11, 8 and 10 knots, respectively. The minimum GCV and MSE values at c2 for t2 and t3 is 11 knots. These results indicate that the minimum GCV and MSE values is obtained at different knot points for each component. Where the knot point is the starting point for a pattern change in the main component.

The estimation results of the PCA spline regression curve at several knots for (a) the first component and (b) the second component.
Fig. 3
The estimation results of the PCA spline regression curve at several knots for (a) the first component and (b) the second component.
Table 1 GCV and MSE values at each knot point.
GCV MSE
c1 c2 c1 c2
t1 t2 t3 t2 t3 t1 t2 t3 t2 t3
1 knot 3.4219 5.0511 5.9059 4.1088 4.2035 0.0221 0.0331 0.0341 0.0283 0.0295
2 knots 3.0961 4.8394 5.7727 4.0977 4.1028 0.0216 0.0328 0.0340 0.0279 0.0282
3 knots 3.0955 4.8389 5.9038 4.0558 4.0952 0.0215 0.0327 0.0341 0.0277 0.0281
4 knots 3.0963 4.8305 5.5025 4.0181 4.0051 0.0216 0.0325 0.0339 0.0275 0.0274
5 knots 3.0947 4.8301 5.4450 3.9925 3.9762 0.0214 0.0325 0.0338 0.0269 0.0265
6 knots 3.0910 4.8389 5.5012 3.9807 3.9321 0.0211 0.0327 0.0339 0.0268 0.0258
7 knots 3.0946 4.8390 5.1106 3.9228 3.9588 0.0214 0.0328 0.0336 0.0261 0.0261
8 knots 3.0921 4.8202 5.1097 3.9414 3.9579 0.0213 0.0321 0.0335 0.0263 0.0261
9 knots 3.0926 4.8413 5.1022 3.9121 3.9554 0.0214 0.0329 0.0334 0.0258 0.0261
10 knots 3.0911 4.8388 5.0461 3.9304 3.9021 0.0212 0.0327 0.0331 0.0263 0.0258
11 knots 3.0905 4.8381 5.0837 3.8012 3.8107 0.0203 0.0327 0.0333 0.0250 0.0253

Bold numbers indicate the minimum GCV and MSE values.

Table 2 Optimal knot points.
K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11
c1 t1 0.104 0.185 0.266 0.347 0.428 0.509 0.590 0.670 0.751 0.832 0.913
t2 0.241 0.403 0.565 0.728 0.890 1.052 1.214 1.377
t3 0.241 0.375 0.508 0.642 0.775 0.909 1.042 1.176 1.309 1.443
c2 t2 0.200 0.322 0.444 0.565 0.687 0.809 0.931 1.052 1.174 1.295 1.417
t3 0.230 0.352 0.475 0.597 0.719 0.842 0.965 1.087 1.209 1.332 1.454

Furthermore, Fig. 4 shows a box plot of the MSE value which aims to compare the estimated results of the PCA spline with the multiple linear regression model and PCA. The use of the MSE value in the plot is because the model we used as a comparison with the estimated results of the PCA spline is a parametric model. The results in Fig. 4 shows that the PCA spline provides a much smaller MSE value compared to the parametric linear regression and PCA models. Therefore, the Spline PCA nonparametric regression model is very suitable to be used to model data between predictors with responses that do not follow a parametric pattern and correlated variables.

Boxplot MSE of linear regression, PCA and PCA spline.
Fig. 4
Boxplot MSE of linear regression, PCA and PCA spline.

5

5 Application on type 2 diabetes mellitus data

The ability of the PCA spline method to be more accurate in the simulation data in the previous section has provided assurance that this method can be applied to diabetes data. The variables studied were fasting blood sugar and HbA1C as the first and second responses, respectively. The factors of age, weight, height, HDL cholesterol, LDL cholesterol, total cholesterol, and triglycerides were the first, second, third, fourth, fifth, sixth, and seventh predictors, respectively. Data plots of fasting blood sugar levels are shown in Fig. 5 and HbA1C in Fig. 6. All figures show that the data plots between fasting blood sugar factors and HbA1C with LDL cholesterol, HDL cholesterol, total cholesterol, and triglyceride factors do not show a parametric plot. Therefore, we use a truncated spline as one of the estimators for non-parametric patterned data. This estimator is able to explain some pattern segmentation that occurs in the data through knot points. The patient's blood sugar is always changing in a fast time can be interpreted well by spline truncated through the knot point. Next, the correlation r y 1 . y 2 = 0.780 , r t 2 . t 3 = 0.856 and r t 3 . t 4 = 0.586 . This shows a correlation between responses and multicollinearity in the predictor variables. To overcome these two types of correlation, we used a PCA biresponse model with a truncated spline estimator.

The plot of fasting blood sugar (y1) based on predictors.
Fig. 5
The plot of fasting blood sugar (y1) based on predictors.
The plot of HbA1C (y2) data based on predictors.
Fig. 6
The plot of HbA1C (y2) data based on predictors.

Based on the Scree plot, we can take two main components of the seven main components, because it can explain the variance of 85.7%. Furthermore, we found that the significant predictor variables in the first and second components were the same, namely the variables LDL cholesterol, total cholesterol and triglycerides. These results indicate that the two groups of diabetic patients can be modeled and we only need to consider three factors from the seven factors studied, namely LDL cholesterol, total cholesterol, and triglycerides. From the value of the principal component that corresponds to the predictor, we can model the main component through the spline function truncated with a certain knot point.

The estimation results of the PCA spline regression curve between the first and second components with predictors are shown in Fig. 7. Based on Fig. 7a and b, the spline curve estimation of each component looks different from one another. In the cholesterol factor, namely LDL and total cholesterol, there is an upward trend in the first and second components, but the increase is different from one another. For triglyceride factors, there is an uptrend in the first component and a downtrend in the second component. The trend is indicated by optimal knot points where the points are selected based on the GCV value. In this data, we get 3 knot points which give the minimum GCV value, namely for LDL cholesterol factors are 105.5, 173, and 240.5, for total cholesterol factors are 164, 252, 340, and for triglyceride factors are 133, 219, 305.

The estimation results of the truncated spline curve are based on the factors of LDL cholesterol, total cholesterol, and triglycerides on (a) the first component and (b) the second component.
Fig. 7
The estimation results of the truncated spline curve are based on the factors of LDL cholesterol, total cholesterol, and triglycerides on (a) the first component and (b) the second component.

The spline equation is truncated on each component corresponding to the knot point are as follows:

(10)
c 1 = 547.147 + 247.493 t 5 + 323.661 t 5 - 105.5 + + 463.159 t 5 - 173 + + 532.79 t 5 - 240.5 + + 229.765 t 6 + 344.662 t 6 - 164 + + 545.372 t 6 - 252 + + 598.609 t 6 - 340 + + 236.209 t 7 + 351.961 t 7 - 133 + + 430 t 7 - 219 + + 478.244 t 7 - 305 + c 2 = 59.437 + 43.86 t 5 + 92.908 t 5 - 105.5 + + 131.477 t 5 - 173 + - 103.674 t 5 - 240.5 + + 39.642 t 6 + 88.338 t 6 - 164 + - 77.957 t 6 - 252 + + 101.394 t 6 - 340 + - 48.583 t 7 + 43.911 t 7 - 133 + - 2.963 t 7 - 219 + - 76.364 t 7 - 305 +

Eq. (10) corresponds to Fig. 8 which shows the estimation results of the spline truncated curve for each principal component.

The estimation results of the spline truncated PCA curve for biresponse to (a) the fasting blood sugar factor and (b) the HbA1C factor.
Fig. 8
The estimation results of the spline truncated PCA curve for biresponse to (a) the fasting blood sugar factor and (b) the HbA1C factor.

Furthermore, the biresponse PCA spline regression model obtained between the response and the main components of the diabetes data is as follows: y 1 = - 37.905 + 0.184 c 1 + 0.445 c 2 y 2 = 1.885 + 0.005 c 1 + 0.016 c 2

Based on the equation of the principal components in (10), the PCA biresponse spline regression model can be expressed as follows:

(11)
y ̂ 1 = 89.219 + { 45.538 t 5 + 59.553 t 5 - 105.5 + + 85.221 t 5 - 173 + + 90.033 t 5 - 240.5 + + 42.276 t 6 + 63.417 t 6 - 164 + + 100.348 t 6 - 252 + + 110.144 t 6 - 340 + + 43.462 t 7 + 64.761 t 7 - 133 + + 79.12 t 7 - 219 + + 87.996 t 7 - 305 + } + { 19.517 t 5 + 41.344 t 5 - 105.5 + + 58.507 t 5 - 173 + - 46.134 t 5 - 240.5 + + 17.641 t 6 + 39.310 t 6 - 164 + - 34.690 t 6 - 252 + + 45.123 t 6 - 340 + - 21.619 t 7 + 19.540 t 7 - 133 + - 1.318 t 7 - 219 + - 33.982 t 7 - 305 + } y ̂ 2 = 5.571 + { 1.237 t 5 + 1.618 t 5 - 105.5 + + 2.315 t 5 - 173 + + 2.663 t 5 - 240.5 + + 1.148 t 6 + 1.723 t 6 - 164 + + 2.726 t 6 - 252 + + 2.993 t 6 - 340 + + 1.181 t 7 + 1.759 t 7 - 133 + + 2.15 t 7 - 219 + + 2.391 t 7 - 305 + } + { 0.701 t 5 + 1.486 t 5 - 105.5 + + 2.103 t 5 - 173 + - 1.658 t 5 - 240.5 + + 0.634 t 6 + 1.413 t 6 - 164 + - 1.247 t 6 - 252 + + 1.622 t 6 - 340 + - 0.777 t 7 + 0.702 t 7 - 133 + - 0.047 t 7 - 219 + - 1.221 t 7 - 305 + }

The results of the analysis of the biresponse PCA spline model showed a pattern of changes in fasting blood sugar and HbA1C levels, which were mostly influenced by LDL cholesterol, total cholesterol, and triglycerides. In the first component, fasting blood sugar and HbA1C tend to rise along with the increase in cholesterol and triglycerides. However, the increment varies at certain value intervals. Furthermore, for the second component, fasting blood sugar and HbA1C increased and decreased based on the patient's cholesterol and triglyceride levels in certain intervals. This shows that through spline truncated PCA biresponse, we can identify two conditions that can occur in patients with type 2 diabetes mellitus.

6

6 Conclusion

A bi-response truncated PCA spline model was developed for data containing multi-dimensional variables in which responses are correlated as well as predictors. The multicollinearity problem in predictors was solved by using PCA spline. The principal component that is formed is modeled with a predictor through a truncated spline estimator which considers the knot point. The ability of the method has been demonstrated through simulation data and MSE values were obtained that were smaller than the parametric regression and PCA approaches as shown in Fig. 4. This method is also applied to data on type 2 diabetes mellitus patients. Based on the results of the analysis of the biresponse spline PCA model, it was found that there were two main components which indicated that there were two different groups of type 2 diabetes mellitus patients. The two principal components are equally affected by LDL cholesterol, total cholesterol and triglycerides. What distinguishes these components is the pattern of changes in fasting blood sugar and HbA1C based on these three factors. The pattern can be seen in Fig. 8, and then modeled as in Eq. (10). The condition of the type 2 diabetes mellitus patients described in this article shows that the important factors that the patient should pay attention to are the regulation of LDL cholesterol, total cholesterol, and triglycerides. The shape of their influence on the patient is described in terms of two components. Also, the effect of these three factors shows that there are several patterns of change at certain intervals corresponding to the knot point. This result is one of the advantages of this method that cannot be explained through a parametric approach.

This research is sponsored by Deputy of Research and Development Strengthening, Ministry of Research and Technology/National Agency for Research and Innovation, Republic of Indonesia for the Basic Research, and will continue to be developed on both theory and data applications. There is an obligation for us to publish our research results as a form of review of the development of pre-existing methods. For that matter, no potential conflict will occur associated with this article, neither to funding nor to all authors.

Acknowledgement

Many thanks to the Deputy of Research and Development Strengthening, Ministry of Research and Technology/National Agency for Research and Innovation, The Republic of Indonesia for the Basic Research with the research contract No: 7/AMD/E1/KP.PTNBH/2020 dated 11 May 2020.

References

  1. , , . Robust PCA via principal component pursuit: a review fora comparative evaluation in video surveillance. Comput. Vis. Image Underst.. 2014;122:22-34.
    [Google Scholar]
  2. , , , , . Designing of child growth chart based on multi response local polynomial modeling. J. Math. Stat.. 2012;8(3):342-347.
    [Google Scholar]
  3. , . Generalized principal component analysis with respect to instrumental variables via univariate spline transformation. Comput. Stat. Data An.. 1993;16(4):423-440.
    [Google Scholar]
  4. , , , . Linear and nonlinear multivariate classification of Iranian bottled mineral waters according to their elemental content determined by ICP-OES. J. Sci. Islam. Repub. Iran. 2013;24(1):15-22.
    [Google Scholar]
  5. , , , , . In search of simple structures in climate: simplifying EOFs. Int. J. Climatol.. 2006;26(1):7-28.
    [Google Scholar]
  6. , , , . Estimation of covariance matrix on bi-response longitudinal data analysis with penalized spline regression. J. Phys.: Conf. Ser.. 2018;979(012093):1-8.
    [Google Scholar]
  7. , , , , , . Use of two smoothing parameters in penalized spline estimator for bi-variate predictor non-parametric regression model. J. Sci. Islam. Repub. Iran.. 2020;31(2):175-183.
    [Google Scholar]
  8. , , , . Changes in blood glucose 2 hours after meals in Type 2 diabetes patients based on length of treatment at Hasanuddin University Hospital, Indonesia. Rawal Medical J.. 2020;45(1):31-34.
    [Google Scholar]
  9. , , , . Penalized spline estimator with multi smoothing parameters in biresponse multipredictor regression model for longitudinal data. Songklanakarin J. Sci. Technol.. 2020;42(4):897-909.
    [Google Scholar]
  10. Islamiyati, A. 2022. Spline longitudinal multi-response model for the detection of lifestyle-based changes in blood glucose of diabetic patients. Curr. Diabetes Rev. E-pub Ahead of Print, Published on: 14 January, 2022.
  11. , , . Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. A.. 2016;374(2065):20150202.
    [CrossRef] [Google Scholar]
  12. , , . Non-polynomial quadratic spline method for solving fourth order singularly perturbed boundary value problems. J. King Saud Univ. Sci.. 2019;31(4):479-484.
    [Google Scholar]
  13. , , . Principal components analysis with spline optimal transformations for continuous data. IAENG Int. J. Appl. Math.. 2011;41(4):367-375.
    [Google Scholar]
  14. , , , , . Spline smoothing for multi-response nonparametric regression model in case of heteroscedasticity of variance. J. Math. Stat.. 2010;8(3):377-384.
    [Google Scholar]
  15. , , , . Application of kernel principal component analysis and computational machine learning to exploration of metabolites strongly associated with diet. Sci. Rep.. 2018;8
    [Google Scholar]
  16. , , . Multiresponse spline regression. Comput. Stat. Data An. 1996;22(6):619-631.
    [Google Scholar]
  17. , , . Modelling of HIV and AIDS cases in Indonesia using bi-response negative binomial regression approach based on local linear estimator. Ann. Biol. 2020;36(2):215-219.
    [Google Scholar]
  18. , , . Clustering and disjoint principal component analysis. Comput. Stat. Data An.. 2009;53(8):3194-3208.
    [Google Scholar]
  19. , , , , . A robust polynomial principal component analysis for seismic noise attenuation. J. Geophys. Eng.. 2016;13(6):1002-1009.
    [Google Scholar]
  20. , , , . Spline smoothing for bivariate data with application to association between hormones. Stat. Sin.. 2000;10:377-397.
    [Google Scholar]
  21. , , . Numerical solution of two-parameter singularly perturbed boundary value problems via eksponential spline. J. King Saud Univ. Sci.. 2013;25(3):201-208.
    [Google Scholar]
Show Sections