Translate this page into:
Robust estimators for circular regression models
⁎Corresponding author. a.abuzaid@alazhar.edu.ps (Ali Abuzaid),
-
Received: ,
Accepted: ,
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.
Peer review under responsibility of King Saud University.
Abstract
The problem of robust estimation in circular regression models has not been studied well. This paper considers the JS circular regression model due to its interesting properties and its sensitivity for existence and detection of outliers. We extend the robust estimators such as M-estimation, least-trimmed squares (LTS), and least-median squares (LMS) estimators, which have been successfully used in the linear regression models, to the JS circular regression model.
The robustness of the proposed estimators are studied through its influence function, and via simulation study. The results show that the proposed robust circular M-estimation is effective in estimating circular models' parameters in the presence of vertical outliers. However, circular LTS and LMS are highly robust estimators in case of circular leverage points. An application of the proposed robust circular estimators is illustrated using a real eye data set.
Keywords
Circular regression
Robust estimation
Influential function
Outlier
Bounded-influence estimate
1 Introduction
The applications on circular variables have increased in last two decades. It varied in many fields including biology, meteorology and medicine. Although the first circular regression model backs to Gould, (1969) and various versions of these models have been proposed; the study of outliers and robustness of circular regression models still not well considered. Most of outliers’ detection procedures were derived based on the simple circular regression model (Hussin et al., 2004) by extending the common methods from linear regression (Abuzaid et al., 2008, 2013). Model (Hussin et al., 2004) assumed a linear relationship between the two circular variables, which is a conservative condition, moreover, it is not applicable to be extended for multiple regression settings. Alternatively, Ibrahim (2013) investigated the robustness of the JS model which is proposed by Sarma and Jammalamadaka, (1993) with one independent circular variable based on the least squares estimation (LS), and some outliers’ detection procedures were proposed by Ibrahim et al. (2013). Moreover, Alkasadi et al. (2019) derived an outlier detection procedure for the multiple JS model with two independent circular variables. Recently, Jha and Biswas (2017) have studied the robustness of Kato et al. (2008) circular regression model based on wrapped Cauchy distribution settings by proposing the maximum trimmed cosine estimator.
Robust estimation methods received a great deal of interest to improve estimator performance in linear regression models. These estimators limit the influence of outliers. In this regard, Huber and Lovric (2011), Hampel et al. (2011) and Birkes and Dodge (2011) showed that the robust M-estimation is highly robust to vertical, but leverage point can break them down completely. Several robust alternatives have been investigated in the literature, among those, least median squares (LMS) estimator Rousseeuw (1984) and least trimmed of squares (LTS) introduced in Rousseeuw and Leroy (2005), which are not much affected with leverage points.
This study is proposing M estimator and high breakdown point estimators, LTS and LMS for JS circular regression, to reduce the effect of vertical outliers and leverage point.
The rest of the article is organized as follows: Section 2 reviews the formulation of the JS circular regression model, and its LS parameters estimates. Section 3 formulates the effect of outliers in the JS model. Section 4 proposes the robust M-estimators, studies the influence function for the proposed estimators and introduces bounded influence of the JS circular regression estimators. An extensive simulation study is conducted to study the performance of the proposed robust estimators in Section 5. Section 6 applies the robust estimators to the eye data set.
2 The JS circular regression model
2.1 Model formulation
For any two circular random variables
and
, Sarma and Jammalamadaka (1993) proposed a regression model to predict v for a given u, by considering the conditional expectation of the vector
given u such that
Then, v can be predicted such that
Due to the fact that
and
are periodic functions, thus they are approximated for a suitable degree m (Kufner and Kadlec, 1971), which have the following two observational regression-like models
2.2 Least squares estimation
Let
be a random circular sample of size n. Therefore, the observational Eqs. (4) can be summarized as
The observational Eqs. (4) can be written in matrix form
The least squares estimates turn out to be
These equations can be combined into the following single matrices
The following section explains the effect of outliers on the JS circular regression model.
3 Outliers in the JS circular regression model
Outliers are a common problem in the statistical analysis. It is defined as observations that are very different to the other observations in a set of data. Ibrahim (2013) investigated the robustness of the JS model by simulation study, and concluded that JS model is sensitive for outliers existence, and the presence of outliers has potentially serious effects on LS estimation. Then Ibrahim et al. (2013) proposed a COVRATIO statistic to define outliers in the y-vertical. In this paper, we define two types of outliers, namely outliers in , it is so called (circular vertical outliers), and outliers with respect to and it is so called (circular leverage points). The effect of outliers on LS estimation is introduced by two ways:
1.Circular Vertical Outliers: if
and
are replaced by
and
, respectively, where,
and
, which implies
and
, then the circular regression in (4) can be rewritten as follows:
Thus, , and .
2.Circular Leverage Points: if is replaced by , where , then , and .
The following section derives robust estimators of the JS model parameters.
4 Robust estimation of the JS circular regression parameters
In this section the robust estimation is extended to the JS circular regression model instead of the classical LS estimator defined in (10) to improve estimation precision. We will use and CLMS abbreviations for circular M-estimation, circular least-trimmed squares, and circular least-median squares,respectively, as derived in the following subsections:
4.1 Robust CM-estimation of JS circular regression parameters
In Eq. (9), if , where , has been replaced by , where, F is symmetric, non decreasing function on [0, ), and almost continuously differentiable anywhere, where . Furthermore, F is a function, which is less sensitive to outliers than squares, then it yields an estimating equation, which result is the same idea of M-estimation of a linear regression model as described by Huber and Lovric (2011).
We define CM-estimates on the JS circular regression as follows:
To solve these equations, let the influence curve be
, if this exists then we will have:
Whereas, if
, then the solution becomes the LS estimate. Generally, Biweight or Huber functions have been widely used as
. However, for Biweight function we define the weight matrix
with
then (13) can be reformulated as
Then (13) can be written as:
These equations can be combined into the following single matrices:
A popular possibility of
is to use the Huber’s function as introduced in Huber and Lovric (2011). For a positive real
, Huber introduced the following objective function
So the
is given as
Then the weight function W is given by:
4.2 Influence Function of Circular Regression Estimators
Let T be an estimator of
- type, then the influence function (IF) describes the effect of an infinitesimal contamination at
on the estimator T, and it is defined as:
It worth to remark that, IF of circular LS estimator is an unbounded function in
and
. On the other hand, the influence function of robust CM estimator is given by:
4.3 Bounded-influence circular estimator
The CM estimators are sensitive to circular leverage observations so, we propose a bounded-influence circular estimator named robust CLTS estimator. By ordering the squared residuals and ascendingly:
Then the CLTS circular estimator choose the circular coefficients and which minimize the sum of the smallest of the squared residuals, which is equivalent to find the circular estimates corresponding to the half circular sample having the smallest sum of squares of residuals. As such, breakdown point is 50%. Replacing by , we get a robust CLMS estimator.
The following section investigates the performance of the proposed robust estimators via simulation.
5 Simulation study
5.1 Settings
A simulation study was carried out to investigate the performance of the proposed robust estimators for the JS circular regression model, namely
, and CLMS. Furthermore, to compare these estimators with classical estimator LS. For simplicity, we consider the case when
. Hence, we have the following set of parameters to be estimated:
We consider the set of uncorrelated random errors from the bivariate Normal distribution with mean vector 0 and variances to be (0.03,0.03). The independent circular variable is generated from von Mises distribution with mean and concentration parameter equals 2 i.e. .
For simplicity, we set the true values of and of the JS model to be 0, while , and are obtained by using the standard additive trigonometric polynomial equations and when . For example, and . Then by comparison with Eq. (4), the true values of and are −0.0 4161, −0.09093, 0.09093 and −0.04161 respectively. Similarly, we can also get different sets of true values by choosing different values of a.
We then introduce vertical and leverage outliers into the data such that the percentages of contamination used are c%= 5%, 10%, 20%, 30%, 40% and 50% from the different sample sizes, namely n = 20, 50 and 100.
To investigate the robustness of the estimators against vertical and leverage circular outliers, the following scenarios were considered:
-
No contamination
-
Vertical (outliers in the only)
-
Leverage points (Outliers in some only).
For vertical outliers scenario, the observation at position d, say , is contaminated as follows; , where is the value after contamination and is the degree of contamination in the range . The generated data of and are then fitted by the JS circular regression model to give the estimates of , and .
For leverage point scenario, different percentages of observation at position d, say instead of the original generated data from . The performance of the proposed estimators were then determined by assessing summary of three statistics based on Monte Carlo trials.
The first statistic is the median of standard error (SE) of the six parameters and it is obtained by where is the mean of the estimates which is obtained by, . The second statistic is the median of mean errors of the estimators given by, . Finally, the median of mean of the cosines of the circular residuals .
The simulations were performed by the statistical software R. To run the simulation, the function and lmsreg from library MASS were used for M-estimation, LTS and LMS, respectively.
5.2 Results and discussion
Table 1 shows that the median (MSE) of the LS is relatively smaller than other estimators when the data are uncontaminated, So the LS gave the best estimator.
Estimators
LS
CM
CLTS
CLMS
median MSE
3.0700
3.0996
3.4295
3.4159
median SE
6.8847
6.9177
7.2380
7.2353
median
0.9979
0.9978
0.9879
0.9890
median MSE
3.06375
3.1055
3.3815
3.3915
median SE
2.7494
2.7686
2.8726
2.8765
median
0.9975
0.9976
0.9853
0.9848
median MSE
3.0341
3.0780
3.3166
3.3301
median SE
1.3673
1.3775
1.4220
1.4241
median
0.9975
0.9976
0.9850
0.9837
Table 2 shows the results for contaminated data with vertical outliers, where the MSE for CM was the smallest, and estimated the associated
were larger than others estimators. Thus, we concluded that the robust CM is better than LS. The CLTS and CLMS performance are almost the same.
Estimators
LS
CM
CLTS
CLMS
10% vertical
median MSE
4.1001
3.3024
3.3155
3.3664
median SE
7.9386
7.1251
7.1456
7.1705
median
0.9061
0.9977
0.9899
0.9888
median MSE
4.3319
3.3278
3.2751
3.3469
median SE
3.2712
2.8444
2.8511
2.8591
median
0.9051
0.9975
0.9866
0.9871
median MSE
4.4382
3.2669
3.3229
3.2979
median SE
1.6572
1.4200
1.4233
1.4187
median
0.9027
0.9975
0.9849
0.9866
20% vertical
median MSE
4.4997
3.3534
3.3750
3.5734
median SE
8.2735
7.1689
7.2018
7.4142
median
0.8198
0.9974
0.9914
0.9892
median MSE
5.0477
3.5431
3.3549
3.3567
median SE
3.5196
2.8652
2.8656
2.9596
median
0.8114
0.9975
0.9878
0.9872
median MSE
5.3299
3.3107
3.3394
3.5284
median SE
1.8108
1.4230
1.4278
1.4769
median
0.8109
0.9975
0.9875
0.9857
30% vertical
median MSE
4.8460
3.3397
3.3416
4.1199
median SE
8.5369
7.1550
7.1677
7.9297
median
0.7392
0.9950
0.9936
0.9907
median MSE
5.3271
3.3031
3.3244
4.2309
median SE
3.5931
2.8447
3.2352
2.8531
median
0.7299
0.9971
0.9899
0.9900
median MSE
5.4952
3.2587
3.3031
4.1668
median SE
1.8271
1.4122
1.4211
1.6069
median
0.7214
0.9974
0.9898
0.9881
According to Table 3, CLTS and CLMS perform better than all the other estimators. They estimated models parameters with smallest MSE, but suffer from small values of
when the leverage percentages are increased in the data set. The CM does poorly as worse as LS, and has higher median (MSE) than other robust estimators.
Estimators
LS
CM
CLTS
CLMS
10% leverage
median MSE
3.3900
3.3715
2.7337
2.8885
median SE
7.2103
7.1795
6.3544
6.6290
median
0.8364
0.8401
0.9328
0.8995
median MSE
3.3310
3.3289
2.7125
2.8507
median SE
2.8518
2.8517
2.5201
2.6391
median
0.8401
0.8409
0.8940
0.8704
median MSE
3.3235
3.3213
2.6144
2.8353
median SE
1.4229
1.4243
1.2460
1.3141
median
0.8351
0.8370
0.9070
0.8687
20% leverage
median MSE
3.3816
3.3181
2.7845
2.8622
median SE
7.2014
7.1410
6.4384
6.4432
median
0.7103
0.7108
0.8748
0.8652
median MSE
3.3159
3.3115
2.6904
2.8177
median SE
2.8488
2.8500
2.5284
2.5409
median
0.6989
0.6991
0.8485
0.8415
median MSE
3.2980
3.2661
3.1258
3.0405
median SE
1.4186
1.4138
1.3575
1.3629
median
0.7788
0.7812
0.8688
0.8540
30% leverage
median MSE
3.3077
3.2467
2.7804
2.9814
median SE
7.1256
7.0743
6.3662
6.5228
median
0.5691
0.5767
0.8287
0.8298
median MSE
3.2844
3.2708
2.7492
2.9900
median SE
2.8401
2.8333
2.4930
2.6046
median
0.5655
0.5671
0.8044
0.8016
median MSE
3.3032
3.2740
2.9956
3.0928
median SE
1.4228
1.4173
1.3373
1.3492
median
0.7029
0.7047
0.8233
0.8346
6 Practical example (Eye Data)
As an application of the proposed robust estimators, we consider the eye data which are consisting of 23 observations. The selected measurements are the angle of the posterior corneal curvature (u) and the angle of the eye (between posterior corneal curvature to iris) (v).
The Mean Circular Error(MCE) statistic was applied on the data after fitting the JS model (Ibrahim, 2013), and they showed that there are two vertical outliers with observation numbers 2 and 15 were identified. The interest here is to compare the fitted model using different estimators and to check the .
The results based on classical LS and robust estimators are reported in Table 4. The SSE for LS is the largest (
=5.7828) compare to other estimators. Thus, the CLMS is the superior estimator.
Estimators
Parameters
LS
CM
CLTS
CLMS
1.0821
1.0516
1.2020
1.2450
−0.1497
−0.1579
−1.1748
−0.1834
−0.3836
−0.3383
−0.4383
−0.4748
0.0986
0.0855
−1.4844
81.3559
0.2533
0.2711
−0.0555
−0.0752
0.5935
0.6125
2.1307
2.0032
SSE
5.7828
4.9763
3.7041
3.6986
0.9775
0.9863
0.9249
0.9288
7 Conclusion
This paper has revisited the JS circular regression model by deriving a set of robust estimators including M, LTS and LMS estimators to improve the robustness of the LS estimator. Simulation results and the application on real data clearly show that robust circular estimators perform better than the classical estimator mentioned earlier. Thus, it is recommend to obtain robust estimators for other circular regression models to increase the accuracy of its predictability.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- Identifying single outlier in linear circular regression model based on circular distance. J. Appl. Prob. Stat.. 2008;3(1):107-117.
- [Google Scholar]
- Detection of outliers in simple circular regression models using the mean circular error statistic. J. Stat. Comput. Simul.. 2013;83(2):269-277.
- [Google Scholar]
- Outliers detection in multiple circular regression models using DFFITc statistic. Sains Malaysiana. 2019;48(7):1557-1563.
- [Google Scholar]
- Alternative methods of regression. Vol vol. 190. John Wiley & Sons; 2011.
- Robust statistics: the approach based on influence functions. John Wiley & Sons; 2011.
- Linear regression model for circular variables with application to directional data. J. Appl. Sci. Technol.. 2004;9(1):1-6.
- [Google Scholar]
- Outlier detection in a circular regression model using COVRATIO statistic. Commun. Stat.-Simul. Comput.. 2013;42(10):2272-2280.
- [Google Scholar]
- Circular regression. In: Matusita K., ed. Statistical Theory and Data Analysis. Utrecht: VSP; 1993. p. :109128109-109128128.
- [Google Scholar]
- Ibrahim, S., 2013. Some Outlier Problems in a Circular Regression Model (Ph.D. thesis). Institute of Mathematical Sciences, University of Malaya.
- Jha, J., Biswas, A., 2017. Robustness issues in circular-circular regression. Technical Report No. ASU/2017/7, Indian Statistical Institute.
- Fourier series. Academia; 1971.
- Robust regression and outlier detection. John Wiley & Sons; 2005.