Translate this page into:
On multivariate-multiobjective stratified sampling design under probabilistic environment: A fuzzy programming technique
⁎Corresponding author. irfii.st@amu.ac.in (Irfan Ali)
-
Received: ,
Accepted: ,
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.
Peer review under responsibility of King Saud University.
Abstract
In a multivariate stratified sampling design, the individual optimum allocation of one character may not remain optimum to other characteristics. For the solution of such problems, a usable allocation must be required to get precise estimates of the unknown population parameters, which may be near optimum to all characteristics in some sense. The compromise criterion is required to obtain such usable allocation in sampling literature. In this paper, the sample allocation problem is considered as a stochastic nonlinear programming problem and thereafter formulated into a multiobjective programming problem to provide the usable allocation. The formulated problem is solved by using different models of stochastic optimization. Afterwards, the proposed allocation is worked out and compared with some other allocations, which are well defined in sampling, to give a comparative study. Also, the numerical study defines the practical utility of the proposed technique.
Keywords
Multivariate-multiobjective stratified sampling
Stochastic programming
Fuzzy goal programming
Compromise allocation
Gamma cost function
1 Introduction
In many real-life practices, the populations may vary in their accessibility. Some parts of the population may be in remote locations, gated buildings or other inaccessible areas. For such situations, the choice of sampling design affects the results of the survey. Then the stratified sampling seems to be the best choice of the sampling technique. For obtaining detailed information about the characteristics of the population, a multivariate stratified sample survey is carried out by splitting the population into strata. It is assumed that all the characteristics are defined in each unit of the population. The estimation of unknown population means of characteristics is required and may be carried out using the Nonlinear Programming Problem (NLPP). Cochran (1977) has been shown that the individual optimum allocation of one character may not remain optimum for others. A compromise criterion may be required to obtain the best allocation, which helps obtain precise information about population parameters. Therefore the allocation based on some compromised criterion is called compromise allocation in multivariate stratified sampling design. Most of the authors (Neyman, 1934; Kokan and Khan, 1967; Chatterjee, 1968; Ahsan, 1975; Khan et al., 1997; Semiz, 2004; Kozak, 2006; Varshney et al., 2012, 2015; Fatima et al., 2014; Muhammad et al., 2015; Muhammad and Husain, 2017; Varshney and Mradula, 2019) discussed the problems of allocation and worked out compromise allocation in multivariate stratified sample surveys. A compromise allocation is obtained either by suggesting different compromise criteria or using the suggested criteria under different conditions, i.e. in the availability of auxiliary information, presence of nonresponse, etcetera. In many sampling designs, the stratum variances are not known in advance but maybe estimable. From the deterministic point of view, such problems may be formulated as NLPPs. However, if the nature of the estimated variances is also considered, it will be an additional restriction to the problem, and therefore the compromise allocation may not be obtained easily. For such situations, the Stochastic Nonlinear Programming Problem (SNLPP) may help to work out the required compromise allocation to obtain sufficient information about population parameters (See (Charnes and Cooper, 1963; Prékopa, 1978; Díaz-García and Garay Tapia, 2007; Kozak and Wang, 2010; Haseen et al., 2016)). Díaz-García and Ramos-Quiroga (2014) discussed and provided results by solving SNLPPs with a fixed linear cost function. The concept of fuzzy set theory has been discussed by Zadeh (1965) and then Bellman and Zadeh (1970). They utilized the fuzzy approach for dynamic issues. The idea of the fuzzy set was given by Zimmermann (1978) to convert the multiobjective linear programming problem into a single objective linear programming problem. Many authors solved sample allocation problems by using fuzzy programming techniques (See (Gupta et al., 2013; Ali and Hasan, 2013; Varshney et al., 2017; Haq et al., 2020). Fuzzy programming is one of many available optimization techniques that deal with optimization problems under uncertainty. The technique is flexible and thus helps decision-makers have a better understanding of their problems. Such techniques may be applied when situations are not clearly defined and also have uncertainty. The fuzzy programming technique is a more appropriate technique for solving the problem when the data has anykind of uncertainity. Fuzzy programming has been studied and applied recently by several authors in different areas (Elsisi, 2019a, 2019b, 2020; Fakhrzad and Goodarzian, 2019; Elsisi and Soliman, 2020; Fathollahi-Fard et al., 2020; Goodarzian and Hosseini-Nasab, 2021; Lu et al., 2020).
Under the probabilistic environment, the use of the nonlinear cost function is proposed by considering the labour cost as part of the survey's total cost. In this paper, the compromise criterion is suggested for determining the compromise allocation for a multiobjective-multivariate stochastic nonlinear programming problem for the fixed cost in the probabilistic situation. The solution procedure is also given to solve the formulated problem by using an appropriate nonlinear programming technique.
In Section 2, the notations and formulation of the problem are given. The formulated problems' solutions are suggested using two deterministic approaches: modified E-model and chance constraints in Section 3. In the modified E-model formulation, the solution strategy is recommended by utilizing a fuzzy goal programming technique. For the chance constraints model, the solutions are obtained using Lagrange multiplier and integer nonlinear programming techniques. In Section 4, the numerical illustration is discussed by considering the Iris data set, and the data set is obtained by simulation carried out by the software R. The formulated problem is solved through the modified E-model approach and chance constraints technique. The solution procedures are suggested by utilizing fuzzy goal programming problem, Lagrange multiplier method and integer nonlinear programming problem. MATLAB software is used to solve the formulated NLPPs. A comparative study is included by considering some other allocations, as discussed in Section 4.1.1, with the proposed allocation. Finally, the conclusion has been made for using the proposed technique in Section 5.
2 Framework of the problem
Assuming a population of units that is partitioned into strata of sizes units such that
For stratum, the following notations are introduced as follows:
Stratum size
Stratum weight
Sample size
: Observational value of stratum unit/stratum sample.
Stratum mean
Sample mean
Stratum mean square
Sample mean square.
Furthermore,
describes the overall population mean.
If the estimated value of is needed, then the stratified sample mean
gives an unbiased estimator for with the sampling variance
In a multivariate stratified population where characteristics are given on each population element, then the population means are to be estimated. Since the individual optimum allocation may not be optimum for other characteristics. Let denotes the value obtained from element in stratum having characteristic and be the stratum mean of . Then the sample means for all characteristics in stratum are calculated by
For characteristic, an unbiased estimate of the overall population mean is given by and is expressed by with its sampling variance where is the stratum variance of characteristic in stratum for , and can be calculated by
For a multivariate stratified sample survey, the linear cost function may be considered for the overall budget of the survey (Cochran, 1977) and may be expressed as
or where denotes the cost to measure all sampling units in all strata, is the per-unit measurement cost of measuring characteristics on the selected unit in stratum, is the stratum sample size and is the overhead cost to conduct the survey. If the travel cost within the stratum come into consideration, then the cost function may not have remained linear. Beardwood et al. (1959) suggested the nonlinear cost function for this case. They showed that the distance between arbitrarily dispersed points is proportional to . The nonlinear cost function, which includes travel costs, may be expressed as
or
Practically, some other cost factors may be considered, like costs on the reward to respondents, labour costs, etc. Whenever the interviewers want to collect detailed information from the selected respondents, there will be a requirement for more human resources available for a specified time. For these prerequisites, the labour costs may be used for conducting the survey, and therefore the cost function may be expressed as
for various values of (Muhammad and Husain, 2017).
The problem may be formulated in two ways by using a deterministic approach, and the solution is obtained either by minimizing the variance
for a fixed cost or by minimizing the cost of the survey with variances of specified limits. Therefore two optimization problems may be described as
respectively.
If the true values of are unknown, then they may be computed through a starter test or the values of past events (Kozak, 2006).
3 Determination of identical probabilistic sampling variances and cost
If are considered as random variables, then the problems defined in (3) and (4) become SNLPPs. These problems may be converted into their equivalent deterministic problems. Several techniques, like modified E-model, E-model, V-model, chance constraints, etc., are available to solve deterministic problems (Charnes and Cooper, 1963). In this manuscript, modified E-model and chance constraints methods are used to convert the problems into the deterministic form.
3.1 Determination of probabilistic sampling variance through modified E-model
Consider the following formulated SNLPP for
characteristic which is given as
By considering the limiting distribution of (Melaku, 1968; Díaz-García and Garay Tapia, 2007), define a random variable that has an asymptotic
For characteristic, is defined for a multivariate case and is given as where denotes the value of the unit in stratum for characteristic and is stratum mean for stratum. The random variable has an asymptotic These are given as
and
respectively, where is the fourth mean moment and can be calculated by the following expression
Let us define which may be given as
where
and
Then, the sample variance has Normal asymptotical distribution
It has also seen that the objective function in (5) is a linear function of
. Therefore, the objective function also follows Normal distribution with mean and variance, which are given as
Therefore by using the modified E-model technique, the objective function may be redefined as
where
and
are non-negative constants such that
. The values of
and
will show the existence of the expectation and variance of
. Hence, the equivalent deterministic NLPP to the SNLPP for
characteristic, given in (5), maybe formulated as
Since the objective function includes the values of the population variance
, but these values are unknown in general, in that case, the sample variances
may be used. Therefore, the equivalent deterministic NLPP defined in (8) may be given as
The NLPP given in (9) may be extended as multiobjective-INLPP (MINLPP) for multivariate stratified sampling designs as given as
3.1.1 Solution procedure by using the fuzzy goal programming technique
To solve the MINLPP given in (10), the fuzzy goal programming technique may be applied for multivariate sampling design. Since no technique is developed to solve the multiobjective formulation of INLPP, in that case, the problem may be converted into a single objective problem by using a suitable criterion. For such a case, the fuzzy goal programming technique may be used and applied using the following steps.
Stage 1: To get the solution of the MINLPP, a problem of a single objective function is to be required by ignoring the remaining objective functions of other characteristics to work out the optimum solution for each characteristic as an ideal solution.
Stage 2: Step −1 is repeated for all characteristics, and p-optimum solutions are obtained to give the optimum values of objective functions .
Stage 3: To compute the payoff matrix, the ideal solutions will give the upper and lower values for each objective function by defining and for objective function; .
These values are computed as and where is the optimum value of the objective function for characteristic with optimum allocation .
Stage 4: The membership function may be defined as where is a strictly monotonic decreasing function to the solution , .
Consider the variable which is defined as
Stage 5: By the max–min method, we have , then where .
Finally, the mathematical programming formulation for the problem (10) is to be solved by using fuzzy goal programming as follows:
3.2 Determination of probabilistic sampling cost through chance constraints
In this section, the SNLPP is considered for minimizing the total survey cost for a given bound to the estimated variance of the mean. This bound may be specified with tolerance limits for estimated variances of the estimates. This SNLPP may be formulated as:
Also, and is a predetermined probability such that .
Since follows an asymptotic , then the estimated in (12) also follows asymptotic Normal distribution with mean and variance defined in (6) and (7), respectively. After standardizing the function of the in (12), it may be re-expressed as where
and represents the function of standard Normal distribution. If denotes the value of a random variable that follows standard normal distribution such that , with these conditions, the inequality may be expressed as
Therefore,
The equivalent deterministic NLPP for SNLPP in (12) may be given as
The expression in (14) population variances
, and these values remain not known in advance. Then
maybe substituted in place of
. Hence the equivalent deterministic NLPP for (12) may be given as
4 Application
A population of size
with three strata and two characteristics are taken and obtained by simulating 150 observations of Iris data. Iris data set is available in the Software R domain (R Development Core Team, 2018). These observations are divided into three strata where two characteristics (that is, length and width of a leaf of a particular species of flower) are measured on each population unit. The population units for three strata of sizes 3000, 3000 and 3500 are generated by the simulation of Iris data using the software R and the values of
and
are computed and reported in Table 1. The values of
and
are assumed for numerical illustration accordingly and given in Table 1. The total cost for conducting the survey is taken as
units.
1
3000
0.01523817
0.02037975
0.00068061
0.001217395
2
1
100
20
2
3000
0.06898021
0.00957083
0.01434982
0.000268493
3
2
100
20
3
3500
0.16608490
0.01080517
0.08426042
0.000349415
3
3
100
20
4.1 Solution for modified E-model by fuzzy goal programming technique
Without loss of generality, is taken. For the given numeric values, given in Table 1, the formulations of the NLPP for both characteristics are given as
characteristics are worked out as given below:
After getting ideal solutions, the payoff matrix may be computed and given in Table 2:
0.00034712
0.00010331
0.00047683
0.00005878
The upper and lower bounds of each objective function may be given as:
Therefore the values of and are obtained as
Let and are the fuzzy membership function for the functions , and they are used for developing a membership function for both characteristics as
By using the max–min addition operator, the objective function is revised as
To maximize the above problem with subject to constraints as formulated as
Using MATLAB, the optimal solution to the above problem is obtained as with variances under the proposed allocation given as
Therefore, the Trace will be where,
4.1.1 Comparison with other allocations
In this section, a comparative study is carried out where the proposed method is compared with some other well-defined methods of allocation. Some of these methods are as follows:
4.1.1.1 Proportional allocation
For the fixed cost of the survey, the proportional allocation may be obtained by substituting in the cost function, and subsequently, stratum-wise allocations, which are rounded off to nearest integers, may be obtained as
also the trace value, under proportional allocation, is computed as 0.0008066.
4.1.1.2 Cochran's average allocation
Cochran (1977) suggested the compromise criterion by taking the average of the individual optimum allocations
;
. These allocations are obtained by solving the individual NLPP for each characteristic; that is, for
characteristic, the required NLPP may be formulated as
For the given numerical values, as given in Table 1, the compromise allocation suggested by Cochran (1977) is worked out as . The variances of both characteristics are calculated as and respectively. Therefore the trace value is .
4.1.1.3 Sukhatme's compromise allocation
This compromise allocation is obtained by optimizing the Trace of the variance–covariance matrix of the estimator. The solution to the following NLPP will give the desired compromise allocation Sukhatme et al. (1984), and the formulation of the NLPP is given as
On the substitution of numerical values from Table 1, the solution is obtained as and therefore, the trace value is calculated as .
In Table 3, it appears that the proposed allocation gives the least trace value compared to the values obtained by other allocations. Furthermore, the relative efficiency of the proposed allocation to proportional allocation is maximum among the others. Table 4 shows the percentage increase in both characteristics' variances when the individual optimum allocation of one characteristic is used for both characteristics, and the proposed allocation is utilized. The proposed allocation provides lesser values of percentage increase in the variances with respect to individual allocations. Table 5 shows the percentage increase in both characteristics' variances when other allocations are used instead of individual allocations. For the proposed allocation, these values are minimum in comparison to others. Therefore it may be claimed that the suggested allocation may be regarded as the best allocation.
S.N
Allocations
Allocations
cost
Trace
R.E. w.r.t Proportional Allocation
1
Proportional
40 40 46 126
1007.3
0.0008066
1.00000
2
Cochran
36 35 53 124
995.67
0.0007755
1.04010
3
Sukhatme
25 36 61 122
991.43
0.0007532
1.07089
4
Proposed
33 35 56 124
999.03
0.0007527
1.07155
Percentage increment within the variances
Characteristics
1
2
1
0
0.31866816
0.014920487
2
0.55578140
0
0.229092349
Percentage increment within the variances of distinct characteristics beneath assorted criteria
Characteristics
Proportional
Cochran
Sukhatme
Proposed
1
15.36391
9.441428
2.40957
0.014920487
2
3.225985
7.91908
27.73566
0.229092349
Based on the above discussion, the proposed allocation works well in comparison to other allocations.
4.2 Solution by chance constraints
When the cost of carrying out a sample survey is high and a specified limit on the variances are given, then this method may be used. With the specified values of
and
, the value of
is
such that
The equivalent deterministic problem of SNLPP may be given as
The NLPP (20) solutions are obtained using MATLAB by the Lagrange multiplier technique and INLPP technique. The solutions are reported in Table 6.
S.No
Allocations
Cost
1
Lagrange multiplier (non integer)
12.97 52.74 61.33 127
1045.10
2
Lagrange multiplier (rounded)
13 53 61 127
1044.59
3
Lagrange multiplier (integer)
13 53 62 128
1052.78
4
Stochastic (non integer)
18.66 36.95 66.30 122
997.630
5
Stochastic (rounded)
19 35 56 124
997.890
6
Stochastic (integer)
19 38 65 122
997.870
Table 6 shows the Lagrange multiplier technique and INLPP technique to minimize the survey's total cost. If the continuous solution to the NLPP (20) is considered, then the nonlinear programming technique is preferable to that of the Lagrange multiplier technique. If the continuous solution is adjusted off to the closest whole number, then the nonlinear programming technique provides the survey's minimum cost. Furthermore, if integer restriction is a must, then the use of a nonlinear programming technique is advisable.
5 Conclusion
In general, stratum variances' true values may not be known in advance but may be estimated. In this way, the problem is defined as a multivariate-multiobjective SNLPP in this paper. The formulated SNLPPs may be converted into their deterministic form using a modified E- model and chance constraints techniques. The formulated problems' solutions may be computed either by minimizing the sampling variances for a fixed cost or minimizing the cost for the fixed precision value of the variances of estimates. For numerical illustration, the data are generated by conducting simulation using R, and the formulated NLPPs may be solved by using MATLAB. The proposed compromise allocation provides the best outcomes for the given numerical application than that obtained by other compromise criteria, as discussed in this paper. Furthermore, for large scale investigations, it is vital to select the appropriate method for attaining the study's objectives.
Acknowledgement
All the authors are very thankful to the Editor in Chief and the anonymous reviewers who helped improve the paper’s quality and presentation substantially.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- Ahsan, M.J., 1975-1976. A procedure for the problem of optimum allocation in multivariate stratified random sampling stratified random sampling. Aligarh. Bull. Math. 5-6, 37–42. DOI:10.1080/01621459.1968.11009271.
- Integer fuzzy programming approach in bi-objective selective maintenance allocation problem. J. Math. Model. Algo.. 2013;13(2):113-124.
- [CrossRef] [Google Scholar]
- The shortest path through many points. Math. Proc. Cambridge. 1959;55:299-327.
- [CrossRef] [Google Scholar]
- Decision-making in a fuzzy environment. Manage. Sci.. 1970;17(4):141-164.
- [CrossRef] [Google Scholar]
- Deterministic equivalents for optimizing and satisficing under chance constraints. Oper. Res.. 1963;11(1):18-39.
- [CrossRef] [Google Scholar]
- Sampling Techniques. New York: John Wiley; 1977.
- Optimum allocation in stratified surveys: stochastic programming. Comput. Stat. Data Anal.. 2007;51:3016-3026.
- [Google Scholar]
- Stochastic optimal design in multivariate stratifies sampling. Optimization. 2014;63(11):1665-1688.
- [CrossRef] [Google Scholar]
- Design of neural network predictive controller based on imperialist competitive algorithm for automatic voltage regulator. Neural. Comput. Appl.. 2019;31(9):5017-5027.
- [Google Scholar]
- New variable structure control based on different meta-heuristics algorithms for frequency regulation considering nonlinearities effects. Int. Trans. Electr. Energy Syst.. 2020;30(7):e12428
- [Google Scholar]
- Optimal design of robust resilient automatic voltage regulators. ISA Trans.. 2020;108:257-268.
- [Google Scholar]
- A fuzzy multiobjective programming approach to develop a green closed-loop supply chain network design problem under uncertainty: modifications of imperialist competitive algorithm. RAIRO Oper. Res.. 2019;53(3):963-990.
- [Google Scholar]
- A bi-objective home healthcare routing and scheduling problem considering patients’ satisfaction in a fuzzy environment. Appl. Soft Comput.. 2020;93:106385
- [CrossRef] [Google Scholar]
- On compromise mixed allocation in multivariate stratified sampling with random parameters. J. Math. Model. Algor.. 2014;13(4):523-536.
- [CrossRef] [Google Scholar]
- Applying a fuzzy multiobjective model for a production–distribution network design problem by using a novel self-adoptive evolutionary algorithm. IJSS: O&L.. 2021;8(1):1-22.
- [Google Scholar]
- Gupta, N., Ali, I., Bari, A., 2013. Fuzzy goal programming approach in selective maintenance reliability model. Pak. J. Stat. Oper. Res. 9(3), 321–331. DOI:10.18187/pjsor.v9i3.654.
- Compromise allocation problem in multivariate stratified sampling with flexible fuzzy goals. J Stat Comput. Simul.. 2020;90(9):1557-1569.
- [Google Scholar]
- Haseen, S., Ali, I., Bari, A., 2016. Multiobjective stochastic multivariate stratified sampling in presence of nonresponse. Commun. Stat. - Simul. Comput. 45(8), 2810–2826. . 2014.926173.
- Compromise allocation in multivariate stratified sampling: an integer solution. Nav. Res. Logist.. 1997;44(1):69-79.
- [Google Scholar]
- Optimum allocation in multivariate surveys: an analytical solution. J. R. Stat. Soc. B.. 1967;29(1):115-125.
- [Google Scholar]
- On sample allocation in multivariate surveys. Commun. Stat. - Theory Methods. 2006;35(4):901-910.
- [CrossRef] [Google Scholar]
- On stochastic optimization in sample allocation among strata. Metron. 2010;LXVIII(1):95-103.
- [Google Scholar]
- A fuzzy intercontinental road-rail multimodal routing model with time and train capacity uncertainty and fuzzy programming approaches. IEEE Access. 2020;8:27532-27548.
- [Google Scholar]
- Asymptotic normality of the optimal allocation in multivariate stratified random sampling. Sankhya. 1968;48:224-232.
- [Google Scholar]
- Muhammad, Y.S., Husain, I., 2017. Trade off between cost and variance for a multiobjective compromise allocation in stratified random sampling. Commun. Stat.-Theory Methods. 46(6), 2655–2666. DOI:10.1080/03610926.2015.1040507.
- Multiobjective compromise allocation in multivariate stratified sampling using extended lexicographic goal programming with Gamma cost function. J. Math. Model. Algorithms. 2015;14(3):255-265.
- [CrossRef] [Google Scholar]
- On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J. Royal Stat. Soc.. 1934;97(4):558-625.
- [CrossRef] [Google Scholar]
- Prékopa, A., 1978. The use of stochastic programming for the solution of the some problems in statistics and probability. Technical Summary Report #1834, University of Wisconsin Madison, Mathematical Research Center, Madison.
- R Development Core Team. R., 2018. A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org.
- Introduction to Probability and Statistics for Engineers and Scientists (fourth ed). Associated Press; 2009. ISBN:978-0-12-370483-2
- Determination of compromise integer strata sample sizes using goal programming. J. Math. Stat.. 2004;33:91-96.
- [Google Scholar]
- Sukhatme, P. V., Sukhatme, B.V., Sukhatme, S., Asok, C., 1984. Sampling theory of surveys with applications. Iowa State University Press, Iowa, USA and Indian Society of Agricultural Statistics, New Delhi, India.
- Varshney, R., Najmussehar., Ahsan, M.J., 2012. Estimation of more than one parameters in stratified sampling with fixed budget. Math. Method. Oper. Res. 75(2), 185–197. DOI:10.1007/s00186 -012-0380-y.
- Varshney, R., Khan M.G.M., Fatima, U., Ahsan, M.J., 2015. Integer compromise allocation in multivariate stratified surveys. Ann. Oper. Res. 226(1): 659–668. DOI:10.1007/s10479-014-1734-z.
- An optimum multivariate-multiobjective stratified sampling design: Fuzzy programming approach. Pak. J. Stat. Oper. Res.. 2017;13(4):829-855.
- [CrossRef] [Google Scholar]
- Optimum allocation in multivariate stratified sampling design in the presence of nonresponse with Gamma cost function. J. Stat. Comput. Simul.. 2019;89(13):2454-2467.
- [CrossRef] [Google Scholar]
- Fuzzy programming and linear programming with several objective functions. Fuzzy Sets Syst.. 1978;1:45-55.
- [Google Scholar]