7.2
CiteScore
3.7
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
7.2
CiteScore
3.7
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
View/Download PDF

Translate this page into:

Original article
11 2022
:34;
102281
doi:
10.1016/j.jksus.2022.102281

A criterion for the global convergence of conjugate gradient methods under strong Wolfe line search

Department of Mathematics, Faculty of Mathematical and Computer Sciences, University of Gezira, Sudan
Department of Mathematics, Faculty of Sciences AlZulfi, Majmaah University, Majmaah 11952, Saudi Arabia
Department of Mathematics and Applied Mathematics University of Pretoria, Pretoria 0002, South Africa
Department of Computer, College of Science and Arts, Qassim University, Ar Rass, Saudi Arabia
Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka 72441, Kingdom of Saudi Arabia

⁎Corresponding author. mogtaba.m@mu.edu.sa (Mogtaba A.Y. Mohammed),

Disclaimer:
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.

Peer review under responsibility of King Saud University.

Abstract

From 1952 until now, the sufficient descent property and the global convergence of conjugate gradient (CG) methods have been studied extensively. However, the sufficient descent property and the global convergence of some CG methods such as the method of Polak, Ribière, and Polyak (PRP) and the method of Hestenes and Stiefel (HS) have not been established yet under the strong Wolfe line search. In this paper, based on Yousif (Yousif, 2020) we present a criterion that guarantees the generation of descent search directions property and the global convergence of CG methods when they are applied under the strong Wolfe line search. Moreover, the PRP and the HS methods are restricted in order to satisfy the presented criterion, so new modified versions of PRP and HS are proposed. Finally, to support the theoretical proofs, a numerical experiment is done.

Keywords

Unconstrained optimization problems
Conjugate gradient methods
Strong Wolfe line search
Sufficient descent property
Global convergence
1

1 Introduction

Unconstrained optimization problems usually arise in various fields of science, engineering, and economics. They are mathematically formulated as

(1.1)
min x R n f ( x ) , where f : R n R is a continuously differentiable function. The methods of the conjugate gradient are widely used to solve problems (1.1), this is due to their simplicity and small footprint. It should be mentioned that optimization problems as in (Eq. (1.1)) are also solved using non-gradient methods especially successfully using different population based heuristics. The previous methods use the following iterative expression:
(1.2)
x k + 1 = x k + α k d k , k = 0 , 1 , 2 , ,

Such that α k represents the step length that is a CG method takes in each step toward the search direction d k . Strong Wolfe line search is one of the most used methods in practical computations for computing α k , in which α k satisfies

(1.3)
f x k + α k d k f x k + δ α k g k T d k
(1.4)
| g x k + α k d k T d k | σ | g k T d k |

Such that g k represents the gradient of the nonlinear function f at the value x k and 0 < δ < σ < 1 and d k is the search direction given by:

(1.5)
d k = - g k , i f k = 0 , - g k + β k d k - 1 , i f k 1 ,

Such that β k is the factor that determines how the conjugate gradient methods differ. Some of the very well-known formulas attributed to Hestenes-Stiefel (HS) (Hestenes and Stiefel, 1952), Fletcher-Reeves (FR) (Fletcher and Reeves, 1964) and Polak-Ribière-Polyak (PRP) (Polyak, 1969; Polak and Ribière, 1969). These formulas are given by β k HS = g k T ( g k - g k - 1 ) d k - 1 T ( g k - g k - 1 ) β k FR = g k 2 g k - 1 2 β k PRP = g k T g k - g k - 1 g k - 1 2

respectively. Other formulas are conjugate descent (CD) (Fletcher, 1987), Liu-Storey (LS) (Liu and Storey, 1992), and Dai-Yuan (DY) (Dai and Yuan, 2000). For more formulas for the coefficient β k see (Abubakar et al., 2022; Yuan and Lu, 2009; Zhang, 2009; Rivaie et al., 2012; Hager and Zhang, 2005; Dai, 2002; Yuan and Sun, 1999; Salleh et al., 2022; Dai, 2016; Wei et al., 2006, Wei et al., 2006).

To guarantee that every search direction generated by a CG method is descent, the sufficient descent property

g k T d k - C g k 2 , k 0 a n d a c o n s t a n t C > 0 , (1.6)

is needed.

The global convergence and descent directions property of the FR method are established using both exact (Zoutendijk and Abadie, 1970) and strong Wolfe line search (Al-Baali, 1985) on general functions. The PRP and the HS methods with exact line search can cycle infinitely without approaching a solution which implies that they both do not have global convergence for general functions (Powell, 1984). Nevertheless, the good performance of the PRP and the HS in practice, that is, due to self-restarting property, both methods are preferred to the FR method. To establish the convergence of them with the strong Wolfe line search, Powell (Powell, 1986) suggested restricting them to be non-negative. Motivated by Powell’s suggestion (Powell, 1986), Gilbert and Nocedal (Gilbert and Nocedal, 1992) conducted an elegant analysis and established that they are globally convergent if they are restricted to be non-negative and the step length satisfies the sufficient descent condition. Further studies on global convergence properties of CG methods are of Hu and Storey (Hu and Storey, 1991), Liu et al (Zoutendijk and Abadie, 1970), and Touati-Ahmed and Storey (Touati-Ahmed and Storey, 1990) among others.

Recently, Yousif (Yousif, 2020) gave detailed proof for the sufficient descent property and the global convergence of the modified method of Rivaie; Mamat, Ismail, and Leong (RMIL + ) (Rivaie et al., 2012). In this author’s work, the coefficient is given by

(1.7)
β k RMIL + = g k T ( g k - g k - 1 ) d k - 1 2 , i f 0 g k T g k - 1 g k 2 , 0 , o t h e r w i s e .

The proof is based on the inequality

(1.8)
g k d k < 2 , k 0 ,

In the above setting the RMIL + method generated g k and d k under the application of strong Wolfe line search in the case of σ [ 0 , 1 4 ] .

In this paper, inspired by Yousif (Yousif, 2020), we present a criterion that guarantees the descent property and the global convergence of each CG method satisfying this criterion. This is presented in Sections 2. In Section 3, based on this criterion, we propose modified versions of PRP and HS methods. Finally, in Section 4, to show the efficiency of the proposed modified methods in practical computation, they are compared with PRP, HS, FR, and RMIL + methods.

2

2 A new criterion guarantees sufficient descent and global convergence

In this section, we firstly show that for every CG method whose coefficient β k satisfies

(2.1)
β k μ g k 2 d k - 1 2 , f o r k 1 a n d a r e a l n u m b e r μ 1 ,

the inequality (1.8) holds true. Secondly, we prove the sufficient descent property and the global convergence of any CG method whose coefficient β k satisfies (2.1) under the application of strong Wolfe line search in the case of σ [ 0 , 1 4 μ ] .

2.1

2.1 The sufficient descent property

Before we prove the desired property, we first note that for every-two positive real numbers σ and μ 1 , we have 0 < σ < 1 4 μ - 2 < 2 2 μ σ - 1 < - 1 - 1 < 2 μ σ - 1 < - 1 2 1 2 < 1 - 2 μ σ < 1

(2.2)
1 < 1 1 - 2 μ σ < 2

Theorem 2.1: Assume that g k and d k are generated by a CG method such that β k satisfies (2.1) under the application of strong Wolfe line search in the case of σ [ 0 , 1 4 μ ] . Then (1.8) holds.

Proof: We follow the induction argument. For k = 0 , (1.5) shows that (1.8) is satisfied. Now , suppose that (1.8) is true for k 1 , rewrite equation (1.5) for k + 1 and multiply the resulting equation by g k + 1 T , we get g k + 1 2 = - g k + 1 T d k + 1 + β k + 1 g k + 1 T d k

Applying the triangle inequality, we get g k + 1 2 | g k + 1 T d k + 1 | + | β k + 1 g k + 1 T d k |

Using the condition (1.4), we obtain g k + 1 2 | g k + 1 T d k + 1 | + σ | β k + 1 | | g k T d k |

Substitute (2.1) for β k + 1 and use C—S inequality, we get

(2.3)
g k + 1 2 g k + 1 d k + 1 + μ σ g k + 1 2 g k d k .

Dividing both sides of (2.3) by g k + 1 and then applying the induction hypothesis (1.8), we come to g k + 1 < d k + 1 + 2 μ σ g k + 1

which leads to g k + 1 ( 1 - 2 μ σ ) < d k + 1

Since 1 - 2 μ σ > 0 and 1 1 - 2 μ σ < 2 (see (2.2)), we come to g k + 1 d k + 1 < 1 1 - 2 μ σ < 2

thus, the proof is complete.

Now, we are able to establish the sufficient descent property (1.6) under the condition (2.1). This is the topic of the following theorem

Theorem 2.2: Assume that g k and d k are generated by a CG method such that β k satisfies (2.1) under the application of strong Wolfe line search in the case of σ [ 0 , 1 4 μ ] . Then the sufficient descent property (1.6) holds true.

Proof: For k = 0 , the result is clear by using (1.5). Consider the case k > 0 .

From (1.5), we have g k T d k = - g k 2 + β k g k T d k - 1 - g k 2 + β k g k T d k - 1

Applying the strong Wolfe condition (1.4), we get g k T d k - g k 2 + σ β k g k - 1 T d k - 1 .

Using Cauchy-Schwartz inequality g k - 1 T d k - 1 g k - 1 d k - 1 and then substituting (2.1) and (1.8), we come to g k T d k - g k 2 + μ σ g k 2 g k - 1 d k - 1 < - 1 + 2 μ σ g k 2 ,

which means

(2.4)
g k T d k < - C g k 2 , where C = 1 - 2 μ σ > 0 (see (2.2)). Thus the proof is complete.

2.2

2.2 The global convergence

Now, based on the following assumption on the objective function f , we establish the global convergence under strong Wolfe line search with 0 < σ < 1 4 μ of every CG method whose coefficient satisfies (2.1).

Assumption 2.1

  1. Define Ω = x R n : f x f ( x 0 ) and assume that Ω is bounded for all initial points x 0 .

  2. Let N be a neighborhood of Ω and assume that f C ( N ) such that for some l > 0

  3. ‖g(x)-g(y)‖ ≤l ‖x-y‖, ∀ x,y ∈ N.

Under this assumption, Zoutendijk (Zoutendijk and Abadie, 1970) proved the following results.

Lemma 2.1 Let Assumption 2.1 is given. The for any conjugate gradient method in the forms (1.2)-(1.5) such that α_k is computed according to strong Wolfe line search. Then

(2.5)
k = 0 ( g k T d k ) 2 d k 2 < .

From (2.4), we get

C 2 g k 4 < ( g k T d k ) 2 , for all k 0

which leads to

(2.6)
k = 0 g k 4 d k 2 < 1 C 2 k = 0 ( g k T d k ) 2 d k 2 .

From (2.5) and (2.6) together, we come to.

(2.7)
k = 0 g k 4 d k 2 < .

Therefore, based on Assumption 2.1, we deduce that if the sequences g k and d k are generated by a CG method with coefficient β k satisfying (2.1) when it is applied under the strong Wolfe line search with 0 < σ < 1 4 μ , then (2.7) holds.

The following lemma will be used in the proof of the global convergence

Lemma 2.2: Suppose that g k and d k are generated by any CG method such that β k satisfies (2.1) under the application of the strong Wolfe line search with 0 < σ < 1 4 μ . Then there exists a positive constant C 1 > 1 such that

(2.8)
g k T d k - C 1 g k 2

Proof: Multiplying (1.5) by g k T and then applying the triangle inequality, we obtain g k T d k g k 2 + β k g k T d k - 1

Substituting (2.1) and applying the strong Wolfe condition (1.4) and using inequality (1.8), we get g k T d k g k 2 + 2 μ σ g k 2

which means d k - C 1 g k 2 where C 1 = 1 + 2 μ σ and this completes the proof.

Theorem 2.3: Suppose that Assumption 2.1 holds. Any CG method with a coefficient β k satisfying (2.1) is globally convergent when it is applied under the strong Wolfe line search with 0 < σ < 1 4 μ , that is,

(2.9)
lim inf k g k = 0 .

Proof: The proof is by contradiction. It assumes that the opposite of (2.9) holds, that is, there exists a constant ε > 0 and an integer k 1 such that

(2.10)
g k ε , f o r a l l , k k 1

which leads to

(2.11)
1 g k 2 1 ε 2 , f o r a l l k k 1

From (1.5), by squaring both sides of d k + g k = β k d k - 1 , we get

(2.12)
d k 2 = - g k 2 - 2 g k T d k + ( β k ) 2 d k - 1 2

Using (2.8), we obtain d k 2 - g k 2 + 2 C 1 g k 2 + ( β k ) 2 d k - 1 2 ,

which means

d k 2 C 3 g k 2 + ( β k ) 2 d k - 1 2 , where C 3 = 2 C 1 - 1 .

Substituting (2.1) and dividing both sides by g k 4 , we get d k 2 g k 4 C 3 g k 2 + μ 2 d k - 1 2 .

Since 1 d k - 1 2 < 4 g k - 1 2 (see (1.8)), then

(2.13)
d k 2 g k 4 < C 3 g k 2 + 4 μ 2 g k - 1 2 .

Combining (2.11) and (2.13) together, we come to d k 2 g k 4 < C 3 + 4 μ 2 ε 2 , f o r a l l k k 1 + 1 .

This means g k 4 d k 2 > γ , w h e r e γ = ε 2 C 3 + 4 μ 2 .

Then k = k 1 + 1 n g k 4 d k 2 > n - k 1 γ .

Since k = 0 g k 4 d k 2 > k = k 1 + 1 g k 4 d k 2 = lim n k = k 1 + 1 n g k 4 d k 2 ,

and k = k 1 + 1 g k 4 d k 2 = lim n k = k 1 + 1 n g k 4 d k 2 > lim n n - k 1 γ = .

We come to k = 0 g k 4 d k 2 > .

This contradicts (2.7). Therefore, the proof is completed.

3

3 Modified versions of the PRP and the HS methods

In this section, since the sufficient descent property and the global convergence of the well-known PRP and HS methods are not established under strong Wolfe line search, then motivated by the results in Section 2, we propose modified versions of PRP and HS methods, that is, by restricting the coefficients β k PRP and β k HS in order to satisfy (2.1) as follows

(3.1)
β k OPRP = β k PRP i f - μ g k 2 d k - 1 2 < β k PRP < μ g k 2 d k - 1 2 0 , o t h e r w i s e .

and

(3.2)
β k OHS = β k HS i f - μ g k 2 d k - 1 2 < β k HS < μ g k 2 d k - 1 2 0 , o t h e r w i s e .

We call these modified versions OPRP and OHS respectively, where the letter “O” stands for Osman.

Of course, both of the modified versions of PRP and HS satisfy (2.1), so that they generate descent directions at each iteration and globally convergent when they are applied under strong Wolfe line search with 0 < σ < 1 4 μ . Note that, in (3.1) and (3.2) when μ , then β k OPRP β k PRP and β k OHS β k HS and also σ tends to zero. Therefore, for a sufficiently large value of μ , the OPRP and the OHS methods can be considered as good approximations to both PRP and HS methods.

We also note, like PRP and HS methods, OPRP and OHS methods perform a restart when they encounter a bad direction, i.e., when g k approaches g k - 1 , then both β k OPRP and β k OHS approach zero, so that d k approaches - g k . Hence, we expect that they perform better than FR method in practice. Also, the sufficient descent property and the global convergence of both OPRP and OHS methods qualified them to be better than both PRP and HS theoretically, but it remains to show their performance in practical computations and this will be shown in the next section.

4

4 Numerical experiment

In this section, to show the efficiency and robustness and to support the theoretical proofs in Section 2, numerical experiments based on comparing the proposed OPRP and OHS when μ = 10 with PRP, HS, FR, and RMIL + methods are done. To accomplish the comparison, a MATLAB coded program for these methods when they are all implemented under strong Wolfe line search with δ = 10 - 4 and σ = 10 - 2 is run. We stop the program if g k 10 - 6 . The test problems are unconstrained and most of them are from (Andrei, 2008). To show the robustness, test problems are implemented under low, medium, and high dimensions, namely, 2, 3, 4, 10, 50, 100, 500, 1000, and 10000. Furthermore, for each dimension, two different initial points are used, one of them is the initial point, which is suggested by Andrei (Andrei, 2008) and the other point is chosen arbitrarily. The comparison is based on the number of iterations (NI), the number of function evaluations (NF), and the number of gradient evaluations (NG). The numerical results are in Table 1. In Table 1, a method is considered to have failed, and we report “Fail” if the number of iterations exceeds 5 × 10 3 , or the search direction is not descent.

Table 1 A comparison between FR, HS, PRP, OHS, OPRP, and RMIL +.
NI/NF/NG NI/NF/NG NI/NF/NG NI/NF/NG NI/NF/NG NI/NF/NG
1 GENERALIZED WHITE & HOLST (Andrei, 2008) 2 49/233/127
74/307/177
15/113/47
16/111/53
15/98/47
Fail
14/89/49
18/112/62
14/89/45
18/115/64
20/113/59
24/141/80
2 THREE-HUMP (Molga and Smutnicki, 2005) 2 14/397/85
Fail
26/750/93
26/770/122
11/293/48
14/347/143
26/750/93
19/530/138
11/293/48
15/385/86
14/387/99
20/547/175
3 SIX-HUMP (Molga and Smutnicki, 2005) 2 13/44/30
10/54/28
5/19/13
8/47/23
5/19/13
8/47/25
5/19/13
7/40/22
5/19/13
8/45/24
5/19/13
8/50/26
4 TRECANNI (Zhu, 2004) 2 10/40/25
9/32/20
5/26/16
4/17/10
5/26/16
Fail
5/26/16
5/20/12
5/26/16
5/19/11
5/26/16
Fail
13 EXTENDED WOOD (Andrei, 2008) 4 6897/33847/16150
1115/5012/2624
123/481/282
167/736/401
106/428/243
157/697/383
51/225/128
74/397/200
57/250/142
85/458/233
118/406/254
245/982/581
14 FREUDENSTEIN & ROTH (Andrei, 2008) 4 24/88/53
194/85/47
Fail
7/41/19
7/39/19
10/52/25
7/35/18
7/45/20
7/36/19
9/52/25
8/40/20
9/52/25
15 GENERALIZED TRIDIAGONAL 2 (Andrei, 2008) 4 5/16/13
Fail
4/13/11
10/51/34
4/13/11
10/49/33
4/13/11
12/61/43
4/13/11
12/61/43
4/13/11
12/61/43
16 QP1 (Andrei, 2008) 4 205/69/45
17/74/43
6/24/14
10/54/27
7/27/16
8/43/22
7/27/16
10/52/29
7/27/16
10/52/29
7/28/17
11/55/31
17 FLETCHER (Andrei, 2008) 10 1203/5826/2809
2443/1202/587
56/256/134
73/344/173
56/256/134
73/348/174
56/256/134
66/380/175
56/256/134
51/300/137
74/307/171
105/502/253
18 GENERALIZED TRIDIAGONAL 1 (Andrei, 2008) 10 27/88/58
43/164/103
23/76/50
27/112/68
23/76/50
27/112/68
23/76/50
27/112/68
23/76/50
27/112/68
22/73/48
27/109/66
19 HAGER (Andrei, 2008) 10 11/34/31
97/314/215
12/37/32
17/60/51
12/37/32
17/60/51
12/37/32
17/60/51
12/37/32
17/60/51
12/37/32
18/65/57
20 ARWHEAD (Andrei, 2008) 10 7/27/17
13/70/36
5/22/14
8/52/24
5/22/14
9/55/26
5/22/14
9/58/28
5/22/14
8/55/26
5/22/14
9/58/28
21 GENERALIZED QUARTIC (Andrei, 2008) 10 11/222/58
47/1065/868
8/89/57
17/335/102
8/93/59
16/320/131
7/69/39
14/223/115
8/93/59
15/236/147
6/48/17
12/154/76
22 POWER (Andrei, 2008) 10 10/30/20
103/30/20
10/30/20
10/30/20
10/30/20
10/30/20
10/30/20
10/30/20
10/30/20
10/30/20
104/312/208
122/366/244
23 GENERALIZED ROSENBROCK (Andrei, 2008) 10 Fail
Fail
437/1702/1000
361/1540/868
480/1847/1103
449/1840/1054
642/2271/1431
962/3409/2124
1805/5798/3804
1211/4135/2615
1173/3915/2505
1691/5709/3598
24 RAYDAN 1 (Andrei, 2008) 10

102
19/90/76
2806/9964/6278
949/484/197
880/3806/1910
17/80/67
Fail
74/287/152
170/723/373
17/80/67
Fail
75/314/157
169/694/371
17/80/67
36/199/166
74/287/152
130/559/294
17/80/67
36/196/168
75/314/157
130/565/293
20/96/81
37/200/171
86/266/179
153/587/353
25 EXTENDED DENSHENB (Andrei, 2008) 10

102
9/31/22
17/65/42
18/68/44
9/44/23
5/19/14
8/34/21
8/34/21
9/51/27
5/19/14
8/34/21
8/34/21
10/48/26
5/19/14
9/37/23
9/37/23
9/44/23
5/19/14
9/37/23
9/37/23
9/44/23
5/19/14
10/45/29
10/45/29
10/47/25
26 EXTENDED PENALTY (Andrei, 2008) 10

102
12/52/31
17/64/38
238/4732/506
2830/11510/4309
45/162/106
10/49/29
17/97/51
15/96/49
29/106/67
11/52/33
Fail
Fail
16/63/39
9/40/23
13/78/40
12/71/37
15/60/37
9/40/23
13/81/42
12/71/37
14/57/36
8/37/22
23/144/78
13/75/39
27 QP2 (Andrei, 2008) 102
283/3487/737
2482/3295/649
21/219/74
27/285/88
23/244/77
24/256/84
35/322/114
31/294/100
36/325/116
32/308/104
33/301/105
30/287/97
28 DIXON3DQ (Andrei, 2008) 50 25/79/55
25/79/55
25/79/55
25/79/55
25/79/55
25/79/55
25/79/55
25/79/55
25/79/55
25/79/55
123/376/262
127/392/275
29 QF2 (Andrei, 2008) 50 116/394/285
73/299/168
70/244/160
66/272/150
70/244/160
65/269/148
70/244/160
66/272/150
70/244/160
65/269/148
78/274/181
74/305/177
30 QF1 (Andrei, 2008) 50

500
38/114/76
40/120/80
131/393/262
137/411/274
38/114/76
40/120/80
131/393/262
137/411/274
38/114/76
40/120/80
131/393/262
137/411/274
38/114/76
40/120/80
131/393/262
137/411/274
38/114/76
40/120/80
13/393/262
137/411/274
69/207/138
78/234/156
162/486/325
198/594/397
31 HIMMELH (Andrei, 2008) 500 13/79/29
11/64/23
5/15/10
5/15/10
5/15/10
5/15/10
6/32/13
6/18/12
6/32/13
6/18/12
5/15/10
6/18/12
32 QUARTC (Andrei, 2008) 500 3/31/26
4/43/33
Fail
Fail
2/24/23
Fail
3/31/26
3/26/16
3/31/26
3/26/16
3/31/26
3/26/16
33 EXTENDED TRIDIAGONAL 1 (Andrei, 2008) 500

103
340/1190/1035
14/68/58
399/1396/1212
518/1813/1561
14/71/60
7/42/35
14/71/60
7/43/25
14/72/58
11/63/53
14/72/58
14/83/61
12/61/50
8/47/38
12/61/51
13/78/55
12/61/50
8/47/38
12/61/50
13/77/55
12/60/50
8/47/38
12/60/50
7/44/24
34 DIAGONAL 4 (Andrei, 2008) 500

103
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
2/6/5
35 EXTENDED WHITE & HOLST (Andrei, 2008) 500

103
57/257/143
282/3749/859
572/257/143
1231/1608/372
15/113/47
50/585/212
15/113/47
35/297/135
15/98/47
49/548/207
15/98/47
35/292/128
15/95/50
50/432/195
15/95/50
48/371/168
15/95/50
50/432/194
15/95/50
49/376/170
22/121/65
52/436/199
22/121/65
120/914/418
36 EXTENDED ROSENBROCK (Andrei, 2008) 103

104
68/530/182
260/2899/718
71/539/188
715/6694/1764
19/120/58
23/176/70
19/120/58
17/115/53
21/134/67
25/183/72
21/134/67
18/117/54
28/168/90
33/219/102
28/168/90
18/105/57
28/167/88
32/215/99
28/167/88
18/105/57
31/181/99
27/185/88
31/181/99
45/301/148
37 EXTENDED HIMMELBLAU (Andrei, 2008) 103

104
15/54/34
11/51/27
22/127/55
17/72/38
7/29/17
9/48/24
9/44/23
10/52/24
8/32/19
9/43/22
9/44/23
Fail
8/32/19
7/39/19
10/47/25
9/47/21
8/32/19
7/39/19
9/44/23
9/47/21
7/30/18
7/39/19
9/39/22
9/48/22
38 STRAIT (Mishra, 2005) 103

104
35/146/91
88/682/269
35/147/92
109/774/348
17/86/48
19/148/59
18/90/51
20/129/59
17/86/48
19/150/60
18/90/51
20/127/60
15/80/44
19/179/71
15/80/44
19/185/71
15/80/44
18/172/63
15/80/44
18/184/68
20/96/55
22/193/78
20/97/55
23/167/71
39 SHALLOW (Issam, 2005) 103

104
18/63/48
179/605/390
46/144/97
19/71/49
7/27/21
13/63/42
8/28/20
9/38/26
7/17/21
14/66/39
8/28/20
10/41/28
7/28/22
12/62/39
9/35/26
10/49/32
7/28/22
12/62/39
9/35/26
10/49/32
7/28/22
15/70/46
8/33/25
10/47/31
40 EXTENDED BEALE (Andrei, 2008) 103

104
75/242/159
80/254/170
88/285/187
86/272/182
10/48/31
10/42/29
9/41/26
10/42/29
13/67/41
11/45/31
9/41/24
11/45/31
11/52/33
12/52/39
10/48/31
12/52/39
14/72/46
12/52/39
9/43/26
12/52/39
14/62/43
13/53/40
6/30/18
13/53/40

According to Table 1, we show the performance of OPRP, OHS, HS, PRP, FR, and RMIL + methods in Figs. 1-3 relative to the number of iterations, number of function evaluations, and number of gradient evaluations respectively. We used the performance profile introduced by Dolan and More (Dolan and More, 2002) which provides solver efficiency, robustness, and probability of success. In Dolan and More performance profile, we plot the percentage P of the test questions where a method falls within the best t-factor. Obviously, in the performance profile table, the curved shape at the top of the method is the winner. Furthermore, plot correctness is a measure of the robustness of the method.

The performance based on NI.
Fig. 1
The performance based on NI.
The performance based on NF.
Fig. 2
The performance based on NF.
The performance based on NG.
Fig. 3
The performance based on NG.

Clearly, from Figs. 1-3, OPRP and OHS solve all test problems and therefore, reach 100 % percentage, whereas, FR, HS, PRP, and RMIL + solve about 94 %, 96 %, 90 %, and 99 % respectively. Furthermore, the left sides of all figures show that OPRP, OHS, PRP, and HS almost have the same highest probability of being the optimal solvers. In general, since the curves of both OPRP and OHS are above all other curves in most cases, then their performance is better than of the other methods.

5

5 Conclusions

In this paper, under the strong Wolfe line search, with 0 < σ < 1 4 μ , μ 1 , we established the sufficient descent property and global convergence of CG methods with their coefficient β k satisfying β k μ g k 2 d k - 1 2 , for all k 1 . At the same time, we have proposed new modified versions of both PRP and HS methods called OPRP and OHS respectively. To show the efficiency and robustness and to support the theoretical proofs which establish the sufficient descent property and the global convergence, numerical experiments based on comparing OPRP and OHS with HS, PRP, FR, and RMIL+ have been done. Based on Dolan and More performance profile, it has been found that the new modified versions perform better than the other methods.

Funding

This research is supported by the deanship of scientific research in Majmaah University under the project number R-2022-230.

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Mogtaba Mohammed - Majamaah University, Osman Yousif - University of Gezera, Mohammed Saleh - Qassim University, Murtada Elbashir - Jouf University.

References

  1. , , , , , , , . A Liu-Storey-type conjugate gradient method for unconstrained minimization problem with application in motion control. J. King Saud Univ. Sci.. 2022;34(4):101923
    [Google Scholar]
  2. , . Descent property and global convergence of the Fletcher-Reeves method with inexact line search. IMA J. Numer. Anal.. 1985;5:121-124.
    [Google Scholar]
  3. , . An unconstrained optimization test functions collection. Adv. Model. Pptimizat.. 2008;10(1):147-161.
    [Google Scholar]
  4. , . A nonmonotone conjugate gradient algorithm for unconstrained optimization. J. Syst. Sci. Complex.. 2002;15(2):139-145.
    [Google Scholar]
  5. , . Comments on a new class of nonlinear conjugate gradient coefficients with global convergence properties. Appl. Math. Comput. 2016;276:297-300.
    [Google Scholar]
  6. , , . A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim.. 2000;10:177-182.
    [Google Scholar]
  7. , , . Benchmarking optimization software with performance profile. Math. Prog.. 2002;91:201-213.
    [Google Scholar]
  8. , . Practical Method of Optimization. Unconstrained Optimization. 1987;Vol. 1
    [Google Scholar]
  9. , , . Function minimization by conjugate gradients. Comput. J.. 1964;7:149-154.
    [Google Scholar]
  10. , , . Global convergence properties of conjugate gradient methods for optimization. SIAM. J. Optim.. 1992;2:21-42.
    [Google Scholar]
  11. , , . A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM. J. Optim.. 2005;16(1):170-192.
    [Google Scholar]
  12. , , . Methods of conjugate gradients for solving linear systems. J. Research Nat. Bur. Standards.. 1952;49:409-436.
    [Google Scholar]
  13. , , . Global convergence result for conjugate gradient methods. J. Optim. Theory Appl.. 1991;71:399-405.
    [Google Scholar]
  14. , . Minimization of extended quadratic functions with inexact line searches. J. Korean Soc. Indust. Appl. Math.. 2005;9(1):55-61.
    [Google Scholar]
  15. , , . Efficient generalized conjugate gradient algorithms. Part 1: Theory. J. Optimiz. Theory Appl.. 1992;69:129-137.
    [Google Scholar]
  16. , . Some New Test Functions for Global Optimization and Performance of Repulsive Particle Swam Method. India: North-Eastern Hill University; .
  17. M. Molga and C. Smutnicki, Test Functions for Optimization Needs, www.zsd.ict.pwr.wroc.p1/files/docs/functions.pdf, 2005.
  18. , , . Note sur la convergence de directions conjuguée. Rev. Francaise Informat Recherche Operationelle, 3e Année. 1969;16:35-43.
    [Google Scholar]
  19. , . The conjugate gradient method in extremem problems. USSR Comp. Math. Math. Phys.. 1969;9:94-112.
    [Google Scholar]
  20. , . Nonconvex Minimization Calculations And The Conjugate Gradient Method In Lecture Notes In Mathematics, 1066. Berlin: Springer-Verlag; . p. :122-141.
  21. , . Convergence properties of algorithms for nonlinear optimization. SIAM Rev.. 1986;28:487-500.
    [Google Scholar]
  22. , , , , . A new class of nonlinear conjugate gradient coefficient with global convergence properties. Appl. Math. Comput.. 2012;218:11323-11332.
    [Google Scholar]
  23. , , , . Two efficient modifications of AZPRP conjugate gradient method with sufficient descent property. J. Inequal. Appl.. 2022;1:1-21.
    [Google Scholar]
  24. , , . Efficient hybrid conjugate gradient techniques. J. Optim. Theory Appl.. 1990;64:379-397.
    [Google Scholar]
  25. , , , . New nonlinear conjugate gradient formulas for large-scale unconstrained optimization problems. Appl. Math. Comput. 2006;179(2):407-430.
    [Google Scholar]
  26. , , , . The convergence properties of some new conjugate gradient methods. Appl. Math. Compu.. 2006;183(2):1341-1350.
    [Google Scholar]
  27. , . The convergence properties of RMIL+ conjugate gradient method under strong Wolfe line search. Appl. Math. Compu.. 2020;367:124777
    [Google Scholar]
  28. , , . A modified PRP conjugate gradient method. Ann. Oper. Res.. 2009;166:73-90.
    [Google Scholar]
  29. , , . Theory and Methods of Optimization. Beijing, China: Science Press of China; .
  30. , . An improved Wei-Yao-Liu nonlinear conjugate gradient method for optimization computation. Appl. Math. Comput.. 2009;215:2269-2274.
    [Google Scholar]
  31. , . A Class of Filled Functions irrelevant to the Number of Local Minimizers for Global Optimization. J. Syst. Sci. Math. Sci.. 2004;22(4):406-413.
    [Google Scholar]
  32. , . Nonlinear programming, computational methods. In: , ed. Integer and Nonlinear Programming. Amsterdam: North- Holland; . p. :37-86.
    [Google Scholar]
Show Sections