A new modified deflected subgradient method

Rachid Belgacem; Abdessamad Amir

doi:10.1016/j.jksus.2017.08.004

View/Download PDF

Buy Reprints

PDF

Translate this page into:

30 (

); 561-567

doi:

10.1016/j.jksus.2017.08.004

A new modified deflected subgradient method

Rachid Belgacem^{^a,⁎}, Abdessamad Amir^{^b}

a Université Hassiba Benbouali de Chlef, Chlef 02000, Algeria

b Universié Abdelhamid Ibn Badis de Mostaganem, Mostaganem 02700, Algeria

⁎Corresponding author. belgacemrachid02@yahoo.fr (Rachid Belgacem),

Received: 2017-4-19, Accepted: 2017-8-13, Published: 2017-10-01

Disclaimer:
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.

1 http://comopt.ifi.uniidelberg.de/software/TSPLIB95/.

Abstract

A new deflected subgradient algorithm is presented for computing a tighter lower bound of the dual problem. These bounds may be useful in nodes evaluation in a Branch and Bound algorithm to find the optimal solution of large-scale integer linear programming problems. The deflected direction search used in the present paper is a convex combination of the Modified Gradient Technique and the Average Direction Strategy. We identify the optimal convex combination parameter allowing the deflected subgradient vector direction to form a more acute angle with the best direction towards an optimal solution. The modified algorithm gives encouraging results for a selected symmetric travelling salesman problem (TSPs) instances taken from TSPLIB library.

Keywords

90C26

90C10

90C27

90C06

Integer linear programming

Subgradient method

Nonsmooth optimization

Travelling salesman problem

1 Introduction

In this paper we consider the following integer linear program:

(1)

(IP) \{\begin{matrix} z^{*} = \min ⊤ x \\ s.t. & A_{1} x ⩽ b_{1} \\ x \in X = \{x \in Z^{n} : A_{2} x ⩽ b_{2}\}, \end{matrix}

where x is an

n \times 1

vector,

Z^{n}

is the set of integers, c,

b_{1}

b_{2}

A_{1}

and

A_{2}

are

n \times 1

m \times 1

k \times 1

m \times n

and

k \times n

matrices, respectively. We assume that the problem

(IP)

is feasible and that X is a bounded and finite set. The problem

(IP)

is called the “primal problem” and

z^{*}

is called the “primal optimal value”. The constraints

A_{2} x ⩽ b_{2}

are generally called the easy constraints, in the sense that an integer linear program with only these constraints is easy to solve. Lagrangian duality (Bazaraa and Sherali, 1981) is the most computationally useful idea for solving hard integer programs. The Lagrangian dual problem is obtained via Lagrangian relaxation approach (Fisher, 1985), where the constraints

A_{1} x ⩽ b_{1}

, which are called the “complicated constraints”, are relaxed by introducing a multiplier vector

λ \in R_{+}^{m}

, called “Lagrangian multiplier”. The Lagrangian relaxation problem is formulated as follows:

(2)

(RP) \{\begin{matrix} w (λ) = \min c^{⊤} x + λ^{⊤} (A_{1} x - b_{1}) \\ s.t. x \in X, \end{matrix}

It is easy to prove that $w (λ) ⩽ z$ for all $λ ⩾ 0$ (weak duality (Bazaraa et al., 2006)). The best choice for $λ$ would be the optimal solution of the following problem, called the dual problem:

(3)

(D) \{\begin{matrix} w^{*} = \max w (λ) \\ λ ⩾ 0 . \end{matrix}

With some suitable assumptions, the dual optimal value $w^{*}$ is equal to $z^{*}$ (strong duality (Bazaraa et al., 2006)). In general, $w^{*}$ provides a tighter lower bound of $z^{*}$ . These bounds may be useful in nodes evaluation in exact methods such as Branch and Bound algorithm to find the optimal solution of $(IP)$ . The function $w (λ)$ is continuous and concave but non-smooth. The most widely adopted method for solving the dual problem is the subgradient optimization, see for instance Polyak (1967), Shor (1985), Nedic and Bertsekas (2010), Nesterov (2014) and Hu et al. (2015). The pure subgradient optimization method is an iterative procedure that can be used to solve the problem of maximizing (minimizing) a non-smooth concave (convex) function $w (λ)$ on a closed convex set $Ω$ . This procedure is summarized in Algorithm 1, and it is used in various fields in science and engineering (Sra et al., 2012).

Algorithm 1 (Based Subgradient Algorithm).

Choose an initial point $λ_{0} \in Ω$ .
Construct a sequence of points ${(λ^{k})}_{k}$ $\subset Ω$ which eventually converges to an optimal solution using the rule $λ^{k + 1} = P_{Ω} (λ^{k} + t_{k} s^{k})$ , where $P_{Ω} (.)$ is a projection operator on the set $Ω$ and $t_{k}$ is a positive scalar called step length such that
(4)
$t_{k} = δ_{k} \frac{w^{*} - w^{k}}{∥ s^{k} ∥^{2}},$ where $w^{k} = w (λ^{k})$ is the dual function at the current iteration, $δ_{k} \in] 0, 2 [$ and $s^{k}$ is a subgradient of the function w at $λ^{k}$ .
Replace k by $k + 1$ and repeat the process until some stopping criteria.

In the context of Lagrangian relaxation, computing the subgradient direction $s^{k}$ and the projection $P_{Ω} (λ^{k} + t_{k} s^{k}) (Ω = R_{+}^{m})$ is a relatively easy problem. Since the subgradient $s^{k}$ is not necessarily a descent direction, the step-length rule (4) differs from those given in the area of descent methods. In fact, this choice assures the decreasing of the subsequence ${(‖λ^{k} - λ^{*}‖)}_{k}$ as well as the convergence of ${(λ^{k})}_{k}$ to $λ^{*}$ . However, it is impossible to know in advance the value of $w^{*}$ for most problems. To this end, the most effective way is to use the variable target value methods developed in Kim et al. (1990), Fumero (2001) and Sherali et al. (2000).

Another challenge in subgradient optimization is the choice of direction search that affects the computational performance of the algorithm. It is known that choosing the subgradient direction $s^{k}$ , leads to the zigzagging phenomenon that might cause slow the procedure to crawl towards optimality (Bazaraa et al., 2006). To overcome this situation, in the spirit of conjugate gradient method (Nocedal and Wright, 2006; Fletcher and Reeves, 1964), we can adopt a direction search that deflects the subgradient pure direction. Accordingly, the direction search $d^{k}$ at $λ^{k}$ is computed as:

(5)

d^{k} = s^{k} + Ψ_{k} d^{k - 1},

where

Ψ_{k} ⩾ 0

is a deflection parameter,

s^{k}

is a subgradient of the function w at

λ^{k}

and

d^{k - 1}

is the previous direction

(d^{0} = 0)

. Then, the new iteration is computed as:

(6)

λ^{k + 1} = P_{Ω} (λ^{k} + t_{k} d^{k}) .

Some promising deflection algorithms of this type are the Modified Gradient Technique (MGT) (Camerini et al., 1975) and the Average Direction Strategy (ADS) (Sherali and Ulular, 1989). The MGT method was found to be superior to the pure subgradient method when used in concert with a specially designed step-length selection rule. The deflection parameter $Ψ_{k}^{MGT}$ is computed according to:

(7)

Ψ_{k}^{MGT} = \{\begin{matrix} - η_{k} \frac{s^{k} d^{k - 1}}{{‖d^{k - 1}‖}^{2}} & if s^{k} d^{k - 1} < 0, \\ 0 & otherwise, \end{matrix}

where

0 < η_{k} ⩽ 2

. With this choice of the deflection parameter, the direction becomes:

(8)

d_{MGT}^{k} = s^{k} + Ψ_{k}^{MTG} d^{k - 1} .

The ADS strategy recommends to make the deflection at each iteration point by choosing the direction search which simply bisects the angle between the current subgradient $s^{k}$ and the previous direction search $d^{k - 1}$ . To get this direction, the deflection parameter is computed according to:

(9)

Ψ_{k}^{ADS} = \frac{∥ s^{k} ∥}{∥ d^{k - 1} ∥} .

With this choice of the deflection parameter, the direction becomes:

(10)

d_{ADS}^{k} = s^{k} + Ψ_{k}^{ADS} d^{k - 1} .

Nowadays, the deflected subgradient method remains an important tool for nonsmooth optimization problems, especially for linear integer programming, due to its simple formulation and low storage requirement. In this paper, we present a new deflected direction search as a convex combination of the direction $d_{MGT}^{k}$ (8) and the direction $d_{ADS}^{k}$ (10). Our main result is the identification of the convex combination parameter which forces the algorithm to have a better deflection search than those given in the pure subgradient, MGT and ADS. For a numerical comparison of our approach and the two concurrent techniques MGT and ADS, we opted for the Travelling Salesman Problem (TSP) where its importance comes from the richness of its application and the fact that it is a typical of other problems of combinatorial optimization (Diaby and Karwan, 2016; El-Sherbeny, 2010).

The remainder of the paper is organized as follows: in Section 2, we describe our deflected subgradient method with convergence analysis. The computational tests, conducted on the Lagrangian relaxation of TSP of different sizes are described in Section 3. In Section 4 we conclude the paper.

2 A new modified deflected subgradient method

In this section, we present a new modified deflected subgradient method $(NMDS)$ which determines the direction search as follows:

(11)

d^{k} = (1 - α_{k}) d_{MGT}^{k} + α_{k} d_{ADS}^{k}, α_{k} \in (0, 1) .

We then obtain the following deflection parameter:

(12)

Ψ_{k} = \{\begin{matrix} \frac{- η_{k} (1 - α_{k}) s^{k} d^{k - 1} + α_{k} ‖s^{k}‖ ‖d^{k - 1}‖}{{‖d^{k - 1}‖}^{2}} & if s^{k} d^{k - 1} < 0, \\ 0 & otherwise, \end{matrix}

hence

d^{k} = s^{k} + Ψ_{k} d^{k - 1}

Algorithm 2 (The Deflected Subgradient Algorithm).

(Initialization): Choose a starting point $λ^{0} \in Ω = R_{+}^{m}$ , let $d^{0} = 0$ and $k = 0$ .
Determine a subgradient $s^{k} \in \partial w (λ^{k})$ and compute $d^{k} = s^{k} + Ψ_{k} d^{k - 1},$ $λ^{k + 1} = P_{Ω} (λ^{k} + t_{k} d^{k}),$ where $Ψ_{k}$ is given by relation (12) and $t_{k}$ will be specified later.
Replace k by $k + 1$ if a stopping condition is not yet met and return to step 2.

Consider the deflected subgradient method algorithm given in Algorithm 2. The following proposition extends important properties of the subgradient vector $s^{k}$ and the deflected subgradient direction $d_{MGT}^{k}$ to the new deflected subgradient direction $d^{k}$ (11). With a best choice of the parameter $t_{k}, d^{k}$ make an acute angle with $λ^{*} - λ^{k}$ and $d^{k - 1}$ . We also get the decreasing of the subsequence ${(∥ λ^{*} - λ^{k} ∥)}_{k}$ .

Proposition 1

Let $s^{k} \in \partial w (λ^{k})$ , $d^{k}$ be the new deflected subgradient direction given by (11) and (12) and let ${λ^{k}}$ be the sequence of iterations generated by the deflected subgradient scheme. If we take $0 < η_{k} ⩽ 2$ and the stepsize $t_{k}$ to satisfy

(13)

0 < t_{k} < \frac{w^{*} - w (λ^{k})}{{‖d^{k}‖}^{2}}, \forall k = 0, 1, 2, \dots

then,

(14)
$d^{k - 1} (λ^{*} - λ^{k}) > 0,$
(15)
$‖λ^{k + 1} - λ^{*}‖ < ‖λ^{k} - λ^{*}‖,$
$d^{k} d^{k - 1} ⩾ 0 .$

for all k where

λ^{k}

are non optimal points and

λ^{*}

is an optimal solution.

Proof

The proof is established by induction on k. Since we start with $d^{0} = 0$ , the case $k = 1$ is trivial. Now, assume that we have
(16)
$d^{k - 2} (λ^{*} - λ^{k - 1}) ⩾ 0, \forall k ⩾ 2,$ and let us establish (14) at iteration k. Using the definition of $d^{k - 1}$ , we obtain that $\begin{matrix} d^{k - 1} (λ^{*} - λ^{k}) = & d^{k - 1} (λ^{*} - λ^{k - 1} + λ^{k - 1} - λ^{k}) \\ = & d^{k - 1} (λ^{*} - λ^{k - 1}) + d^{k - 1} (λ^{k - 1} - λ^{k}) \\ = & (s^{k - 1} + Ψ_{k - 1} d^{k - 2}) (λ^{*} - λ^{k - 1}) + d^{k - 1} (λ^{k - 1} - λ^{k}) \\ = & s^{k - 1} (λ^{*} - λ^{k - 1}) + Ψ_{k - 1} d^{k - 2} (λ^{*} - λ^{k - 1}) + d^{k - 1} (λ^{k - 1} - λ^{k}) . \end{matrix}$

Furthermore, from the concavity of the function $w (\cdot)$ , the induction hypothesis (16) and the inequalities in (13), we get
(17)
$d^{k - 1} (λ^{*} - λ^{k}) ⩾ (w^{*} - w (λ^{k - 1})) - d^{k - 1} (λ^{k} - λ^{k - 1})$

Since the vector $λ^{k} - P_{Ω} (λ^{k - 1} + t_{k - 1} d^{k - 1})$ is perpendicular to the supporting hyperplane of $Ω = R_{+}^{m}$ at $λ^{k}$ , the angle at $λ^{k}$ is obtuse (see Fig. 1). We deduce that $d^{k - 1} (λ^{k} - (λ^{k - 1} + t_{k - 1} d^{k - 1})) ⩽ 0,$ which is equivalent to
(18)
$- d^{k - 1} (λ^{k} - λ^{k - 1}) ⩾ - t_{k - 1} ∥ d^{k - 1} ∥^{2}$

Substituting (18) in (17) we obtain
(19)
$d^{k - 1} (λ^{*} - λ^{k}) ⩾ (w^{*} - w (λ^{k - 1})) - t_{k - 1} ∥ d^{k - 1} ∥^{2} > 0 .$
We have $\begin{matrix} {‖λ^{*} - λ^{k + 1}‖}^{2} = & {‖λ^{*} - P_{Ω} (λ^{k} + t_{k} d^{k})‖}^{2} \\ ⩽ & {‖λ^{*} - λ^{k} - t_{k} d^{k}‖}^{2} \\ = & {‖λ^{*} - λ^{k}‖}^{2} + t_{k}^{2} {‖d^{k}‖}^{2} - 2 t_{k} d^{k} (λ^{*} - λ^{k}) \\ = & {‖λ^{*} - λ^{k}‖}^{2} + t_{k} [t_{k} {‖d^{k}‖}^{2} - 2 d^{k} (λ^{*} - λ^{k})] . \end{matrix}$

From the concavity of w, the inequalities in (13) and by applying (14) in Proposition 1, we get the following relations, respectively: $\begin{matrix} t_{k} {‖d^{k}‖}^{2} ⩽ & w^{*} - w (λ^{k}) ⩽ 2 (w^{*} - w (λ^{k})) \\ ⩽ & 2 s^{k} (λ^{*} - λ^{k}) \\ ⩽ & 2 d_{MGT}^{k} (λ^{*} - λ^{k}) \\ ⩽ & 2 d^{k} (λ^{*} - λ^{k}) . \end{matrix}$

It follows, that $t_{k} {‖d^{k}‖}^{2} - 2 d^{k} (λ^{*} - λ^{k}) ⩽ 0 .$

Consequently, $‖λ^{*} - λ^{k + 1}‖ < ‖λ^{*} - λ^{k}‖ .$
If $s^{k} d^{k - 1} ⩾ 0$ then $d^{k} = s^{k}$ and hence the claim follows. Thus, consider the case $s^{k} d^{k - 1} < 0$ . we have then $\begin{matrix} d^{k} d^{k - 1} = & (s^{k} + Ψ_{k} d^{k - 1}) d^{k - 1} \\ = & s^{k} d^{k - 1} + Ψ_{k} {‖d^{k - 1}‖}^{2} \\ = & s^{k} d^{k - 1} - η_{k} (1 - α_{k}) s^{k} d^{k - 1} + α_{k} ‖s^{k}‖ ‖d^{k - 1}‖ \\ = & (- α_{k} + η_{k} α_{k} (1 - α_{k}) + α_{k}) ‖s^{k}‖ ‖d^{k - 1}‖ \\ = & η_{k} α_{k} (1 - α_{k}) ‖s^{k}‖ ‖d^{k - 1}‖ \\ ⩾ & 0 . \end{matrix}$

This completes the prove. □

The importance of Proposition 1 lies on the fact that choosing the deflection parameter $Ψ_{k}$ using the rule (12) with $0 < η_{k} ⩽ 2$ forces the current deflected subgradient direction to form always an acute angle with the previous step direction and hence, this method eliminates the zigzagging of the pure subgradient procedure. Note that the choice of the vector of deflected direction $d_{MGT}^{k}$ is always at least as good as the direction of the subgradient vector $s^{k}$ . If $1 ⩽ η_{k} ⩽ 2$ , then $d_{MGT}^{k} d_{MGT}^{k - 1} ⩾ 0$ (Camerini et al., 1975).

The theorem below shows that with a particular choice of the convex combination parameter $α_{k}$ and the parameter $η_{k}$ , the deflected subgradient vector direction $d^{k}$ is always at least as good as the direction $d_{MGT}^{k}$ in a sense that $d^{k}$ can form a more acute angle with the best direction towards an optimal solution than $d_{MGT}^{k}$ does (see Fig. 2), which enhances the speed of convergence. The two lemmas below are necessary for the proof of our principal result.

Lemma 1

Let $s^{k} \in \partial w (λ^{k})$ and set $α_{k} = - \cos (s^{k}, d^{k - 1})$ if $s^{k} d^{k - 1} < 0$ . With the assumption in (13) and letting

(20)

0 < η_{k} ⩽ \frac{1}{2 - α_{k}},

then

(21)

d^{k} (λ^{*} - λ^{k}) ⩾ d_{MGT}^{k} (λ^{*} - λ^{k}) forallk .

Proof

Using (8), (10) and (11) we obtain the following relation: $\begin{matrix} d^{k} (λ^{*} - λ^{k}) - d_{MGT}^{k} (λ^{*} - λ^{k}) = & α_{k} d_{MGT}^{k} (λ^{*} - λ^{k}) + (1 - α_{k}) d_{ADS}^{k} (λ^{*} - λ^{k}) - d_{MGT}^{k} (λ^{*} - λ^{k}) \\ = & (1 - α_{k}) [d_{ADS}^{k} (λ^{*} - λ^{k}) - d_{MGT}^{k} (λ^{*} - λ^{k})] \\ = & (1 - α_{k}) [(s^{k} + Ψ_{k}^{ADS} d^{k - 1}) (λ^{*} - λ^{k}) - (s^{k} + Ψ_{k}^{MGT} d^{k - 1}) (λ^{*} - λ^{k})] \\ = & (1 - α_{k}) (Ψ_{k}^{ADS} - Ψ_{k}^{MGT}) d^{k - 1} (λ^{*} - λ^{k}) . \end{matrix}$

From (7) and (9) it follows that:

(22)

Ψ_{k}^{ADS} - Ψ_{k}^{MGT} = \frac{‖s^{k}‖ ‖d^{k - 1}‖}{{‖d^{k - 1}‖}^{2}} [1 + η_{k} \cos (s^{k}, d^{k - 1})] .

Using the last equality and applying Proposition 1 we get (21). □

Lemma 2

Under the same hypothesis of Lemma 1, we have

(23)

‖d^{k}‖ ⩽ ‖d_{MGT}^{k}‖ for all k .

Proof

If $s^{k} d^{k - 1} ⩾ 0$ then $Ψ_{k} = 0$ and hence (23) obviously holds and one simply has $d^{k} = d_{MGT}^{k}$ . In the case where $s^{k} d^{k - 1} < 0$ , then: $\begin{matrix} {‖d^{k}‖}^{2} - {‖d_{MGT}^{k}‖}^{2} = & {‖s^{k} + Ψ_{k} d^{k - 1}‖}^{2} - {‖s^{k} + Ψ_{k}^{MGT} d^{k - 1}‖}^{2} \\ = & (Ψ_{k}^{2} - {(Ψ_{k}^{MGT})}^{2}) {‖d^{k - 1}‖}^{2} + 2 (Ψ_{k} - Ψ_{k}^{MGT}) s^{k} d^{k - 1} \\ = & (Ψ_{k} - Ψ_{k}^{MGT}) [(Ψ_{k} + Ψ_{k}^{MGT}) {‖d^{k - 1}‖}^{2} + 2 s^{k} d^{k - 1}] . \end{matrix}$

Since $s^{k} d^{k - 1} = ‖s^{k}‖ ‖d^{k - 1}‖ \cos (s^{k}, d^{k - 1})$ one finds: ${‖d^{k}‖}^{2} - {‖d_{MGT}^{k}‖}^{2} = α_{k}^{2} {‖s^{k}‖}^{2} (- η_{k} α_{k} + 1) [η_{k} (2 - α_{k}) - 1] .$

By the choice of $η_{k}$ such that we obtain $‖d^{k}‖ ⩽ ‖d_{MGT}^{k}‖ . □$

Theorem 1

Under the same hypothesis of Lemma 1, we have

(i)
(24)
$\frac{d^{k} (λ^{*} - λ^{k})}{‖d^{k}‖} ⩾ \frac{d_{MGT}^{k} (λ^{*} - λ^{k})}{‖d_{MGT}^{k}‖} .$
(ii) If the vectors $d^{k}$ and $d_{MGT}^{k}$ form an angle $θ_{d^{k}}$ and $θ_{d_{MGT}^{k}}$ , respectively, with the vector $λ^{*} - λ^{k}$ , then $0 ⩽ θ_{d^{k}} ⩽ θ_{d_{MGT}^{k}} ⩽ 90 ° .$

Proof

Direct consequence of the previous lemmas. □

Case where s k is deflected since it has formed an obtuse angle with d k - 1 and the direction d NMDS k is better as compared to other directions d ADS k and d MGT k .

3 Computational results

The proposed algorithm has been applied to one of the standard integer linear programming problems in the field of operational research, namely the symmetric travelling salesman problem (TSPs). The travelling salesman problem is a classical NP-Hard combinatorial optimization problem (Garey and Johnson, 1990). It can be formulated as follows: giving a set of cities, and distances between them, the goal is to find the shortest tour visiting every city only once and returning to the starting city. More details on this problem may be found in Lawler et al., 1985. The TSPs can be stated as follows (where $c_{ij}$ is the cost of link $(i, j)$ ):

(25)

\min \sum_{i = 1}^{m} \sum_{\binom{j = 1}{j \neq i}}^{m} c_{ij} x_{ij},

subject to

(26)

\sum_{j = 1}^{m} x_{ij} = 1, i = 1, \dots, m,

(27)

\sum_{i = 1}^{m} x_{ij} = 1, j = 1, \dots, m,

(28)

\sum_{i \in Q}^{m} \sum_{j \in Q}^{m} x_{ij} ⩽ |Q| - 1, \forall Q : 2 ⩽ |Q| ⩽ m - 2,

(29)

x_{ij} = 0 or 1, i, j = 1, \dots, m,

where Q

\subset \{1, \dots, m\}

. Letting X be the set of all 1-trees (Held and Karp, 1970), the subtour constrains (28) can be eliminated by insisting that a vector x satisfying the constraints (26), (27) and (29) must also belong to X. In particular for the symmetric case, constraints (26) and (27) can be replaced by (30) leading to the following equivalent formulation of the TSPs:

\min \sum_{i = 1}^{m} \sum_{j = 1 j \neq i}^{m} c_{ij} x_{ij},

subject to

(30)

\sum_{\binom{j = 1}{j \neq i}}^{m} x_{ij} + \sum_{\binom{j = 1}{j \neq i}}^{m} x_{ji} = 2 for i = 1, \dots, m,

x \in X .

From this, one obtains the following dual function, which has to be maximized:

(31)

w (λ) = \min \{\sum_{i = 1}^{m} \sum_{\binom{j = 1}{j \neq i}}^{m} (c_{ij} + λ_{i} + λ_{j}) x_{ij}, x \in X\} - 2 \sum_{i = 1}^{m} λ_{i},

where

λ \in R^{m}

is the vector of Lagrangian multipliers.

Given a vector $\overline{λ}$ , if $\overline{x}$ optimizes $w (\overline{λ})$ , then a vector $\overline{s}$ whose $i th$ component

(32)

\overline{s_{i}} = (\sum_{\binom{j = 1}{j \neq i}}^{m} x_{ij} + \sum_{\binom{j = 1}{j \neq i}}^{m} x_{ji} - 2)

is a subgradient of

w (λ)

\overline{λ}

(Bazaraa et al., 2006; Held et al., 1974).

To validate the feasibility and effectiveness of the proposed approach, we have applied it on some TSPs instances taken from TSPLIB.¹ The proposed algorithm, MGT and ADS were implemented in Matlab and executed on an Intel(R) Core(TM) i517U CPU @ 1.70 GHz 1.70 GHz RAM 4.00GO.

For all symmetric instances and for a fair comparison between the three algorithms, the following parameter settings were chosen:

The same initial multiplier $λ^{1} = {(0, 0, \dots, 0)}^{T}$ was used for the three algorithms.
The stop conditions are the maximum number of iteration $iterMAX = 1000$ , or $∣ w^{*} - w (λ^{k}) ∣ ⩽ ε$ , where $ε$ is a small tolerance $(ε = 10^{- 2})$ .
The step size $t_{k}$ is defined according to formula (4).
The parameter $δ_{k}$ follows the Held et al. (1974) suggestion, that makes $0 < δ_{k} ⩽ 2$ , beginning with $δ_{k} = 2$ . If after 20iterations $w (λ^{k})$ not increases, $δ_{k}$ is updated to $δ_{k} = \frac{δ_{k}}{2}$ .
For MGT algorithm, as mentioned in Camerini et al. (1975), the use of $η_{k} = 1.5$ is recommended and its intuitive justification together with computational results are also given, which indicates that in practise, the performance of MGT strategy is superior to that of the pure subgradient algorithm.
For our algorithm the value of $η_{k}$ depends on the optimal convex combination parameter $α_{k}$ as indicated in Lemma 1, where $α_{k} = - \cos (s^{k}, d^{k - 1})$ if $s^{k} d^{k - 1} < 0$ . We used $η_{k} = \frac{1}{2 - α_{k}} - ε$ , where $ε$ is an arbitrary small value.

Table 1 shows the experimental results obtained by: MGT strategy, ADS strategy and by applying our NMDS algorithm proposed in this paper with 11 symmetric benchmark instances between n = 6 and n = 101 vertices taken from TSPLIB. For the three strategies, the duality GAP for these 11 examples is null. However, always NMDS algorithm outperforms the others in number of iterations and execution time. Table 2 gives the computational results for 19 symmetric benchmark instances between 131 and 3056 vertices. This table also shows that our algorithm gives near optimal results for several instances. The column headers are as follows:

Name: Indicate the instance name.
n: Indicate the problem size.
$w^{*}$ : The best known optimal solution.
LB: The best value (lower bound) obtained by each strategy.
Iter: Number of iterations at which the best value LBis obtained (limited to 1000).
$GAP = \frac{w^{*} - LB}{w^{*}}$ .
CPU: Total computer time, in second for calculating the best value LBobtained by each strategy.

Table 1 Computational results for

6 ⩽ n ⩽ 101

Name	n	$w^{*}$	Strategy
			Our method				ADS strategy				MGT strategy
			LB	GAP	Iter	CPU	LB	GAP	Iter	CPU	LB	GAP	Iter	CPU
tsp6	6	207	207	0	2	0.027862 s	207	0	2	0.032690 s	207	0	2	0.028533 s
tsp7	7	106.4	106.4	0	3	0.028651 s	106.4	0	3	0.028819 s	106.4	0	3	0.029059 s
tsp8	8	100	100	0	3	0.032122 s	100	0	3	0.032360 s	100	0	3	0.033069 s
tsp10	10	378	10	0	3	0.077130 s	378	0	18	0.085784 s	378	0	15	0.099755 s
ulysses16	16	68.59	68.59	0	16	0.077255 s	68.59	0	30	0.099056 s	68.59	0	26	0.090043 s
gr21	21	2707	2707	0	19	0.077098 s	2707	0	26	0.086654 s	2707	0	22	0.080609 s
ulysses22	22	70.13	70.13	0	21	0.219532 s	70.13	0	70	0.467253 s	70.13	0	28	0.253197 s
tsp33	33	10861	10861	0	18	0.27955 s	10861	0	19	0.303524 s	10861	0	22	0.304997 s
eil76	76	538	538	0	12	0.728258 s	538	0	23	1.285156 s	538	0	18	1.071510 s
rat99	99	1211	1211	0	19	1.817552 s	1211	0	20	1.902072 s	1211	0	34	3.111573 s
eil101	101	629	629	0	13	1.253360 s	629	0	16	1.523116 s	629	0	15	1.422569 s

Table 2 Computational results for

131 ⩽ n ⩽ 3056

Name	n	$w^{*}$	Strategy
			Our method				ADS strategy				MGT strategy
			LB	GAP	Iter	CPU	LB	GAP	Iter	CPU	LB	GAP	Iter	CPU
xq131	131	564	555.96	0.0159	350	38.134934 s	555.52	0.0167	350	39.009219 s	555.64	0.0165	350	42.417376 s
ch150	150	6528	6498.3	0.0045	193	57.562539 s	6463.2	0.0099	200	58.301731 s	6490.4	0.0057	202	59.548258 s
tsp237	237	1019	1004.8	0.0139	190	76.942993 s	1002.4	0.0162	200	79.046667 s	1003.9	0.0148	197	77.288710 s
a280	280	2579	2569.8	0.0035	251	128.401733 s	2568.3	0.0041	251	134.245723 s	2569.8	0.0035	251	129.376815
linhp318	318	41345	41345	0	188	135.792252 s	41330	0.0003	200	148.125054 s	41337	0.0001	195	140.734334 s
pbk411	411	1343	1443	0	250	328.562376 s	1338.3	0.0034	250	376.626413 s	1338.6	0.0032	250	235.626147 s
pbn423	423	1365	1365	0	251	344.519801 s	1360.3	0.0034	251	349.933052 s	1361.5	0.0025	251	370.504702 s
pbm436	436	1443	1423.9	0.0132	207	241.663478 s	1422.3	0.0143	251	334.741944 s	1421.4	0.0149	207	255.055079 s
rat575	575	6773	6721.2	0.0076	350	1287.476161 s	6716	0.0084	350	1266.572875 s	6718.9	0.0079	350	1138.613357 s
rbx711	711	3115	3099.6	0.0049	500	1648.379114 s	3094.3	0.0066	500	1668.377875 s	3096.2	0.0060	500	1701.099452 s
rat783	783	8806	8652.2	0.0197	121	496.527482 s	8609.5	0.0223	180	744.089813 s	8642.8	0.0185	131	592.200505 s
dkg813	813	3199	3164.7	0.0107	200	126.290411 s	3147.5	0.0203	200	129.330427 s	3154.7	0.0138	200	138.698584 s
pbd984	984	2797	2797	0	500	2216.458302,s	2772	0.0089	500	4059.009584 s	2774.3	0.0081	500	3394.524016,s
xit1083	1083	3558	3551.1	0.0019	550	3394.524016 s	3525.1	0.0092	600	3394.524016 s	3549.6	0.0023	600	3726.7668 s
dka1376	1376	4666	4666	0	500	7321.695698 s	4659.6	0.0013	500	14632.080221 s	4662	0.0008	500	14704.797585 s
dja1436	1436	5257	5194.2	0.0437	500	10959.322745 s	5189.4	0.0128	500	12014.302471 s	5184.4	0.0138	500	11486.802618 s
dcc1911	1911	6396	6367.9	0.0043	500	10602.013548 s	6338.3	0.0097	500	11742.144423 s	6357.2	0.0060	500	11127.0759858 s
djb2103	2103	6197	6197	0	430	13566.845650 s	6133.8	0.0101	500	14954.211430 s	6169.3	0.0044	500	15955.063763 s
pia3056	3056	8285	8285	0	500	51124.513213 s	8192	0.0112	500	59123.013454 s	8228.2	0.0068	500	58072.083303 s

4 Conclusion

By identifying the optimal convex combination parameter, a new deflected direction is given as convex combination of the deflected direction of MGT and ADS. This direction, at each iteration reduces the zigzagging phenomenon and hence getting closer and faster to the optimal solution. The analysis studies are consistent with the numerical experiments. Moreover, this method can be used to improve convergence in the area of deflected subgradient method using augmented Lagrangian duality (Burachik and Kaya, 2010) and dual subgradient methods (Gustavsson et al., 2015). One can also follow Lim and Sherali (2006) and combine this method with a variable target technique in order to have a good performance. Finally, the subgradient method is usually used as subroutine in exact, heuristic and metaheuristic optimization, which justifies the large spectrum of applications of our approach.

Acknowledgements

The authors wish to express their sincere thanks to the editor and the reviewers for their constructive comments, which are very useful for improving this paper.

References

Bazaraa M.S., Sherali H.D., . On the choice of step sizes in subgradient optimization. Eur. J. Oper. Res.. 1981;7:380-388.
[Google Scholar]
Bazaraa, M.S., Sherali, H.D., Shetty, C.M., 2006. Non Linear Programming: Theory and Algorithms. Wiley-Interscience Series in Discrete Mathematics and Optimization.
Burachik R.S., Kaya C.Y., . A deflected subgradient method using a general augmented Lagrangian duality with implications on penalty methods. In: Burachik R., Yao J.C., eds. Variational Analysis and Generalized Differentiation in Optimization and Control. Springer Optimization and Its Applications. Vol vol. 47. New York, NY: Springer; 2010.
[Google Scholar]
Camerini P.M., Fratta L., Maffioli F., . On improving relaxation methods by modified gradient techniques. In: Balinski M.L., Wolfe P., eds. Nondifferentiable Optimization. Math. Program. Studies. Vol vol. 3. Berlin, Heidelberg: Springer; 1975.
[Google Scholar]
Diaby, M., Karwan, M.H., 2016. Advances in combinatorial optimization: linear programming formulation of the travelling salesman and other hard combinatorial optimization problems.
El-Sherbeny N.A., . Vehicle routing with time windows: An overview of exact, heuristic and metaheuristic methods. J. King Saud Univ. Sci.. 2010;22:123-131.
[Google Scholar]
Fletcher R., Reeves C.M., . Function minimization by conjugate gradients. Comput. J.. 1964;7:149-154.
[Google Scholar]
Fisher M.L., . An application oriented guide to Lagrangian relaxation. Interfaces. 1985;15:10-21.
[Google Scholar]
Fumero F., . A modified subgradient algorithm for Lagrangian relaxation. Comput. Oper. Res.. 2001;28:33-52.
[Google Scholar]
Garey M.R., Johnson D.S., . Computers and Intractability: A Guide to the Theory of NP Completeness. New York, NY, USA: W. H. Freeman & Co.; 1990.
Gustavsson E., Patriksson M., Strömberg, . Primal convergence from dual subgradient methods for convex optimization. Math. Program.. 2015;150:365-390.
[Google Scholar]
Held M., Karp R.M., . The travelling salesman problem and minimum spanning trees. Oper. Res.. 1970;18:1138-1162.
[Google Scholar]
Held M.H., Wolfe P., Crowder H.D., . Validation of subgradient optimization. Math. Program.. 1974;6:62-88.
[Google Scholar]
Hu Y., Yang X., Sim C.-K., . Inexact subgradient methods for quasi-convex optimization problems. Eur. J. Oper. Res.. 2015;240:315-327.
[Google Scholar]
Kim S., Ahn H., Cho S.-C., . Variable target value subgradient method. Math. Program.. 1990;49:359-369.
[Google Scholar]
Lawler E.L., Lenstra Jan Karel, Rinnooy Kan A.H.G., Shmoys D.B., . The Travelling Salesman Problem: A Guided Tour of Combinatorial Optimization. John Wiley & Sons Inc; 1985.
Lim C., Sherali H.D., . Convergence and computational analyses for Some variable target value and subgradient deflection methods. Comput. Optim. Appl.. 2006;34:409-428.
[Google Scholar]
Nedic A., Bertsekas D.P., . The effect of deterministic noise in subgradient methods. Math. Program.. 2010;125:75-99.
[Google Scholar]
Nesterov Y., . Subgradient methods for huge-scale optimizations problems. Math. Program.. 2014;146:275-297.
[Google Scholar]
Nocedal, J., Wright, S.J., 2006. Numerical Optimization. Springer Series in Operations Research.
Polyak B.T., . A general method of solving extremum problems. Sov. Math. Dokl.. 1967;8:593-597.
[Google Scholar]
Sherali H.D., Ulular O., . A primal-dual conjugate subgradient algorithm for specially structured linear and convex programming problems. Appl. Math. Optim.. 1989;20:193-221.
[Google Scholar]
Sherali H.D., Choi G., Tuncbilek C.H., . A Variable target value method for nondifferentiable optimization. Oper. Res. Lett.. 2000;26:1-8.
[Google Scholar]
Shor N.Z., . Minimization Methods for Non-Differentiable Functions. Berlin: Springer; 1985.
Sra S., Nowozin S., Wright S.J., . Optimization For Machine Learning. MIT Press; 2012.

Introduction

A new modified deflected subgradient method

Computational results

Conclusion

Fulltext Views
141

PDF downloads
172

View/Download PDF
Download Citations

BibTeX
RIS

Show Sections

[1] Bazaraa M.S., Sherali H.D., . On the choice of step sizes in subgradient optimization. Eur. J. Oper. Res.. 1981;7:380-388.
[Google Scholar]

[2] Bazaraa, M.S., Sherali, H.D., Shetty, C.M., 2006. Non Linear Programming: Theory and Algorithms. Wiley-Interscience Series in Discrete Mathematics and Optimization.

[3] Burachik R.S., Kaya C.Y., . A deflected subgradient method using a general augmented Lagrangian duality with implications on penalty methods. In: Burachik R., Yao J.C., eds. Variational Analysis and Generalized Differentiation in Optimization and Control. Springer Optimization and Its Applications. Vol vol. 47. New York, NY: Springer; 2010.
[Google Scholar]

[4] Camerini P.M., Fratta L., Maffioli F., . On improving relaxation methods by modified gradient techniques. In: Balinski M.L., Wolfe P., eds. Nondifferentiable Optimization. Math. Program. Studies. Vol vol. 3. Berlin, Heidelberg: Springer; 1975.
[Google Scholar]

[5] Diaby, M., Karwan, M.H., 2016. Advances in combinatorial optimization: linear programming formulation of the travelling salesman and other hard combinatorial optimization problems.

[6] El-Sherbeny N.A., . Vehicle routing with time windows: An overview of exact, heuristic and metaheuristic methods. J. King Saud Univ. Sci.. 2010;22:123-131.
[Google Scholar]

[7] Fletcher R., Reeves C.M., . Function minimization by conjugate gradients. Comput. J.. 1964;7:149-154.
[Google Scholar]

[8] Fisher M.L., . An application oriented guide to Lagrangian relaxation. Interfaces. 1985;15:10-21.
[Google Scholar]

[9] Fumero F., . A modified subgradient algorithm for Lagrangian relaxation. Comput. Oper. Res.. 2001;28:33-52.
[Google Scholar]

[10] Garey M.R., Johnson D.S., . Computers and Intractability: A Guide to the Theory of NP Completeness. New York, NY, USA: W. H. Freeman & Co.; 1990.

[11] Gustavsson E., Patriksson M., Strömberg, . Primal convergence from dual subgradient methods for convex optimization. Math. Program.. 2015;150:365-390.
[Google Scholar]

[12] Held M., Karp R.M., . The travelling salesman problem and minimum spanning trees. Oper. Res.. 1970;18:1138-1162.
[Google Scholar]

[13] Held M.H., Wolfe P., Crowder H.D., . Validation of subgradient optimization. Math. Program.. 1974;6:62-88.
[Google Scholar]

[14] Hu Y., Yang X., Sim C.-K., . Inexact subgradient methods for quasi-convex optimization problems. Eur. J. Oper. Res.. 2015;240:315-327.
[Google Scholar]

[15] Kim S., Ahn H., Cho S.-C., . Variable target value subgradient method. Math. Program.. 1990;49:359-369.
[Google Scholar]

[16] Lawler E.L., Lenstra Jan Karel, Rinnooy Kan A.H.G., Shmoys D.B., . The Travelling Salesman Problem: A Guided Tour of Combinatorial Optimization. John Wiley & Sons Inc; 1985.

[17] Lim C., Sherali H.D., . Convergence and computational analyses for Some variable target value and subgradient deflection methods. Comput. Optim. Appl.. 2006;34:409-428.
[Google Scholar]

[18] Nedic A., Bertsekas D.P., . The effect of deterministic noise in subgradient methods. Math. Program.. 2010;125:75-99.
[Google Scholar]

[19] Nesterov Y., . Subgradient methods for huge-scale optimizations problems. Math. Program.. 2014;146:275-297.
[Google Scholar]

[20] Nocedal, J., Wright, S.J., 2006. Numerical Optimization. Springer Series in Operations Research.

[21] Polyak B.T., . A general method of solving extremum problems. Sov. Math. Dokl.. 1967;8:593-597.
[Google Scholar]

[22] Sherali H.D., Ulular O., . A primal-dual conjugate subgradient algorithm for specially structured linear and convex programming problems. Appl. Math. Optim.. 1989;20:193-221.
[Google Scholar]

[23] Sherali H.D., Choi G., Tuncbilek C.H., . A Variable target value method for nondifferentiable optimization. Oper. Res. Lett.. 2000;26:1-8.
[Google Scholar]

[24] Shor N.Z., . Minimization Methods for Non-Differentiable Functions. Berlin: Springer; 1985.

[25] Sra S., Nowozin S., Wright S.J., . Optimization For Machine Learning. MIT Press; 2012.

A new modified deflected subgradient method

Abstract

Keywords

90C26

90C10

90C27

90C06

Integer linear programming

Subgradient method

Nonsmooth optimization

Travelling salesman problem

1 Introduction

2 A new modified deflected subgradient method

3 Computational results

4 Conclusion

Acknowledgements

References

Suggested read for related articles: