智能优化算法第一次实验

一、实验目的

(1) 掌握梯度下降法的基础知识及其应用过程；

(2) 利用梯度下降法求解简单优化问题。

二、实验原理

梯度下降法是一种最优化算法，由于函数在该点梯度的模代表着函数值在该点最大的变化率，我们可以让函数沿着梯度方向迭代移动逼近函数最值，这就是梯度下降法的基本原理。

三、实验内容

3.1 实验步骤与过程

1. 写出梯度算法求解该问题的迭代公式，详细阐述迭代公式每项的意义。

\[f(x)= x_1^2 + 2x_2^2 - 5x_1 - 2x_1x_2 + 2 \tag{1} \]

\[\frac {\partial f(x)} {\partial x_1} = 2(x_1 - x_2) - 5 , \frac {\partial f(x)} {\partial x_2} = 4x_2 - 2x_1 \tag{2} \]

\[\nabla f = (2(x_1 - x_2) - 5,4x_2 - 2x_1) \tag{3} \]

迭代公式:

\[x_{k + 1} = x_k - \lambda \nabla f(x_k) \tag{4} \]

意义：

其中，\(f(x_k)\)为当前点，\(\lambda\)为步长，\(\nabla f(x_k)\)为函数\(f(x)\)在当前点的梯度。该式子表示从当前点\(f(x_k)\)，沿着\(\nabla f(x_k)\)的方向，以步长为\(\lambda\)的速度逼近。

梯度值计算公式:

\[\|\nabla f(X)\| = \sqrt{(\frac {\partial f(X)} {\partial x_1})^2 + ({\frac {\partial f(X)} {\partial x_2}})^2} \tag {5} \]

(1) 编程实现梯度下降法，采用梯度下降法求解\(f(x)\)的最小值及对应\(x\)的解，结果保留\(6\)位小数。

import numpy as np
import matplotlib.pyplot as plt
#绘制三维图
from mpl_toolkits.mplot3d import Axes3D
import random
import math

#计算梯度
def calc_grad(x1,x2):
    return math.sqrt((2 * (x1 - x2) - 5)**2 + (4 * x2 - 2 * x1) ** 2)

#计算f(x)
def f(x1,x2):
    return x1 ** 2 + 2 * x2 ** 2 - 5 * x1 - 2 * x1 * x2 + 2

#随机取点,生成初始点位
x1 = random.randint(-100,100)
x2 = random.randint(-100,100)
#设定学习率
learning_rate = 0.1
#记录迭代过程点的(f(x},x1,x2))
path = []
grad = calc_grad(x1,x2)
#迭代更新
while grad > 1e-6:
    path.append((f(x1,x2),x1,x2))
    x1 = x1 - learning_rate * (2 * (x1 - x2) - 5)
    x2 = x2 - learning_rate * (4 * x2 - 2 * x1)
    grad = calc_grad(x1,x2)
#输出6位小数，无后导零
print(f'f(x) = {round(path[-1][0],6)},x1 = {round(x1,6)},x2 = {round(x2,6)}')

(2) 画出迭代优化过程曲线，即函数值随迭代次数变化的曲线，并查看迭代多少次停止。

#解决负号、乱码问题
plt.rcParams['axes.unicode_minus'] = False 
plt.rcParams['font.family'] = 'FangSong'
# x轴为迭代次数,y轴为f(x)
x = list(range (1,len(path) + 1))
y = [v[0] for v in path]
plt.xlabel('迭代次数')
plt.ylabel('f(x)')
plt.title('迭代优化过程曲线')
# 找到x轴的最大值  
max_value = np.max(x)  
# 添加标注  
plt.annotate('迭代次数: {}'.format(max_value), xy=(0.9, 0.1), xycoords=plt.gca(), ha='right', va='top') 
plt.plot(x,y)
plt.show()

Figure_1

3.2 实验结果及分析

根据实验结果，我们不难发现，当\((x_1,x_2)\)在\((5.000000,2.500000)\)附近时，会出现梯度小于\(10^{-6}\)且函数值近似\(-10.5000000\)的情况。此时梯度十分接近\(0\)，函数值接近函数最小值。

通过多次随机生成数据迭代可知，当学习率固定时，不同的起始点的迭代次数会有所不同。可见合理的起始点选取对优化迭代次数具有重要作用。

#随机取点,生成初始点位
#设定学习率
learning_rate = 0.1
for i in range(3):
    x1 = random.randint(-100,100)
    x2 = random.randint(-100,100)
    a = x1
    b = x2
    grad = calc_grad(x1,x2)
    #迭代更新
    while grad > 1e-6:
        path.append((f(x1,x2),x1,x2))
        x1 = x1 - learning_rate * (2 * (x1 - x2) - 5)
        x2 = x2 - learning_rate * (4 * x2 - 2 * x1)
        grad = calc_grad(x1,x2)
    #输出6位小数，无后导零
    print(f'这是第{i}次，起始点为({a},{b}),迭代次数为{len(path)}')

当固定起始点时，改变学习率，我们发现，不同的学习率会导致迭代次数有所不同。可见设定合适的学习率也会对算法有很大的优化作用。

#随机取点,生成初始点位
x1 = random.randint(-100,100)
x2 = random.randint(-100,100)
a = x1
b = x2
#设定学习率
learning_rate = 0.1
for i in range(6):
    x1 = a
    x2 = b
    #设定学习率
    learning_rate = learning_rate + 0.1
    grad = calc_grad(x1,x2)
    #迭代更新
    while grad > 1e-6:
        path.append((f(x1,x2),x1,x2))
        x1 = x1 - learning_rate * (2 * (x1 - x2) - 5)
        x2 = x2 - learning_rate * (4 * x2 - 2 * x1)
        grad = calc_grad(x1,x2)
    #输出6位小数，无后导零
    print(f'这是第{i}次，学习率为{learning_rate},迭代次数为{len(path)}')

综上所述，在设定起始点和学习率的时候我们要尽量根据问题设定合适的参数，这样才能事半功倍。

四、实验结论或体会

通过此次实验，我们可以发现梯度下降法的本质在于反向迭代更新的思想。

初值选的合理可能迭代的快且能够到达全局最优解。初值若选的不合理可能迭代的速度慢，而且只能求得局部最优解。

步长的大小影响迭代的速度。若步长太大，可能迭代结果震荡过大，无法找到最优解。若步长太小，迭代次数会很多，速度会很慢，容易落在相对极小值。因此最好随着迭代调整。

优势与不足重点在于在什么情境下和什么比较。

优势：思想较为简单，可拓展运用范围较广。

不足：梯度下降法迭代可能会收敛到局部最优解而不是全局最优解。

梯度下降算法是一种思想简单的最优化算法。其简单的反向迭代思想，使其具有极强的扩展性。所以我们时长能在其他优化算法和神经网络中窥见他的思路。