SQL优化改写案例12(DM数据库表关联多次,连接条件不一优化案例)

发布时间 2023-04-17 22:48:17作者: 小至尖尖

阅读须知:

  1、本案例是某金融机构的报表系统使用达梦数据库的报表统计SQL,不能拿出真实 SQL + 数据,笔者自己在ORACLE模拟的数据。

  2、笔者对达梦数据库的hint不熟悉,该sql案例注重于优化思路和改写思路,如果使用hint干预速度将会更快。

 

上周六和DM哥们去优化金融项目SQL,遇到个比较有意思的案例,SQL截图如下:

 A表和B表关联了8次,每次关联结果不一样,用来聚合运算,A、B表都是千万级别大表,使用了 parallel(4) 都需要跑50多秒。

模拟案例表和数据量:

 两张表都是千万级大表,比较符合生产实际情况。

-- 原始SQL:
SELECT
    /*+ gather_plan_statistics */ /*query111*/
       s.STU_CLASSID,
       count(S1.STU_NAME) S1_name   ,
       count(S2.STU_NAME) S2_name   ,
       count(S3.STU_NAME) S3_name   ,
       count(S4.STU_NAME) S4_name   ,
       count(S5.STU_NAME) S5_name   ,
       count(S6.STU_NAME) S6_name   ,
       count(S7.STU_NAME) S7_name   ,
       count(S8.STU_NAME) S8_name
       from STUDENT s
    LEFT JOIN STUDENT2 S1 on s.STU_NAME = S1.STU_NAME and S1.STU_CLASSID = 1
    LEFT JOIN STUDENT2 S2 on s.STU_NAME = S2.STU_NAME and S2.STU_CLASSID = 2
    LEFT JOIN STUDENT2 S3 on s.STU_NAME = S3.STU_NAME and S3.STU_CLASSID = 3
    LEFT JOIN STUDENT2 S4 on s.STU_NAME = S4.STU_NAME and S4.STU_CLASSID = 4
    LEFT JOIN STUDENT2 S5 on s.STU_NAME = S5.STU_NAME and S5.STU_CLASSID = 5
    LEFT JOIN STUDENT2 S6 on s.STU_NAME = S6.STU_NAME and S6.STU_CLASSID = 6
    LEFT JOIN STUDENT2 S7 on s.STU_NAME = S7.STU_NAME and S7.STU_CLASSID = 7
    LEFT JOIN STUDENT2 S8 on s.STU_NAME = S8.STU_NAME and S8.STU_CLASSID = 8
GROUP BY s.STU_CLASSID;
--执行计划:
Plan hash value: 42163929
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------
"| Id  | Operation                     | Name                          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |"
-------------------------------------------------------------------------------------------------------------------------------------------------------------
"|   0 | SELECT STATEMENT              |                               |      1 |        |    300 |00:00:57.08 |     866K|    141K|       |       |          |"
"|   1 |  HASH GROUP BY                |                               |      1 |    300 |    300 |00:00:57.08 |     866K|    141K|  2697M|    38M| 1352K (0)|"
"|*  2 |   HASH JOIN RIGHT OUTER       |                               |      1 |     26M|     26M|00:00:53.69 |     866K|    141K|  5659K|  2517K| 6411K (0)|"
"|*  3 |    INDEX FAST FULL SCAN       | IDX_STUDENT2_STU_NAME_CLASSID |      1 |   1768 |    100K|00:00:00.66 |   90599 |      0 |       |       |          |"
"|*  4 |    HASH JOIN RIGHT OUTER      |                               |      1 |     26M|     26M|00:00:47.90 |     775K|    141K|  5659K|  2517K| 5427K (0)|"
"|*  5 |     INDEX FAST FULL SCAN      | IDX_STUDENT2_STU_NAME_CLASSID |      1 |   1768 |  99373 |00:00:00.65 |   90599 |      0 |       |       |          |"
"|*  6 |     HASH JOIN RIGHT OUTER     |                               |      1 |     26M|     26M|00:00:42.14 |     685K|    141K|  5902K|  2624K| 5432K (0)|"
"|*  7 |      INDEX FAST FULL SCAN     | IDX_STUDENT2_STU_NAME_CLASSID |      1 |   1768 |  99511 |00:00:00.65 |   90599 |      0 |       |       |          |"
"|*  8 |      HASH JOIN RIGHT OUTER    |                               |      1 |     26M|     26M|00:00:36.40 |     594K|    141K|  5659K|  2517K| 5392K (0)|"
"|*  9 |       INDEX FAST FULL SCAN    | IDX_STUDENT2_STU_NAME_CLASSID |      1 |   1768 |  99782 |00:00:00.66 |   90599 |      0 |       |       |          |"
"|* 10 |       HASH JOIN RIGHT OUTER   |                               |      1 |     26M|     26M|00:00:30.36 |     504K|    141K|  5659K|  2517K|   13M (0)|"
"|* 11 |        INDEX FAST FULL SCAN   | IDX_STUDENT2_STU_NAME_CLASSID |      1 |    487K|  99809 |00:00:00.66 |   90599 |      0 |       |       |          |"
"|* 12 |        HASH JOIN RIGHT OUTER  |                               |      1 |     26M|     26M|00:00:22.13 |     413K|    141K|  5659K|  2517K|   14M (0)|"
"|* 13 |         INDEX FAST FULL SCAN  | IDX_STUDENT2_STU_NAME_CLASSID |      1 |    484K|    100K|00:00:00.69 |   90599 |      0 |       |       |          |"
"|* 14 |         HASH JOIN RIGHT OUTER |                               |      1 |     26M|     26M|00:00:13.87 |     322K|    141K|  5659K|  2517K| 5411K (0)|"
"|* 15 |          INDEX FAST FULL SCAN | IDX_STUDENT2_STU_NAME_CLASSID |      1 |   1768 |    100K|00:00:00.67 |   90599 |      0 |       |       |          |"
"|* 16 |          HASH JOIN RIGHT OUTER|                               |      1 |     26M|     26M|00:00:07.88 |     232K|    141K|  5738K|  2624K| 5422K (0)|"
"|* 17 |           INDEX FAST FULL SCAN| IDX_STUDENT2_STU_NAME_CLASSID |      1 |   1768 |  99925 |00:00:00.67 |   90599 |      0 |       |       |          |"
"|  18 |           TABLE ACCESS FULL   | STUDENT                       |      1 |     26M|     26M|00:00:01.62 |     141K|    141K|       |       |          |"
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
"   2 - access(""S"".""STU_NAME""=""S8"".""STU_NAME"")"
"   3 - filter(""S8"".""STU_CLASSID""=8)"
"   4 - access(""S"".""STU_NAME""=""S7"".""STU_NAME"")"
"   5 - filter(""S7"".""STU_CLASSID""=7)"
"   6 - access(""S"".""STU_NAME""=""S5"".""STU_NAME"")"
"   7 - filter(""S5"".""STU_CLASSID""=5)"
"   8 - access(""S"".""STU_NAME""=""S3"".""STU_NAME"")"
"   9 - filter(""S3"".""STU_CLASSID""=3)"
"  10 - access(""S"".""STU_NAME""=""S4"".""STU_NAME"")"
"  11 - filter(""S4"".""STU_CLASSID""=4)"
"  12 - access(""S"".""STU_NAME""=""S6"".""STU_NAME"")"
"  13 - filter(""S6"".""STU_CLASSID""=6)"
"  14 - access(""S"".""STU_NAME""=""S2"".""STU_NAME"")"
"  15 - filter(""S2"".""STU_CLASSID""=2)"
"  16 - access(""S"".""STU_NAME""=""S1"".""STU_NAME"")"
"  17 - filter(""S1"".""STU_CLASSID""=1)"
 
Note
-----
   - dynamic sampling used for this statement (level=2)

跑了57s,慢得一批,如果DM有并行HASH的hint ,本案例将会秒杀。

 

优化方式1,将 STUDENT2  转换成分区表,利用分区裁剪思路进行优化(需要提前和业务开发沟通,是否会影响其他SQL语句)

-- Create table
create table STUDENT_PATITION
(
  stu_id      NUMBER not null,
  stu_name    VARCHAR2(200),
  stu_sex     VARCHAR2(1),
  stu_age     NUMBER,
  stu_date    DATE,
  stu_classid NUMBER,
  stu_sal     NUMBER
)
partition by list (STU_CLASSID)
(
  partition PARTITION_STUCLASSID2_1 values (1),
  partition PARTITION_STUCLASSID_2 values (2),
  partition PARTITION_STUCLASSID_3 values (3),
  partition PARTITION_STUCLASSID_4 values (4),
  partition PARTITION_STUCLASSID_5 values (5),
  partition PARTITION_STUCLASSID_6 values (6),
  partition PARTITION_STUCLASSID_7 values (7),
  partition PARTITION_STUCLASSID_8 values (8),
.......
)

使用 STU_CLASSID 作为分区列。

create index IDX_STUDENT3_LOCAL_1_2 on STUDENT_PATITION (STU_CLASSID, STU_NAME) nologging  local;

再创建个本地索引。

-- 使用分区表
SELECT
/*+ gather_plan_statistics */ /*query222*/ s.STU_CLASSID, count(S1.STU_NAME) S1_name , count(S2.STU_NAME) S2_name , count(S3.STU_NAME) S3_name , count(S4.STU_NAME) S4_name , count(S5.STU_NAME) S5_name , count(S6.STU_NAME) S6_name , count(S7.STU_NAME) S7_name , count(S8.STU_NAME) S8_name from STUDENT s LEFT JOIN STUDENT_PATITION S1 on s.STU_NAME = S1.STU_NAME and S1.STU_CLASSID = 1 LEFT JOIN STUDENT_PATITION S2 on s.STU_NAME = S2.STU_NAME and S2.STU_CLASSID = 2 LEFT JOIN STUDENT_PATITION S3 on s.STU_NAME = S3.STU_NAME and S3.STU_CLASSID = 3 LEFT JOIN STUDENT_PATITION S4 on s.STU_NAME = S4.STU_NAME and S4.STU_CLASSID = 4 LEFT JOIN STUDENT_PATITION S5 on s.STU_NAME = S5.STU_NAME and S5.STU_CLASSID = 5 LEFT JOIN STUDENT_PATITION S6 on s.STU_NAME = S6.STU_NAME and S6.STU_CLASSID = 6 LEFT JOIN STUDENT_PATITION S7 on s.STU_NAME = S7.STU_NAME and S7.STU_CLASSID = 7 LEFT JOIN STUDENT_PATITION S8 on s.STU_NAME = S8.STU_NAME and S8.STU_CLASSID = 8 GROUP BY s.STU_CLASSID
-- 分区表执行计划
Plan hash value: 1458661267
 
-------------------------------------------------------------------------------------------------------------------------------------------------------
"| Id  | Operation                      | Name                   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |"
-------------------------------------------------------------------------------------------------------------------------------------------------------
"|   0 | SELECT STATEMENT               |                        |      1 |        |    300 |00:00:48.33 |     144K|    141K|       |       |          |"
"|   1 |  HASH GROUP BY                 |                        |      1 |    300 |    300 |00:00:48.33 |     144K|    141K|  2697M|    38M| 3855K (0)|"
"|*  2 |   HASH JOIN RIGHT OUTER        |                        |      1 |     26M|     26M|00:00:45.00 |     144K|    141K|  5659K|  2517K| 6820K (0)|"
"|   3 |    PARTITION LIST SINGLE       |                        |      1 |    100K|    100K|00:00:00.01 |     302 |      0 |       |       |          |"
"|   4 |     INDEX FAST FULL SCAN       | IDX_STUDENT3_LOCAL_1_2 |      1 |    100K|    100K|00:00:00.01 |     302 |      0 |       |       |          |"
"|*  5 |    HASH JOIN RIGHT OUTER       |                        |      1 |     26M|     26M|00:00:39.44 |     143K|    141K|  5659K|  2517K| 6830K (0)|"
"|   6 |     PARTITION LIST SINGLE      |                        |      1 |    100K|    100K|00:00:00.01 |     301 |      0 |       |       |          |"
"|   7 |      INDEX FAST FULL SCAN      | IDX_STUDENT3_LOCAL_1_2 |      1 |    100K|    100K|00:00:00.01 |     301 |      0 |       |       |          |"
"|*  8 |     HASH JOIN RIGHT OUTER      |                        |      1 |     26M|     26M|00:00:33.87 |     143K|    141K|  5659K|  2517K| 5806K (0)|"
"|   9 |      PARTITION LIST SINGLE     |                        |      1 |    100K|    100K|00:00:00.01 |     301 |      0 |       |       |          |"
"|  10 |       INDEX FAST FULL SCAN     | IDX_STUDENT3_LOCAL_1_2 |      1 |    100K|    100K|00:00:00.01 |     301 |      0 |       |       |          |"
"|* 11 |      HASH JOIN RIGHT OUTER     |                        |      1 |     26M|     26M|00:00:28.67 |     143K|    141K|  5659K|  2517K| 5809K (0)|"
"|  12 |       PARTITION LIST SINGLE    |                        |      1 |  99925 |  99925 |00:00:00.01 |     300 |      0 |       |       |          |"
"|  13 |        INDEX FAST FULL SCAN    | IDX_STUDENT3_LOCAL_1_2 |      1 |  99925 |  99925 |00:00:00.01 |     300 |      0 |       |       |          |"
"|* 14 |       HASH JOIN RIGHT OUTER    |                        |      1 |     26M|     26M|00:00:23.47 |     142K|    141K|  5659K|  2517K| 5811K (0)|"
"|  15 |        PARTITION LIST SINGLE   |                        |      1 |  99809 |  99809 |00:00:00.01 |     300 |      0 |       |       |          |"
"|  16 |         INDEX FAST FULL SCAN   | IDX_STUDENT3_LOCAL_1_2 |      1 |  99809 |  99809 |00:00:00.01 |     300 |      0 |       |       |          |"
"|* 17 |        HASH JOIN RIGHT OUTER   |                        |      1 |     26M|     26M|00:00:18.30 |     142K|    141K|  5659K|  2517K| 5808K (0)|"
"|  18 |         PARTITION LIST SINGLE  |                        |      1 |  99782 |  99782 |00:00:00.01 |     300 |      0 |       |       |          |"
"|  19 |          INDEX FAST FULL SCAN  | IDX_STUDENT3_LOCAL_1_2 |      1 |  99782 |  99782 |00:00:00.01 |     300 |      0 |       |       |          |"
"|* 20 |         HASH JOIN RIGHT OUTER  |                        |      1 |     26M|     26M|00:00:12.92 |     142K|    141K|  5902K|  2624K| 5815K (0)|"
"|  21 |          PARTITION LIST SINGLE |                        |      1 |  99511 |  99511 |00:00:00.01 |     299 |      0 |       |       |          |"
"|  22 |           INDEX FAST FULL SCAN | IDX_STUDENT3_LOCAL_1_2 |      1 |  99511 |  99511 |00:00:00.01 |     299 |      0 |       |       |          |"
"|* 23 |          HASH JOIN RIGHT OUTER |                        |      1 |     26M|     26M|00:00:07.13 |     141K|    141K|  5574K|  2624K| 5799K (0)|"
"|  24 |           PARTITION LIST SINGLE|                        |      1 |  99373 |  99373 |00:00:00.01 |     299 |      0 |       |       |          |"
"|  25 |            INDEX FAST FULL SCAN| IDX_STUDENT3_LOCAL_1_2 |      1 |  99373 |  99373 |00:00:00.01 |     299 |      0 |       |       |          |"
"|  26 |           TABLE ACCESS FULL    | STUDENT                |      1 |     26M|     26M|00:00:01.59 |     141K|    141K|       |       |          |"
-------------------------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
"   2 - access(""S"".""STU_NAME""=""S8"".""STU_NAME"")"
"   5 - access(""S"".""STU_NAME""=""S6"".""STU_NAME"")"
"   8 - access(""S"".""STU_NAME""=""S2"".""STU_NAME"")"
"  11 - access(""S"".""STU_NAME""=""S1"".""STU_NAME"")"
"  14 - access(""S"".""STU_NAME""=""S4"".""STU_NAME"")"
"  17 - access(""S"".""STU_NAME""=""S3"".""STU_NAME"")"
"  20 - access(""S"".""STU_NAME""=""S5"".""STU_NAME"")"
"  23 - access(""S"".""STU_NAME""=""S7"".""STU_NAME"")"

原来57s的SQL现在可以降到48s出结果,可以看到执行计划中有 PARTITION LIST SINGLE ,正是利用了分区裁剪的思路。

如果在加上并行的hint 走分区并行,拆分成每个分区并行运行,那速度将会更快,但是这里不用。

 

优化方式2,如果业务开发怕影响太大,不允许改表结构,只能对当前SQL进行等价改写。

-- 等价改写:
SELECT
    /*+ gather_plan_statistics  */ /*query333*/
       s.STU_CLASSID,
       sum(case when S1.STU_CLASSID = 1 then 1 else 0 end ) S1_name   ,
       sum(case when S1.STU_CLASSID = 2 then 1 else 0 end ) S2_name   ,
       sum(case when S1.STU_CLASSID = 3 then 1 else 0 end ) S3_name   ,
       sum(case when S1.STU_CLASSID = 4 then 1 else 0 end ) S4_name   ,
       sum(case when S1.STU_CLASSID = 5 then 1 else 0 end ) S5_name   ,
       sum(case when S1.STU_CLASSID = 6 then 1 else 0 end ) S6_name   ,
       sum(case when S1.STU_CLASSID = 7 then 1 else 0 end ) S7_name   ,
       sum(case when S1.STU_CLASSID = 8 then 1 else 0 end ) S8_name
from STUDENT s
         LEFT JOIN STUDENT2 S1 on s.STU_NAME = S1.STU_NAME
GROUP BY s.STU_CLASSID;
-- 改写后执行计划

Plan hash value: 17088642
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
"| Id  | Operation              | Name                          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|"
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
"|   0 | SELECT STATEMENT       |                               |      1 |        |    300 |00:00:30.58 |     232K|    272K|    130K|       |       |          |         |"
"|   1 |  HASH GROUP BY         |                               |      1 |    300 |    300 |00:00:30.58 |     232K|    272K|    130K|  1800M|    32M| 1353K (0)|         |"
"|*  2 |   HASH JOIN OUTER      |                               |      1 |     30M|     29M|00:00:23.47 |     232K|    272K|    130K|  1211M|    34M|   29M (1)|    1056K|"
"|   3 |    TABLE ACCESS FULL   | STUDENT                       |      1 |     26M|     26M|00:00:01.27 |     141K|    141K|      0 |       |       |          |         |"
"|   4 |    INDEX FAST FULL SCAN| IDX_STUDENT2_STU_NAME_CLASSID |      1 |     30M|     29M|00:00:01.29 |   90599 |      0 |      0 |       |       |          |         |"
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
"   2 - access(""S"".""STU_NAME""=""S1"".""STU_NAME"")"
 
Note
-----
   - dynamic sampling used for this statement (level=2)
 

改写后执行时间直接降到30s就可以出结果,student2表只需要访问一次,对比第一条SQL语句降低了27s,极大提升。

加上相关并行hint 估计10s以内就能把结果跑出来甚至更快,本案例不加任何 hint,只是让大家了解优化思路。

看来还是得花时间好好学习下达梦数据库,毕竟现在信创国产化是一个重要的趋势,为信创行业添砖加瓦。