Oracle数据库导致系统CPU使用率100%的案例
▲点击上方名片“蜘蛛杂谈”关注
后语
问题解决,激起了些许深入学习数据库的兴趣,但是想到那些又长又复杂的sql语句又开始头疼了。。。
★ 欢迎大家阅读、关注、点赞、收藏、转发 ★★ 关注我,关注东方蜘蛛,交流更多硬件系统运维干货 ★★“蜘蛛杂谈”诚邀广大IT爱好者投稿,投稿邮箱:45778530@qq.com,或者添加作者微信:eastspider0214
前言这个案例只是模拟案例。虽然作为一名系统工程师,稍有些数据库基础,但是碰到和系统相关的问题时,我们还是需要从系统层去做分析处理,希望这个案例能给大家带来些许处理思路。
故障现象
Oracle 19c RAC 2节点CPU使用率100%。(使用top、sar等命令)
故障分析过程1、查询负载很高前35个进程2、查询前35负载高的sPID# ps aux|head -1;ps aux|grep -v PID|sort -rn -k +3|head -35USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDoracle 21974 13.7 0.0 40453412 33848 ? Rs 10:56 1:27 oracleorcl1 (LOCAL=NO)root 3204 13.6 0.0 114296 6480 ? Sl 11:06 0:00 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show configoracle 22790 13.6 0.0 40715556 33888 ? Rs 10:56 1:25 oracleorcl1 (LOCAL=NO)oracle 22317 13.6 0.0 40715556 32004 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)oracle 22260 13.6 0.0 40453408 33912 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)oracle 22215 13.6 0.0 40453408 33892 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)......oracle 22219 13.4 0.0 40453412 33904 ? Rs 10:56 1:25 oracleorcl1 (LOCAL=NO)oracle 22156 13.4 0.0 40453872 34464 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)oracle 22145 13.4 0.0 40453412 33848 ? Rs 10:56 1:25 oracleorcl1 (LOCAL=NO)oracle 22124 13.4 0.0 40453412 31724 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)oracle 22109 13.4 0.0 40453408 33836 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)
3、把得到的spid替换到下面sql语句中# ps aux| grep -v PID | sort -rn -k +3 | head -35 | awk '{print$2}'230332294823015229412309123061......235012344023429234042326623161
4、查询“f5kskn9df2h2p”相关的sql_id和数量select sql_id from v$sessionwhere paddr in(select addr from v$processwhere spid in('23033','22948','23015','22941',......'23429','23404','23266','23161'));SQL_ID--------------------------f5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2p......f5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2p35 rows selected.
> SELECT sid,serial#,sql_id from v$session where sql_id in ( 'f5kskn9df2h2p');SID SERIAL# SQL_ID---------- ---------- --------------------------4609 49390 f5kskn9df2h2p4722 1377 f5kskn9df2h2p4723 7168 f5kskn9df2h2p4724 51376 f5kskn9df2h2p4725 62475 f5kskn9df2h2p4726 64323 f5kskn9df2h2p4727 19318 f5kskn9df2h2p4728 23453 f5kskn9df2h2p4729 34066 f5kskn9df2h2p300 rows selected.
5、随机抽一条,查看一下具体是执行的什么内容> SELECT count(*) from v$session where sql_id in ( 'f5kskn9df2h2p' );COUNT(*)----------300
6、思路1:系统层查看相关进程启动时间,发现在同一时间有sh脚本在执行,脚本内容如下:(感觉不准确,可能会误判)> select sql_text from v$sqlarea a,v$session b where a.SQL_ID=b.PREV_SQL_ID and b.SID='4729';SQL_TEXT
杀掉所有connOracleDB.sh进程:/home/oracle/test$ cat /home/oracle/test/connOracleDB.sh*******
# pkill -9 connOracleDB.sh
7、思路2:通过SQL语句查询sql_id对应的系统进程,再查其对应的父进程> col MACHINE for a30> select process,MACHINE,PORT,PROGRAM,PADDR from v$session where sql_id = 'f5kskn9df2h2p';PROCESS MACHINE PORT PROGRAM PADDR------------------------------------------------ ------------------------------ ---------- ----------------------------------- ----------------3480 vone1 26564 sqlplus@vone1 (TNS V1-V3) 000000013870A3183155 vone1 26280 sqlplus@vone1 (TNS V1-V3) 000000016076BF183118 vone1 25964 sqlplus@vone1 (TNS V1-V3) 00000001507198883393 vone1 26572 sqlplus@vone1 (TNS V1-V3) 00000001686EB5103244 vone1 26268 sqlplus@vone1 (TNS V1-V3) 00000001587024A83170 vone1 25970 sqlplus@vone1 (TNS V1-V3) 00000001487592A03457 vone1 26580 sqlplus@vone1 (TNS V1-V3) 0000000160773F583338 vone1 26286 sqlplus@vone1 (TNS V1-V3) 00000001507218C83218 vone1 25976 sqlplus@vone1 (TNS V1-V3) 000000014081F7883410 vone1 26592 sqlplus@vone1 (TNS V1-V3) 000000015870A4E83139 vone1 26292 sqlplus@vone1 (TNS V1-V3) 00000001487612E03148 vone1 25978 sqlplus@vone1 (TNS V1-V3) 00000001386FB7F8......
再查看connOracleDB.sh脚本,分析脚本内容是否会引起异常,杀掉该脚本。/home/oracle/test$ ps -ef| grep 3480oracle 3480 3399 0 14:10 pts/3 00:00:00 /opt/oracle/products/19.3.0/bin/sqlplusoracle 23509 21062 0 14:38 pts/4 00:00:00 grep --color=auto 3480/home/oracle/test$ ps -ef|grep 3399oracle 3399 1 0 14:10 pts/3 00:00:00 /bin/bash /home/oracle/test/connOracleDB.shoracle 3480 3399 0 14:10 pts/3 00:00:00 /opt/oracle/products/19.3.0/bin/sqlplusoracle 23686 21062 0 14:38 pts/4 00:00:00 grep --color=auto 3399
8、在sql下执行生成kill的语句/home/oracle/test$ cat /home/oracle/test/connOracleDB.sh******# pkill -9 connOracleDB.sh
SELECT 'alter system kill session''' || SID || ',' || serial# || ''';' FROM v$session WHERE sql_id IN ('f5kskn9df2h2p');'ALTERSYSTEMKILLSESSION'''||SID||','||SERIAL#||''';'------------------------------------------------------alter system kill session'3,19584' immediate;alter system kill session'4,26016';alter system kill session'5,543';alter system kill session'6,22344';......alter system kill session'4727,35784';alter system kill session'4728,57499';alter system kill session'4729,45135';300 rows selected.
后语
问题解决,激起了些许深入学习数据库的兴趣,但是想到那些又长又复杂的sql语句又开始头疼了。。。
★ 欢迎大家阅读、关注、点赞、收藏、转发 ★★ 关注我,关注东方蜘蛛,交流更多硬件系统运维干货 ★★“蜘蛛杂谈”诚邀广大IT爱好者投稿,投稿邮箱:45778530@qq.com,或者添加作者微信:eastspider0214
点击“在看”支持一下吧
收录于合集 #Oracle
2个上一篇Oracle数据库常用运维SQL语句
蜘蛛杂谈
关注后可发消息