关于控制文件方面
Oracle:
[postgres@pgccc01 bin]$ ls -la $PGDATA/global/pg_control
-rw-------. 1 postgres postgres 8192 Jun 9 10:50 /data/pg_data/global/pg_control
存储在pg_global表空间中,我们知道PG默认有两个表空间,一个是 pg_global,另一个是pg_default表空间,我们会在initdb初始化时形成。
postgres=# select * from pg_tablespace ;
oid | spcname | spcowner | spcacl | spcoptions
-------+--------------------+----------+-------------------+------------
1663 | pg_default | 10 | |
1664 | pg_global | 10 | |
16456 | tablespace_andyxi2 | 10 | | ##后面创建的表空间
16431 | tablespace_andyxi | 16389 | {andyxi=C/andyxi} | ##后面创建的表空间
(4 rows)
(2)查看pg_controldata文件的内容
有两种方式,
一种是通过SQL查询,如下:
select * from pg_catalog.pg_control_checkpoint();
select * from pg_catalog.pg_control_system();
select * from pg_catalog.pg_control_init(); -- 获取结果和pg_controldata相同
select * from pg_catalog.pg_control_recovery();
另一种是:
[postgres@pgccc01 bin]$ pg_controldata
pg_control version number: 1300
Catalog version number: 202209061
Database system identifier: 7238474806429805345
Database cluster state: in production
pg_control last modified: Fri 09 Jun 2023 10:50:42 AM CST
Latest checkpoint location: 0/10442160
Latest checkpoint's REDO location: 0/10442128
Latest checkpoint's REDO WAL file: 000000010000000000000010
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:810
Latest checkpoint's NextOID: 16471
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 717
Latest checkpoint's oldestXID's DB: 5
Latest checkpoint's oldestActiveXID: 810
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16392
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Fri 09 Jun 2023 10:49:11 AM CST
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 100
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: 86f315bd65a4f0a2eb33d3001509754570cfcef6f0858c1bf2759b1612615131
(3)我们现在模拟pg_control这个控制文件损坏,进一步分析和恢复
我们在做实验之前,来了解一下重建控制命令(PostgreSQL10版本以后):
[postgres@pgccc01 pg_wal]$ pg_resetwal --help
pg_resetwal resets the PostgreSQL write-ahead log.
Usage:
pg_resetwal [OPTION]... DATADIR
Options:
-c, --commit-timestamp-ids=XID,XID
set oldest and newest transactions bearing
commit timestamp (zero means no change)
[-D, --pgdata=]DATADIR data directory
-e, --epoch=XIDEPOCH set next transaction ID epoch
-f, --force force update to be done
-l, --next-wal-file=WALFILE set minimum starting location for new WAL
-m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID
-n, --dry-run no update, just show what would be done
-o, --next-oid=OID set next OID
-O, --multixact-offset=OFFSET set next multitransaction offset
-u, --oldest-transaction-id=XID set oldest transaction ID
-V, --version output version information, then exit
-x, --next-transaction-id=XID set next transaction ID
--wal-segsize=SIZE size of WAL segments, in megabytes
-?, --help show this help, then exit
Report bugs to <pgsql-bugs@lists.postgresql.org>.
PostgreSQL home page: <https://www.postgresql.org/>
同时,我们先了解下数据库目录文件有哪些:
[postgres@pgccc01 pg_data]$ ls -la
total 68
drwx------. 20 postgres postgres 4096 Jun 9 10:14 .
drwxr-xr-x. 9 postgres postgres 269 Jun 7 15:36 ..
drwx------. 7 postgres postgres 63 May 30 13:39 base
drwx------. 2 postgres postgres 4096 Jun 9 10:14 global
drwx------. 2 postgres postgres 6 May 29 13:39 pg_commit_ts
drwx------. 2 postgres postgres 6 May 29 13:39 pg_dynshmem
-rw-------. 1 postgres postgres 4789 May 29 13:39 pg_hba.conf
-rw-------. 1 postgres postgres 1636 Jun 8 15:39 pg_ident.conf
drwx------. 4 postgres postgres 68 Jun 9 10:49 pg_logical
drwx------. 4 postgres postgres 36 May 29 13:39 pg_multixact
drwx------. 2 postgres postgres 6 May 29 13:39 pg_notify
drwx------. 2 postgres postgres 6 May 29 13:39 pg_replslot
drwx------. 2 postgres postgres 6 May 29 13:39 pg_serial
drwx------. 2 postgres postgres 6 May 29 13:39 pg_snapshots
drwx------. 2 postgres postgres 6 Jun 9 10:14 pg_stat
drwx------. 2 postgres postgres 6 May 29 13:39 pg_stat_tmp
drwx------. 2 postgres postgres 18 May 29 13:39 pg_subtrans
drwx------. 2 postgres postgres 32 Jun 7 15:37 pg_tblspc
drwx------. 2 postgres postgres 6 May 29 13:39 pg_twophase
-rw-------. 1 postgres postgres 3 May 29 13:39 PG_VERSION
drwx------. 3 postgres postgres 92 Jun 9 10:50 pg_wal
drwx------. 2 postgres postgres 18 May 29 13:39 pg_xact
-rw-------. 1 postgres postgres 88 Jun 7 09:25 postgresql.auto.conf
-rw-------. 1 postgres postgres 29804 Jun 2 13:26 postgresql.conf
-rw-------. 1 postgres postgres 27 Jun 9 10:14 postmaster.opts
-rw-------. 1 postgres postgres 71 Jun 9 10:14 postmaster.pid
drwxrwxr-x. 3 postgres postgres 29 May 30 13:51 tablespace_andyxi
我们恢复控制文件,需要先了解一下这几个数据库目录文件
[postgres@pgccc01 pg_data]$ ls -la pg_xact/
total 12
drwx------. 2 postgres postgres 18 May 29 13:39 .
drwx------. 20 postgres postgres 4096 Jun 9 10:14 ..
-rw-------. 1 postgres postgres 8192 Jun 9 10:49 0000
[postgres@pgccc01 pg_data]$ ls -la pg_wal/
total 32772
drwx------. 3 postgres postgres 92 Jun 9 10:50 .
drwx------. 20 postgres postgres 4096 Jun 9 10:14 ..
-rw-------. 1 postgres postgres 16777216 Jun 9 10:50 000000010000000000000010
-rw-------. 1 postgres postgres 16777216 Jun 9 10:48 000000010000000000000011
drwx------. 2 postgres postgres 6 Jun 2 13:31 archive_status
[postgres@pgccc01 pg_data]$ ls -la pg_multixact/
total 4
drwx------. 4 postgres postgres 36 May 29 13:39 .
drwx------. 20 postgres postgres 4096 Jun 9 10:14 ..
drwx------. 2 postgres postgres 18 May 29 13:39 members
drwx------. 2 postgres postgres 18 May 29 13:39 offsets
[postgres@pgccc01 pg_data]$ ls -la pg_multixact/members/
total 8
drwx------. 2 postgres postgres 18 May 29 13:39 .
drwx------. 4 postgres postgres 36 May 29 13:39 ..
-rw-------. 1 postgres postgres 8192 May 29 13:39 0000
[postgres@pgccc01 pg_data]$ ls -la pg_multixact/offsets/
total 8
drwx------. 2 postgres postgres 18 May 29 13:39 .
drwx------. 4 postgres postgres 36 May 29 13:39 ..
-rw-------. 1 postgres postgres 8192 Jun 9 10:19 0000
控制文件的恢复涉及到16进制的转换问题,如:

模拟pg_control不存在

PostgreSQL不能启动


pg_resetwal最关键的步骤:
1.参数-l
-l, --next-wal-file=WALFILE set minimum starting location for new WAL
进入pg_wal(相当于redo log)
[postgres@pgccc01 pg_data]$ cd /data/pg_data/pg_wal/
[postgres@pgccc01 pg_wal]$ ls -la
total 32772
drwx------. 3 postgres postgres 92 Jun 9 10:50 .
drwx------. 20 postgres postgres 4096 Jun 9 13:31 ..
-rw-------. 1 postgres postgres 16777216 Jun 9 13:29 000000010000000000000010
-rw-------. 1 postgres postgres 16777216 Jun 9 10:48 000000010000000000000011
drwx------. 2 postgres postgres 6 Jun 2 13:31 archive_status
此时最新文件是000000010000000000000011 需要编号+1,000000010000000000000012(16进制)
2.参数-O
-O, --multixact-offset=OFFSET set next multitransaction offset

上述pg_multixact日志0000文件最大值为0,一般这个数不变
取最大值加1,然后乘以65536,再转换成16进制为10000,然后末尾添加5个0
-O=0x1000000000
3.参数-m
-m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID

日志0000文件最大值为0,一般这个数不变,取最大值编号后加1,然后末尾添加4个0
-m=0x00010000,0x00010000
4 参数-x
-x, --next-transaction-id=XID set next transaction ID

j最大文件编码取最大值加1,然后末尾添加5个0
-x=0x000100000
在global表空间创建pg_control空文件

最后用pg_resetwal命令加上相应参数进行恢复
pg_resetwal -l 000000010000000000000012 -O 0x1000000000 -m 0x00010000,0x00010000 -x 0x000100000 -f $PGDATA
因为我是在进程开启的情况下,模拟pg_control文件丢失的情况,但postmaster.pid仍存在,需要删除

删除postmaster.pid后,显示

这时pg_control回来了

pg_controldata也可以运行了
[postgres@pgccc01 pg_data]$ pg_controldata
PostgreSQL启动正常:
