Home > mailing lists

Re: Re: transaction lost when delete clog file after normal shutdown - Mailing list pgsql-hackers

From	章晨曦@易景科技
Subject	Re: Re: transaction lost when delete clog file after normal shutdown
Date	December 23 15:12:12
Msg-id	tencent_7C07ED7046EE5D834D708F31@qq.com Whole thread Raw
In response to	transaction lost when delete clog file after normal shutdown ("章晨曦@易景科技" <zhangchenxi@halodbtech.com>)
Responses	Re: transaction lost when delete clog file after normal shutdown
List	pgsql-hackers

Tree view

Thanks tom.

But what I think is we may provide a better experience. Consider the below example:

[jet@halodev-jet-01 data]$ psql

psql (16.6)

Type "help" for help.

postgres=# CREATE TABLE a_test (n INT);

CREATE TABLE

postgres=# INSERT INTO a_test VALUES (1);

INSERT 0 1

postgres=# 2024-12-23 16:56:11.023 CST [1356476] FATAL: terminating connection due to unexpected postmaster exit

postgres=#

postgres=# \q

[jet@halodev-jet-01 data]$

### Here we simulate crash and clog file corrupt (delete the clog file).

[jet@halodev-jet-01 data]$ pg_ctl start

pg_ctl: another server might be running; trying to start server anyway

waiting for server to start....2024-12-23 16:57:24.036 CST [1356495] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit

2024-12-23 16:57:24.036 CST [1356495] LOG: listening on IPv6 address "::1", port 5432

2024-12-23 16:57:24.036 CST [1356495] LOG: listening on IPv4 address "127.0.0.1", port 5432

2024-12-23 16:57:24.046 CST [1356495] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"

2024-12-23 16:57:24.055 CST [1356498] LOG: database system was interrupted; last known up at 2024-12-23 16:54:56 CST

2024-12-23 16:57:24.147 CST [1356498] LOG: database system was not properly shut down; automatic recovery in progress

2024-12-23 16:57:24.151 CST [1356498] LOG: redo starts at 0/14E4D20

2024-12-23 16:57:24.152 CST [1356498] LOG: file "pg_xact/0000" doesn't exist, reading as zeroes

2024-12-23 16:57:24.152 CST [1356498] CONTEXT: WAL redo at 0/14FCAB0 for Transaction/COMMIT: 2024-12-23 16:55:13.531244+08; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608 relcache 16384

2024-12-23 16:57:24.152 CST [1356498] LOG: invalid record length at 0/14FCD20: expected at least 24, got 0

2024-12-23 16:57:24.152 CST [1356498] LOG: redo done at 0/14FCCE8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

2024-12-23 16:57:24.157 CST [1356496] LOG: checkpoint starting: end-of-recovery immediate wait

2024-12-23 16:57:24.184 CST [1356496] LOG: checkpoint complete: wrote 27 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.014 s, total=0.030 s; sync files=22, longest=0.006 s, average=0.001 s; distance=96 kB, estimate=96 kB; lsn=0/14FCD20, redo lsn=0/14FCD20

2024-12-23 16:57:24.188 CST [1356495] LOG: database system is ready to accept connections

done

server started

[jet@halodev-jet-01 data]$ psql

psql (16.6)

Type "help" for help.

postgres=# SELECT * FROM a_test;

---

(1 row)

postgres=# \q

We can see that when database restart, it will try to recover. So I think we may improve database reliable in some scenarios if just clog file corrupted.

Regards,

Jet

Tom Lane<tgl@sss.pgh.pa.us> 在 2024年12月23日周一 14:50 写道：

"章晨曦@易景科技" <zhangchenxi@halodbtech.com> writes:
> And after a while, a system error occurred and unfortunately, just caused clog file corrupted.  
> So we need to restore the database from backup just because of the tiny clog file corrupted.

I'm not seeing a large difference between this complaint
and whining because Unix doesn't have a way to recover from
"sudo rm -rf /". clog is critical data: if you mess with
it you will destroy your database. It is not the only
critical data in the system, either.

> Is there any chance to improve this?

We're not in the business of building doubly- or triply-redundant
storage. The cost/benefit just isn't attractive for very many people.
If you don't trust your hardware, you can put your storage on RAID,
or replicate the database, etc. If you have a DBA who thinks it's
cool to remove files they don't understand the purpose of, the answer
is to fire that DBA.

regards, tom lane

pgsql-hackers by date:

From: "David G. Johnston"
Date: 23 December, 14:17:35
Subject: Re: Repeatable read transaction doesn't see dropped table

From: "Hayato Kuroda (Fujitsu)"
Date: 23 December, 15:12:46
Subject: RE: Logical replication timeout

Re: Re: transaction lost when delete clog file after normal shutdown - Mailing list pgsql-hackers

Previous

Next