Thread: transaction lost when delete clog file after normal shutdown
Hi there,
I notice a little strange things of clog behaviours.
When I create a test table, say, a_test, the table only contains a INT type column.
postgres=# CREATE TABLE a_test (n INT);
CREATE TABLE
and then insert one tuple:
postgres=# INSERT INTO a_test VALUES (1);
INSERT 0 1
And then quit, shutdown the database.
[jet@halodev-jet-01 pg_xact]$ pg_ctl stop
waiting for server to shut down....2024-12-23 12:05:46.568 CST [1340130] LOG: received fast shutdown request
2024-12-23 12:05:46.577 CST [1340130] LOG: aborting any active transactions
2024-12-23 12:05:46.579 CST [1340130] LOG: background worker "logical replication launcher" (PID 1340136) exited with exit code 1
2024-12-23 12:05:46.579 CST [1340131] LOG: shutting down
2024-12-23 12:05:46.583 CST [1340131] LOG: checkpoint starting: shutdown immediate
2024-12-23 12:05:46.604 CST [1340131] LOG: checkpoint complete: wrote 21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.011 s, total=0.025 s; sync files=17, longest=0.005 s, average=0.001 s; distance=88 kB, estimate=88 kB; lsn=0/1554658, redo lsn=0/1554658
2024-12-23 12:05:46.607 CST [1340130] LOG: database system is shut down
done
server stopped
After database stopped, change to the pg_xact directory, and delete the clog file, and then create a dummy clog file.
[jet@halodev-jet-01 pg_xact]$ ls -ltr
total 8
-rw-rw-r--. 1 jet jet 8192 Dec 23 12:05 0000
[jet@halodev-jet-01 pg_xact]$ rm -f 0000
[jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
8+0 records in
8+0 records out
8192 bytes (8.2 kB, 8.0 KiB) copied, 8.5894e-05 s, 95.4 MB/s
And then start the database.
[jet@halodev-jet-01 pg_xact]$ pg_ctl start
waiting for server to start....2024-12-23 12:06:01.935 CST [1340148] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv6 address "::1", port 5432
2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv4 address "127.0.0.1", port 5432
2024-12-23 12:06:01.943 CST [1340148] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-12-23 12:06:01.950 CST [1340151] LOG: database system was shut down at 2024-12-23 12:05:46 CST
2024-12-23 12:06:01.956 CST [1340148] LOG: database system is ready to accept connections
done
server started
The database started normally. But the INSERT transaction was lost:
[jet@halodev-jet-01 pg_xact]$ psql
psql (16.6)
Type "help" for help.
postgres=# SELECT * FROM a_test;
n
---
(0 rows)
postgres=#
BUT! If I do the same steps above, and the only difference is after INSERT, I just do a simple query SELECT * FROM a_test; the INSERT transaction still exists. How could this happen? What's the difference after SELECT?
[jet@halodev-jet-01 pg_xact]$ psql
psql (16.6)
Type "help" for help.
postgres=# INSERT INTO a_test VALUES (1);
INSERT 0 1
postgres=# SELECT * FROM a_test;
n
---
1
(1 row)
postgres=# \q
[jet@halodev-jet-01 pg_xact]$ pg_ctl stop
waiting for server to shut down....2024-12-23 12:13:36.577 CST [1340148] LOG: received fast shutdown request
2024-12-23 12:13:36.583 CST [1340148] LOG: aborting any active transactions
2024-12-23 12:13:36.584 CST [1340148] LOG: background worker "logical replication launcher" (PID 1340154) exited with exit code 1
2024-12-23 12:13:36.585 CST [1340149] LOG: shutting down
2024-12-23 12:13:36.587 CST [1340149] LOG: checkpoint starting: shutdown immediate
2024-12-23 12:13:36.601 CST [1340149] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.006 s, total=0.017 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=0 kB; lsn=0/15548E8, redo lsn=0/15548E8
2024-12-23 12:13:36.603 CST [1340148] LOG: database system is shut down
done
server stopped
[jet@halodev-jet-01 pg_xact]$ rm -f 0000
[jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
8+0 records in
8+0 records out
8192 bytes (8.2 kB, 8.0 KiB) copied, 8.9862e-05 s, 91.2 MB/s
[jet@halodev-jet-01 pg_xact]$ pg_ctl start
waiting for server to start....2024-12-23 12:13:48.430 CST [1340205] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv6 address "::1", port 5432
2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv4 address "127.0.0.1", port 5432
2024-12-23 12:13:48.437 CST [1340205] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-12-23 12:13:48.445 CST [1340208] LOG: database system was shut down at 2024-12-23 12:13:36 CST
2024-12-23 12:13:48.451 CST [1340205] LOG: database system is ready to accept connections
done
server started
[jet@halodev-jet-01 pg_xact]$ psql
psql (16.6)
Type "help" for help.
postgres=# SELECT * FROM a_test;
n
---
1
(1 row)
Jet
Yes, i think you're right. The tuple will be set to HEAP_XMIN_COMMITTED when doing the visibility checking, but don't you think it's a little weird? Or may cause some confusion?
Thanks,
Jet
Junwang Zhao<zhjwpku@gmail.com> 在 2024年12月23日 周一 12:35 写道:
On Mon, Dec 23, 2024 at 12:17 PM 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote:
>
> Hi there,
>
> I notice a little strange things of clog behaviours.
>
> When I create a test table, say, a_test, the table only contains a INT type column.
>
> postgres=# CREATE TABLE a_test (n INT);
> CREATE TABLE
>
> and then insert one tuple:
>
> postgres=# INSERT INTO a_test VALUES (1);
> INSERT 0 1
>
> And then quit, shutdown the database.
> [jet@halodev-jet-01 pg_xact]$ pg_ctl stop
> waiting for server to shut down....2024-12-23 12:05:46.568 CST [1340130] LOG: received fast shutdown request
> 2024-12-23 12:05:46.577 CST [1340130] LOG: aborting any active transactions
> 2024-12-23 12:05:46.579 CST [1340130] LOG: background worker "logical replication launcher" (PID 1340136) exited with exit code 1
> 2024-12-23 12:05:46.579 CST [1340131] LOG: shutting down
> 2024-12-23 12:05:46.583 CST [1340131] LOG: checkpoint starting: shutdown immediate
> 2024-12-23 12:05:46.604 CST [1340131] LOG: checkpoint complete: wrote 21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.011 s, total=0.025 s; sync files=17, longest=0.005 s, average=0.001 s; distance=88 kB, estimate=88 kB; lsn=0/1554658, redo lsn=0/1554658
> 2024-12-23 12:05:46.607 CST [1340130] LOG: database system is shut down
> done
> server stopped
>
> After database stopped, change to the pg_xact directory, and delete the clog file, and then create a dummy clog file.
> [jet@halodev-jet-01 pg_xact]$ ls -ltr
> total 8
> -rw-rw-r--. 1 jet jet 8192 Dec 23 12:05 0000
> [jet@halodev-jet-01 pg_xact]$ rm -f 0000
> [jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
> 8+0 records in
> 8+0 records out
> 8192 bytes (8.2 kB, 8.0 KiB) copied, 8.5894e-05 s, 95.4 MB/s
>
> And then start the database.
> [jet@halodev-jet-01 pg_xact]$ pg_ctl start
> waiting for server to start....2024-12-23 12:06:01.935 CST [1340148] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
> 2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv6 address "::1", port 5432
> 2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv4 address "127.0.0.1", port 5432
> 2024-12-23 12:06:01.943 CST [1340148] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
> 2024-12-23 12:06:01.950 CST [1340151] LOG: database system was shut down at 2024-12-23 12:05:46 CST
> 2024-12-23 12:06:01.956 CST [1340148] LOG: database system is ready to accept connections
> done
> server started
>
> The database started normally. But the INSERT transaction was lost:
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# SELECT * FROM a_test;
> n
> ---
> (0 rows)
>
> postgres=#
>
> BUT! If I do the same steps above, and the only difference is after INSERT, I just do a simple query SELECT * FROM a_test; the INSERT transaction still exists. How could this happen? What's the difference after SELECT?
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# INSERT INTO a_test VALUES (1);
> INSERT 0 1
> postgres=# SELECT * FROM a_test;
> n
> ---
> 1
> (1 row)
>
> postgres=# \q
> [jet@halodev-jet-01 pg_xact]$ pg_ctl stop
> waiting for server to shut down....2024-12-23 12:13:36.577 CST [1340148] LOG: received fast shutdown request
> 2024-12-23 12:13:36.583 CST [1340148] LOG: aborting any active transactions
> 2024-12-23 12:13:36.584 CST [1340148] LOG: background worker "logical replication launcher" (PID 1340154) exited with exit code 1
> 2024-12-23 12:13:36.585 CST [1340149] LOG: shutting down
> 2024-12-23 12:13:36.587 CST [1340149] LOG: checkpoint starting: shutdown immediate
> 2024-12-23 12:13:36.601 CST [1340149] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.006 s, total=0.017 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=0 kB; lsn=0/15548E8, redo lsn=0/15548E8
> 2024-12-23 12:13:36.603 CST [1340148] LOG: database system is shut down
> done
> server stopped
> [jet@halodev-jet-01 pg_xact]$ rm -f 0000
> [jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
> 8+0 records in
> 8+0 records out
> 8192 bytes (8.2 kB, 8.0 KiB) copied, 8.9862e-05 s, 91.2 MB/s
> [jet@halodev-jet-01 pg_xact]$ pg_ctl start
> waiting for server to start....2024-12-23 12:13:48.430 CST [1340205] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
> 2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv6 address "::1", port 5432
> 2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv4 address "127.0.0.1", port 5432
> 2024-12-23 12:13:48.437 CST [1340205] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
> 2024-12-23 12:13:48.445 CST [1340208] LOG: database system was shut down at 2024-12-23 12:13:36 CST
> 2024-12-23 12:13:48.451 CST [1340205] LOG: database system is ready to accept connections
> done
> server started
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# SELECT * FROM a_test;
> n
> ---
> 1
> (1 row)
>
> Jet
>
>
>
>
>
Because after you do a select, the commit info of xmin/xmax has been set
to the tuple header, you don't need to check clog after that.
You can use the *pageinspect* extension to verify that.
--
Regards
Junwang Zhao
>
> Hi there,
>
> I notice a little strange things of clog behaviours.
>
> When I create a test table, say, a_test, the table only contains a INT type column.
>
> postgres=# CREATE TABLE a_test (n INT);
> CREATE TABLE
>
> and then insert one tuple:
>
> postgres=# INSERT INTO a_test VALUES (1);
> INSERT 0 1
>
> And then quit, shutdown the database.
> [jet@halodev-jet-01 pg_xact]$ pg_ctl stop
> waiting for server to shut down....2024-12-23 12:05:46.568 CST [1340130] LOG: received fast shutdown request
> 2024-12-23 12:05:46.577 CST [1340130] LOG: aborting any active transactions
> 2024-12-23 12:05:46.579 CST [1340130] LOG: background worker "logical replication launcher" (PID 1340136) exited with exit code 1
> 2024-12-23 12:05:46.579 CST [1340131] LOG: shutting down
> 2024-12-23 12:05:46.583 CST [1340131] LOG: checkpoint starting: shutdown immediate
> 2024-12-23 12:05:46.604 CST [1340131] LOG: checkpoint complete: wrote 21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.011 s, total=0.025 s; sync files=17, longest=0.005 s, average=0.001 s; distance=88 kB, estimate=88 kB; lsn=0/1554658, redo lsn=0/1554658
> 2024-12-23 12:05:46.607 CST [1340130] LOG: database system is shut down
> done
> server stopped
>
> After database stopped, change to the pg_xact directory, and delete the clog file, and then create a dummy clog file.
> [jet@halodev-jet-01 pg_xact]$ ls -ltr
> total 8
> -rw-rw-r--. 1 jet jet 8192 Dec 23 12:05 0000
> [jet@halodev-jet-01 pg_xact]$ rm -f 0000
> [jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
> 8+0 records in
> 8+0 records out
> 8192 bytes (8.2 kB, 8.0 KiB) copied, 8.5894e-05 s, 95.4 MB/s
>
> And then start the database.
> [jet@halodev-jet-01 pg_xact]$ pg_ctl start
> waiting for server to start....2024-12-23 12:06:01.935 CST [1340148] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
> 2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv6 address "::1", port 5432
> 2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv4 address "127.0.0.1", port 5432
> 2024-12-23 12:06:01.943 CST [1340148] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
> 2024-12-23 12:06:01.950 CST [1340151] LOG: database system was shut down at 2024-12-23 12:05:46 CST
> 2024-12-23 12:06:01.956 CST [1340148] LOG: database system is ready to accept connections
> done
> server started
>
> The database started normally. But the INSERT transaction was lost:
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# SELECT * FROM a_test;
> n
> ---
> (0 rows)
>
> postgres=#
>
> BUT! If I do the same steps above, and the only difference is after INSERT, I just do a simple query SELECT * FROM a_test; the INSERT transaction still exists. How could this happen? What's the difference after SELECT?
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# INSERT INTO a_test VALUES (1);
> INSERT 0 1
> postgres=# SELECT * FROM a_test;
> n
> ---
> 1
> (1 row)
>
> postgres=# \q
> [jet@halodev-jet-01 pg_xact]$ pg_ctl stop
> waiting for server to shut down....2024-12-23 12:13:36.577 CST [1340148] LOG: received fast shutdown request
> 2024-12-23 12:13:36.583 CST [1340148] LOG: aborting any active transactions
> 2024-12-23 12:13:36.584 CST [1340148] LOG: background worker "logical replication launcher" (PID 1340154) exited with exit code 1
> 2024-12-23 12:13:36.585 CST [1340149] LOG: shutting down
> 2024-12-23 12:13:36.587 CST [1340149] LOG: checkpoint starting: shutdown immediate
> 2024-12-23 12:13:36.601 CST [1340149] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.006 s, total=0.017 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=0 kB; lsn=0/15548E8, redo lsn=0/15548E8
> 2024-12-23 12:13:36.603 CST [1340148] LOG: database system is shut down
> done
> server stopped
> [jet@halodev-jet-01 pg_xact]$ rm -f 0000
> [jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
> 8+0 records in
> 8+0 records out
> 8192 bytes (8.2 kB, 8.0 KiB) copied, 8.9862e-05 s, 91.2 MB/s
> [jet@halodev-jet-01 pg_xact]$ pg_ctl start
> waiting for server to start....2024-12-23 12:13:48.430 CST [1340205] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
> 2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv6 address "::1", port 5432
> 2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv4 address "127.0.0.1", port 5432
> 2024-12-23 12:13:48.437 CST [1340205] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
> 2024-12-23 12:13:48.445 CST [1340208] LOG: database system was shut down at 2024-12-23 12:13:36 CST
> 2024-12-23 12:13:48.451 CST [1340205] LOG: database system is ready to accept connections
> done
> server started
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# SELECT * FROM a_test;
> n
> ---
> 1
> (1 row)
>
> Jet
>
>
>
>
>
Because after you do a select, the commit info of xmin/xmax has been set
to the tuple header, you don't need to check clog after that.
You can use the *pageinspect* extension to verify that.
--
Regards
Junwang Zhao
But think about such a scenario, after INSERT some tuples, and COMMIT also succeed.
And after a while, a system error occurred and unfortunately, just caused clog file corrupted.
So we need to restore the database from backup just because of the tiny clog file corrupted.
Is there any chance to improve this?
Junwang Zhao<zhjwpku@gmail.com> 在 2024年12月23日 周一 13:53 写道:
On Mon, Dec 23, 2024 at 1:43 PM 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote:
>
> Yes, i think you're right. The tuple will be set to HEAP_XMIN_COMMITTED when doing the visibility checking, but don't you think it's a little weird? Or may cause some confusion?
This is kind of an optimization, when there are lots of modifications
in a single
transaction, we don't know the transaction status until the last commit/abort,
we only need to do one write to clog instead of updating the visibility info of
all tuples, so no need to bookkeeping all the modified tuples somewhere.
>
> Yes, i think you're right. The tuple will be set to HEAP_XMIN_COMMITTED when doing the visibility checking, but don't you think it's a little weird? Or may cause some confusion?
This is kind of an optimization, when there are lots of modifications
in a single
transaction, we don't know the transaction status until the last commit/abort,
we only need to do one write to clog instead of updating the visibility info of
all tuples, so no need to bookkeeping all the modified tuples somewhere.
"=?utf-8?B?56ug5pmo5pumQOaYk+aZr+enkeaKgA==?=" <zhangchenxi@halodbtech.com> writes: > And after a while, a system error occurred and unfortunately, just caused clog file corrupted. > So we need to restore the database from backup just because of the tiny clog file corrupted. I'm not seeing a large difference between this complaint and whining because Unix doesn't have a way to recover from "sudo rm -rf /". clog is critical data: if you mess with it you will destroy your database. It is not the only critical data in the system, either. > Is there any chance to improve this? We're not in the business of building doubly- or triply-redundant storage. The cost/benefit just isn't attractive for very many people. If you don't trust your hardware, you can put your storage on RAID, or replicate the database, etc. If you have a DBA who thinks it's cool to remove files they don't understand the purpose of, the answer is to fire that DBA. regards, tom lane
Thanks tom.
But what I think is we may provide a better experience. Consider the below example:
[jet@halodev-jet-01 data]$ psql
postgres=# \q
psql (16.6)
Type "help" for help.
postgres=# CREATE TABLE a_test (n INT);
CREATE TABLE
postgres=# INSERT INTO a_test VALUES (1);
INSERT 0 1
postgres=# 2024-12-23 16:56:11.023 CST [1356476] FATAL: terminating connection due to unexpected postmaster exit
postgres=#
postgres=# \q
[jet@halodev-jet-01 data]$
### Here we simulate crash and clog file corrupt (delete the clog file).
[jet@halodev-jet-01 data]$ pg_ctl start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2024-12-23 16:57:24.036 CST [1356495] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
2024-12-23 16:57:24.036 CST [1356495] LOG: listening on IPv6 address "::1", port 5432
2024-12-23 16:57:24.036 CST [1356495] LOG: listening on IPv4 address "127.0.0.1", port 5432
2024-12-23 16:57:24.046 CST [1356495] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-12-23 16:57:24.055 CST [1356498] LOG: database system was interrupted; last known up at 2024-12-23 16:54:56 CST
2024-12-23 16:57:24.147 CST [1356498] LOG: database system was not properly shut down; automatic recovery in progress
2024-12-23 16:57:24.151 CST [1356498] LOG: redo starts at 0/14E4D20
2024-12-23 16:57:24.152 CST [1356498] LOG: file "pg_xact/0000" doesn't exist, reading as zeroes
2024-12-23 16:57:24.152 CST [1356498] CONTEXT: WAL redo at 0/14FCAB0 for Transaction/COMMIT: 2024-12-23 16:55:13.531244+08; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608 relcache 16384
2024-12-23 16:57:24.152 CST [1356498] LOG: invalid record length at 0/14FCD20: expected at least 24, got 0
2024-12-23 16:57:24.152 CST [1356498] LOG: redo done at 0/14FCCE8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-12-23 16:57:24.157 CST [1356496] LOG: checkpoint starting: end-of-recovery immediate wait
2024-12-23 16:57:24.184 CST [1356496] LOG: checkpoint complete: wrote 27 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.014 s, total=0.030 s; sync files=22, longest=0.006 s, average=0.001 s; distance=96 kB, estimate=96 kB; lsn=0/14FCD20, redo lsn=0/14FCD20
2024-12-23 16:57:24.188 CST [1356495] LOG: database system is ready to accept connections
done
server started
[jet@halodev-jet-01 data]$ psql
psql (16.6)
Type "help" for help.
postgres=# SELECT * FROM a_test;
n
---
1
(1 row)
We can see that when database restart, it will try to recover. So I think we may improve database reliable in some scenarios if just clog file corrupted.
Regards,
Jet
Tom Lane<tgl@sss.pgh.pa.us> 在 2024年12月23日 周一 14:50 写道:
"章晨曦@易景科技" <zhangchenxi@halodbtech.com> writes:
> And after a while, a system error occurred and unfortunately, just caused clog file corrupted.
> So we need to restore the database from backup just because of the tiny clog file corrupted.
I'm not seeing a large difference between this complaint
and whining because Unix doesn't have a way to recover from
"sudo rm -rf /". clog is critical data: if you mess with
it you will destroy your database. It is not the only
critical data in the system, either.
> Is there any chance to improve this?
We're not in the business of building doubly- or triply-redundant
storage. The cost/benefit just isn't attractive for very many people.
If you don't trust your hardware, you can put your storage on RAID,
or replicate the database, etc. If you have a DBA who thinks it's
cool to remove files they don't understand the purpose of, the answer
is to fire that DBA.
regards, tom lane
> And after a while, a system error occurred and unfortunately, just caused clog file corrupted.
> So we need to restore the database from backup just because of the tiny clog file corrupted.
I'm not seeing a large difference between this complaint
and whining because Unix doesn't have a way to recover from
"sudo rm -rf /". clog is critical data: if you mess with
it you will destroy your database. It is not the only
critical data in the system, either.
> Is there any chance to improve this?
We're not in the business of building doubly- or triply-redundant
storage. The cost/benefit just isn't attractive for very many people.
If you don't trust your hardware, you can put your storage on RAID,
or replicate the database, etc. If you have a DBA who thinks it's
cool to remove files they don't understand the purpose of, the answer
is to fire that DBA.
regards, tom lane
> On 23 Dec 2024, at 14:12, 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote: > > we simulate crash and clog file corrupt (delete the clog file) Clog file cannot disappear as a result of a crash. What makes you think otherwise? Best regards, Andrey Borodin.
S1:
When database normal shutdown, and clog file missing, the database cannot restart. And if make a zero clog file, database started but may cause transactions lost.
S2:
When database crashed, and clog file missing, when database restart, it will try to recover. And everything is ok
So I think we may improve the database more reliable in some scenarios, e.g. Only clog file corrupted or missing, like S1
Regards,
Jet
Andrey Borodin<x4mmm@yandex-team.ru> 在 2024年12月23日 周一 17:51 写道:
> On 23 Dec 2024, at 14:12, 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote:
>
> we simulate crash and clog file corrupt (delete the clog file)
Clog file cannot disappear as a result of a crash. What makes you think otherwise?
Best regards, Andrey Borodin.
> On 23 Dec 2024, at 15:03, 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote: > > S1: > When database normal shutdown, and clog file missing, the database cannot restart. And if make a zero clog file, databasestarted but may cause transactions lost. > > S2: > When database crashed, and clog file missing, when database restart, it will try to recover. And everything is ok > > So I think we may improve the database more reliable in some scenarios, e.g. Only clog file corrupted or missing, likeS1 I still do not get it. Why clog file would be missing? Best regards, Andrey Borodin.
Human errors, disk errors, or even cosmic rays ...
Regards,
Jet
Andrey Borodin<x4mmm@yandex-team.ru> 在 2024年12月23日 周一 18:15 写道:
> On 23 Dec 2024, at 15:03, 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote:
>
> S1:
> When database normal shutdown, and clog file missing, the database cannot restart. And if make a zero clog file, database started but may cause transactions lost.
>
> S2:
> When database crashed, and clog file missing, when database restart, it will try to recover. And everything is ok
>
> So I think we may improve the database more reliable in some scenarios, e.g. Only clog file corrupted or missing, like S1
I still do not get it. Why clog file would be missing?
Best regards, Andrey Borodin.
> On 23 Dec 2024, at 11:36, 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote: > Human errors, disk errors, or even cosmic rays ... That sounds exactly like the scenario which backups are made for. In all these error cases there is no way of being sure that no other part of the system has been compromised, so you restore from backup with WAL replay. -- Daniel Gustafsson
> On 23 Dec 2024, at 15:36, 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote: > > disk errors, or even cosmic rays ... It is file system responsibility to protect directory structure. > Human errors, We cannot protect database from this with technical measures. I concur with Tom about firing DBA who removed xact log file. Best regards, Andrey Borodin.
Yes, of course we can solve this by restoring from backup.
But if the database volumn is large, say, 100TB or more, the cost
is really too expensive just because the tiny clog file corrupt.
Regards,
Jet
Daniel Gustafsson<daniel@yesql.se> 在 2024年12月23日 周一 18:43 写道:
> On 23 Dec 2024, at 11:36, 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote:
> Human errors, disk errors, or even cosmic rays ...
That sounds exactly like the scenario which backups are made for. In all these
error cases there is no way of being sure that no other part of the system has
been compromised, so you restore from backup with WAL replay.
--
Daniel Gustafsson
> Human errors, disk errors, or even cosmic rays ...
That sounds exactly like the scenario which backups are made for. In all these
error cases there is no way of being sure that no other part of the system has
been compromised, so you restore from backup with WAL replay.
--
Daniel Gustafsson
> On 23 Dec 2024, at 16:01, 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote: > > Yes, of course we can solve this by restoring from backup. > But if the database volumn is large, say, 100TB or more, the cost > is really too expensive just because the tiny clog file corrupt. There are corresponding commitfest entries [0,1], where clog file corruption was discussed. See Emails section. Thanks! Best regards Andrey Borodin. [0] https://commitfest.postgresql.org/16/1462/ [1] https://commitfest.postgresql.org/51/4709/
On 12/23/24 06:01, 章晨曦@易景科技 wrote: > Yes, of course we can solve this by restoring from backup. > But if the database volumn is large, say, 100TB or more, the cost > is really too expensive just because the tiny clog file corrupt. Why would you trust the other 99.999 TB if something corrupted the clog file? Regards, Jan > > Regards, > Jet > > Daniel Gustafsson<daniel@yesql.se> 在2024年12月23日 周一 18:43 写道: > > On 23 Dec 2024, at 11:36, 章晨曦@易景科技 > <zhangchenxi@halodbtech.com> wrote: > > > Human errors, disk errors, or even cosmic rays ... > > That sounds exactly like the scenario which backups are made for. In > all these > error cases there is no way of being sure that no other part of the > system has > been compromised, so you restore from backup with WAL replay. > > -- > Daniel Gustafsson >
>Why would you trust the other 99.999 TB if >something corrupted the clog
>file?
+1 >file?
On Tue, 24 Dec 2024 at 02:06, Jan Wieck <jan@wi3ck.info> wrote:
On 12/23/24 06:01, 章晨曦@易景科技 wrote:
> Yes, of course we can solve this by restoring from backup.
> But if the database volumn is large, say, 100TB or more, the cost
> is really too expensive just because the tiny clog file corrupt.
Why would you trust the other 99.999 TB if something corrupted the clog
file?
Regards, Jan
>
> Regards,
> Jet
>
> Daniel Gustafsson<daniel@yesql.se> 在2024年12月23日 周一 18:43 写道:
> > On 23 Dec 2024, at 11:36, 章晨曦@易景科技
> <zhangchenxi@halodbtech.com> wrote:
>
> > Human errors, disk errors, or even cosmic rays ...
>
> That sounds exactly like the scenario which backups are made for. In
> all these
> error cases there is no way of being sure that no other part of the
> system has
> been compromised, so you restore from backup with WAL replay.
>
> --
> Daniel Gustafsson
>
HI Jan Wieck
> Why would you trust the other 99.999 TB if something corrupted the clog
> file?However, on the other hand, oracle has many solutions to open the database after the data files are damaged, and his intention should be to start the database even if some critical files are damaged to salvage the data inside.Because there's a lot of that going on in the real world, and you can't change that by firing the dba.
On Tue, Dec 24, 2024 at 9:45 AM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
>Why would you trust the other 99.999 TB if >something corrupted the clog+1
>file?On Tue, 24 Dec 2024 at 02:06, Jan Wieck <jan@wi3ck.info> wrote:On 12/23/24 06:01, 章晨曦@易景科技 wrote:
> Yes, of course we can solve this by restoring from backup.
> But if the database volumn is large, say, 100TB or more, the cost
> is really too expensive just because the tiny clog file corrupt.
Why would you trust the other 99.999 TB if something corrupted the clog
file?
Regards, Jan
>
> Regards,
> Jet
>
> Daniel Gustafsson<daniel@yesql.se> 在2024年12月23日 周一 18:43 写道:
> > On 23 Dec 2024, at 11:36, 章晨曦@易景科技
> <zhangchenxi@halodbtech.com> wrote:
>
> > Human errors, disk errors, or even cosmic rays ...
>
> That sounds exactly like the scenario which backups are made for. In
> all these
> error cases there is no way of being sure that no other part of the
> system has
> been compromised, so you restore from backup with WAL replay.
>
> --
> Daniel Gustafsson
>
Hi Michael Paquier
Thank you for the information you provided,
Thanks
Thank you for the information you provided,
Thanks
On Tue, 24 Dec 2024 at 13:13, Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Dec 24, 2024 at 09:55:09AM +0800, wenhui qiu wrote:
> However, on the other hand, oracle has many solutions to open the database
> after the data files are damaged, and his intention should be to start the
> database even if some critical files are damaged to salvage the data
> inside.Because there's a lot of that going on in the real world, and you
> can't change that by firing the dba.
So does Postgres, at least partially depending on the state of the
cluster (for example, see ways to bypass catalog indexes to be able to
log in). FWIW, I can be easily convinced that more tooling in this
area to help folks do low-level chirurgy on the on-disk files of a
data folder while a server is running is cool to have, like
pg_surgery:
https://www.postgresql.org/docs/devel/pgsurgery.html
If you have suggestions about ways that would help, feel free to
propose them.
Anyway, if it comes to corruption, these tools should only be used if
you don't have a backup, and only to retrieve data from a data folder
to then do a logical copy of it to a freshly initialized data folder,
most likely on a different host or partition, if you suspect that your
disk is at fault for example.
If you see an issue in the backend code, even a tiny window where we
could lose data because we are missing a flush or manipulate files so
as consistency is not guaranteed post-crash, that would be worth
discussing on the ground of being a legit bug. Manual removal of
on-disk files is not that.
--
Michael