Re: Re: transaction lost when delete clog file after normal shutdown - Mailing list pgsql-hackers
From | 章晨曦@易景科技 |
---|---|
Subject | Re: Re: transaction lost when delete clog file after normal shutdown |
Date | |
Msg-id | tencent_4EC65210183836C77A44FCA3@qq.com Whole thread Raw |
In response to | transaction lost when delete clog file after normal shutdown ("章晨曦@易景科技" <zhangchenxi@halodbtech.com>) |
List | pgsql-hackers |
Yes, i think you're right. The tuple will be set to HEAP_XMIN_COMMITTED when doing the visibility checking, but don't you think it's a little weird? Or may cause some confusion?
Thanks,
Jet
Junwang Zhao<zhjwpku@gmail.com> 在 2024年12月23日 周一 12:35 写道:
On Mon, Dec 23, 2024 at 12:17 PM 章晨曦@易景科技 <zhangchenxi@halodbtech.com> wrote:
>
> Hi there,
>
> I notice a little strange things of clog behaviours.
>
> When I create a test table, say, a_test, the table only contains a INT type column.
>
> postgres=# CREATE TABLE a_test (n INT);
> CREATE TABLE
>
> and then insert one tuple:
>
> postgres=# INSERT INTO a_test VALUES (1);
> INSERT 0 1
>
> And then quit, shutdown the database.
> [jet@halodev-jet-01 pg_xact]$ pg_ctl stop
> waiting for server to shut down....2024-12-23 12:05:46.568 CST [1340130] LOG: received fast shutdown request
> 2024-12-23 12:05:46.577 CST [1340130] LOG: aborting any active transactions
> 2024-12-23 12:05:46.579 CST [1340130] LOG: background worker "logical replication launcher" (PID 1340136) exited with exit code 1
> 2024-12-23 12:05:46.579 CST [1340131] LOG: shutting down
> 2024-12-23 12:05:46.583 CST [1340131] LOG: checkpoint starting: shutdown immediate
> 2024-12-23 12:05:46.604 CST [1340131] LOG: checkpoint complete: wrote 21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.011 s, total=0.025 s; sync files=17, longest=0.005 s, average=0.001 s; distance=88 kB, estimate=88 kB; lsn=0/1554658, redo lsn=0/1554658
> 2024-12-23 12:05:46.607 CST [1340130] LOG: database system is shut down
> done
> server stopped
>
> After database stopped, change to the pg_xact directory, and delete the clog file, and then create a dummy clog file.
> [jet@halodev-jet-01 pg_xact]$ ls -ltr
> total 8
> -rw-rw-r--. 1 jet jet 8192 Dec 23 12:05 0000
> [jet@halodev-jet-01 pg_xact]$ rm -f 0000
> [jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
> 8+0 records in
> 8+0 records out
> 8192 bytes (8.2 kB, 8.0 KiB) copied, 8.5894e-05 s, 95.4 MB/s
>
> And then start the database.
> [jet@halodev-jet-01 pg_xact]$ pg_ctl start
> waiting for server to start....2024-12-23 12:06:01.935 CST [1340148] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
> 2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv6 address "::1", port 5432
> 2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv4 address "127.0.0.1", port 5432
> 2024-12-23 12:06:01.943 CST [1340148] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
> 2024-12-23 12:06:01.950 CST [1340151] LOG: database system was shut down at 2024-12-23 12:05:46 CST
> 2024-12-23 12:06:01.956 CST [1340148] LOG: database system is ready to accept connections
> done
> server started
>
> The database started normally. But the INSERT transaction was lost:
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# SELECT * FROM a_test;
> n
> ---
> (0 rows)
>
> postgres=#
>
> BUT! If I do the same steps above, and the only difference is after INSERT, I just do a simple query SELECT * FROM a_test; the INSERT transaction still exists. How could this happen? What's the difference after SELECT?
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# INSERT INTO a_test VALUES (1);
> INSERT 0 1
> postgres=# SELECT * FROM a_test;
> n
> ---
> 1
> (1 row)
>
> postgres=# \q
> [jet@halodev-jet-01 pg_xact]$ pg_ctl stop
> waiting for server to shut down....2024-12-23 12:13:36.577 CST [1340148] LOG: received fast shutdown request
> 2024-12-23 12:13:36.583 CST [1340148] LOG: aborting any active transactions
> 2024-12-23 12:13:36.584 CST [1340148] LOG: background worker "logical replication launcher" (PID 1340154) exited with exit code 1
> 2024-12-23 12:13:36.585 CST [1340149] LOG: shutting down
> 2024-12-23 12:13:36.587 CST [1340149] LOG: checkpoint starting: shutdown immediate
> 2024-12-23 12:13:36.601 CST [1340149] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.006 s, total=0.017 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=0 kB; lsn=0/15548E8, redo lsn=0/15548E8
> 2024-12-23 12:13:36.603 CST [1340148] LOG: database system is shut down
> done
> server stopped
> [jet@halodev-jet-01 pg_xact]$ rm -f 0000
> [jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
> 8+0 records in
> 8+0 records out
> 8192 bytes (8.2 kB, 8.0 KiB) copied, 8.9862e-05 s, 91.2 MB/s
> [jet@halodev-jet-01 pg_xact]$ pg_ctl start
> waiting for server to start....2024-12-23 12:13:48.430 CST [1340205] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
> 2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv6 address "::1", port 5432
> 2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv4 address "127.0.0.1", port 5432
> 2024-12-23 12:13:48.437 CST [1340205] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
> 2024-12-23 12:13:48.445 CST [1340208] LOG: database system was shut down at 2024-12-23 12:13:36 CST
> 2024-12-23 12:13:48.451 CST [1340205] LOG: database system is ready to accept connections
> done
> server started
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# SELECT * FROM a_test;
> n
> ---
> 1
> (1 row)
>
> Jet
>
>
>
>
>
Because after you do a select, the commit info of xmin/xmax has been set
to the tuple header, you don't need to check clog after that.
You can use the *pageinspect* extension to verify that.
--
Regards
Junwang Zhao
>
> Hi there,
>
> I notice a little strange things of clog behaviours.
>
> When I create a test table, say, a_test, the table only contains a INT type column.
>
> postgres=# CREATE TABLE a_test (n INT);
> CREATE TABLE
>
> and then insert one tuple:
>
> postgres=# INSERT INTO a_test VALUES (1);
> INSERT 0 1
>
> And then quit, shutdown the database.
> [jet@halodev-jet-01 pg_xact]$ pg_ctl stop
> waiting for server to shut down....2024-12-23 12:05:46.568 CST [1340130] LOG: received fast shutdown request
> 2024-12-23 12:05:46.577 CST [1340130] LOG: aborting any active transactions
> 2024-12-23 12:05:46.579 CST [1340130] LOG: background worker "logical replication launcher" (PID 1340136) exited with exit code 1
> 2024-12-23 12:05:46.579 CST [1340131] LOG: shutting down
> 2024-12-23 12:05:46.583 CST [1340131] LOG: checkpoint starting: shutdown immediate
> 2024-12-23 12:05:46.604 CST [1340131] LOG: checkpoint complete: wrote 21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.011 s, total=0.025 s; sync files=17, longest=0.005 s, average=0.001 s; distance=88 kB, estimate=88 kB; lsn=0/1554658, redo lsn=0/1554658
> 2024-12-23 12:05:46.607 CST [1340130] LOG: database system is shut down
> done
> server stopped
>
> After database stopped, change to the pg_xact directory, and delete the clog file, and then create a dummy clog file.
> [jet@halodev-jet-01 pg_xact]$ ls -ltr
> total 8
> -rw-rw-r--. 1 jet jet 8192 Dec 23 12:05 0000
> [jet@halodev-jet-01 pg_xact]$ rm -f 0000
> [jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
> 8+0 records in
> 8+0 records out
> 8192 bytes (8.2 kB, 8.0 KiB) copied, 8.5894e-05 s, 95.4 MB/s
>
> And then start the database.
> [jet@halodev-jet-01 pg_xact]$ pg_ctl start
> waiting for server to start....2024-12-23 12:06:01.935 CST [1340148] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
> 2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv6 address "::1", port 5432
> 2024-12-23 12:06:01.936 CST [1340148] LOG: listening on IPv4 address "127.0.0.1", port 5432
> 2024-12-23 12:06:01.943 CST [1340148] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
> 2024-12-23 12:06:01.950 CST [1340151] LOG: database system was shut down at 2024-12-23 12:05:46 CST
> 2024-12-23 12:06:01.956 CST [1340148] LOG: database system is ready to accept connections
> done
> server started
>
> The database started normally. But the INSERT transaction was lost:
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# SELECT * FROM a_test;
> n
> ---
> (0 rows)
>
> postgres=#
>
> BUT! If I do the same steps above, and the only difference is after INSERT, I just do a simple query SELECT * FROM a_test; the INSERT transaction still exists. How could this happen? What's the difference after SELECT?
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# INSERT INTO a_test VALUES (1);
> INSERT 0 1
> postgres=# SELECT * FROM a_test;
> n
> ---
> 1
> (1 row)
>
> postgres=# \q
> [jet@halodev-jet-01 pg_xact]$ pg_ctl stop
> waiting for server to shut down....2024-12-23 12:13:36.577 CST [1340148] LOG: received fast shutdown request
> 2024-12-23 12:13:36.583 CST [1340148] LOG: aborting any active transactions
> 2024-12-23 12:13:36.584 CST [1340148] LOG: background worker "logical replication launcher" (PID 1340154) exited with exit code 1
> 2024-12-23 12:13:36.585 CST [1340149] LOG: shutting down
> 2024-12-23 12:13:36.587 CST [1340149] LOG: checkpoint starting: shutdown immediate
> 2024-12-23 12:13:36.601 CST [1340149] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.006 s, total=0.017 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=0 kB; lsn=0/15548E8, redo lsn=0/15548E8
> 2024-12-23 12:13:36.603 CST [1340148] LOG: database system is shut down
> done
> server stopped
> [jet@halodev-jet-01 pg_xact]$ rm -f 0000
> [jet@halodev-jet-01 pg_xact]$ dd if=/dev/zero of=./0000 bs=1024 count=8
> 8+0 records in
> 8+0 records out
> 8192 bytes (8.2 kB, 8.0 KiB) copied, 8.9862e-05 s, 91.2 MB/s
> [jet@halodev-jet-01 pg_xact]$ pg_ctl start
> waiting for server to start....2024-12-23 12:13:48.430 CST [1340205] LOG: starting PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
> 2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv6 address "::1", port 5432
> 2024-12-23 12:13:48.430 CST [1340205] LOG: listening on IPv4 address "127.0.0.1", port 5432
> 2024-12-23 12:13:48.437 CST [1340205] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
> 2024-12-23 12:13:48.445 CST [1340208] LOG: database system was shut down at 2024-12-23 12:13:36 CST
> 2024-12-23 12:13:48.451 CST [1340205] LOG: database system is ready to accept connections
> done
> server started
> [jet@halodev-jet-01 pg_xact]$ psql
> psql (16.6)
> Type "help" for help.
>
> postgres=# SELECT * FROM a_test;
> n
> ---
> 1
> (1 row)
>
> Jet
>
>
>
>
>
Because after you do a select, the commit info of xmin/xmax has been set
to the tuple header, you don't need to check clog after that.
You can use the *pageinspect* extension to verify that.
--
Regards
Junwang Zhao
pgsql-hackers by date: