Re: refactoring relation extension and BufferAlloc(), faster COPY - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: refactoring relation extension and BufferAlloc(), faster COPY |
Date | |
Msg-id | 0b5eb82b-cb99-e0a4-b932-3dc60e2e3926@gmail.com Whole thread Raw |
In response to | Re: refactoring relation extension and BufferAlloc(), faster COPY (Andres Freund <andres@anarazel.de>) |
Responses |
Re: refactoring relation extension and BufferAlloc(), faster COPY
|
List | pgsql-hackers |
Hi Andres, 07.04.2023 11:39, Andres Freund wrote: > Hi, > > On 2023-04-06 18:15:14 -0700, Andres Freund wrote: >> I think it might be worth having a C test for some of the bufmgr.c API. Things >> like testing that retrying a failed relation extension works the second time >> round. > A few hours after this I hit a stupid copy-pasto (21d7c05a5cf) that would > hopefully have been uncovered by such a test... A few days later I've found a new defect introduced with 31966b151. The following script: echo " CREATE TABLE tbl(id int); INSERT INTO tbl(id) SELECT i FROM generate_series(1, 1000) i; DELETE FROM tbl; CHECKPOINT; " | psql -q sleep 2 grep -C2 'automatic vacuum of table ".*.tbl"' server.log tf=$(psql -Aqt -c "SELECT format('%s/%s', pg_database.oid, relfilenode) FROM pg_database, pg_class WHERE datname = current_database() AND relname = 'tbl'") ls -l "$PGDB/base/$tf" pg_ctl -D $PGDB stop -m immediate pg_ctl -D $PGDB -l server.log start with the autovacuum enabled as follows: autovacuum = on log_autovacuum_min_duration = 0 autovacuum_naptime = 1 gives: 2023-04-11 20:56:56.261 MSK [675708] LOG: checkpoint starting: immediate force wait 2023-04-11 20:56:56.324 MSK [675708] LOG: checkpoint complete: wrote 900 buffers (5.5%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.016 s, sync=0.034 s, total=0.063 s; sync files=252, longest=0.017 s, average=0.001 s; distance=4162 kB, estimate=4162 kB; lsn=0/1898588, redo lsn=0/1898550 2023-04-11 20:56:57.558 MSK [676060] LOG: automatic vacuum of table "testdb.public.tbl": index scans: 0 pages: 5 removed, 0 remain, 5 scanned (100.00% of total) tuples: 1000 removed, 0 remain, 0 are dead but not yet removable -rw------- 1 law law 0 апр 11 20:56 .../tmpdb/base/16384/16385 waiting for server to shut down.... done server stopped waiting for server to start.... stopped waiting pg_ctl: could not start server Examine the log output. server stops with the following stack trace: Core was generated by `postgres: startup recovering 000000010000000000000001 '. Program terminated with signal SIGABRT, Aborted. warning: Section `.reg-xstate/790626' in core file too small. #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140209454906240) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140209454906240) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140209454906240) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140209454906240, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007f850ec53476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007f850ec397f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000557950889c0b in ExceptionalCondition ( conditionName=0x557950a67680 "mode == RBM_NORMAL || mode == RBM_ZERO_AND_LOCK || mode == RBM_ZERO_ON_ERROR", fileName=0x557950a673e8 "bufmgr.c", lineNumber=1008) at assert.c:66 #6 0x000055795064f2d0 in ReadBuffer_common (smgr=0x557952739f38, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=4294967295, mode=RBM_ZERO_AND_CLEANUP_LOCK, strategy=0x0, hit=0x7fff22dd648f) at bufmgr.c:1008 #7 0x000055795064ebe7 in ReadBufferWithoutRelcache (rlocator=..., forkNum=MAIN_FORKNUM, blockNum=4294967295, mode=RBM_ZERO_AND_CLEANUP_LOCK, strategy=0x0, permanent=true) at bufmgr.c:800 #8 0x000055795021c0fa in XLogReadBufferExtended (rlocator=..., forknum=MAIN_FORKNUM, blkno=0, mode=RBM_ZERO_AND_CLEANUP_LOCK, recent_buffer=0) at xlogutils.c:536 #9 0x000055795021bd92 in XLogReadBufferForRedoExtended (record=0x5579526c4998, block_id=0 '\000', mode=RBM_NORMAL, get_cleanup_lock=true, buf=0x7fff22dd6598) at xlogutils.c:391 #10 0x00005579501783b1 in heap_xlog_prune (record=0x5579526c4998) at heapam.c:8726 #11 0x000055795017b7db in heap2_redo (record=0x5579526c4998) at heapam.c:9960 #12 0x0000557950215b34 in ApplyWalRecord (xlogreader=0x5579526c4998, record=0x7f85053d0120, replayTLI=0x7fff22dd6720) at xlogrecovery.c:1915 #13 0x0000557950215611 in PerformWalRecovery () at xlogrecovery.c:1746 #14 0x0000557950201ce3 in StartupXLOG () at xlog.c:5433 #15 0x00005579505cb6d2 in StartupProcessMain () at startup.c:267 #16 0x00005579505be9f7 in AuxiliaryProcessMain (auxtype=StartupProcess) at auxprocess.c:141 #17 0x00005579505ca2b5 in StartChildProcess (type=StartupProcess) at postmaster.c:5369 #18 0x00005579505c5224 in PostmasterMain (argc=3, argv=0x5579526c3e70) at postmaster.c:1455 #19 0x000055795047a97d in main (argc=3, argv=0x5579526c3e70) at main.c:200 As I can see, autovacuum removes pages from the table file, and this causes the crash while replaying the record: rmgr: Heap2 len (rec/tot): 60/ 988, tx: 0, lsn: 0/01898600, prev 0/01898588, desc: PRUNE snapshotConflictHorizon 732 nredirected 0 ndead 226, blkref #0: rel 1663/16384/16385 blk 0 FPW Best regards, Alexander
pgsql-hackers by date: