Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition - Mailing list pgsql-bugs
From | tender wang |
---|---|
Subject | Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition |
Date | |
Msg-id | CAHewXNnayN3NM1HfaOCejk=sGfSva6ZDArWxKiTxL7PdDHRtMw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition (tender wang <tndrwang@gmail.com>) |
Responses |
Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
|
List | pgsql-bugs |
I tried to analyze the issue, and I found that it might be caused by this commit:
commit dad50f677c42de207168a3f08982ba23c9fc6720
bufmgr: Acquire and clean victim buffer separately
Before this dad50f677 commit, the LocalBufferAlloc() will do below operation:
/*
* it's all ours now.
*/
bufHdr->tag = newTag;
buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_IO_ERROR);
buf_state |= BM_TAG_VALID;
buf_state &= ~BUF_USAGECOUNT_MASK;
buf_state += BUF_USAGECOUNT_ONE;
* it's all ours now.
*/
bufHdr->tag = newTag;
buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_IO_ERROR);
buf_state |= BM_TAG_VALID;
buf_state &= ~BUF_USAGECOUNT_MASK;
buf_state += BUF_USAGECOUNT_ONE;
Now after dad50f677, GetLocalVictimBuffer() doesn't do above operations, so my reported issue will happen.
In my reported issue:
f 3
(gdb) p /x buf_state
$1 = 0x1000000
$1 = 0x1000000
In GetLocalVictimBuffer(), buf_state has no choices to do: buf_state &= ~(BUF_FLAG_MASK | BUF_USAGECOUNT_MASK);
I try to fix this issue in attached patch according to LocalBufferAlloc() logic, but I'm not 100% understanded all detailed about bufmgr.
So any thoughts?
tender wang <tndrwang@gmail.com> 于2023年12月26日周二 18:51写道:
Thanks for the report. I can reproduce your reported bug on master. And I find another assert failed when run below SQL:psql (17devel)
Type "help" for help.
postgres=# CREATE UNLOGGED TABLE filler(a int, b text STORAGE plain);
CREATE TABLE
postgres=# INSERT INTO filler SELECT g, repeat('x', 1000) FROM generate_series(1,
postgres(# 50000) g;
INSERT 0 50000
postgres=# CREATE TEMP TABLE tbl(a int);
CREATE TABLE
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
ERROR: could not extend file "base/5/t3_16389": No space left on device
HINT: Check free disk space.
postgres=# DROP TABLE filler;
DROP TABLE
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT 0 200000
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f9d3d8b1859 in __GI_abort () at abort.c:79
#2 0x000055f83501c868 in ExceptionalCondition (conditionName=0x55f8351fcb78 "!(buf_state & (BM_VALID | BM_TAG_VALID | BM_DIRTY | BM_JUST_DIRTIED))", fileName=0x55f8351fca4b "localbuf.c",
lineNumber=402) at assert.c:66
#3 0x000055f834df05ab in ExtendBufferedRelLocal (bmr=..., fork=MAIN_FORKNUM, flags=8, extend_by=1, extend_upto=4294967295, buffers=0x7ffff3ed1530, extended_by=0x7ffff3ed13fc)
at localbuf.c:402
#4 0x000055f834de7a0a in ExtendBufferedRelCommon (bmr=..., fork=MAIN_FORKNUM, strategy=0x0, flags=8, extend_by=1, extend_upto=4294967295, buffers=0x7ffff3ed1530, extended_by=0x7ffff3ed14dc)
at bufmgr.c:1828
#5 0x000055f834de6393 in ExtendBufferedRelBy (bmr=..., fork=MAIN_FORKNUM, strategy=0x0, flags=8, extend_by=1, buffers=0x7ffff3ed1530, extended_by=0x7ffff3ed14dc) at bufmgr.c:889
#6 0x000055f83492a240 in RelationAddBlocks (relation=0x7f9d325a7648, bistate=0x0, num_pages=1, use_fsm=true, did_unlock=0x7ffff3ed168d) at hio.c:342
#7 0x000055f83492ab67 in RelationGetBufferForTuple (relation=0x7f9d325a7648, len=32, otherBuffer=0, options=0, bistate=0x0, vmbuffer=0x7ffff3ed1714, vmbuffer_other=0x0, num_pages=1)
at hio.c:768
#8 0x000055f834910840 in heap_insert (relation=0x7f9d325a7648, tup=0x55f83786e898, cid=0, options=0, bistate=0x0) at heapam.c:1853
#9 0x000055f834920cc0 in heapam_tuple_insert (relation=0x7f9d325a7648, slot=0x55f83786e808, cid=0, options=0, bistate=0x0) at heapam_handler.c:252
#10 0x000055f834bd582a in table_tuple_insert (rel=0x7f9d325a7648, slot=0x55f83786e808, cid=0, options=0, bistate=0x0) at ../../../src/include/access/tableam.h:1400
#11 0x000055f834bd7859 in ExecInsert (context=0x7ffff3ed1970, resultRelInfo=0x55f836fe5ed0, slot=0x55f83786e808, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0)
at nodeModifyTable.c:1133
#12 0x000055f834bdbbae in ExecModifyTable (pstate=0x55f836fe5cc0) at nodeModifyTable.c:3806
#13 0x000055f834b9a6cb in ExecProcNodeFirst (node=0x55f836fe5cc0) at execProcnode.c:464
#14 0x000055f834b8db69 in ExecProcNode (node=0x55f836fe5cc0) at ../../../src/include/executor/executor.h:273
#15 0x000055f834b9096f in ExecutePlan (estate=0x55f836fe5a30, planstate=0x55f836fe5cc0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0,
direction=ForwardScanDirection, dest=0x55f836ff4378, execute_once=true) at execMain.c:1670
#16 0x000055f834b8e20f in standard_ExecutorRun (queryDesc=0x55f836f35a20, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:365
#17 0x000055f834b8e033 in ExecutorRun (queryDesc=0x55f836f35a20, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:309
#18 0x000055f834e3f27a in ProcessQuery (plan=0x55f836ff4218, sourceText=0x55f836f0b4b0 "INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;", params=0x0, queryEnv=0x0,
dest=0x55f836ff4378, qc=0x7ffff3ed1dd0) at pquery.c:160
#19 0x000055f834e40d99 in PortalRunMulti (portal=0x55f836f86a00, isTopLevel=true, setHoldSnapshot=false, dest=0x55f836ff4378, altdest=0x55f836ff4378, qc=0x7ffff3ed1dd0) at pquery.c:1277
#20 0x000055f834e402bf in PortalRun (portal=0x55f836f86a00, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x55f836ff4378, altdest=0x55f836ff4378, qc=0x7ffff3ed1dd0)
at pquery.c:791
#21 0x000055f834e39478 in exec_simple_query (query_string=0x55f836f0b4b0 "INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;") at postgres.c:1273
#22 0x000055f834e3e105 in PostgresMain (dbname=0x55f836f42870 "postgres", username=0x55f836f42858 "gpadmin") at postgres.c:4653
#23 0x000055f834d63393 in BackendRun (port=0x55f836f39fd0) at postmaster.c:4422
#24 0x000055f834d62a4c in BackendStartup (port=0x55f836f39fd0) at postmaster.c:4101
#25 0x000055f834d5f358 in ServerLoop () at postmaster.c:1769
#26 0x000055f834d5ec7e in PostmasterMain (argc=3, argv=0x55f836f05b80) at postmaster.c:1468
#27 0x000055f834c1525d in main (argc=3, argv=0x55f836f05b80) at main.c:198PG Bug reporting form <noreply@postgresql.org> 于2023年12月26日周二 17:32写道:The following bug has been logged on the website:
Bug reference: 18259
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 16.1
Operating system: Ubuntu 22.04
Description:
The following script:
mkdir /tmp/100m
sudo mount -t tmpfs -o size=100M tmpfs /tmp/100m
export PGDATA=/tmp/100m/tmpdb
initdb
pg_ctl -l server.log start
cat << 'EOF' | psql
CREATE UNLOGGED TABLE filler(a int, b text STORAGE plain);
INSERT INTO filler SELECT g, repeat('x', 1000) FROM generate_series(1,
50000) g;
CREATE TEMP TABLE tbl(a int);
INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
DROP TABLE filler;
INSERT INTO tbl SELECT g from generate_series(1, 200000) g;
EOF
triggers an assertion failure following "no space left" errors:
...
CREATE TABLE
ERROR: could not extend file "base/5/t3_16391": No space left on device
HINT: Check free disk space.
ERROR: could not extend file "base/5/t3_16391": No space left on device
HINT: Check free disk space.
DROP TABLE
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
TRAP: failed Assert("buf_state & BM_TAG_VALID"), File: "localbuf.c", Line:
390, PID: 25978
The call stack of the failure is:
ExtendBufferedRelLocal at localbuf.c:391:4
ExtendBufferedRelCommon at bufmgr.c:1801:17
ExtendBufferedRelBy at bufmgr.c:862:9
RelationAddBlocks at hio.c:342:16
RelationGetBufferForTuple at hio.c:768:11
heap_insert at heapam.c:1862:11
heapam_tuple_insert at heapam_handler.c:253:2
table_tuple_insert at tableam.h:1402:1
ExecInsert at nodeModifyTable.c:1138:21
ExecModifyTable at nodeModifyTable.c:3810:12
ExecProcNodeFirst at execProcnode.c:465:1
ExecProcNode at executor.h:274:1
ExecutePlan at execMain.c:1670:10
standard_ExecutorRun at execMain.c:365:3
ExecutorRun at execMain.c:310:1
ProcessQuery at pquery.c:165:5
PortalRunMulti at pquery.c:1277:5
PortalRun at pquery.c:795:5
exec_simple_query at postgres.c:1274:10
PostgresMain at postgres.c:4641:27
ExitPostmaster at postmaster.c:5047:1
BackendStartup at postmaster.c:4196:5
ServerLoop at postmaster.c:1788:6
PostmasterMain at postmaster.c:1466:11
The first bad commit for this anomaly is 31966b15 (and exactly that commit
added the Assert).
With debug logging added in this code within ExtendBufferedRelLocal():
if (found)
{
BufferDesc *existing_hdr =
GetLocalBufferDescriptor(hresult->id);
uint32 buf_state;
UnpinLocalBuffer(BufferDescriptorGetBuffer(victim_buf_hdr));
existing_hdr = GetLocalBufferDescriptor(hresult->id);
PinLocalBuffer(existing_hdr, false);
buffers[i] = BufferDescriptorGetBuffer(existing_hdr);
buf_state = pg_atomic_read_u32(&existing_hdr->state);
Assert(buf_state & BM_TAG_VALID);
Assert(!(buf_state & BM_DIRTY));
buf_state &= BM_VALID;
pg_atomic_unlocked_write_u32(&existing_hdr->state, buf_state);
...
I see that it reached for the second INSERT (and NOSPC error) with
existing_hdr->state == 0x2040000, but for the third INSERT I observe
state == 0x0.
Attachment
pgsql-bugs by date: