Panic during xlog building with big values - Mailing list pgsql-hackers
From | Maksim.Melnikov |
---|---|
Subject | Panic during xlog building with big values |
Date | |
Msg-id | 68fa5363-5f1e-4a9e-8f10-134a5b37c98b@postgrespro.ru Whole thread Raw |
List | pgsql-hackers |
Hello, during testing we found following issue when update records with big values. 2025-07-07 14:40:30.434 MSK [125435] PANIC: oversized WAL record 2025-07-07 14:40:30.434 MSK [125435] DETAIL: WAL record would be 1073742015 bytes (of maximum 1069547520 bytes); rmid 10 flags 64. tested commit: 62a17a92833d1eaa60d8ea372663290942a1e8eb Test description: set wal_level = logical in postgresql.conf CREATE DATABASE regression_big_values WITH TEMPLATE = template0 ENCODING = 'UTF8'; \c regression_big_values CREATE TABLE big_text_test (i int, c1 text, c2 text); -- Mark columns as toastable, but don't try to compress ALTER TABLE big_text_test ALTER c1 SET STORAGE EXTERNAL; ALTER TABLE big_text_test ALTER c2 SET STORAGE EXTERNAL; ALTER TABLE big_text_test REPLICA IDENTITY FULL; INSERT INTO big_text_test (i, c1, c2) VALUES (1, repeat('a', 1073741737), NULL); UPDATE big_text_test SET c2 = repeat('b', 1073741717); (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x000073665c64527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x000073665c6288ff in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000632bc0ff9b78 in errfinish (filename=0x632bc10b7778 "xloginsert.c", lineno=916, funcname=0x632bc10b7d50 <__func__.2> "XLogRecordAssemble") at elog.c:600 #6 0x0000632bc08e41ca in XLogRecordAssemble (rmid=10 '\n', info=64 '@', RedoRecPtr=2252407976, doPageWrites=true, fpw_lsn=0x7ffd23d44bb0, num_fpi=0x7ffd23d44ba4, topxid_included=0x7ffd23d44ba3) at xloginsert.c:916 #7 0x0000632bc08e3851 in XLogInsert (rmid=10 '\n', info=64 '@') at xloginsert.c:520 #8 0x0000632bc083f052 in log_heap_update (reln=0x73665d191c38, oldbuf=6989, newbuf=6989, oldtup=0x7ffd23d44da0, newtup=0x632bfa9d89c0, old_key_tuple=0x7364d09fe048, all_visible_cleared=false, new_all_visible_cleared=false) at heapam.c:9042 #9 0x0000632bc08372ff in heap_update (relation=0x73665d191c38, otid=0x7ffd23d45082, newtup=0x736610a13048, cid=0, crosscheck=0x0, wait=true, tmfd=0x7ffd23d45120, lockmode=0x7ffd23d45034, update_indexes=0x7ffd23d45030) at heapam.c:4132 #10 0x0000632bc0840bd4 in heapam_tuple_update (relation=0x73665d191c38, otid=0x7ffd23d45082, slot=0x632bfa9d7fb8, cid=0, snapshot=0x632bfa979400, crosscheck=0x0, wait=true, tmfd=0x7ffd23d45120, lockmode=0x7ffd23d45034, update_indexes=0x7ffd23d45030) at heapam_handler.c:330 #11 0x0000632bc0b33f21 in table_tuple_update (rel=0x73665d191c38, otid=0x7ffd23d45082, slot=0x632bfa9d7fb8, cid=0, snapshot=0x632bfa979400, crosscheck=0x0, wait=true, tmfd=0x7ffd23d45120, lockmode=0x7ffd23d45034, update_indexes=0x7ffd23d45030) at ../../../src/include/access/tableam.h:1500 #12 0x0000632bc0b37a46 in ExecUpdateAct (context=0x7ffd23d45100, resultRelInfo=0x632bfa9d50e8, tupleid=0x7ffd23d45082, oldtuple=0x0, slot=0x632bfa9d7fb8, canSetTag=true, updateCxt=0x7ffd23d4502c) at nodeModifyTable.c:2301 #13 0x0000632bc0b37fdc in ExecUpdate (context=0x7ffd23d45100, resultRelInfo=0x632bfa9d50e8, tupleid=0x7ffd23d45082, oldtuple=0x0, oldSlot=0x632bfa9d7ea8, slot=0x632bfa9d7fb8, canSetTag=true) at nodeModifyTable.c:2525 #14 0x0000632bc0b3b9bc in ExecModifyTable (pstate=0x632bfa9d4ed8) at nodeModifyTable.c:4507 #15 0x0000632bc0af5585 in ExecProcNodeFirst (node=0x632bfa9d4ed8) at execProcnode.c:469 #16 0x0000632bc0ae7c82 in ExecProcNode (node=0x632bfa9d4ed8) at ../../../src/include/executor/executor.h:313 #17 0x0000632bc0aeab37 in ExecutePlan (queryDesc=0x632bfa8b79d0, operation=CMD_UPDATE, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x632bfa940888) at execMain.c:1679 #18 0x0000632bc0ae8345 in standard_ExecutorRun (queryDesc=0x632bfa8b79d0, direction=ForwardScanDirection, count=0) at execMain.c:367 #19 0x0000632bc0ae81a3 in ExecutorRun (queryDesc=0x632bfa8b79d0, direction=ForwardScanDirection, count=0) at execMain.c:304 #20 0x0000632bc0deac67 in ProcessQuery (plan=0x632bfa93f750, sourceText=0x632bfa8e31a0 "UPDATE big_text_test SET c2 = repeat('b', 1073741717) || 'бвг';", params=0x0, queryEnv=0x0, dest=0x632bfa940888, qc=0x7ffd23d45550) at pquery.c:161 #21 0x0000632bc0dec79a in PortalRunMulti (portal=0x632bfa964e30, isTopLevel=true, setHoldSnapshot=false, dest=0x632bfa940888, altdest=0x632bfa940888, qc=0x7ffd23d45550) at pquery.c:1272 #22 0x0000632bc0debca6 in PortalRun (portal=0x632bfa964e30, count=9223372036854775807, isTopLevel=true, dest=0x632bfa940888, altdest=0x632bfa940888, qc=0x7ffd23d45550) at pquery.c:788 #23 0x0000632bc0de432a in exec_simple_query (query_string=0x632bfa8e31a0 "UPDATE big_text_test SET c2 = repeat('b', 1073741717) || 'бвг';") at postgres.c:1273 #24 0x0000632bc0de9b1b in PostgresMain (dbname=0x632bfa91e510 "regression_big_values", username=0x632bfa91e4f8 "maxim") at postgres.c:4766 #25 0x0000632bc0ddf84e in BackendMain (startup_data=0x7ffd23d45800, startup_data_len=24) at backend_startup.c:124 #26 0x0000632bc0cded62 in postmaster_child_launch (child_type=B_BACKEND, child_slot=2, startup_data=0x7ffd23d45800, startup_data_len=24, client_sock=0x7ffd23d45860) at launch_backend.c:290 #27 0x0000632bc0ce5854 in BackendStartup (client_sock=0x7ffd23d45860) at postmaster.c:3580 #28 0x0000632bc0ce2d23 in ServerLoop () at postmaster.c:1702 #29 0x0000632bc0ce2612 in PostmasterMain (argc=1, argv=0x632bfa89b9c0) at postmaster.c:1400 #30 0x0000632bc0b7eeab in main (argc=1, argv=0x632bfa89b9c0) at main.c:227 The reason is "if (total_len > XLogRecordMaxSize)" check in XLogRecordAssemble() function in xloginsert.c file. So we oversized xlog record max size and log error in critical section. I found thread where this problem was partially discussed: https://www.postgresql.org/message-id/flat/CAEze2WgGiw%2BLZt%2BvHf8tWqB_6VxeLsMeoAuod0N%3Dij1q17n5pw%40mail.gmail.com. Some ideas from thread: "I think the big issue with the patch as it stands is that it will typically cause PANICs on failure, because the record-too-large ERROR be a in a critical section. That's still better than generating a record that can't be replayed, but it's not good." In my opinion we can avoid PANIC in critical section, so it is better check xlog size before critical section. Also I have some ideas how we can do it. So I've attached patch to check oversize xlog record before critical section. It seems, from one side it doesn't complicate codebase and from other side it helps to solve above problem. Best regards, Maksim Melnikov
Attachment
pgsql-hackers by date: