Thread: BUG #18386: Slow connection access after 'vacuum full pg_attribute'

BUG #18386: Slow connection access after 'vacuum full pg_attribute'

From

PG Bug reporting form

Date:

11 March 2024, 12:48:39

The following bug has been logged on the website:

Bug reference:      18386
Logged by:          ji xiaohang
Email address:      1165125080@qq.com
PostgreSQL version: 14.8
Operating system:   centos 7
Description:

In my database, there are a lot of tables and views, resulting in a large
pg_attribute table.
I then did a vacuum full pg_attribute and restarted the database.
Later, it was found that the first new connection took several minutes to
connect to the database.
The stack during connection access is as follows:

#0  0x00007faece149c64 in pread64 () from /usr/lib64/libpthread.so.0
#1  0x000055b919b8bc0c in FileRead (file=11, buffer=0x7fae37909580
"\035\002", amount=8192, offset=1019985920, wait_event_info=167772173) at
fd.c:1889
#2  0x000055b919bbdf39 in mdread (reln=0x55b91b6a9120, forknum=MAIN_FORKNUM,
blocknum=779870, buffer=0x7fae37909580 "\035\002") at md.c:638
#3  0x000055b919bbfefe in smgrread (reln=0x55b91b6a9120,
forknum=MAIN_FORKNUM, blocknum=779870, buffer=0x7fae37909580 "\035\002") at
smgr.c:590
#4  0x000055b919b83797 in ReadBuffer_common (smgr=0x55b91b6a9120,
relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=779870,
mode=RBM_NORMAL, strategy=0x55b91b69b490, hit=0x7ffda51d9e43) at
bufmgr.c:901
#5  0x000055b919b82f8c in ReadBufferExtended (reln=0x7fae35495758,
forkNum=MAIN_FORKNUM, blockNum=779870, mode=RBM_NORMAL,
strategy=0x55b91b69b490) at bufmgr.c:660
#6  0x000055b9197bd18e in heapgetpage (sscan=0x55b91b69a8d0, page=779870) at
heapam.c:383
#7  0x000055b9197bec8b in heapgettup_pagemode (scan=0x55b91b69a8d0,
dir=ForwardScanDirection, nkeys=2, key=0x55b91b69a450) at heapam.c:1079
#8  0x000055b9197bf1a1 in heap_getnextslot (sscan=0x55b91b69a8d0,
direction=ForwardScanDirection, slot=0x55b91b69b1f0) at heapam.c:1372
#9  0x000055b9197e2bfa in table_scan_getnextslot (sscan=0x55b91b69a8d0,
direction=ForwardScanDirection, slot=0x55b91b69b1f0) at
../../../../src/include/access/tableam.h:889
#10 0x000055b9197e34b8 in systable_getnext (sysscan=0x55b91b69a400) at
genam.c:471
#11 0x000055b919d3cb44 in RelationBuildTupleDesc (relation=0x7fae35499178)
at relcache.c:580
#12 0x000055b919d3dd09 in RelationBuildDesc (targetRelId=2662,
insertIt=true) at relcache.c:1207
#13 0x000055b919d42867 in load_critical_index (indexoid=2662, heapoid=1259)
at relcache.c:4146
#14 0x000055b919d423ee in RelationCacheInitializePhase3 () at
relcache.c:3900
#15 0x000055b919d68640 in InitPostgres (in_dbname=0x55b91b615c50 "conetdb",
dboid=0, username=0x55b91b615c38 "root", useroid=0, out_dbname=0x0,
override_allow_connections=false) at postinit.c:1035
#16 0x000055b919bc7761 in PostgresMain (argc=1, argv=0x55b91b615db0,
dbname=0x55b91b615c50 "conetdb", username=0x55b91b615c38 "root") at
postgres.c:3917
#17 0x000055b919b1f4db in BackendRun (port=0x55b91b61bab0) at
postmaster.c:4521
#18 0x000055b919b1ebdf in BackendStartup (port=0x55b91b61bab0) at
postmaster.c:4204
#19 0x000055b919b1ae50 in ServerLoop () at postmaster.c:1729
#20 0x000055b919b1a5ff in PostmasterMain (argc=1, argv=0x55b91b5e21d0) at
postmaster.c:1402
#21 0x000055b919a3be37 in main (argc=1, argv=0x55b91b5e21d0) at main.c:228

After analysis, it is confirmed that the relcache needs to be constructed
for new connections. 
RelationBuildDesc needs to scan the pg_attribute table in sequence to load
the critical index. 
However, the indexes of the system tables being loaded are moved to the end
of the file. 
So the entire pg_attribute needs to be loaded, whereas before only a few
pages had to be scanned.

test=> select ctid, attrelid, attname,attnum from pg_attribute where
attrelid in ('pg_class_oid_index'::regclass,
'pg_attribute_relid_attnum_index'::regclass,
'pg_index_indexrelid_index'::regclass, 'pg_opclass_oid_index'::regclass,
'pg_amproc_fam_proc_index'::regclass,
'pg_rewrite_rel_rulename_index'::regclass,
'pg_trigger_tgrelid_tgname_index'::regclass);
     ctid     | attrelid |     attname     | attnum 
--------------+----------+-----------------+--------
 (2040908,37) |     2655 | amprocfamily    |      1
 (2040908,38) |     2655 | amproclefttype  |      2
 (2040908,39) |     2655 | amprocrighttype |      3
 (2040908,40) |     2655 | amprocnum       |      4
 (2040908,47) |     2659 | attrelid        |      1
 (2040908,48) |     2659 | attnum          |      2
 (2040909,3)  |     2662 | oid             |      1
 (2040909,46) |     2679 | indexrelid      |      1
 (2040910,8)  |     2687 | oid             |      1
 (2040910,24) |     2693 | ev_class        |      1
 (2040910,25) |     2693 | rulename        |      2
 (2040910,47) |     2701 | tgrelid         |      1
 (2040910,48) |     2701 | tgname          |      2
(13 rows)

So I wonder if it's necessary to optimize the vacuum full system table to
culster action by default.

RE: BUG #18386: Slow connection access after 'vacuum full pg_attribute'

From

"1165125080"

Date:

26 March 2024, 02:36:31

The tuple of the system table is moved to the last part of the pg_attribute table by vacuum table. As a result, the connection is slow after the database is restarted. Are there any workarounds or suggestions for this problem.

Re: BUG #18386: Slow connection access after 'vacuum full pg_attribute'

From

Noah Misch

Date:

04 April 2024, 05:28:28

On Tue, Mar 26, 2024 at 10:36:31AM +0800, 1165125080 wrote:
> The tuple of the system table is moved to the last part of the pg_attribute
> table by vacuum table. As a result, the connection is slow after the
> database is restarted. Are there any workarounds or suggestions for this
> problem.

System tables have low-numbered attrelid, so I would try this as a workaround
to move them back to the front:

  cluster pg_attribute using pg_attribute_relid_attnum_index;