Thread: BUG #18386: Slow connection access after 'vacuum full pg_attribute'

BUG #18386: Slow connection access after 'vacuum full pg_attribute'

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18386
Logged by:          ji xiaohang
Email address:      1165125080@qq.com
PostgreSQL version: 14.8
Operating system:   centos 7
Description:

In my database, there are a lot of tables and views, resulting in a large
pg_attribute table.
I then did a vacuum full pg_attribute and restarted the database.
Later, it was found that the first new connection took several minutes to
connect to the database.
The stack during connection access is as follows:

#0  0x00007faece149c64 in pread64 () from /usr/lib64/libpthread.so.0
#1  0x000055b919b8bc0c in FileRead (file=11, buffer=0x7fae37909580
"\035\002", amount=8192, offset=1019985920, wait_event_info=167772173) at
fd.c:1889
#2  0x000055b919bbdf39 in mdread (reln=0x55b91b6a9120, forknum=MAIN_FORKNUM,
blocknum=779870, buffer=0x7fae37909580 "\035\002") at md.c:638
#3  0x000055b919bbfefe in smgrread (reln=0x55b91b6a9120,
forknum=MAIN_FORKNUM, blocknum=779870, buffer=0x7fae37909580 "\035\002") at
smgr.c:590
#4  0x000055b919b83797 in ReadBuffer_common (smgr=0x55b91b6a9120,
relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=779870,
mode=RBM_NORMAL, strategy=0x55b91b69b490, hit=0x7ffda51d9e43) at
bufmgr.c:901
#5  0x000055b919b82f8c in ReadBufferExtended (reln=0x7fae35495758,
forkNum=MAIN_FORKNUM, blockNum=779870, mode=RBM_NORMAL,
strategy=0x55b91b69b490) at bufmgr.c:660
#6  0x000055b9197bd18e in heapgetpage (sscan=0x55b91b69a8d0, page=779870) at
heapam.c:383
#7  0x000055b9197bec8b in heapgettup_pagemode (scan=0x55b91b69a8d0,
dir=ForwardScanDirection, nkeys=2, key=0x55b91b69a450) at heapam.c:1079
#8  0x000055b9197bf1a1 in heap_getnextslot (sscan=0x55b91b69a8d0,
direction=ForwardScanDirection, slot=0x55b91b69b1f0) at heapam.c:1372
#9  0x000055b9197e2bfa in table_scan_getnextslot (sscan=0x55b91b69a8d0,
direction=ForwardScanDirection, slot=0x55b91b69b1f0) at
../../../../src/include/access/tableam.h:889
#10 0x000055b9197e34b8 in systable_getnext (sysscan=0x55b91b69a400) at
genam.c:471
#11 0x000055b919d3cb44 in RelationBuildTupleDesc (relation=0x7fae35499178)
at relcache.c:580
#12 0x000055b919d3dd09 in RelationBuildDesc (targetRelId=2662,
insertIt=true) at relcache.c:1207
#13 0x000055b919d42867 in load_critical_index (indexoid=2662, heapoid=1259)
at relcache.c:4146
#14 0x000055b919d423ee in RelationCacheInitializePhase3 () at
relcache.c:3900
#15 0x000055b919d68640 in InitPostgres (in_dbname=0x55b91b615c50 "conetdb",
dboid=0, username=0x55b91b615c38 "root", useroid=0, out_dbname=0x0,
override_allow_connections=false) at postinit.c:1035
#16 0x000055b919bc7761 in PostgresMain (argc=1, argv=0x55b91b615db0,
dbname=0x55b91b615c50 "conetdb", username=0x55b91b615c38 "root") at
postgres.c:3917
#17 0x000055b919b1f4db in BackendRun (port=0x55b91b61bab0) at
postmaster.c:4521
#18 0x000055b919b1ebdf in BackendStartup (port=0x55b91b61bab0) at
postmaster.c:4204
#19 0x000055b919b1ae50 in ServerLoop () at postmaster.c:1729
#20 0x000055b919b1a5ff in PostmasterMain (argc=1, argv=0x55b91b5e21d0) at
postmaster.c:1402
#21 0x000055b919a3be37 in main (argc=1, argv=0x55b91b5e21d0) at main.c:228

After analysis, it is confirmed that the relcache needs to be constructed
for new connections. 
RelationBuildDesc needs to scan the pg_attribute table in sequence to load
the critical index. 
However, the indexes of the system tables being loaded are moved to the end
of the file. 
So the entire pg_attribute needs to be loaded, whereas before only a few
pages had to be scanned.

test=> select ctid, attrelid, attname,attnum from pg_attribute where
attrelid in ('pg_class_oid_index'::regclass,
'pg_attribute_relid_attnum_index'::regclass,
'pg_index_indexrelid_index'::regclass, 'pg_opclass_oid_index'::regclass,
'pg_amproc_fam_proc_index'::regclass,
'pg_rewrite_rel_rulename_index'::regclass,
'pg_trigger_tgrelid_tgname_index'::regclass);
     ctid     | attrelid |     attname     | attnum 
--------------+----------+-----------------+--------
 (2040908,37) |     2655 | amprocfamily    |      1
 (2040908,38) |     2655 | amproclefttype  |      2
 (2040908,39) |     2655 | amprocrighttype |      3
 (2040908,40) |     2655 | amprocnum       |      4
 (2040908,47) |     2659 | attrelid        |      1
 (2040908,48) |     2659 | attnum          |      2
 (2040909,3)  |     2662 | oid             |      1
 (2040909,46) |     2679 | indexrelid      |      1
 (2040910,8)  |     2687 | oid             |      1
 (2040910,24) |     2693 | ev_class        |      1
 (2040910,25) |     2693 | rulename        |      2
 (2040910,47) |     2701 | tgrelid         |      1
 (2040910,48) |     2701 | tgname          |      2
(13 rows)

So I wonder if it's necessary to optimize the vacuum full system table to
culster action by default.



The tuple of the system table is moved to the last part of the pg_attribute table by vacuum table. As a result, the connection is slow after the database is restarted. Are there any workarounds or suggestions for this problem.
On Tue, Mar 26, 2024 at 10:36:31AM +0800, 1165125080 wrote:
> The tuple of the system table is moved to the last part of the pg_attribute
> table by vacuum table. As a result, the connection is slow after the
> database is restarted. Are there any workarounds or suggestions for this
> problem.

System tables have low-numbered attrelid, so I would try this as a workaround
to move them back to the front:

  cluster pg_attribute using pg_attribute_relid_attnum_index;