Re: Creating foreign key on partitioned table is too slow - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Creating foreign key on partitioned table is too slow |
Date | |
Msg-id | 20191024221758.vfv2enubnwmy3deu@development Whole thread Raw |
In response to | Re: Creating foreign key on partitioned table is too slow (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Creating foreign key on partitioned table is too slow
Re: Creating foreign key on partitioned table is too slow Re: Creating foreign key on partitioned table is too slow |
List | pgsql-hackers |
On Thu, Oct 24, 2019 at 03:48:57PM -0300, Alvaro Herrera wrote: >On 2019-Oct-23, kato-sho@fujitsu.com wrote: > >> Hello >> >> To benchmark with tpcb model, I tried to create a foreign key in the partitioned history table, but backend process killedby OOM. >> the number of partitions is 8192. I tried in master(commit: ad4b7aeb84). >> >> I did the same thing in another server which has 200GB memory, but creating foreign key did not end in 24 hours. > >Thanks for reporting. It sounds like there must be a memory leak here. >I am fairly pressed for time at present so I won't be able to >investigate this until, at least, mid November. > I've briefly looked into this, and I think the main memory leak is in RelationBuildPartitionDesc. It gets called with PortalContext, it allocates a lot of memory building the descriptor, copies it into CacheContext but does not even try to free anything. So we end up with something like this: TopMemoryContext: 215344 total in 11 blocks; 47720 free (12 chunks); 167624 used pgstat TabStatusArray lookup hash table: 32768 total in 3 blocks; 9160 free (4 chunks); 23608 used TopTransactionContext: 4194304 total in 10 blocks; 1992968 free (18 chunks); 2201336 used RowDescriptionContext: 8192 total in 1 blocks; 6880 free (0 chunks); 1312 used MessageContext: 8192 total in 1 blocks; 3256 free (1 chunks); 4936 used Operator class cache: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used smgr relation table: 32768 total in 3 blocks; 16768 free (8 chunks); 16000 used TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used Portal hash: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used TopPortalContext: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used PortalContext: 1557985728 total in 177490 blocks; 9038656 free (167645 chunks); 1548947072 used: Relcache by OID: 16384 total in 2 blocks; 3424 free (3 chunks); 12960 used CacheMemoryContext: 17039424 total in 13 blocks; 7181480 free (9 chunks); 9857944 used partition key: 1024 total in 1 blocks; 168 free (0 chunks); 856 used: history index info: 2048 total in 2 blocks; 568 free (1 chunks); 1480 used: pg_class_tblspc_relfilenode_index ... index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: pg_class_oid_index WAL record construction: 49776 total in 2 blocks; 6344 free (0 chunks); 43432 used PrivateRefCount: 8192 total in 1 blocks; 2584 free (0 chunks); 5608 used MdSmgr: 8192 total in 1 blocks; 5976 free (0 chunks); 2216 used LOCALLOCK hash: 65536 total in 4 blocks; 18584 free (12 chunks); 46952 used Timezones: 104128 total in 2 blocks; 2584 free (0 chunks); 101544 used ErrorContext: 8192 total in 1 blocks; 6840 free (4 chunks); 1352 used Grand total: 1580997216 bytes in 177834 blocks; 18482808 free (167857 chunks); 1562514408 used (At which point I simply interrupted the query, it'd allocate more and more memory until an OOM). The attached patch trivially fixes that by adding a memory context tracking all the temporary data, and then just deletes it as a whole at the end of the function. This significantly reduces the memory usage for me, not sure it's 100% correct. FWIW, even with this fix it still takes an awful lot to create the foreign key, because the CPU is stuck doing this 60.78% 60.78% postgres postgres [.] bms_equal 32.58% 32.58% postgres postgres [.] get_eclass_for_sort_expr 3.83% 3.83% postgres postgres [.] add_child_rel_equivalences 0.23% 0.00% postgres [unknown] [.] 0x0000000000000005 0.22% 0.00% postgres [unknown] [.] 0000000000000000 0.18% 0.18% postgres postgres [.] AllocSetCheck ... Haven't looked into the details yet. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: