Thread: High memory usage in CachedPlan for large IN clauses in partitioned table updates
High memory usage in CachedPlan for large IN clauses in partitioned table updates
From
"赵庭海(庭章)"
Date:
Hi,
Recently, I noticed that while I used pgjdbc to batch update a partition table, Cacheplan can
sometimes use a lot of memory, this problem occurs in more Postgresql backend processes,
which amplifies the impact. I tried to repeat this question in my test environment.
The testing query (more than 1000 condition after in clause):
UPDATE test SET migrate_account_batch_id = 'newtest2' WHERE (id, store_id) IN (xxx);
I use backend function MemoryContextStats to print CachedPlan for this query to error log file,
the output content is:
CachedPlan: 33580752 total in 18 blocks; 7615592 free (1 chunks); 25965160 used.
This seems to use more memory than expected. Then I found out that the reason is that
filters like ((id, store_id) IN (id1, id2),(id3, id4)) are represented in memory as
[(id = id1 and store_id = id2) or (id = id3 and store_id = id4) or ......], and planner
generate BitmapIndexScan path for every filtering condition, so that when there are many
filtering conditions after in clause, the used memory will be more. Then, if the
update occurs in multiple child table of the partitioned table, this memory can
also be multiplied.
I would like to ask why aren't CachedPlans immediately released after execution
when using extend protocol, so as to prevent multiple such CachedPlan from
causing high memory usage. The logic now is that if an sql is executed using the
extend protocol, its CachedPlan will be resident in memory even if it is not
being used, instead, it is released when the bind message is processed again in
function PortalDrop.
Or maybe there's some more detailed code I didn't notice.
Regards,
Tinghai Zhao
Re: High memory usage in CachedPlan for large IN clauses in partitioned table updates
From
Tom Lane
Date:
"=?UTF-8?B?6LW15bqt5rW3KOW6reeroCk=?=" <zhaotinghai.zth@alibaba-inc.com> writes: > The testing query (more than 1000 condition after in clause): > UPDATE test SET migrate_account_batch_id = 'newtest2' WHERE (id, store_id) IN (xxx); As you've discovered, this is a lousy way to write such a query. You'd get a better result by putting all the constants in a VALUES clause that you join to, along the lines of WITH v(v1, v2) as (VALUES (1,2),(3,4),...) UPDATE test SET migrate_account_batch_id = 'newtest2' FROM v WHERE (id, store_id) = (v1, v2); If the value pairs aren't known unique, instead FROM (SELECT DISTINCT * FROM v) vv > I would like to ask why aren't CachedPlans immediately released after execution > when using extend protocol, so as to prevent multiple such CachedPlan from > causing high memory usage. If you use a named statement, the expectation is that the plan could be re-used. You could use an unnamed statement if you don't want that. But really the size of the plan is the least of your problems with this query --- planning time has to be pretty awful as well. regards, tom lane