Home > mailing lists

Thread: High memory usage in CachedPlan for large IN clauses in partitioned table updates

High memory usage in CachedPlan for large IN clauses in partitioned table updates

From

"赵庭海(庭章)"

Date:

24 March, 12:37:14

Hi,

Recently, I noticed that while I used pgjdbc to batch update a partition table, Cacheplan can

sometimes use a lot of memory, this problem occurs in more Postgresql backend processes,

which amplifies the impact. I tried to repeat this question in my test environment.

The testing query (more than 1000 condition after in clause):

UPDATE test SET migrate_account_batch_id = 'newtest2' WHERE (id, store_id) IN (xxx);

I use backend function MemoryContextStats to print CachedPlan for this query to error log file,

the output content is:

CachedPlan: 33580752 total in 18 blocks; 7615592 free (1 chunks); 25965160 used.

This seems to use more memory than expected. Then I found out that the reason is that

filters like ((id, store_id) IN (id1, id2),(id3, id4)) are represented in memory as

[(id = id1 and store_id = id2) or (id = id3 and store_id = id4) or ......], and planner

generate BitmapIndexScan path for every filtering condition, so that when there are many

filtering conditions after in clause, the used memory will be more. Then, if the

update occurs in multiple child table of the partitioned table, this memory can

also be multiplied.

I would like to ask why aren't CachedPlans immediately released after execution

when using extend protocol, so as to prevent multiple such CachedPlan from

causing high memory usage. The logic now is that if an sql is executed using the

extend protocol, its CachedPlan will be resident in memory even if it is not

being used, instead, it is released when the bind message is processed again in

function PortalDrop.

Or maybe there's some more detailed code I didn't notice.

Regards,

Tinghai Zhao

Re: High memory usage in CachedPlan for large IN clauses in partitioned table updates

From

Tom Lane

Date:

24 March, 18:44:13

"=?UTF-8?B?6LW15bqt5rW3KOW6reeroCk=?=" <zhaotinghai.zth@alibaba-inc.com> writes:
> The testing query (more than 1000 condition after in clause):
> UPDATE test SET migrate_account_batch_id = 'newtest2' WHERE (id, store_id) IN (xxx);

As you've discovered, this is a lousy way to write such a query.
You'd get a better result by putting all the constants in a VALUES
clause that you join to, along the lines of

WITH v(v1, v2) as (VALUES (1,2),(3,4),...)
UPDATE test SET migrate_account_batch_id = 'newtest2'
FROM v
WHERE (id, store_id) = (v1, v2);

If the value pairs aren't known unique, instead

FROM (SELECT DISTINCT * FROM v) vv

> I would like to ask why aren't CachedPlans immediately released after execution
> when using extend protocol, so as to prevent multiple such CachedPlan from
> causing high memory usage.

If you use a named statement, the expectation is that the plan could
be re-used.  You could use an unnamed statement if you don't want
that.  But really the size of the plan is the least of your problems
with this query --- planning time has to be pretty awful as well.

            regards, tom lane