Re: Limit memory usage by postgres_fdw batches - Mailing list pgsql-hackers
| From | Tomas Vondra |
|---|---|
| Subject | Re: Limit memory usage by postgres_fdw batches |
| Date | |
| Msg-id | 80f78762-e4d7-4d57-89eb-a99202ab1491@vondra.me Whole thread Raw |
| In response to | Limit memory usage by postgres_fdw batches (Alexander Pyhalov <a.pyhalov@postgrespro.ru>) |
| List | pgsql-hackers |
Hi Alexander,
On 12/26/25 11:54, Alexander Pyhalov wrote:
> Hi.
>
> We had some real cases when client set rather big batch_size on server
> level, but for some foreign table, containing large documents, it was
> inadequate and lead to OOM killer intervention. You can argue that
> batch_size can be set on foreign table level, but it can still be not
> flexible enough, when tuple size varies.
I agree this can be an issue with large tuples. It'd be good to consider
the size, not just the number of tuples.
I think you modified the right places, but I think there are two or
three issues we should improve:
1) It calls estimate_batch_length() for every ExecInsert() call, i.e.
for every tuples. And it walks all tuples up to that point, which makes
it O(N^2). I haven't measured how significant it is, but AFAICS we could
track the current size of the batch fairly easily, and use that.
2) work_mem is in kilobytes, while batch_len is in bytes, so the
comparison (batch_len > work_mem) is not quite right. I'll probably fire
every time, preventing any batching.
3) Isn't this consider the size of the new tuple in batch_len? Imagine
the tuples are 99% of the work_mem limit. We add the first one. When
adding the next one we check the current batch is below work_mem, and so
we proceed to add the second tuple. Now the batch is 1.98% of the limit.
I think it should work like this:
1) batch_len + tup_len < work_mem => add tuple to batch
2) tup_len < work_mem => flush batch, add tuple to batch
3) tup_len => work_mem => flush batch, insert tuple directly
What bothers me a little bit is that this is per relation. AFAICS when
inserting into a partitioned table with multiple foreign partitions,
each partition will have a separate limit. I wonder if we could do
better and have some sort of "global" limit for the whole insert.
But that's not the fault of this patch, of course.
> I suppose this case is also
> takes place for fetch_size. Issue here is that we can't somehow limit
> size of data (versus number of rows) while fetching from cursor. But we
> can use tuple store to preserve fetched results, so that they spill out
> to the disk.
>
Perhaps. Seems like a separate issue. I haven't looked very closely, but
do we want to use the tuplestore always, or just when the tuples get too
large? It might even be on batch-by-batch, I guess. With fetch_size=1
it's hardly useful, right? Is the tuplestore management measurable?
> I'm attaching two patches which try to fix issues with possible huge
> memory usage by postgres_fdw batches.
> With fetched tuples we still can't use only tuplestore, as ctids are not
> preserved, and so have to store them separately.
>
> The reproducer for insert is simple.
>
> create extension postgres_fdw ;
> create server loopback foreign data wrapper postgres_fdw options (dbname
> 'postgres', port '5432', batch_size '100', fetch_size '100');
> create table base_table(i int, s bytea);
> create foreign table foreign_table (i int, s bytea) server loopback
> options(table_name 'base_table');
> create user mapping for public server loopback ;
>
> insert into foreign_table select i, pg_read_binary_file('/some/big/
> file') from generate_series(1,1000) i;
>
> will easily grow backend RSS to several gigabytes.
> The first patch fixes this problem.
>
> The second patch alleviates the second issue - SELECT * queries also can
> grow backend memory to several GBs. Still memory usage can peak (on my
> toy examples) up to 3-4 GB, but at least it seams 1-2 GB less than non-
> patched version.
>
How large are the tuples? How much higher was this RSS than the
theoretical minimum?
regards
--
Tomas Vondra
pgsql-hackers by date: