Re: Performance degradation on concurrent COPY into a single relation in PG16. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Performance degradation on concurrent COPY into a single relation in PG16.
Date
Msg-id 20230711185159.v2j5vnyrtodnwhgz@awork3.anarazel.de
Whole thread Raw
In response to Performance degradation on concurrent COPY into a single relation in PG16.  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Performance degradation on concurrent COPY into a single relation in PG16.
List pgsql-hackers
Hi,

On 2023-07-03 11:55:13 +0900, Masahiko Sawada wrote:
> While testing PG16, I observed that in PG16 there is a big performance
> degradation in concurrent COPY into a single relation with 2 - 16
> clients in my environment. I've attached a test script that measures
> the execution time of COPYing 5GB data in total to the single relation
> while changing the number of concurrent insertions, in PG16 and PG15.
> Here are the results on my environment (EC2 instance, RHEL 8.6, 128
> vCPUs, 512GB RAM):
>
> * PG15 (4b15868b69)
> PG15: nclients = 1, execution time = 14.181
>
> * PG16 (c24e9ef330)
> PG16: nclients = 1, execution time = 17.112

> The relevant commit is 00d1e02be2 "hio: Use ExtendBufferedRelBy() to
> extend tables more efficiently". With commit 1cbbee0338 (the previous
> commit of 00d1e02be2), I got a better numbers, it didn't have a better
> scalability, though:
>
> PG16: nclients = 1, execution time = 17.444

I think the single client case is indicative of an independent regression, or
rather regressions - it can't have anything to do with the fallocate() issue
and reproduces before that too in your numbers.

1) COPY got slower, due to:
9f8377f7a27 Add a DEFAULT option to COPY  FROM

This added a new palloc()/free() to every call to NextCopyFrom(). It's not at
all clear to me why this needs to happen in NextCopyFrom(), particularly
because it's already stored in CopyFromState?


2) pg_strtoint32_safe() got substantially slower, mainly due
   to
faff8f8e47f Allow underscores in integer and numeric constants.
6fcda9aba83 Non-decimal integer literals

pinned to one cpu, turbo mode disabled, I get the following best-of-three times for
  copy test from '/tmp/tmp_4.data'
(too impatient to use the larger file every time)

15:
6281.107 ms

HEAD:
7000.469 ms

backing out 9f8377f7a27:
6433.516 ms

also backing out faff8f8e47f, 6fcda9aba83:
6235.453 ms


I suspect 1) can relatively easily be fixed properly. But 2) seems much
harder. The changes increased the number of branches substantially, that's
gonna cost in something as (previously) tight as pg_strtoint32().



For higher concurrency numbers, I now was able to reproduce the regression, to
a smaller degree. Much smaller after fixing the above. The reason we run into
the issue here is basically that the rows in the test are very narrow and reach

#define MAX_BUFFERED_TUPLES        1000

at a small number of pages, so we go back and forth between extending with
fallocate() and not.

I'm *not* saying that that is the solution, but after changing that to 5000,
the numbers look a lot better (with the other regressions "worked around"):

(this is again with turboboost disabled, to get more reproducible numbers)

clients                1       2       4       8       16      32

15,buffered=1000                25725   13211   9232    5639    4862    4700
15,buffered=5000                26107   14550   8644    6050    4943    4766
HEAD+fixes,buffered=1000        25875   14505   8200    4900    3565    3433
HEAD+fixes,buffered=5000        25830   12975   6527    3594    2739    2642

Greetings,

Andres Freund

[1] https://postgr.es/m/CAD21AoAEwHTLYhuQ6PaBRPXKWN-CgW9iw%2B4hm%3D2EOFXbJQ3tOg%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: tablecmds.c/MergeAttributes() cleanup
Next
From: Gurjeet Singh
Date:
Subject: Re: Document that server will start even if it's unable to open some TCP/IP ports