On 5/30/23 10:05 PM, David Rowley wrote:
> My understanding had been that concurrency was required, but I see the
> commit message for 00d1e02be mentions:
>
>> Even single threaded
>> COPY is measurably faster, primarily due to not dirtying pages while
>> extending, if supported by the operating system (see commit 4d330a61bb1).
>
> If that's the case then maybe the beta release notes could be edited
> slightly to reflect this. Maybe something like:
>
> "Relation extensions have been improved allowing faster bulk loading
> of data using COPY. These improvements are more significant when
> multiple processes are concurrently loading data into the same table."
>
> The current text of "PostgreSQL 16 can also improve the performance of
> concurrent bulk loading of data using COPY up to 300%." does lead me
> to believe that nothing has been done to improve things when only a
> single backend is involved.
Typically once a release announcement is out, we'll only edit it if it's
inaccurate. I don't think the statement in the release announcement is
inaccurate, as it specifies that concurrent bulk loading is faster.
I had based the description on what Andres described in the original
discussion and through reading[1], which showed a "measurable"
improvement as the commit message said, but it was not to the same
degree as concurrently loading. It does still seem impactful -- the
results show up to 20% improvement on a single backend -- but the bigger
story was around the concurrency.
I'm -0.5 for revising the announcement, but I also don't want people to
miss out on testing this. I'd be OK with this:
"PostgreSQL 16 can also improve the performance of bulk loading of data,
with some tests showing using up to 300% improvement when concurrently
executing `COPY` commands."
Thanks,
Jonathan
[1]
https://www.postgresql.org/message-id/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de