Thread: Speeding up COPY TO for uuids and arrays

Speeding up COPY TO for uuids and arrays

From

Laurenz Albe

Date:

17 February, 19:48:23

While analyzing a customer's performance problem, I noticed that
the performance of pg_dump for large arrays is terrible.

As a test case, I created a table with 10000 rows, each of which
had an array of 10000 uuids.  The table resided in shared buffers.

The following took 24.5 seconds:

  COPY mytab TO '/dev/null';

Most of the time was spent in array_out and uuid_out.

I tried binary copy, which took 4.4 seconds:

  COPY mytab TO '/dev/null' (FORMAT 'binary');

Here, a lot of time was spent in pq_begintypsend.


So I looked for low-hanging fruit, and the result is the attached
patch series.

- Patch 0001 speeds up pq_begintypsend with a custom macro.
  That brought the binary copy down to 3.7 seconds, which is a
  speed gain of 15%.

- Patch 0001 speeds up uuid_out by avoiding the overhead of
  a Stringinfo.  This brings text mode COPY to 19.4 seconds,
  which is speed gain of 21%.

- Patch 0003 speeds up array_out a bit by avoiding some zero
  byte writes.  The measured speed gain is under 2%.

Yours,
Laurenz Albe

Attachment

Re: Speeding up COPY TO for uuids and arrays

From

Andres Freund

Date:

17 February, 23:24:33

Hi,

On 2024-02-17 17:48:23 +0100, Laurenz Albe wrote:
> - Patch 0001 speeds up pq_begintypsend with a custom macro.
>   That brought the binary copy down to 3.7 seconds, which is a
>   speed gain of 15%.

Nice win, but I think we can do a bit better than this. Certainly it shouldn't
be a macro, given the multiple evaluation risks.  I don't think we actually
need to zero those bytes, in fact that seems more likely to hide bugs than
prevent them.

I have an old patch series improving performance in this area. A big win is to
simply not allocate as much memory for a new stringinfo, when we already know
the upper bound, as we do in many of the send functions.

> - Patch 0001 speeds up uuid_out by avoiding the overhead of
>   a Stringinfo.  This brings text mode COPY to 19.4 seconds,
>   which is speed gain of 21%.

Nice!

I wonder if we should move the core part for converting to hex to numutils.c,
we already have code the for the inverse. There does seem to be further
optimization potential in the conversion, and that seems better done somewhere
central rather than one type's output function. OTOH, it might not be worth
it, given the need to add the dashes.

> - Patch 0003 speeds up array_out a bit by avoiding some zero
>   byte writes.  The measured speed gain is under 2%.

Makes sense.

Greetings,

Andres Freund

Re: Speeding up COPY TO for uuids and arrays

From

Michael Paquier

Date:

19 February, 06:36:52

On Sat, Feb 17, 2024 at 12:24:33PM -0800, Andres Freund wrote:
> I wonder if we should move the core part for converting to hex to numutils.c,
> we already have code the for the inverse. There does seem to be further
> optimization potential in the conversion, and that seems better done somewhere
> central rather than one type's output function. OTOH, it might not be worth
> it, given the need to add the dashes.

I'd tend to live with the current location of the code, but I'm OK if
people feel differently on this one, so I'm OK with what Laurenz is
proposing.

>> - Patch 0003 speeds up array_out a bit by avoiding some zero
>>   byte writes.  The measured speed gain is under 2%.
>
> Makes sense.

+1.
--
Michael

Attachment

signature.asc

Re: Speeding up COPY TO for uuids and arrays

From

Laurenz Albe

Date:

19 February, 17:08:45

On Sat, 2024-02-17 at 12:24 -0800, Andres Freund wrote:
> On 2024-02-17 17:48:23 +0100, Laurenz Albe wrote:
> > - Patch 0001 speeds up pq_begintypsend with a custom macro.
> >   That brought the binary copy down to 3.7 seconds, which is a
> >   speed gain of 15%.
>
> Nice win, but I think we can do a bit better than this. Certainly it shouldn't
> be a macro, given the multiple evaluation risks.  I don't think we actually
> need to zero those bytes, in fact that seems more likely to hide bugs than
> prevent them.
>
> I have an old patch series improving performance in this area.

The multiple evaluation danger could be worked around with an automatic
variable, or the macro could just carry a warning (like appendStringInfoCharMacro).

But considering the thread in [1], I guess I'll retract that patch; I'm
sure the more invasive improvement you have in mind will do better.

> > - Patch 0001 speeds up uuid_out by avoiding the overhead of
> >   a Stringinfo.  This brings text mode COPY to 19.4 seconds,
> >   which is speed gain of 21%.
>
> Nice!
>
> I wonder if we should move the core part for converting to hex to numutils.c,
> we already have code the for the inverse. There does seem to be further
> optimization potential in the conversion, and that seems better done somewhere
> central rather than one type's output function. OTOH, it might not be worth
> it, given the need to add the dashes.

I thought about it, but then decided not to do that.
Calling a function that converts the bytes to hex and then adding the
hyphens will slow down processing, and I think the code savings would be
minimal at best.

> > - Patch 0003 speeds up array_out a bit by avoiding some zero
> >   byte writes.  The measured speed gain is under 2%.
>
> Makes sense.

Thanks for your interest and approval.  The attached patches are the
renamed, but unmodified patches 2 and 3 from before.

Yours,
Laurenz Albe

 [1]: https://postgr.es/m/CAMkU%3D1whbRDUwa4eayD9%2B59K-coxO9senDkPRbTn3cg0pUz4AQ%40mail.gmail.com

Attachment

Re: Speeding up COPY TO for uuids and arrays

From

Michael Paquier

Date:

21 February, 08:29:05

On Mon, Feb 19, 2024 at 03:08:45PM +0100, Laurenz Albe wrote:
> On Sat, 2024-02-17 at 12:24 -0800, Andres Freund wrote:
>> I wonder if we should move the core part for converting to hex to numutils.c,
>> we already have code the for the inverse. There does seem to be further
>> optimization potential in the conversion, and that seems better done somewhere
>> central rather than one type's output function. OTOH, it might not be worth
>> it, given the need to add the dashes.
>
> I thought about it, but then decided not to do that.
> Calling a function that converts the bytes to hex and then adding the
> hyphens will slow down processing, and I think the code savings would be
> minimal at best.

Yeah, I'm not sure either if that's worth doing, the current
conversion code is simple enough.  I'd be curious to hear about ideas
to optimize that more locally, though.

I was curious about the UUID one, and COPYing out 10M rows with a
single UUID attribute brings down the runtime of a COPY from 3.8s to
2.3s here on a simple benchmark, with uuid_out showing up at the top
of profiles easily on HEAD.  Some references for HEAD:
31.63%          5300  postgres  postgres            [.] uuid_out
19.79%          3316  postgres  postgres            [.] appendStringInfoChar
11.27%          1887  postgres  postgres            [.] CopyAttributeOutText
 6.36%          1065  postgres  postgres            [.] pgstat_progress_update_param

And with the patch for uuid_out:
22.66%          2147  postgres  postgres           [.] CopyAttributeOutText
12.99%          1231  postgres  postgres           [.] uuid_out
11.41%          1081  postgres  postgres           [.] pgstat_progress_update_param
 4.79%           454  postgres  postgres           [.] CopyOneRowTo

That's very good, so I'd like to apply this part.  Let me know if
there are any objections.
--
Michael

Attachment

signature.asc

Re: Speeding up COPY TO for uuids and arrays

From

Michael Paquier

Date:

22 February, 04:34:57

On Wed, Feb 21, 2024 at 02:29:05PM +0900, Michael Paquier wrote:
> That's very good, so I'd like to apply this part.  Let me know if
> there are any objections.

This part is done as of 011d60c4352c.  I did not evaluate the rest
yet.
--
Michael

Attachment

signature.asc

Re: Speeding up COPY TO for uuids and arrays

From

Laurenz Albe

Date:

22 February, 10:16:09

On Thu, 2024-02-22 at 10:34 +0900, Michael Paquier wrote:
> This part is done as of 011d60c4352c.  I did not evaluate the rest
> yet.

Thanks!

I'm attaching the remaining patch for the Juli commitfest, if you
don't get inspired before that.

Yours,
Laurenz Albe

Attachment

v3-0001-Small-speedup-for-array_out.patch

Re: Speeding up COPY TO for uuids and arrays

From

Michael Paquier

Date:

22 February, 10:27:21

On Thu, Feb 22, 2024 at 08:16:09AM +0100, Laurenz Albe wrote:
> I'm attaching the remaining patch for the Juli commitfest, if you
> don't get inspired before that.

There are good chances that I get inspired some time next week.  We'll
see ;)
--
Michael

Attachment

signature.asc

re: Speeding up COPY TO for uuids and arrays

From

Ranier Vilela

Date:

22 February, 22:42:37

Hi,

On 2024-02-17 17:48:23 +0100, Laurenz Albe wrote:

> As a test case, I created a table with 10000 rows, each of which
> had an array of 10000 uuids. The table resided in shared buffers.

Can you share exactly script used to create a table?

best regards,

Ranier Vilela

Re: Speeding up COPY TO for uuids and arrays

From

Michael Paquier

Date:

26 February, 08:28:08

On Thu, Feb 22, 2024 at 04:42:37PM -0300, Ranier Vilela wrote:
> Can you share exactly script used to create a table?

Stressing the internals of array_out() for the area of the patch is
not that difficult, as we want to quote each element that's returned
in output.

The trick is to have the following to stress the second quoting loop a
maximum:
- a high number of rows.
- a high number of items in the arrays.
- a *minimum* number of characters in each element of the array, with
characters that require quoting.

The best test case I can think of to demonstrate the patch would be
something like that (adjust rows and elts as you see fit):
-- Number of rows
\set rows 6
-- Number of elements
\set elts 4
create table tab as
  with data as (
    select array_agg(a) as array
      from (
        select '{'::text
          from generate_series(1, :elts) as int(a)) as index(a))
  select data.array from data, generate_series(1,:rows);

Then I get:
       array
-------------------
 {"{","{","{","{"}
 {"{","{","{","{"}
 {"{","{","{","{"}
 {"{","{","{","{"}
 {"{","{","{","{"}
 {"{","{","{","{"}
(6 rows)

With "\set rows 100000" and "\set elts 10000", giving 100MB of data
with 100k rows with 10k elements each, I get for HEAD when data is in
shared buffers:
=# copy tab to '/dev/null';
COPY 100000
Time: 48620.927 ms (00:48.621)
And with v3:
=# copy tab to '/dev/null';
COPY 100000
Time: 47993.183 ms (00:47.993)

Profiles don't fundamentally change much, array_out() gets a 30.76% ->
29.72% in self runtime, with what looks like a limited impact to me.

With 1k rows and 1M elements, COPY TO gets reduced from 54338.436 ms
to 54129.978 ms, and a 29.51% -> 29.12% increase (looks like noise).

Perhaps I've hit some noise while running this set of tests, but the
impact of the proposed patch looks very limited to me.  If you have a
better set of tests and/or ideas, feel free of course.
--
Michael

Attachment

signature.asc

Re: Speeding up COPY TO for uuids and arrays

From

Ranier Vilela

Date:

26 February, 17:26:27

Em seg., 26 de fev. de 2024 às 02:28, Michael Paquier <michael@paquier.xyz> escreveu:

On Thu, Feb 22, 2024 at 04:42:37PM -0300, Ranier Vilela wrote:
> Can you share exactly script used to create a table?

Stressing the internals of array_out() for the area of the patch is
not that difficult, as we want to quote each element that's returned
in output.

The trick is to have the following to stress the second quoting loop a
maximum:
- a high number of rows.
- a high number of items in the arrays.
- a *minimum* number of characters in each element of the array, with
characters that require quoting.

The best test case I can think of to demonstrate the patch would be
something like that (adjust rows and elts as you see fit):
-- Number of rows
\set rows 6
-- Number of elements
\set elts 4
create table tab as
with data as (
select array_agg(a) as array
from (
select '{'::text
from generate_series(1, :elts) as int(a)) as index(a))
select data.array from data, generate_series(1,:rows);

Then I get:
array
-------------------
{"{","{","{","{"}
{"{","{","{","{"}
{"{","{","{","{"}
{"{","{","{","{"}
{"{","{","{","{"}
{"{","{","{","{"}
(6 rows)

With "\set rows 100000" and "\set elts 10000", giving 100MB of data
with 100k rows with 10k elements each, I get for HEAD when data is in
shared buffers:
=# copy tab to '/dev/null';
COPY 100000
Time: 48620.927 ms (00:48.621)
And with v3:
=# copy tab to '/dev/null';
COPY 100000
Time: 47993.183 ms (00:47.993)

Thanks Michael, for the script.

It is easier to make comparisons, using the exact same script.

best regards,

Ranier Vilela