Re: Optimize WindowAgg's use of tuplestores - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Optimize WindowAgg's use of tuplestores
Date
Msg-id CAExHW5tf2HBXrPLAqoDNbte3h0Gz8o5HcuJ3Yq_wPUw2=w5OuQ@mail.gmail.com
Whole thread Raw
In response to Optimize WindowAgg's use of tuplestores  (David Rowley <dgrowley@gmail.com>)
Responses Re: Optimize WindowAgg's use of tuplestores
List pgsql-hackers
On Sun, Jul 7, 2024 at 4:27 PM David Rowley <dgrowley@gmail.com> wrote:
>
> As part of making tuplestores faster [1], I noticed that in WindowAgg, when we end one partition we call
tuplestore_end()and then we do tuplestore_begin_heap() again for the next partition in begin_partition() and then go on
toset up the tuplestore read pointers according to what's required for the given frameOptions of the WindowAgg.  This
mightmake sense if the frameOptions could change between partitions, but they can't, so I don't see any reason why we
can'tjust do tuplestore_clear() at the end of a partition.  That resets the read pointer positions back to the start
againready for the next partition. 
>
> I wrote the attached patch and checked how it affects performance. It helps quite a bit when there are lots of
partitions.
>
> CREATE TABLE a (a INT NOT NULL);
> INSERT INTO a SELECT x FROM generate_series(1,1000000)x;
> VACUUM FREEZE ANALYZE a;
>
> bench.sql:
> SELECT a,count(*) OVER (PARTITION BY a) FROM a OFFSET 1000000;
>
> master:
> $ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
> latency average = 293.488 ms
> latency average = 295.509 ms
> latency average = 297.772 ms
>
> patched:
> $ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
> latency average = 203.234 ms
> latency average = 204.538 ms
> latency average = 203.877 ms
>
> About 45% faster.
>

I repeated your measurements but by varying the number of partitions
and repeating pgbench 5 times instead of 3. The idea is to see the
impact of the change on a lower number of partitions.

10 partitions query: SELECT a,count(*) OVER (PARTITION BY a % 10) FROM
a OFFSET 1000000;
100 partitions query: SELECT a,count(*) OVER (PARTITION BY a % 100)
FROM a OFFSET 1000000;
1000 partitions query: SELECT a,count(*) OVER (PARTITION BY a % 1000)
FROM a OFFSET 1000000;
original query with 1M partitions: SELECT a,count(*) OVER (PARTITION
BY a) FROM a OFFSET 1000000;
Notice that the offset is still the same to avoid any impact it may
have on the query execution.

Here are the results
master:
no. of partitions, average latencies
10, 362.166 ms, 369.313 ms, 375.203 ms, 368.798 ms, 372.483 ms
100, 372.885 ms, 381.463 ms, 385.372 ms, 382.915 ms, 383.630 ms
1000, 390.834 ms, 395.653 ms, 400.339 ms, 407.777 ms, 389.906 ms
1000000, 552.848 ms, 553.943 ms, 547.806 ms, 541.871 ms, 546.741 ms

patched
10, 356.980 ms, 371.223 ms, 375.550 ms, 378.011 ms, 381.119 ms
100, 392.307 ms, 385.087 ms, 380.383 ms, 390.999 ms, 388.422 ms
1000, 405.136 ms, 397.576 ms, 399.021 ms, 399.572 ms, 406.604 ms
1000000, 394.711 ms, 403.741 ms, 399.008 ms, 392.932 ms, 393.335 ms

Observations
1. The numbers corresponding to 10 and 100 partitions are higher when
patched. That might be just noise. I don't see any reason why it would
impact negatively when there are a small number of partitions. The
lower partition cases also have a higher number of rows per partition,
so is the difference between MemoryContextDelete() vs
MemoryContextReset() making any difference here. May be worth
verifying those cases carefully. Otherwise upto 1000 partitions, it
doesn't show any differences.
2.  For 1M partitions it does make a difference. About 35% in my case.
Moreover the change seems to be making the execution times independent
of the number of partitions (more or less).

Combining this observation with the first one, It might be worth
looking at the execution times when there are many rows per partition
in case of a higher number of partitions.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: Document DateStyle effect on jsonpath string()
Next
From: vignesh C
Date:
Subject: Re: long-standing data loss bug in initial sync of logical replication