Optimize WindowAgg's use of tuplestores - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Optimize WindowAgg's use of tuplestores |
Date | |
Msg-id | CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com Whole thread Raw |
Responses |
Re: Optimize WindowAgg's use of tuplestores
|
List | pgsql-hackers |
As part of making tuplestores faster [1], I noticed that in WindowAgg, when we end one partition we call tuplestore_end() and then we do tuplestore_begin_heap() again for the next partition in begin_partition() and then go on to set up the tuplestore read pointers according to what's required for the given frameOptions of the WindowAgg. This might make sense if the frameOptions could change between partitions, but they can't, so I don't see any reason why we can't just do tuplestore_clear() at the end of a partition. That resets the read pointer positions back to the start again ready for the next partition.
I wrote the attached patch and checked how it affects performance. It helps quite a bit when there are lots of partitions.
CREATE TABLE a (a INT NOT NULL);
INSERT INTO a SELECT x FROM generate_series(1,1000000)x;
VACUUM FREEZE ANALYZE a;
bench.sql:
SELECT a,count(*) OVER (PARTITION BY a) FROM a OFFSET 1000000;
master:
$ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
latency average = 293.488 ms
latency average = 295.509 ms
latency average = 297.772 ms
latency average = 293.488 ms
latency average = 295.509 ms
latency average = 297.772 ms
patched:
$ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
latency average = 203.234 ms
latency average = 204.538 ms
latency average = 203.877 ms
latency average = 203.234 ms
latency average = 204.538 ms
latency average = 203.877 ms
About 45% faster.
Here's the top of perf top of each:
master:
8.61% libc.so.6 [.] _int_malloc
6.71% libc.so.6 [.] _int_free
6.42% postgres [.] heap_form_minimal_tuple
6.40% postgres [.] tuplestore_alloc_read_pointer
5.87% libc.so.6 [.] unlink_chunk.constprop.0
3.86% libc.so.6 [.] __memmove_avx512_unaligned_erms
3.80% postgres [.] AllocSetFree
3.51% postgres [.] spool_tuples
3.45% postgres [.] ExecWindowAgg
2.09% postgres [.] tuplesort_puttuple_common
1.92% postgres [.] comparetup_datum
1.88% postgres [.] AllocSetAlloc
1.85% postgres [.] tuplesort_heap_replace_top
1.84% postgres [.] ExecStoreBufferHeapTuple
1.69% postgres [.] window_gettupleslot
6.71% libc.so.6 [.] _int_free
6.42% postgres [.] heap_form_minimal_tuple
6.40% postgres [.] tuplestore_alloc_read_pointer
5.87% libc.so.6 [.] unlink_chunk.constprop.0
3.86% libc.so.6 [.] __memmove_avx512_unaligned_erms
3.80% postgres [.] AllocSetFree
3.51% postgres [.] spool_tuples
3.45% postgres [.] ExecWindowAgg
2.09% postgres [.] tuplesort_puttuple_common
1.92% postgres [.] comparetup_datum
1.88% postgres [.] AllocSetAlloc
1.85% postgres [.] tuplesort_heap_replace_top
1.84% postgres [.] ExecStoreBufferHeapTuple
1.69% postgres [.] window_gettupleslot
patched:
8.95% postgres [.] ExecWindowAgg
8.10% postgres [.] heap_form_minimal_tuple
5.16% postgres [.] spool_tuples
4.03% libc.so.6 [.] __memmove_avx512_unaligned_erms
4.02% postgres [.] begin_partition
3.11% postgres [.] tuplesort_puttuple_common
2.97% postgres [.] AllocSetAlloc
2.96% postgres [.] comparetup_datum
2.83% postgres [.] tuplesort_heap_replace_top
2.79% postgres [.] window_gettupleslot
2.22% postgres [.] AllocSetFree
2.11% postgres [.] MemoryContextReset
1.99% postgres [.] LogicalTapeWrite
1.98% postgres [.] advance_windowaggregate
8.10% postgres [.] heap_form_minimal_tuple
5.16% postgres [.] spool_tuples
4.03% libc.so.6 [.] __memmove_avx512_unaligned_erms
4.02% postgres [.] begin_partition
3.11% postgres [.] tuplesort_puttuple_common
2.97% postgres [.] AllocSetAlloc
2.96% postgres [.] comparetup_datum
2.83% postgres [.] tuplesort_heap_replace_top
2.79% postgres [.] window_gettupleslot
2.22% postgres [.] AllocSetFree
2.11% postgres [.] MemoryContextReset
1.99% postgres [.] LogicalTapeWrite
1.98% postgres [.] advance_windowaggregate
Lots less malloc/free work going on.
I'm also tempted to do a cleanup of the state machine inside nodeWindowAgg.c as I had to add another bool flag to WindowAggState to replace the previous test that checked if the tuplestore was empty (i.e just finished a partition) with if (winstate->buffer == NULL). I couldn't do that anymore due to no longer nullifying that field. I think such a cleanup would be a separate patch. Annoyingly the extra bool is the 9th bool flag and widens that struct by 8 bytes and leaves a 7 byte hole.
David
Attachment
pgsql-hackers by date: