Thread: range_agg extremely slow compared to naive implementation in obscure circumstances
range_agg extremely slow compared to naive implementation in obscure circumstances
From
Duncan Sands
Date:
In general range_agg is faster than the naive version CREATE AGGREGATE public.naive_range_agg(anymultirange) ( SFUNC = multirange_union, STYPE = anymultirange ); however here is an example where using it is over 6000 times slower. I'm not sure exactly what feature of the example triggers this - I failed to create a synthetic testcase using generate_series, thus the attached table data. How to reproduce (Ubuntu 22.10, x86_64, postgresql 15.1-1.pgdg22.10+1): $ cp data.txt.gz /tmp/ $ gunzip /tmp/data.txt.gz $ psql Pager usage is off. psql (15.1 (Ubuntu 15.1-1.pgdg22.10+1)) Type "help" for help. duncan=> CREATE TEMP TABLE wacky(priority bigint, valid tstzrange); CREATE TABLE duncan=> \COPY wacky FROM /tmp/data.txt COPY 98094 duncan=> CREATE AGGREGATE public.naive_range_agg(anymultirange) ( SFUNC = multirange_union, STYPE = anymultirange ); CREATE AGGREGATE duncan=> \timing on Timing is on. duncan=> EXPLAIN (ANALYZE) SELECT FROM (SELECT valid, naive_range_agg(valid::tstzmultirange) OVER (ORDER BY priority DESC GROUPS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS more_recent FROM wacky) foo WHERE valid <@ more_recent; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Subquery Scan on foo (cost=11063.57..13879.37 rows=433 width=0) (actual time=88.086..88.087 rows=0 loops=1) Filter: (foo.valid <@ foo.more_recent) Rows Removed by Filter: 98094 -> WindowAgg (cost=11063.57..12796.37 rows=86640 width=72) (actual time=16.102..84.242 rows=98094 loops=1) -> Sort (cost=11063.57..11280.17 rows=86640 width=40) (actual time=16.096..20.205 rows=98094 loops=1) Sort Key: wacky.priority DESC Sort Method: external merge Disk: 3848kB -> Seq Scan on wacky (cost=0.00..1588.40 rows=86640 width=40) (actual time=0.021..5.479 rows=98094 loops=1) Planning Time: 0.277 ms Execution Time: 88.631 ms (10 rows) Time: 98.261 ms duncan=> EXPLAIN (ANALYZE) SELECT FROM (SELECT valid, range_agg(valid::tstzmultirange) OVER (ORDER BY priority DESC GROUPS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS more_recent FROM wacky) foo WHERE valid <@ more_recent; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Subquery Scan on foo (cost=11063.57..13879.37 rows=433 width=0) (actual time=566009.972..566009.973 rows=0 loops=1) Filter: (foo.valid <@ foo.more_recent) Rows Removed by Filter: 98094 -> WindowAgg (cost=11063.57..12796.37 rows=86640 width=72) (actual time=21.996..565998.800 rows=98094 loops=1) -> Sort (cost=11063.57..11280.17 rows=86640 width=40) (actual time=21.988..26.154 rows=98094 loops=1) Sort Key: wacky.priority DESC Sort Method: external merge Disk: 3848kB -> Seq Scan on wacky (cost=0.00..1588.40 rows=86640 width=40) (actual time=0.014..6.868 rows=98094 loops=1) Planning Time: 0.178 ms Execution Time: 566010.770 ms (10 rows) Time: 566018.613 ms (09:26.019)
Attachment
Re: range_agg extremely slow compared to naive implementation in obscure circumstances
From
Pavel Stehule
Date:
po 30. 1. 2023 v 11:24 odesílatel Duncan Sands <duncan.sands@deepbluecap.com> napsal:
In general range_agg is faster than the naive version
CREATE AGGREGATE public.naive_range_agg(anymultirange) (
SFUNC = multirange_union,
STYPE = anymultirange
);
however here is an example where using it is over 6000 times slower. I'm not
sure exactly what feature of the example triggers this - I failed to create a
synthetic testcase using generate_series, thus the attached table data.
How to reproduce (Ubuntu 22.10, x86_64, postgresql 15.1-1.pgdg22.10+1):
$ cp data.txt.gz /tmp/
$ gunzip /tmp/data.txt.gz
$ psql
Pager usage is off.
psql (15.1 (Ubuntu 15.1-1.pgdg22.10+1))
Type "help" for help.
duncan=> CREATE TEMP TABLE wacky(priority bigint, valid tstzrange);
CREATE TABLE
duncan=> \COPY wacky FROM /tmp/data.txt
COPY 98094
duncan=> CREATE AGGREGATE public.naive_range_agg(anymultirange) (
SFUNC = multirange_union,
STYPE = anymultirange
);
CREATE AGGREGATE
duncan=> \timing on
Timing is on.
duncan=> EXPLAIN (ANALYZE) SELECT FROM (SELECT valid,
naive_range_agg(valid::tstzmultirange) OVER (ORDER BY priority DESC GROUPS
BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS more_recent FROM wacky) foo
WHERE valid <@ more_recent;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Subquery Scan on foo (cost=11063.57..13879.37 rows=433 width=0) (actual
time=88.086..88.087 rows=0 loops=1)
Filter: (foo.valid <@ foo.more_recent)
Rows Removed by Filter: 98094
-> WindowAgg (cost=11063.57..12796.37 rows=86640 width=72) (actual
time=16.102..84.242 rows=98094 loops=1)
-> Sort (cost=11063.57..11280.17 rows=86640 width=40) (actual
time=16.096..20.205 rows=98094 loops=1)
Sort Key: wacky.priority DESC
Sort Method: external merge Disk: 3848kB
-> Seq Scan on wacky (cost=0.00..1588.40 rows=86640 width=40)
(actual time=0.021..5.479 rows=98094 loops=1)
Planning Time: 0.277 ms
Execution Time: 88.631 ms
(10 rows)
Time: 98.261 ms
duncan=> EXPLAIN (ANALYZE) SELECT FROM (SELECT valid,
range_agg(valid::tstzmultirange) OVER (ORDER BY priority DESC GROUPS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING) AS more_recent FROM wacky) foo WHERE valid
<@ more_recent;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Subquery Scan on foo (cost=11063.57..13879.37 rows=433 width=0) (actual
time=566009.972..566009.973 rows=0 loops=1)
Filter: (foo.valid <@ foo.more_recent)
Rows Removed by Filter: 98094
-> WindowAgg (cost=11063.57..12796.37 rows=86640 width=72) (actual
time=21.996..565998.800 rows=98094 loops=1)
-> Sort (cost=11063.57..11280.17 rows=86640 width=40) (actual
time=21.988..26.154 rows=98094 loops=1)
Sort Key: wacky.priority DESC
Sort Method: external merge Disk: 3848kB
-> Seq Scan on wacky (cost=0.00..1588.40 rows=86640 width=40)
(actual time=0.014..6.868 rows=98094 loops=1)
Planning Time: 0.178 ms
Execution Time: 566010.770 ms
(10 rows)
Maybe there is some problem in range_deserialize function
40,53% postgres [.] range_deserialize ◆
8,28% postgres [.] FunctionCall2Coll ▒
7,14% postgres [.] range_compare ▒
4,95% postgres [.] qsort_arg ▒
4,51% postgres [.] range_cmp_bounds ▒
2,49% postgres [.] timestamp_cmp ▒
1,73% postgres [.] range_serialize ▒
0,91% postgres [.] AllocSetAlloc
8,28% postgres [.] FunctionCall2Coll ▒
7,14% postgres [.] range_compare ▒
4,95% postgres [.] qsort_arg ▒
4,51% postgres [.] range_cmp_bounds ▒
2,49% postgres [.] timestamp_cmp ▒
1,73% postgres [.] range_serialize ▒
0,91% postgres [.] AllocSetAlloc
Regards
Pavel
Time: 566018.613 ms (09:26.019)
Re: range_agg extremely slow compared to naive implementation in obscure circumstances
From
David Rowley
Date:
On Mon, 30 Jan 2023 at 23:43, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Maybe there is some problem in range_deserialize function It seems to be that range_deserialize is called from within range_compare which is the qsort comparison function (see multirange_canonicalize). That'll end up calling range_deserialize twice, once for each item being compared about O(N log N) times. Ordinarily, this probably isn't too bad as we only do this in the aggregate's final function. It's likely the performance is bad here as the aggregate is being used as a window function and the finalfn must be called once for each row rather than once per group as it would if it was being used as a normal aggregate function. It might be better if we had multirange_canonicalize() deserialize these once and used some representation that could more easily be qsorted. I'm not planning on doing any work on it though. It's probably unlikely that we'd do anything about this as part of a bug fix. David
Re: range_agg extremely slow compared to naive implementation in obscure circumstances
From
Alvaro Herrera
Date:
On 2023-Jan-31, David Rowley wrote: > It might be better if we had multirange_canonicalize() deserialize > these once and used some representation that could more easily be > qsorted. I'm not planning on doing any work on it though. Yeah, maybe it would be possible to have an in-memory representation that doesn't require any deparsing, and keep the compact representation to be used only for in-data-page storage. How to do this within the constraints of the Datum abstraction is not clear to me. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El sabio habla porque tiene algo que decir; el tonto, porque tiene que decir algo" (Platon).
Re: range_agg extremely slow compared to naive implementation in obscure circumstances
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2023-Jan-31, David Rowley wrote: >> It might be better if we had multirange_canonicalize() deserialize >> these once and used some representation that could more easily be >> qsorted. I'm not planning on doing any work on it though. > Yeah, maybe it would be possible to have an in-memory representation > that doesn't require any deparsing, and keep the compact representation > to be used only for in-data-page storage. How to do this within the > constraints of the Datum abstraction is not clear to me. Perhaps the "expanded datum" mechanism would serve? src/include/utils/expandeddatum.h It might be too heavyweight for this application, but I'm not sure. regards, tom lane