Re: Trouble with hashagg spill I/O pattern and costing - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Trouble with hashagg spill I/O pattern and costing |
Date | |
Msg-id | 20200521143040.zgn7ealdasfkmfcb@development Whole thread Raw |
In response to | Re: Trouble with hashagg spill I/O pattern and costing (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Trouble with hashagg spill I/O pattern and costing
|
List | pgsql-hackers |
On Thu, May 21, 2020 at 03:41:22PM +0200, Tomas Vondra wrote: >On Tue, May 19, 2020 at 05:12:02PM +0200, Tomas Vondra wrote: >> >>... >> >>The problem is that the hashagg plan runs in ~1400 seconds, while the >>groupagg only takes ~360. And per explain analyze, the difference really >>is in the aggregation - if we subtract the seqscan, the sort+groupagg >>takes about 310s: >> >> -> GroupAggregate (cost=41772791.17..43305665.51 rows=6206695 width=36) (actual time=283378.004..335611.192 rows=6398981loops=1) >> Group Key: lineitem_1.l_partkey >> -> Sort (cost=41772791.17..42252715.81 rows=191969856 width=9) (actual time=283377.977..306182.393 rows=191969841loops=1) >> Sort Key: lineitem_1.l_partkey >> Sort Method: external merge Disk: 3569544kB >> -> Seq Scan on lineitem lineitem_1 (cost=0.00..5519079.56 rows=191969856 width=9) (actual time=0.019..28253.076rows=192000551 loops=1) >> >>while the hashagg takes ~1330s: >> >> -> HashAggregate (cost=13977751.34..15945557.39 rows=6206695 width=36) (actual time=202952.170..1354546.897 rows=6400000loops=1) >> Group Key: lineitem_1.l_partkey >> Planned Partitions: 128 >> Peak Memory Usage: 4249 kB >> Disk Usage: 26321840 kB >> HashAgg Batches: 16512 >> -> Seq Scan on lineitem lineitem_1 (cost=0.00..5519079.56 rows=191969856 width=9) (actual time=0.007..22205.617rows=192000551 loops=1) >> >>And that's while only writing 26GB, compared to 35GB in the sorted plan, >>and with cost being ~16M vs. ~43M (so roughly inverse). >> > >I've noticed I've actually made a mistake here - it's not 26GB vs. 35GB >in hash vs. sort, it's 26GB vs. 3.5GB. That is, the sort-based plan >writes out *way less* data to the temp file. > >The reason is revealed by explain verbose: > > -> GroupAggregate > Output: lineitem_1.l_partkey, (0.2 * avg(lineitem_1.l_quantity)) > Group Key: lineitem_1.l_partkey > -> Sort > Output: lineitem_1.l_partkey, lineitem_1.l_quantity > Sort Key: lineitem_1.l_partkey > -> Seq Scan on public.lineitem lineitem_1 > Output: lineitem_1.l_partkey, lineitem_1.l_quantity > > -> HashAggregate > Output: lineitem_1.l_partkey, (0.2 * avg(lineitem_1.l_quantity)) > Group Key: lineitem_1.l_partkey > -> Seq Scan on public.lineitem lineitem_1 > Output: lineitem_1.l_orderkey, lineitem_1.l_partkey, > lineitem_1.l_suppkey, lineitem_1.l_linenumber, > lineitem_1.l_quantity, lineitem_1.l_extendedprice, > lineitem_1.l_discount, lineitem_1.l_tax, > lineitem_1.l_returnflag, lineitem_1.l_linestatus, > lineitem_1.l_shipdate, lineitem_1.l_commitdate, > lineitem_1.l_receiptdate, lineitem_1.l_shipinstruct, > lineitem_1.l_shipmode, lineitem_1.l_comment > >It seems that in the hashagg case we're not applying projection in the >seqscan, forcing us to serialize way much data (the whole lineitem >table, essentially). > >It's probably still worth tweaking the I/O pattern, I think. > OK, it seems the attached trivial fix (simply changing CP_LABEL_TLIST to CP_SMALL_TLIST) addresses this for me. I've only tried it on the patched version that pre-allocates 128 blocks, and the results seem pretty nice: sort hash hash+tlist ------------------------------------------ 4MB 331 478 188 128MB 222 434 210 which I guess is what we wanted ... I'll give it a try on the other machine (temp on SATA), but I don't see why would it not behave similarly nicely. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: