[PATCH] Lazy hashaggregate when no aggregation is needed - Mailing list pgsql-hackers

From Ants Aasma
Subject [PATCH] Lazy hashaggregate when no aggregation is needed
Date
Msg-id CA+CSw_uE-RCyQd_bXJNe=usrXkq+keFrQrahkc+8ou+Ws4Y=Vw@mail.gmail.com
Whole thread Raw
Responses Re: [PATCH] Lazy hashaggregate when no aggregation is needed  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
A user complained on pgsql-performance that SELECT col FROM table
GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe
to return tuples from hash-aggregate as they are found when no
aggregate functions are in use. Attached is a first shot at that. The
planner is modified so that when the optimization applies, hash table
size check is compared against the limit and start up cost comes from
the input. The executor is modified so that when the hash table is not
filled yet and the optimization applies, nodes are returned
immediately.

Can somebody poke holes in this? The patch definitely needs some code
cleanup in nodeAgg.c, but otherwise it passes regression tests and
seems to work as intended. It also optimizes the SELECT DISTINCT col
FROM table LIMIT 2; case, but not SELECT DISTINCT ON (col) col FROM
table LIMIT 2 because it is explicitly forced to use sorted
aggregation.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [COMMITTERS] pgsql: pg_test_timing utility, to measure clock monotonicity and timing
Next
From: Jaime Casanova
Date:
Subject: triggers and inheritance tree