Home > mailing lists

Improve hash-agg performance - Mailing list pgsql-hackers

From	Andres Freund
Subject	Improve hash-agg performance
Date	November 3, 2016 11:07:30
Msg-id	20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de Whole thread Raw
Responses	Re: Improve hash-agg performance Re: Improve hash-agg performance
List	pgsql-hackers

Tree view

Hi,

There's two things I found while working on faster expression
evaluation, slot deforming and batched execution. As those two issues
often seemed quite dominant cost-wise it seemed worthwhile to evaluate
them independently.

1) We atm do one ExecProject() to compute each aggregate's
   arguments. Turns out it's noticeably faster to compute the argument
   for all aggregates in one go. Both because it reduces the amount of
   function call / moves more things into a relatively tight loop, and
   because it allows to deform all the required columns at once, rather
   than one-by-one.  For a single aggregate it'd be faster to avoid
   ExecProject alltogether (i.e. directly evaluate the expression as we
   used to), but as soon you have two the new approach is faster.

2) For hash-aggs we right now we store the representative tuple using
   the input tuple's format, with unneeded columns set to NULL. That
   turns out to be expensive if the aggregated-on columns are not
   leading columns, because we have to skip over a potentially large
   number of NULLs.  The fix here is to simply use a different tuple
   format for the hashtable.  That doesn't cause overhead, because we
   already move columns in/out of the hashslot explicitly anyway.

Comments?

Regards,

Andres Freund

Attachment

pgsql-hackers by date:

From: Kevin Grittner
Date: 03 November 2016, 10:39:59
Subject: Re: delta relations in AFTER triggers

From: Ashutosh Bapat
Date: 03 November 2016, 11:26:49
Subject: Re: Danger of automatic connection reset in psql

Improve hash-agg performance - Mailing list pgsql-hackers

Attachment

Previous

Next