Re: HashAggregate slower than sort? - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: HashAggregate slower than sort?
Date
Msg-id 4C1B6DA902000025000325B8@gw.wicourts.gov
Whole thread Raw
In response to HashAggregate slower than sort?  ("Jatinder Sangha" <js@coalition.com>)
List pgsql-performance
"Jatinder Sangha" <js@coalition.com> wrote:

> I have a simple query that when planned either uses hash-
> aggregates or a sort depending on the amount of working memory
> available. The problem is that when it uses the hash-aggregates,
> the query runs 25% slower than when using the sort method.
>
> The table in question contains about 60 columns, many of which are
> boolean, 32-bit integers and some are 64-bit integers. Many fields
> are text - and some of these can be quite long (eg 32Kb).

> Obviously, I can re-write the query to use a "distinct on (...)"
> clause

Yeah, that seems prudent, to say the least.

> Why is the hash-aggregate slower than the sort?
>
> Is it something to do with the number of columns? ie. When
> sorting, the first few columns defined on the table (id, version)
> make the row unique - but when using the hash-aggregate feature,
> presumably every column needs to be hashed which takes longer
> especially for long text fields?

Sounds like a reasonable guess to me.  But since you're apparently
retrieving about 9,000 wide rows in (worst case) 56 ms, it would
seem that your active data set may be fully cached.  If so, you
could try reducing both random_page_cost and seq_page_cost to
something in the 0.1 to 0.005 range and see if it improves the
accuracy of the cost estimates.  Not that you should go back to
using DISTINCT on all 60 column, including big text columns; but
these cost factors might help other queries pick faster plans.

-Kevin

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: B-Heaps
Next
From: Greg Smith
Date:
Subject: Re: requested shared memory size overflows size_t