Home > mailing lists

Re: Group by more efficient than distinct? - Mailing list pgsql-performance

From	Matthew Wakeling
Subject	Re: Group by more efficient than distinct?
Date	April 22, 2008 12:22:35
Msg-id	Pine.LNX.4.64.0804221318560.12158@aragorn.flymine.org Whole thread Raw
In response to	Re: Group by more efficient than distinct? (Mark Mielke <mark@mark.mielke.cc>)
Responses	Re: Group by more efficient than distinct?
List	pgsql-performance

Tree view

On Tue, 22 Apr 2008, Mark Mielke wrote:
> The poster I responded to said that the memory required for a hash join was
> relative to the number of distinct values, not the number of rows. They gave
> an example of millions of rows, but only a few distinct values. Above, you
> agree with me that it it would include the rows (or at least references to
> the rows) as well. If it stores rows, or references to rows, then memory *is*
> relative to the number of rows, and millions of records would require
> millions of rows (or row references).

Yeah, I think we're talking at cross-purposes, due to hash tables being
used in two completely different places in Postgres. Firstly, you have
hash joins, where Postgres loads the references to the actual rows, and
puts those in the hash table. For that situation, you want a small number
of rows. Secondly, you have hash aggregates, where Postgres stores an
entry for each "group" in the hash table, and does not store the actual
rows. For that situation, you can have a bazillion individual rows, but
only a small number of distinct groups.

Matthew

--
First law of computing:  Anything can go wro
sig: Segmentation fault.  core dumped.

pgsql-performance by date:

From: Mark Mielke
Date: 22 April 2008, 12:01:28
Subject: Re: Group by more efficient than distinct?

From: Mark Mielke
Date: 22 April 2008, 13:04:35
Subject: Re: Group by more efficient than distinct?

Re: Group by more efficient than distinct? - Mailing list pgsql-performance

Previous

Next