Re: [PERFORM] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-hackers

From Dave Held
Subject Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date
Msg-id 49E94D0CFCD4DB43AFBA928DDD20C8F9026184D7@asg002.asg.local
Whole thread Raw
List pgsql-hackers
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, April 25, 2005 10:23 AM
> To: Simon Riggs
> Cc: josh@agliodbs.com; Greg Stark; Marko Ristola; pgsql-perform;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks
> suggested?
>
> [...]
> It's not just the scan --- you also have to sort, or something like
> that, if you want to count distinct values.  I doubt anyone is really
> going to consider this a feasible answer for large tables.

How about an option to create a stat hashmap for the column
that maps distinct values to their number of occurrences?  Obviously
the map would need to be updated on INSERT/DELETE/UPDATE, but if the
table is dominated by reads, and an accurate n_distinct is very
important, there may be people willing to pay the extra time and space
cost.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: possible TODO: read-only tables, select from indexes
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] Continue transactions after errors in psql