Home > mailing lists

Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan - Mailing list pgsql-hackers

From	Jules Bean
Subject	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date	August 24, 2000 08:27:32
Msg-id	20000824101113.N17510@grommit.office.vi.net Whole thread Raw
In response to	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Wed, Aug 23, 2000 at 10:30:30AM -0400, Tom Lane wrote:
> Jules Bean <jules@jellybean.co.uk> writes:
> > I have in a table a 'category' column which takes a small number of
> > (basically fixed) values.  Here by 'small', I mean ~1000, while the
> > table itself has ~10 000 000 rows. Some categories have many, many
> > more rows than others.  In particular, there's one category which hits
> > over half the rows.  Because of this (AIUI) postgresql assumes
> > that the query
> >    select ... from thistable where category='something'
> > is best served by a seqscan, even though there is an index on
> > category.
> 
> Yes, we know about that one.  We have stats about the most common value
> in a column, but no information about how the less-common values are
> distributed.  We definitely need stats about several top values not just
> one, because this phenomenon of a badly skewed distribution is pretty
> common.

ISTM that that might be enough, in fact.

If you have stats telling you that the most popular value is 'xyz',
and that it constitutes 50% of the rows (i.e. 5 000 000) then you can
conclude that, on average, other entries constitute a mere 5 000
000/999 ~~ 5000 entries, and it would be definitely be enough.
(That's assuming you store the number of distinct values somewhere).

> BTW, if your highly-popular value is actually a dummy value ('UNKNOWN'
> or something like that), a fairly effective workaround is to replace the
> dummy entries with NULL.  The system does account for NULLs separately
> from real values, so you'd then get stats based on the most common
> non-dummy value.

I can't really do that.  Even if I could, the distribution is very
skewed -- so the next most common makes up a very high proportion of
what's left.  I forget the figures exactly.

Jules

pgsql-hackers by date:

From: Zeugswetter Andreas SB
Date: 24 August 2000, 08:22:50
Subject: AW: Re: How Do You Pronounce "PostgreSQL"?

From: Vince Vielhaber
Date: 24 August 2000, 09:27:33
Subject: Re: [7.0.2] problems with spinlock under FreeBSD?

Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan - Mailing list pgsql-hackers

Previous

Next