Home > mailing lists

Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan - Mailing list pgsql-hackers

From	Jules Bean
Subject	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date	August 23, 2000 11:42:18
Msg-id	20000823133418.F17510@grommit.office.vi.net Whole thread Raw
In response to	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan (Tiago Antão <tra@fct.unl.pt>)
Responses	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Mon, Aug 21, 2000 at 04:48:08PM +0100, Tiago Ant?o wrote:
> On Mon, 21 Aug 2000, Tom Lane wrote:
> 
> > >   One thing it might be interesting (please tell me if you think
> > > otherwise) would be to improve pg with better statistical information, by
> > > using, for example, histograms.
> > 
> > Yes, that's been on the todo list for a while.
> 
>   If it's ok and nobody is working on that, I'll look on that subject.
>   I'll start by looking at the analize portion of vacuum. I'm thinking in
> using arrays for the histogram (I've never used the array data type of
> postgres).

Apologies if this is naive; I don't understand the details of the
optimisation you are discussing.  However, I have an optimisation of
my own in mind which might be related.

I have in a table a 'category' column which takes a small number of
(basically fixed) values.  Here by 'small', I mean ~1000, while the
table itself has ~10 000 000 rows. Some categories have many, many
more rows than others.  In particular, there's one category which hits
over half the rows.  Because of this (AIUI) postgresql assumes
that the query

select ... from thistable where category='something'

is best served by a seqscan, even though there is an index on
category.  I assume this is because it calculates the 'average' number
of rows per category, and it's too high for an index to be useful.

In fact, for lots of values of 'something' in the query above, and
index scan would be /much/ faster.  Many categories have (obviously,
since there's ~1000 of them) less that 0.1% of the rows, and an index
scan would be much faster. [I checked this with set
enable_seqscan=off, FWIW].

I don't quite know what statistics should be collected here, but
something would be useful...

Jules

pgsql-hackers by date:

From: Tiago Antão
Date: 23 August 2000, 11:20:30
Subject: analyze.c

From: Jerome Raupach
Date: 23 August 2000, 12:12:41
Subject: Problem with insert

Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan - Mailing list pgsql-hackers

Previous

Next