Re: slow seqscan - Mailing list pgsql-performance

From Stephan Szabo
Subject Re: slow seqscan
Date
Msg-id 20040421072156.K55057@megazone.bigpanda.com
Whole thread Raw
In response to Re: slow seqscan  (Edoardo Ceccarelli <eddy@axa.it>)
Responses Re: slow seqscan
List pgsql-performance
On Wed, 21 Apr 2004, Edoardo Ceccarelli wrote:

>
> > What happens if you go:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric,
> > LOWER(testo));
> >
> > or even just:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));
> >
> I wasn't able to make this 2 field index with lower:
>
> dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON
> annuncio400(rubric, LOWER(testo));
> ERROR:  parser: parse error at or near "(" at character 71

That's a 7.4 feature I think (and I think the version with two columns
may need extra parens around the lower()). I think the only way to do
something equivalent in 7.3 is to make a function that concatenates the
two in some fashion after having applied the lower to the one part and
then using that in the queries as well.  Plus, if you're not in "C"
locale, I'm not sure that it'd help in 7.3 anyway.

> >> But the strangest thing ever is that if I change the filter with
> >> another one that represent a smaller amount of data  it uses the
> >> index scan!!!
> >
> >
> > What's strange about that?  The less data is going to be retrieved,
> > the more likely postgres is to use the index.
> >
> can't understand this policy:
>
> dba400=# SELECT count(*) from annuncio400 where rubric='DD';
>  count
> -------
>   6753
> (1 row)
>
> dba400=# SELECT count(*) from annuncio400 where rubric='MA';
>  count
> -------
>   2165
> (1 row)
>
> so it's using the index on 2000 rows and not for 6000?  it's not that
> big difference, isn't it?

It's a question of how many pages it thinks it's going to have to retrieve
in order to handle the request.  If it say needs (or think it needs) to
retrieve 50% of the pages, then given a random_page_cost of 4, it's going
to expect the index scan to be about twice the cost.

Generally speaking one good way to compare is to try the query with
explain analyze and then change parameters like enable_seqscan and try the
query with explain analyze again and compare the estimated rows and costs.
That'll give an idea of how it expects the two versions of the query to
compare speed wise.

pgsql-performance by date:

Previous
From: Paul Thomas
Date:
Subject: Re: MySQL vs PG TPC-H benchmarks
Next
From: Dave Cramer
Date:
Subject: Re: Wierd context-switching issue on Xeon