Re: Query with order by and limit is very slow - wrong index used - Mailing list pgsql-performance

From Gregg Jaskiewicz
Subject Re: Query with order by and limit is very slow - wrong index used
Date
Msg-id CAJY59_grzRJUBurDZniuVtj2OgUNhvpKDD9HR0RGuT7D2H_mxg@mail.gmail.com
Whole thread Raw
In response to Re: Query with order by and limit is very slow - wrong index used  (Nowak Michał <michal.nowak@me.com>)
List pgsql-performance
2011/10/3 Nowak Michał <michal.nowak@me.com>:
>> How many rows do you have in that table?
>
> a9-dev=> select count(*) from records;
>  count
> ---------
> 3620311
> (1 row)


>
> a9-dev=> select source_id, count(*) from records where source_id =
'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'or source_id =
'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'group by source_id; 
>                      source_id                        | count
> --------------------------------------------------------+--------
> http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml    | 427254
> http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml | 989184
> (2 rows)

So the second one is roughly 27% of the table.  I don't know the
actual condition under which planner changes over the seqscan, but
that value seems quite common it seems.
The other thing planner is going to look at is the correlation, most
common values, most common frequencies.
In other words, if the value is 27% of all values, but is evenly
spread across - I think planner will go for seq scan regardless.

At the end of the day (afaik), index scan only pics pages for narrowed
down seqscan really. So imagine if your index scan returned all the
pages, you would still have to do a seqscan on all of them. Planner is
trying to avoid that by weighting the costs of both operations.
If it is too slow to run the current queries, you could try
normalizing the table by splitting source_id into separate one and
referencing it by an id. Very often what you'd find is that doing so
lowers I/O required, hence saves a lot of time in queries. Downside
is, that it is bit harder to add/update the tables. But that's where
triggers and procedures come handy.


>
>> ALTER TABLE records ALTER id SET source_id 1000; vacuum analyze verbose records;
> Did you mean ALTER TABLE records ALTER id SET STATISTICS 1000;?

Yup, that's what I meant. Sorry.


--
GJ

pgsql-performance by date:

Previous
From: Venkat Balaji
Date:
Subject: Re: : Performance Improvement Strategy
Next
From: Nowak Michał
Date:
Subject: Fwd: Query with order by and limit is very slow - wrong index used