Thread: insane index scan times

insane index scan times

From
Sergei Shelukhin
Date:
Create table entries (id bigint primary key, somedata varchar(500));
/* insert ~12mil rows of data, somedata mostly empty */

create table stuff (id bigint, bah int, primary key (id, bah));

insert into stuff (id,bah) select id, 1 from entries;

create index blah on stuff(id);

vacuum full analyze;

set enable_seqscan = on;

explain select * from entries inner join stuff on entries.id =
stuff.id;

 ->  Seq Scan on stuff  (cost=0.00..193344.00 rows=12550400 width=12)


set enable_seqscan = off;

explain select * from entries inner join stuff on entries.id =
stuff.id;

 ->  Index Scan using blah on stuff  (cost=0.00..25406232.30
rows=12550400 width=12)

Query execution resuls are consistent w/explain. wtf? no I mean,
WTF?????!!!!!!!!!!!!!!!!!!!

Sorry. But I 'm amazed.


Re: insane index scan times

From
Alvaro Herrera
Date:
Sergei Shelukhin escribió:

> explain select * from entries inner join stuff on entries.id =
> stuff.id;
>
>  ->  Seq Scan on stuff  (cost=0.00..193344.00 rows=12550400 width=12)
>
>
> set enable_seqscan = off;
>
> explain select * from entries inner join stuff on entries.id =
> stuff.id;
>
>  ->  Index Scan using blah on stuff  (cost=0.00..25406232.30
> rows=12550400 width=12)
>
> Query execution resuls are consistent w/explain. wtf? no I mean,
> WTF?????!!!!!!!!!!!!!!!!!!!
>
> Sorry. But I 'm amazed.

I am not.  You are asking it to give you 12 million rows -- so it does.
What's the surprise if it takes long?

Do you really want to have all 12 million rows as a result?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: insane index scan times

From
Sergei Shelukhin
Date:
This is just an example isolating the problem.
Actual queries contain more tables and more joins and return
reasonable amount of data.
Performance of big indices however is appalling, with planner always
reverting to seqscan with default settings.

I tried to pre-filter the data as much as possible in preceding joins
to put less strain on the offending join (less rows to join by?) but
it doesn't help.

I wonder what exactly makes index perform 100+ times slower than
seqscan - I mean even if it's perfromed on the HD which it should not
be given the index size, index and table are on the same HD and index
is smaller and also sorted, isn't it?





Re: insane index scan times

From
Martijn van Oosterhout
Date:
On Sun, Jun 03, 2007 at 11:29:07PM -0700, Sergei Shelukhin wrote:
> I wonder what exactly makes index perform 100+ times slower than
> seqscan - I mean even if it's perfromed on the HD which it should not
> be given the index size, index and table are on the same HD and index
> is smaller and also sorted, isn't it?

Um, because if you scan the whole index you also have to scan the whole
table, and you're going to scan the table and the index in random
order, which is slower again.

An index is faster for selecting a *portion* of the table, it's
useless once you get to a significant percentage.

However, recent versions have Bitmap index scans which are a middle
ground, linear index scan, linear heap scan which has a much better
worst case. So perhaps you're running a really old version of postgres,
you didn't actually say what version you were running.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: insane index scan times

From
Sergei Shelukhin
Date:
Version is 8.1
The query I originally ran returned ~4-5 rows and had a lot of other
joins and filtering conditions prior to the join with the big table.
Is there any way to instruct postgres to do joins in the specific
order or smth?


Re: insane index scan times

From
Martijn van Oosterhout
Date:
On Thu, Jun 07, 2007 at 11:20:20AM -0700, Sergei Shelukhin wrote:
> Version is 8.1
> The query I originally ran returned ~4-5 rows and had a lot of other
> joins and filtering conditions prior to the join with the big table.
> Is there any way to instruct postgres to do joins in the specific
> order or smth?

Not really, Postgres can rearrange joins into the order it works out to
be the best.

You need to check whether the statistics on your columns are good, so
postgres can make a good estimate.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: insane index scan times

From
"Alexander Staubo"
Date:
On 6/8/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Thu, Jun 07, 2007 at 11:20:20AM -0700, Sergei Shelukhin wrote:
> > Version is 8.1
> > The query I originally ran returned ~4-5 rows and had a lot of other
> > joins and filtering conditions prior to the join with the big table.
> > Is there any way to instruct postgres to do joins in the specific
> > order or smth?
>
> Not really, Postgres can rearrange joins into the order it works out to
> be the best.

Are you sure? I was under the impression that PostgreSQL could reorder
"where" clauses, but did not  yet reorder explicit joins (as opposed
to implicit ones through, say, "in"). But perhaps it only applied to
some types of joins.

Alexander.

Re: insane index scan times

From
Alvaro Herrera
Date:
Alexander Staubo escribió:
> On 6/8/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> >On Thu, Jun 07, 2007 at 11:20:20AM -0700, Sergei Shelukhin wrote:
> >> Version is 8.1
> >> The query I originally ran returned ~4-5 rows and had a lot of other
> >> joins and filtering conditions prior to the join with the big table.
> >> Is there any way to instruct postgres to do joins in the specific
> >> order or smth?
> >
> >Not really, Postgres can rearrange joins into the order it works out to
> >be the best.
>
> Are you sure? I was under the impression that PostgreSQL could reorder
> "where" clauses, but did not  yet reorder explicit joins (as opposed
> to implicit ones through, say, "in"). But perhaps it only applied to
> some types of joins.

Postgres does flatten joins, and reorders the resulting range tables as
convenient.  Only in 8.2 it learned how to to reorder outer joins, but
for inner joins this has been in place for a lot longer.

--
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo." (Jean B. Say)

Re: insane index scan times

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Alexander Staubo escribi�:
>> Are you sure? I was under the impression that PostgreSQL could reorder
>> "where" clauses, but did not  yet reorder explicit joins (as opposed
>> to implicit ones through, say, "in"). But perhaps it only applied to
>> some types of joins.

> Postgres does flatten joins, and reorders the resulting range tables as
> convenient.  Only in 8.2 it learned how to to reorder outer joins, but
> for inner joins this has been in place for a lot longer.

The rules for this have changed over time, and you can affect them through
planner parameters too.  Consult
http://www.postgresql.org/docs/8.2/static/explicit-joins.html
                               ^^^
with URL adjusted for your PG version.

            regards, tom lane

Re: insane index scan times

From
"hubert depesz lubaczewski"
Date:
On 6/7/07, Sergei Shelukhin <realgeek@gmail.com> wrote:
Version is 8.1
The query I originally ran returned ~4-5 rows and had a lot of other
joins and filtering conditions prior to the join with the big table.
Is there any way to instruct postgres to do joins in the specific
order or smth?

make explain analyze of original query (notice *explain analyze* and not *explain*), and post this to list. otherwise - you can't really hope someone will tell you what's wrong.

depesz

--
http://www.depesz.com/ - nowy, lepszy depesz

Re: insane index scan times

From
Scott Marlowe
Date:
Sergei Shelukhin wrote:
> This is just an example isolating the problem.
> Actual queries contain more tables and more joins and return
> reasonable amount of data.
> Performance of big indices however is appalling, with planner always
> reverting to seqscan with default settings.
>
> I tried to pre-filter the data as much as possible in preceding joins
> to put less strain on the offending join (less rows to join by?) but
> it doesn't help.
>
> I wonder what exactly makes index perform 100+ times slower than
> seqscan - I mean even if it's perfromed on the HD which it should not
> be given the index size, index and table are on the same HD and index
> is smaller and also sorted, isn't it?
>
> set enable_seqscan = on;
>
> explain select * from entries inner join stuff on entries.id =
> stuff.id;
>
>  ->  Seq Scan on stuff  (cost=0.00..193344.00 rows=12550400 width=12)
>
>
> set enable_seqscan = off;
>
> explain select * from entries inner join stuff on entries.id =
> stuff.id;
>
>  ->  Index Scan using blah on stuff  (cost=0.00..25406232.30
> rows=12550400 width=12)
I don't think you really understand postgresql very well.

There's no evidence that the index scan is 100 times slower.  The
planner is guesstimating that it will take much longer for the index to
do the same work.

Do some research on postgresql's MVCC model and "visibility".  The
indexes don't have it, so every access has to hit the tables anyway, so
if the query planner figures you're going to hit 10% or so of the table,
just seq scan it and go.

Run your queries with "explain analyze" and see which is faster.  If the
seq scan is faster, then pgsql made the right choice.  What you can do
to speed it up depends on your system.

Post the output of explain analyze select ... here and let us look at it.

More importantly, post your REAL Queries with explain analyze output
(not just explain) and let's see what we see.