Thread: insane index scan times
Create table entries (id bigint primary key, somedata varchar(500)); /* insert ~12mil rows of data, somedata mostly empty */ create table stuff (id bigint, bah int, primary key (id, bah)); insert into stuff (id,bah) select id, 1 from entries; create index blah on stuff(id); vacuum full analyze; set enable_seqscan = on; explain select * from entries inner join stuff on entries.id = stuff.id; -> Seq Scan on stuff (cost=0.00..193344.00 rows=12550400 width=12) set enable_seqscan = off; explain select * from entries inner join stuff on entries.id = stuff.id; -> Index Scan using blah on stuff (cost=0.00..25406232.30 rows=12550400 width=12) Query execution resuls are consistent w/explain. wtf? no I mean, WTF?????!!!!!!!!!!!!!!!!!!! Sorry. But I 'm amazed.
Sergei Shelukhin escribió: > explain select * from entries inner join stuff on entries.id = > stuff.id; > > -> Seq Scan on stuff (cost=0.00..193344.00 rows=12550400 width=12) > > > set enable_seqscan = off; > > explain select * from entries inner join stuff on entries.id = > stuff.id; > > -> Index Scan using blah on stuff (cost=0.00..25406232.30 > rows=12550400 width=12) > > Query execution resuls are consistent w/explain. wtf? no I mean, > WTF?????!!!!!!!!!!!!!!!!!!! > > Sorry. But I 'm amazed. I am not. You are asking it to give you 12 million rows -- so it does. What's the surprise if it takes long? Do you really want to have all 12 million rows as a result? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
This is just an example isolating the problem. Actual queries contain more tables and more joins and return reasonable amount of data. Performance of big indices however is appalling, with planner always reverting to seqscan with default settings. I tried to pre-filter the data as much as possible in preceding joins to put less strain on the offending join (less rows to join by?) but it doesn't help. I wonder what exactly makes index perform 100+ times slower than seqscan - I mean even if it's perfromed on the HD which it should not be given the index size, index and table are on the same HD and index is smaller and also sorted, isn't it?
On Sun, Jun 03, 2007 at 11:29:07PM -0700, Sergei Shelukhin wrote: > I wonder what exactly makes index perform 100+ times slower than > seqscan - I mean even if it's perfromed on the HD which it should not > be given the index size, index and table are on the same HD and index > is smaller and also sorted, isn't it? Um, because if you scan the whole index you also have to scan the whole table, and you're going to scan the table and the index in random order, which is slower again. An index is faster for selecting a *portion* of the table, it's useless once you get to a significant percentage. However, recent versions have Bitmap index scans which are a middle ground, linear index scan, linear heap scan which has a much better worst case. So perhaps you're running a really old version of postgres, you didn't actually say what version you were running. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Version is 8.1 The query I originally ran returned ~4-5 rows and had a lot of other joins and filtering conditions prior to the join with the big table. Is there any way to instruct postgres to do joins in the specific order or smth?
On Thu, Jun 07, 2007 at 11:20:20AM -0700, Sergei Shelukhin wrote: > Version is 8.1 > The query I originally ran returned ~4-5 rows and had a lot of other > joins and filtering conditions prior to the join with the big table. > Is there any way to instruct postgres to do joins in the specific > order or smth? Not really, Postgres can rearrange joins into the order it works out to be the best. You need to check whether the statistics on your columns are good, so postgres can make a good estimate. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On 6/8/07, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Thu, Jun 07, 2007 at 11:20:20AM -0700, Sergei Shelukhin wrote: > > Version is 8.1 > > The query I originally ran returned ~4-5 rows and had a lot of other > > joins and filtering conditions prior to the join with the big table. > > Is there any way to instruct postgres to do joins in the specific > > order or smth? > > Not really, Postgres can rearrange joins into the order it works out to > be the best. Are you sure? I was under the impression that PostgreSQL could reorder "where" clauses, but did not yet reorder explicit joins (as opposed to implicit ones through, say, "in"). But perhaps it only applied to some types of joins. Alexander.
Alexander Staubo escribió: > On 6/8/07, Martijn van Oosterhout <kleptog@svana.org> wrote: > >On Thu, Jun 07, 2007 at 11:20:20AM -0700, Sergei Shelukhin wrote: > >> Version is 8.1 > >> The query I originally ran returned ~4-5 rows and had a lot of other > >> joins and filtering conditions prior to the join with the big table. > >> Is there any way to instruct postgres to do joins in the specific > >> order or smth? > > > >Not really, Postgres can rearrange joins into the order it works out to > >be the best. > > Are you sure? I was under the impression that PostgreSQL could reorder > "where" clauses, but did not yet reorder explicit joins (as opposed > to implicit ones through, say, "in"). But perhaps it only applied to > some types of joins. Postgres does flatten joins, and reorders the resulting range tables as convenient. Only in 8.2 it learned how to to reorder outer joins, but for inner joins this has been in place for a lot longer. -- Alvaro Herrera http://www.advogato.org/person/alvherre "No deja de ser humillante para una persona de ingenio saber que no hay tonto que no le pueda enseñar algo." (Jean B. Say)
Alvaro Herrera <alvherre@commandprompt.com> writes: > Alexander Staubo escribi�: >> Are you sure? I was under the impression that PostgreSQL could reorder >> "where" clauses, but did not yet reorder explicit joins (as opposed >> to implicit ones through, say, "in"). But perhaps it only applied to >> some types of joins. > Postgres does flatten joins, and reorders the resulting range tables as > convenient. Only in 8.2 it learned how to to reorder outer joins, but > for inner joins this has been in place for a lot longer. The rules for this have changed over time, and you can affect them through planner parameters too. Consult http://www.postgresql.org/docs/8.2/static/explicit-joins.html ^^^ with URL adjusted for your PG version. regards, tom lane
On 6/7/07, Sergei Shelukhin <realgeek@gmail.com> wrote:
make explain analyze of original query (notice *explain analyze* and not *explain*), and post this to list. otherwise - you can't really hope someone will tell you what's wrong.
depesz
--
http://www.depesz.com/ - nowy, lepszy depesz
Version is 8.1
The query I originally ran returned ~4-5 rows and had a lot of other
joins and filtering conditions prior to the join with the big table.
Is there any way to instruct postgres to do joins in the specific
order or smth?
make explain analyze of original query (notice *explain analyze* and not *explain*), and post this to list. otherwise - you can't really hope someone will tell you what's wrong.
depesz
--
http://www.depesz.com/ - nowy, lepszy depesz
Sergei Shelukhin wrote: > This is just an example isolating the problem. > Actual queries contain more tables and more joins and return > reasonable amount of data. > Performance of big indices however is appalling, with planner always > reverting to seqscan with default settings. > > I tried to pre-filter the data as much as possible in preceding joins > to put less strain on the offending join (less rows to join by?) but > it doesn't help. > > I wonder what exactly makes index perform 100+ times slower than > seqscan - I mean even if it's perfromed on the HD which it should not > be given the index size, index and table are on the same HD and index > is smaller and also sorted, isn't it? > > set enable_seqscan = on; > > explain select * from entries inner join stuff on entries.id = > stuff.id; > > -> Seq Scan on stuff (cost=0.00..193344.00 rows=12550400 width=12) > > > set enable_seqscan = off; > > explain select * from entries inner join stuff on entries.id = > stuff.id; > > -> Index Scan using blah on stuff (cost=0.00..25406232.30 > rows=12550400 width=12) I don't think you really understand postgresql very well. There's no evidence that the index scan is 100 times slower. The planner is guesstimating that it will take much longer for the index to do the same work. Do some research on postgresql's MVCC model and "visibility". The indexes don't have it, so every access has to hit the tables anyway, so if the query planner figures you're going to hit 10% or so of the table, just seq scan it and go. Run your queries with "explain analyze" and see which is faster. If the seq scan is faster, then pgsql made the right choice. What you can do to speed it up depends on your system. Post the output of explain analyze select ... here and let us look at it. More importantly, post your REAL Queries with explain analyze output (not just explain) and let's see what we see.