Thread: Why is that so slow?

Why is that so slow?

From
Tatsuo Ishii
Date:
Hi,

I have a 2 tables and in some cases joining them are very slow.

Here are details.

create table postal (oldcode varchar(5),    -- has an btree indexnewcode char(7),    -- has an btree indexpid int2,
  -- has an btree indexkana_city text,        -- has an btree indexkana_town text,        -- has an btree indexcity
text,       -- has an btree indextown text        -- has an btree index
 
);

(has 119479 records)

create table prefecture (pid int2,        -- has an btree indexpref char(8),kana_pref char(16)
);

(has 47 records)

My question is:

This is fast as I expected.

postal=> explain select * from postal,prefecture where city = 'aaa' and postal.pid = prefecture.pid;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=4.10 size=1 width=100) ->  Index Scan using cityindex on postal  (cost=2.05 size=1 width=74) ->
IndexScan using prefpidindex on prefecture  (cost=2.05 size=47 width=26)
 

But:

postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=98.90 size=1 width=100) ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26) ->  Index Scan
usingpidindex on postal  (cost=2.05 size=1 width=74)
 

This is so slooow. Can anybody explain this? Am I missing something?

Note that 6.4.x and current show same behavior.
---
Tatsuo Ishii



Re: [HACKERS] Why is that so slow?

From
Tatsuo Ishii
Date:
I got a good suggestion from Hiroshi. The conclusion is:

>    pid int2,        -- has an btree index

This is bad. I had defined a btree index on pid and it has 2000
duplicate entries in average! After I removed the index, the query
runs unbelievably fast! Now explain shows:

Nested Loop  (cost=933.82 size=1 width=100) ->  Index Scan using cityindex on postal  (cost=931.77 size=1 width=74) ->
IndexScan using prefpidindex on prefecture  (cost=2.05 size=47 width=26)
 

--
Tatsuo Ishii


Re: [HACKERS] Why is that so slow?

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid;
> NOTICE:  QUERY PLAN:
> 
> Nested Loop  (cost=98.90 size=1 width=100)
>   ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26)
>   ->  Index Scan using pidindex on postal  (cost=2.05 size=1 width=74)
> 
> This is so slooow. Can anybody explain this? Am I missing something?

and later:
> I had defined a btree index on pid and it has 2000
> duplicate entries in average! After I removed the index, the query
> runs unbelievably fast! Now explain shows:

> Nested Loop  (cost=933.82 size=1 width=100)
> -> Index Scan using cityindex on postal  (cost=931.77 size=1 width=74)
> -> Index Scan using prefpidindex on prefecture  (cost=2.05 size=47 width=26)

Hmm.  Removal of the index is just a hack --- the system should have
been smart enough not to use it.  It looks like the system chose the
first plan shown above because it thought that selecting postal entries
matching a particular pid value would on average match only one postal
tuple (note the "size" fields, which are estimates of the numbers of
resulting tuples).  But in reality, each scan produced 2000 matching
entries on average, according to your second message --- and each of
those entries had to be tested to see if it had the right city name.
So, very slow.

The question I have is why didn't the system realize that there would be
lots of matches on pid?  The "dispersion" statistics it keeps ought to
have given it a clue that this approach wouldn't be very selective.

The second example is fast because the scan over postal looking for city
name matches finds only one match, so prefecture is scanned only once.
However the cost estimates there also look bogus --- the system is again
mis-guessing how many entries will be selected.  It seems to think that
all 47 prefecture entries will be matched by a scan for a specific pid.
So, bogus dispersion values again (or bad use of them).

Something is fishy here.  Have you done a "vacuum analyze" since loading
the data in these tables?
        regards, tom lane


Re: [HACKERS] Why is that so slow?

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid;
> > NOTICE:  QUERY PLAN:
> > 
> > Nested Loop  (cost=98.90 size=1 width=100)
> >   ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26)
> >   ->  Index Scan using pidindex on postal  (cost=2.05 size=1 width=74)
> > 
> > This is so slooow. Can anybody explain this? Am I missing something?
> 
> and later:
> > I had defined a btree index on pid and it has 2000
> > duplicate entries in average! After I removed the index, the query
> > runs unbelievably fast! Now explain shows:
> 
> > Nested Loop  (cost=933.82 size=1 width=100)
> > -> Index Scan using cityindex on postal  (cost=931.77 size=1 width=74)
> > -> Index Scan using prefpidindex on prefecture  (cost=2.05 size=47 width=26)
> 
> Hmm.  Removal of the index is just a hack --- the system should have
> been smart enough not to use it.  It looks like the system chose the
> first plan shown above because it thought that selecting postal entries
> matching a particular pid value would on average match only one postal
> tuple (note the "size" fields, which are estimates of the numbers of
> resulting tuples).  But in reality, each scan produced 2000 matching
> entries on average, according to your second message --- and each of
> those entries had to be tested to see if it had the right city name.
> So, very slow.
> 
> The question I have is why didn't the system realize that there would be
> lots of matches on pid?  The "dispersion" statistics it keeps ought to
> have given it a clue that this approach wouldn't be very selective.
> 
> The second example is fast because the scan over postal looking for city
> name matches finds only one match, so prefecture is scanned only once.

Actulally not only one since I use ~ operator. Anyway matching rows
would be reasonably small.

> However the cost estimates there also look bogus --- the system is again
> mis-guessing how many entries will be selected.  It seems to think that
> all 47 prefecture entries will be matched by a scan for a specific pid.
> So, bogus dispersion values again (or bad use of them).
> 
> Something is fishy here.  Have you done a "vacuum analyze" since loading
> the data in these tables?

Oh, I never thought about that. After re-made the index I removed in
the next letter and did vacuum analyze, I got:

Hash Join  (cost=951.50 size=19 width=100) ->  Index Scan using cityindex on postal  (cost=944.77 size=19 width=74) ->
Hash (cost=0.00 size=0 width=0)       ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26)
 

This plan looks good(and actually as fast as the previous
one). However, the cost estimate for prefecture is again 47?
--
Tatsuo Ishii


Re: [HACKERS] Why is that so slow?

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> Something is fishy here.  Have you done a "vacuum analyze" since loading
>> the data in these tables?

> Oh, I never thought about that.

Ah.  OK, that explains the system's poor choice of plan --- it was
effectively operating on the assumption that these tables were small.

(Note to hackers: maybe a freshly created table should be given dummy
statistics, say having 1000 rows instead of 0 rows?  That would help
to prevent the optimizer from making really foolish choices when no
vacuum's been done yet for the table.  But I dunno whether we could
invent plausible default values for all the stats...)

> After re-made the index I removed in
> the next letter and did vacuum analyze, I got:

> Hash Join  (cost=951.50 size=19 width=100)
> -> Index Scan using cityindex on postal  (cost=944.77 size=19 width=74)
> -> Hash  (cost=0.00 size=0 width=0)
>    -> Seq Scan on prefecture  (cost=2.55 size=47 width=26)

> This plan looks good(and actually as fast as the previous
> one). However, the cost estimate for prefecture is again 47?

No, that looks OK in this context: it's proposing to load the whole
prefecture table into an internal hashtable, so it will have to scan
all 47 prefecture rows to do it.  The only guesstimating in this plan
is the "size=19" for the index scan, ie, an estimated 19 hits from the
match on city name.  That seems fairly reasonable, although of course
it could be badly off depending on your match pattern.
        regards, tom lane


Re: [HACKERS] Why is that so slow?

From
Bruce Momjian
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> Something is fishy here.  Have you done a "vacuum analyze" since loading
> >> the data in these tables?
> 
> > Oh, I never thought about that.
> 
> Ah.  OK, that explains the system's poor choice of plan --- it was
> effectively operating on the assumption that these tables were small.
> 
> (Note to hackers: maybe a freshly created table should be given dummy
> statistics, say having 1000 rows instead of 0 rows?  That would help
> to prevent the optimizer from making really foolish choices when no
> vacuum's been done yet for the table.  But I dunno whether we could
> invent plausible default values for all the stats...)

No way to really make a default.  Zero is the correct number when the
table is created, right?  Improved optimize may be even worse or better
for un-analyzed tables.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Why is that so slow?

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> (Note to hackers: maybe a freshly created table should be given dummy
>> statistics, say having 1000 rows instead of 0 rows?  That would help
>> to prevent the optimizer from making really foolish choices when no
>> vacuum's been done yet for the table.  But I dunno whether we could
>> invent plausible default values for all the stats...)

> No way to really make a default.  Zero is the correct number when the
> table is created, right?

Well, it's right at the instant of creation, but I think that's much too
simplistic a way of looking at it.  Tables are generally created with
the intention of putting data into them.  It's a reasonable assumption
that the table will shortly have some rows in it.

Now, any particular estimate like 1000 is obviously going to be wrong.
The point I'm trying to make is that the optimizer is more likely to
generate a sane plan if it assumes that the table contains a moderate
number of rows.  We have seen gripes time and time again from people
who made a table, didn't bother to do a vacuum, and got horribly slow
nested-loop plans from the optimizer because it assumed their table
was empty.  With a nonzero initial estimate, the optimizer will choose
a plan that might be somewhat inefficient if the table really is small;
but it won't be seriously unusable if the table is large.

Once you've done a vacuum, of course, the whole question is moot.
But I think the system's behavior would be more robust if it assumed
that a never-yet-vacuumed table contained some rows, not no rows.
        regards, tom lane


Re: [HACKERS] Why is that so slow?

From
Bruce Momjian
Date:
> Well, it's right at the instant of creation, but I think that's much too
> simplistic a way of looking at it.  Tables are generally created with
> the intention of putting data into them.  It's a reasonable assumption
> that the table will shortly have some rows in it.
> 
> Now, any particular estimate like 1000 is obviously going to be wrong.
> The point I'm trying to make is that the optimizer is more likely to
> generate a sane plan if it assumes that the table contains a moderate
> number of rows.  We have seen gripes time and time again from people
> who made a table, didn't bother to do a vacuum, and got horribly slow
> nested-loop plans from the optimizer because it assumed their table
> was empty.  With a nonzero initial estimate, the optimizer will choose
> a plan that might be somewhat inefficient if the table really is small;
> but it won't be seriously unusable if the table is large.
> 
> Once you've done a vacuum, of course, the whole question is moot.
> But I think the system's behavior would be more robust if it assumed
> that a never-yet-vacuumed table contained some rows, not no rows.

True, but the new optimizer code favors ordered/mergejoin over nested
loop because it favors ordered results over unordered ones like nested
loop.  Should fix the problem. 

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026