Thread: Why is that so slow?
Hi, I have a 2 tables and in some cases joining them are very slow. Here are details. create table postal (oldcode varchar(5), -- has an btree indexnewcode char(7), -- has an btree indexpid int2, -- has an btree indexkana_city text, -- has an btree indexkana_town text, -- has an btree indexcity text, -- has an btree indextown text -- has an btree index ); (has 119479 records) create table prefecture (pid int2, -- has an btree indexpref char(8),kana_pref char(16) ); (has 47 records) My question is: This is fast as I expected. postal=> explain select * from postal,prefecture where city = 'aaa' and postal.pid = prefecture.pid; NOTICE: QUERY PLAN: Nested Loop (cost=4.10 size=1 width=100) -> Index Scan using cityindex on postal (cost=2.05 size=1 width=74) -> IndexScan using prefpidindex on prefecture (cost=2.05 size=47 width=26) But: postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid; NOTICE: QUERY PLAN: Nested Loop (cost=98.90 size=1 width=100) -> Seq Scan on prefecture (cost=2.55 size=47 width=26) -> Index Scan usingpidindex on postal (cost=2.05 size=1 width=74) This is so slooow. Can anybody explain this? Am I missing something? Note that 6.4.x and current show same behavior. --- Tatsuo Ishii
I got a good suggestion from Hiroshi. The conclusion is: > pid int2, -- has an btree index This is bad. I had defined a btree index on pid and it has 2000 duplicate entries in average! After I removed the index, the query runs unbelievably fast! Now explain shows: Nested Loop (cost=933.82 size=1 width=100) -> Index Scan using cityindex on postal (cost=931.77 size=1 width=74) -> IndexScan using prefpidindex on prefecture (cost=2.05 size=47 width=26) -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid; > NOTICE: QUERY PLAN: > > Nested Loop (cost=98.90 size=1 width=100) > -> Seq Scan on prefecture (cost=2.55 size=47 width=26) > -> Index Scan using pidindex on postal (cost=2.05 size=1 width=74) > > This is so slooow. Can anybody explain this? Am I missing something? and later: > I had defined a btree index on pid and it has 2000 > duplicate entries in average! After I removed the index, the query > runs unbelievably fast! Now explain shows: > Nested Loop (cost=933.82 size=1 width=100) > -> Index Scan using cityindex on postal (cost=931.77 size=1 width=74) > -> Index Scan using prefpidindex on prefecture (cost=2.05 size=47 width=26) Hmm. Removal of the index is just a hack --- the system should have been smart enough not to use it. It looks like the system chose the first plan shown above because it thought that selecting postal entries matching a particular pid value would on average match only one postal tuple (note the "size" fields, which are estimates of the numbers of resulting tuples). But in reality, each scan produced 2000 matching entries on average, according to your second message --- and each of those entries had to be tested to see if it had the right city name. So, very slow. The question I have is why didn't the system realize that there would be lots of matches on pid? The "dispersion" statistics it keeps ought to have given it a clue that this approach wouldn't be very selective. The second example is fast because the scan over postal looking for city name matches finds only one match, so prefecture is scanned only once. However the cost estimates there also look bogus --- the system is again mis-guessing how many entries will be selected. It seems to think that all 47 prefecture entries will be matched by a scan for a specific pid. So, bogus dispersion values again (or bad use of them). Something is fishy here. Have you done a "vacuum analyze" since loading the data in these tables? regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid; > > NOTICE: QUERY PLAN: > > > > Nested Loop (cost=98.90 size=1 width=100) > > -> Seq Scan on prefecture (cost=2.55 size=47 width=26) > > -> Index Scan using pidindex on postal (cost=2.05 size=1 width=74) > > > > This is so slooow. Can anybody explain this? Am I missing something? > > and later: > > I had defined a btree index on pid and it has 2000 > > duplicate entries in average! After I removed the index, the query > > runs unbelievably fast! Now explain shows: > > > Nested Loop (cost=933.82 size=1 width=100) > > -> Index Scan using cityindex on postal (cost=931.77 size=1 width=74) > > -> Index Scan using prefpidindex on prefecture (cost=2.05 size=47 width=26) > > Hmm. Removal of the index is just a hack --- the system should have > been smart enough not to use it. It looks like the system chose the > first plan shown above because it thought that selecting postal entries > matching a particular pid value would on average match only one postal > tuple (note the "size" fields, which are estimates of the numbers of > resulting tuples). But in reality, each scan produced 2000 matching > entries on average, according to your second message --- and each of > those entries had to be tested to see if it had the right city name. > So, very slow. > > The question I have is why didn't the system realize that there would be > lots of matches on pid? The "dispersion" statistics it keeps ought to > have given it a clue that this approach wouldn't be very selective. > > The second example is fast because the scan over postal looking for city > name matches finds only one match, so prefecture is scanned only once. Actulally not only one since I use ~ operator. Anyway matching rows would be reasonably small. > However the cost estimates there also look bogus --- the system is again > mis-guessing how many entries will be selected. It seems to think that > all 47 prefecture entries will be matched by a scan for a specific pid. > So, bogus dispersion values again (or bad use of them). > > Something is fishy here. Have you done a "vacuum analyze" since loading > the data in these tables? Oh, I never thought about that. After re-made the index I removed in the next letter and did vacuum analyze, I got: Hash Join (cost=951.50 size=19 width=100) -> Index Scan using cityindex on postal (cost=944.77 size=19 width=74) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on prefecture (cost=2.55 size=47 width=26) This plan looks good(and actually as fast as the previous one). However, the cost estimate for prefecture is again 47? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> Something is fishy here. Have you done a "vacuum analyze" since loading >> the data in these tables? > Oh, I never thought about that. Ah. OK, that explains the system's poor choice of plan --- it was effectively operating on the assumption that these tables were small. (Note to hackers: maybe a freshly created table should be given dummy statistics, say having 1000 rows instead of 0 rows? That would help to prevent the optimizer from making really foolish choices when no vacuum's been done yet for the table. But I dunno whether we could invent plausible default values for all the stats...) > After re-made the index I removed in > the next letter and did vacuum analyze, I got: > Hash Join (cost=951.50 size=19 width=100) > -> Index Scan using cityindex on postal (cost=944.77 size=19 width=74) > -> Hash (cost=0.00 size=0 width=0) > -> Seq Scan on prefecture (cost=2.55 size=47 width=26) > This plan looks good(and actually as fast as the previous > one). However, the cost estimate for prefecture is again 47? No, that looks OK in this context: it's proposing to load the whole prefecture table into an internal hashtable, so it will have to scan all 47 prefecture rows to do it. The only guesstimating in this plan is the "size=19" for the index scan, ie, an estimated 19 hits from the match on city name. That seems fairly reasonable, although of course it could be badly off depending on your match pattern. regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > >> Something is fishy here. Have you done a "vacuum analyze" since loading > >> the data in these tables? > > > Oh, I never thought about that. > > Ah. OK, that explains the system's poor choice of plan --- it was > effectively operating on the assumption that these tables were small. > > (Note to hackers: maybe a freshly created table should be given dummy > statistics, say having 1000 rows instead of 0 rows? That would help > to prevent the optimizer from making really foolish choices when no > vacuum's been done yet for the table. But I dunno whether we could > invent plausible default values for all the stats...) No way to really make a default. Zero is the correct number when the table is created, right? Improved optimize may be even worse or better for un-analyzed tables. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> (Note to hackers: maybe a freshly created table should be given dummy >> statistics, say having 1000 rows instead of 0 rows? That would help >> to prevent the optimizer from making really foolish choices when no >> vacuum's been done yet for the table. But I dunno whether we could >> invent plausible default values for all the stats...) > No way to really make a default. Zero is the correct number when the > table is created, right? Well, it's right at the instant of creation, but I think that's much too simplistic a way of looking at it. Tables are generally created with the intention of putting data into them. It's a reasonable assumption that the table will shortly have some rows in it. Now, any particular estimate like 1000 is obviously going to be wrong. The point I'm trying to make is that the optimizer is more likely to generate a sane plan if it assumes that the table contains a moderate number of rows. We have seen gripes time and time again from people who made a table, didn't bother to do a vacuum, and got horribly slow nested-loop plans from the optimizer because it assumed their table was empty. With a nonzero initial estimate, the optimizer will choose a plan that might be somewhat inefficient if the table really is small; but it won't be seriously unusable if the table is large. Once you've done a vacuum, of course, the whole question is moot. But I think the system's behavior would be more robust if it assumed that a never-yet-vacuumed table contained some rows, not no rows. regards, tom lane
> Well, it's right at the instant of creation, but I think that's much too > simplistic a way of looking at it. Tables are generally created with > the intention of putting data into them. It's a reasonable assumption > that the table will shortly have some rows in it. > > Now, any particular estimate like 1000 is obviously going to be wrong. > The point I'm trying to make is that the optimizer is more likely to > generate a sane plan if it assumes that the table contains a moderate > number of rows. We have seen gripes time and time again from people > who made a table, didn't bother to do a vacuum, and got horribly slow > nested-loop plans from the optimizer because it assumed their table > was empty. With a nonzero initial estimate, the optimizer will choose > a plan that might be somewhat inefficient if the table really is small; > but it won't be seriously unusable if the table is large. > > Once you've done a vacuum, of course, the whole question is moot. > But I think the system's behavior would be more robust if it assumed > that a never-yet-vacuumed table contained some rows, not no rows. True, but the new optimizer code favors ordered/mergejoin over nested loop because it favors ordered results over unordered ones like nested loop. Should fix the problem. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026