Bad query plan when the wrong data type is used - Mailing list pgsql-performance

From Laszlo Nagy
Subject Bad query plan when the wrong data type is used
Date
Msg-id 4D514FFF.9060308@shopzeus.com
Whole thread Raw
Responses Re: Bad query plan when the wrong data type is used
List pgsql-performance
This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
                            from  variation_item_sellingsite_asin visa
                            inner join product p on p.id = visa.product_id
                            inner join variation_item vi on vi.id = visa.variation_item_id
                            where visa.id =4

runs in 43 msec. The "visa.id" column has int4 datatype. The query plan uses an index condition:

"Nested Loop  (cost=0.00..26.19 rows=1 width=28)"
"  ->  Nested Loop  (cost=0.00..17.75 rows=1 width=24)"
"        ->  Index Scan using variation_item_sellingsite_asin_pkey on variation_item_sellingsite_asin visa  (cost=0.00..8.58 rows=1 width=16)"
"              Index Cond: (id = 4)"
"        ->  Index Scan using pk_product_id on product p  (cost=0.00..9.16 rows=1 width=16)"
"              Index Cond: (p.id = visa.product_id)"
"  ->  Index Scan using pk_variation_item_id on variation_item vi  (cost=0.00..8.43 rows=1 width=12)"
"        Index Cond: (vi.id = visa.variation_item_id)"


This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
                            from  variation_item_sellingsite_asin visa
                            inner join product p on p.id = visa.product_id
                            inner join variation_item vi on vi.id = visa.variation_item_id
                            where visa.id =4.0

Runs for  1144 msec! Query plan uses seq scan + filter:

"Nested Loop  (cost=33957.27..226162.68 rows=14374 width=28)"
"  ->  Hash Join  (cost=33957.27..106190.76 rows=14374 width=20)"
"        Hash Cond: (visa.variation_item_id = vi.id)"
"        ->  Seq Scan on variation_item_sellingsite_asin visa  (cost=0.00..71928.04 rows=14374 width=16)"
"              Filter: ((id)::numeric = 4.0)"
"        ->  Hash  (cost=22026.01..22026.01 rows=954501 width=12)"
"              ->  Seq Scan on variation_item vi  (cost=0.00..22026.01 rows=954501 width=12)"
"  ->  Index Scan using pk_product_id on product p  (cost=0.00..8.33 rows=1 width=16)"
"        Index Cond: (p.id = visa.product_id)"


Which is silly. I think that PostgreSQL converts the int side to a float, and then compares them.

It would be better to do this, for each item in the loop:
  • evaluate the right side (which is float)
  • tell if it is an integer or not
  • if not an integer, then discard the row immediately
  • otherwise use its integer value for the index scan
The result is identical, but it makes possible to use the index scan. Of course, I know that the query itself is wrong, because I sould not use a float where an int is expected. But this CAN be optimized, so I think it should be! My idea for the query optimizer is not to use the "wider" data type, but use the data type that has an index on it instead.

(I spent an hour figuring out what is wrong with my program. In some cases it was slow, in other cases it was really fast, and I never got an error message.)

What do you think?

   Laszlo

pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: Really really slow select count(*)
Next
From: Maciek Sakrejda
Date:
Subject: Re: Really really slow select count(*)