Home > mailing lists

Bad query plan when the wrong data type is used - Mailing list pgsql-performance

From	Laszlo Nagy
Subject	Bad query plan when the wrong data type is used
Date	February 8, 2011 10:24:42
Msg-id	4D514FFF.9060308@shopzeus.com Whole thread Raw
Responses	Re: Bad query plan when the wrong data type is used
List	pgsql-performance

Tree view

This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
                            from variation_item_sellingsite_asin visa
                            inner join product p on p.id = visa.product_id
                            inner join variation_item vi on vi.id = visa.variation_item_id
                            where visa.id =4

runs in 43 msec. The "visa.id" column has int4 datatype. The query plan uses an index condition:

"Nested Loop (cost=0.00..26.19 rows=1 width=28)"
" -> Nested Loop (cost=0.00..17.75 rows=1 width=24)"
"        -> Index Scan using variation_item_sellingsite_asin_pkey on variation_item_sellingsite_asin visa (cost=0.00..8.58 rows=1 width=16)"
"              Index Cond: (id = 4)"
"        -> Index Scan using pk_product_id on product p (cost=0.00..9.16 rows=1 width=16)"
"              Index Cond: (p.id = visa.product_id)"
" -> Index Scan using pk_variation_item_id on variation_item vi (cost=0.00..8.43 rows=1 width=12)"
"        Index Cond: (vi.id = visa.variation_item_id)"

This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
                            from variation_item_sellingsite_asin visa
                            inner join product p on p.id = visa.product_id
                            inner join variation_item vi on vi.id = visa.variation_item_id
                            where visa.id =4.0

Runs for 1144 msec! Query plan uses seq scan + filter:

"Nested Loop (cost=33957.27..226162.68 rows=14374 width=28)"
" -> Hash Join (cost=33957.27..106190.76 rows=14374 width=20)"
"        Hash Cond: (visa.variation_item_id = vi.id)"
"        -> Seq Scan on variation_item_sellingsite_asin visa (cost=0.00..71928.04 rows=14374 width=16)"
"              Filter: ((id)::numeric = 4.0)"
"        -> Hash (cost=22026.01..22026.01 rows=954501 width=12)"
"              -> Seq Scan on variation_item vi (cost=0.00..22026.01 rows=954501 width=12)"
" -> Index Scan using pk_product_id on product p (cost=0.00..8.33 rows=1 width=16)"
"        Index Cond: (p.id = visa.product_id)"

Which is silly. I think that PostgreSQL converts the int side to a float, and then compares them.

It would be better to do this, for each item in the loop:

evaluate the right side (which is float)
tell if it is an integer or not
if not an integer, then discard the row immediately
otherwise use its integer value for the index scan

The result is identical, but it makes possible to use the index scan. Of course, I know that the query itself is wrong, because I sould not use a float where an int is expected. But this CAN be optimized, so I think it should be! My idea for the query optimizer is not to use the "wider" data type, but use the data type that has an index on it instead.

(I spent an hour figuring out what is wrong with my program. In some cases it was slow, in other cases it was really fast, and I never got an error message.)

What do you think?

Laszlo

pgsql-performance by date:

From: Shaun Thomas
Date: 08 February 2011, 10:23:11
Subject: Re: Really really slow select count(*)

From: Maciek Sakrejda
Date: 08 February 2011, 12:23:34
Subject: Re: Really really slow select count(*)

Bad query plan when the wrong data type is used - Mailing list pgsql-performance

Previous

Next