query that worked in 8.1 not working in 8.4 - Mailing list pgsql-general

From Sean Foreman
Subject query that worked in 8.1 not working in 8.4
Date
Msg-id 4A96E8E6.4090401@mpaygateway.com
Whole thread Raw
Responses Re: query that worked in 8.1 not working in 8.4
Re: query that worked in 8.1 not working in 8.4
Re: query that worked in 8.1 not working in 8.4
List pgsql-general
We recently upgraded postgres from 8.1 to 8.4.

One of our queries stopped working and after some digging I've narrowed
the problem down to this:

table structure of interest:

merchant_set
 merchant_set_id

merchant
 merchant_id
 merchant_set_id

customer
 customer_id
 merchant_set_id

-- failure (count=1)
-- note: merchant_set.merchant_set_id in ...
select
    count(customer.customer_id)
from
    acquire.customer customer
    inner join entity_setup.merchant_set merchant_set on
        (customer.merchant_set_id = merchant_set.merchant_set_id
        and merchant_set.merchant_set_id in (
            select merchant_set_id
            from entity_setup.merchant merchant
            where merchant.merchant_id in (4,8,85,67)))
where
    customer.merchant_set_id = 2;

-- success (count=3562)
-- note: customer.merchant_set_id in ...
select
    count(customer.customer_id)
from
    acquire.customer customer
    inner join entity_setup.merchant_set merchant_set on
        (customer.merchant_set_id = merchant_set.merchant_set_id
        and customer.merchant_set_id in (
            select merchant_set_id
            from entity_setup.merchant merchant
            where merchant.merchant_id in (4,8,85,67)))
where
    customer.merchant_set_id = 2;

Explain for query 1 (failure):
"Aggregate  (cost=5.23..5.24 rows=1 width=4) (actual time=0.161..0.161
rows=1 loops=1)"
"  Output: count(customer.customer_id)"
"  ->  Nested Loop Semi Join  (cost=3.23..5.22 rows=1 width=4) (actual
time=0.140..0.153 rows=1 loops=1)"
"        Output: customer.customer_id"
"        ->  Seq Scan on merchant_set  (cost=0.00..1.82 rows=1 width=4)
(actual time=0.034..0.045 rows=1 loops=1)"
"              Output: merchant_set.merchant_set_id, ..."
"              Filter: (merchant_set_id = 2)"
"        ->  Nested Loop  (cost=3.23..266.07 rows=3562 width=12) (actual
time=0.101..0.101 rows=1 loops=1)"
"              Output: customer.customer_id, customer.merchant_set_id,
merchant.merchant_set_id"
"              ->  HashAggregate  (cost=3.23..3.24 rows=1 width=4)
(actual time=0.081..0.081 rows=1 loops=1)"
"                    Output: merchant.merchant_set_id"
"                    ->  Seq Scan on merchant  (cost=0.00..3.23 rows=1
width=4) (actual time=0.039..0.064 rows=2 loops=1)"
"                          Output: merchant.merchant_id, ... ,
merchant.merchant_set_id, ..."
"                          Filter: ((merchant_set_id = 2) AND
(merchant_id = ANY ('{4,8,85,67}'::integer[])))"
"              ->  Seq Scan on customer  (cost=0.00..227.21 rows=3562
width=8) (actual time=0.015..0.015 rows=1 loops=1)"
"                    Output: customer.customer_id, ... ,
customer.merchant_set_id"
"                    Filter: (customer.merchant_set_id = 2)"
"Total runtime: 0.318 ms"

Explain for query 2 (success):
"Aggregate  (cost=312.42..312.43 rows=1 width=4) (actual
time=17.442..17.442 rows=1 loops=1)"
"  Output: count(customer.customer_id)"
"  ->  Nested Loop  (cost=3.23..303.51 rows=3562 width=4) (actual
time=0.140..15.179 rows=3562 loops=1)"
"        Output: customer.customer_id"
"        ->  Seq Scan on merchant_set  (cost=0.00..1.82 rows=1 width=4)
(actual time=0.035..0.049 rows=1 loops=1)"
"              Output: merchant_set.merchant_set_id, ..."
"              Filter: (merchant_set_id = 2)"
"        ->  Nested Loop  (cost=3.23..266.07 rows=3562 width=8) (actual
time=0.101..11.144 rows=3562 loops=1)"
"              Output: customer.customer_id, customer.merchant_set_id"
"              ->  HashAggregate  (cost=3.23..3.24 rows=1 width=4)
(actual time=0.082..0.085 rows=1 loops=1)"
"                    Output: merchant.merchant_set_id"
"                    ->  Seq Scan on merchant  (cost=0.00..3.23 rows=1
width=4) (actual time=0.038..0.064 rows=2 loops=1)"
"                          Output: merchant.merchant_id, ... ,
merchant.merchant_set_id, ..."
"                          Filter: ((merchant_set_id = 2) AND
(merchant_id = ANY ('{4,8,85,67}'::integer[])))"
"              ->  Seq Scan on customer  (cost=0.00..227.21 rows=3562
width=8) (actual time=0.015..6.901 rows=3562 loops=1)"
"                    Output: customer.customer_id, ... ,
customer.merchant_set_id"
"                    Filter: (customer.merchant_set_id = 2)"
"Total runtime: 17.610 ms"

Notes:
1. The real query gets information from customer & merchant_set so both
tables are necessary. The query to merchant is a security filter.
2. I have fixed this query by dropping the subquery to merchant, and
inner joining to merchant directly. This forces me to add a group by so
customers are not duplicated which isn't as elegant as the original query.

I want to understand why the first version used to work with 8.1 and no
longer works with 8.4. Is this bad sql and I was getting lucky before or
is postgres making a bad decision in the latest release?



pgsql-general by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: Getting the column to a which a sequence belongs.
Next
From: Scott Marlowe
Date:
Subject: Re: query that worked in 8.1 not working in 8.4