Thread: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them

The following bug has been logged on the website:

Bug reference:      7604
Logged by:          Bill MacArthur
Email address:      webmaster@dhs-club.com
PostgreSQL version: 9.2.1
Operating system:   CentOS 5.8
Description:        =


vip_declines_mailers_base is a VIEW that itself uses another VIEW alongside
several other joined tables. nop_seed is a 1 column table that contains 1
date as a reference. members_cancel_pending is a VIEW.

CREATE OR REPLACE VIEW vip_decline_mailers_base AS =

 SELECT m.id, m.alias, m.firstname, m.lastname, m.emailaddress, =

        CASE
            WHEN s.void =3D false THEN s.end_date
            ELSE (s.end_date - '1 mon'::interval)::date
        END AS paid_thru, =

    mop.payment_method, m.mail_option_lvl, now()::date AS "current_date"
   FROM nop_seed, =

    subscriptions s
   JOIN mop ON mop.id =3D s.member_id
   JOIN members_cancel_pending m ON m.id =3D s.member_id AND
m.membertype::text =3D 'v'::text
   JOIN subscription_types st ON s.subscription_type =3D st.subscription_ty=
pe
  WHERE (s.end_date < nop_seed.paid_thru OR s.void =3D true) AND
st.sub_class::text =3D 'VM'::text;


Then executing a query against vip_declines_mailers_base with no
constraining arguments, the complete result set counts as this:
network=3D# select count(*) from vip_decline_mailers_base;
 count
-------
   358

vip_declines_mailers_base is another VIEW that merely adds some criteria to
limit the result set of the 'base' VIEW. In versions 9.0 and back it did
just that. After upgrading to 9.2, the criteria actually expand the result
set.
CREATE OR REPLACE VIEW vip_mailer_unpaid_current AS =

 SELECT vip_decline_mailers_base.id, vip_decline_mailers_base.alias, =

    vip_decline_mailers_base.firstname, vip_decline_mailers_base.lastname, =

    vip_decline_mailers_base.emailaddress,
vip_decline_mailers_base.paid_thru, =

    vip_decline_mailers_base.payment_method, =

    vip_decline_mailers_base.mail_option_lvl, =

    vip_decline_mailers_base."current_date", =

    current_month_text(now()) AS current_month_text
   FROM vip_decline_mailers_base
  WHERE vip_decline_mailers_base.mail_option_lvl > 0 AND
vip_decline_mailers_base.paid_thru >=3D first_of_another_month((now()::date=
 -
'1 mon'::interval)::date) AND vip_decline_mailers_base.paid_thru <=3D
(first_of_month() - 1);

network=3D# select count(*) from vip_mailer_unpaid_current;
 count
-------
   391

How can this be? What's worse, is that adding the criteria somehow mangles
the inner workings of the 'base' VIEW and causes it to return results where
the membertype does not even match the join criteria which should be 'v'
only.

I could create a self contained test case, but the number of tables and
scrubbing the data could be tedious. Perhaps there is enough here to help
pinpoint a trouble spot. I should restate, also, that these VIEWS have been
working fine with 9.0 and earlier versions.

FWIW, here are the EXPLAINs on the two queries.

network=3D# explain select count(*) from vip_decline_mailers_base;
                                                                            =

                                                     QUERY PLAN

---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
-------------
--------------------------------
 Aggregate  (cost=3D82439.33..82439.34 rows=3D1 width=3D0)
   ->  Hash Right Join  (cost=3D78647.01..82439.21 rows=3D47 width=3D0)
         Hash Cond: (c.id =3D m.id)
         Filter: ((CASE WHEN (c.id IS NULL) THEN m.membertype WHEN
((c.status =3D 'tr'::text) OR (c.status =3D 'p'::text) OR (c.status =3D
'cd'::text) OR (c.status =3D 'sp'::text) OR (c.status =3D 'sa'::text)) THEN
m.membertype ELSE 'c'::character
varying END)::text =3D 'v'::text)
         ->  Seq Scan on cancellations c  (cost=3D0.00..3325.35 rows=3D1197=
35
width=3D6)
         ->  Hash  (cost=3D78529.91..78529.91 rows=3D9368 width=3D6)
               ->  Hash Join  (cost=3D6128.57..78529.91 rows=3D9368 width=
=3D6)
                     Hash Cond: (m.id =3D mop.id)
                     ->  Seq Scan on members m  (cost=3D0.00..66417.39
rows=3D1570739 width=3D6)
                     ->  Hash  (cost=3D6011.47..6011.47 rows=3D9368 width=
=3D8)
                           ->  Hash Join  (cost=3D3129.39..6011.47 rows=3D9=
368
width=3D8)
                                 Hash Cond: (mop.id =3D s.member_id)
                                 ->  Seq Scan on mop  (cost=3D0.00..2158.67
rows=3D71967 width=3D4)
                                 ->  Hash  (cost=3D3012.28..3012.28 rows=3D=
9369
width=3D4)
                                       ->  Nested Loop  (cost=3D2.67..3012.=
28
rows=3D9369 width=3D4)
                                             Join Filter: ((s.end_date <
nop_seed.paid_thru) OR s.void)
                                             ->  Seq Scan on nop_seed =

(cost=3D0.00..1.01 rows=3D1 width=3D4)
                                             ->  Hash Join =

(cost=3D2.67..2723.56 rows=3D23017 width=3D9)
                                                   Hash Cond:
(s.subscription_type =3D st.subscription_type)
                                                   ->  Seq Scan on
subscriptions s  (cost=3D0.00..2188.61 rows=3D80561 width=3D11)
                                                   ->  Hash =

(cost=3D2.52..2.52 rows=3D12 width=3D2)
                                                         ->  Seq Scan on
subscription_types st  (cost=3D0.00..2.52 rows=3D12 width=3D2)
                                                               Filter:
((sub_class)::text =3D 'VM'::text)
(23 rows)


network=3D# explain select count(*) from vip_mailer_unpaid_current;
                                                                            =

                                                                            =

             QUERY PLAN

---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
-------------
---------------------------------------------------------------------------=
-------------------------------
 Aggregate  (cost=3D25954.57..25954.58 rows=3D1 width=3D0)
   ->  Nested Loop  (cost=3D0.00..25954.57 rows=3D1 width=3D0)
         ->  Nested Loop Left Join  (cost=3D0.00..25948.57 rows=3D1 width=
=3D8)
               ->  Nested Loop  (cost=3D0.00..25741.69 rows=3D34 width=3D10)
                     ->  Nested Loop  (cost=3D0.00..25186.62 rows=3D47 widt=
h=3D4)
                           Join Filter: (s.subscription_type =3D
st.subscription_type)
                           ->  Nested Loop  (cost=3D0.00..25154.54 rows=3D1=
64
width=3D6)
                                 Join Filter: ((s.end_date <
nop_seed.paid_thru) OR s.void)
                                 ->  Seq Scan on nop_seed  (cost=3D0.00..1.=
01
rows=3D1 width=3D4)
                                 ->  Seq Scan on subscriptions s =

(cost=3D0.00..25148.50 rows=3D403 width=3D11)
                                       Filter: ((CASE WHEN (NOT void) THEN
end_date ELSE ((end_date - '1 mon'::interval))::date END <=3D
((date_trunc('month'::text, now()))::date - 1)) AND (CASE WHEN (NOT void)
THEN end_date ELSE ((end_date
 - '1 mon'::interval))::date END >=3D first_of_another_month((((now())::dat=
e -
'1 mon'::interval))::date)))
                           ->  Materialize  (cost=3D0.00..2.58 rows=3D12
width=3D2)
                                 ->  Seq Scan on subscription_types st =

(cost=3D0.00..2.52 rows=3D12 width=3D2)
                                       Filter: ((sub_class)::text =3D
'VM'::text)
                     ->  Index Scan using members_pkey on members m =

(cost=3D0.00..11.80 rows=3D1 width=3D6)
                           Index Cond: (id =3D s.member_id)
                           Filter: (mail_option_lvl > 0)
               ->  Index Scan using cancellations_id_key on cancellations c =

(cost=3D0.00..6.07 rows=3D1 width=3D6)
                     Index Cond: (id =3D m.id)
                     Filter: ((CASE WHEN (id IS NULL) THEN m.membertype WHEN
((status =3D 'tr'::text) OR (status =3D 'p'::text) OR (status =3D 'cd'::tex=
t) OR
(status =3D 'sp'::text) OR (status =3D 'sa'::text)) THEN m.membertype ELSE
'c'::character
varying END)::text =3D 'v'::text)
         ->  Index Only Scan using mop_pkey on mop  (cost=3D0.00..5.99 rows=
=3D1
width=3D4)
               Index Cond: (id =3D m.id)
(22 rows)
webmaster@dhs-club.com writes:
> vip_declines_mailers_base is another VIEW that merely adds some criteria to
> limit the result set of the 'base' VIEW. In versions 9.0 and back it did
> just that. After upgrading to 9.2, the criteria actually expand the result
> set.

This doesn't seem to match any of the known bugs in 9.2, so I'm afraid
there's no help for it: you need to create a self-contained test case.

            regards, tom lane
On 10/15/2012 12:18 PM, Tom Lane wrote:
> webmaster@dhs-club.com writes:
>> vip_declines_mailers_base is another VIEW that merely adds some criteria to
>> limit the result set of the 'base' VIEW. In versions 9.0 and back it did
>> just that. After upgrading to 9.2, the criteria actually expand the result
>> set.
>
> This doesn't seem to match any of the known bugs in 9.2, so I'm afraid
> there's no help for it: you need to create a self-contained test case.
>
>             regards, tom lane
>

Tom, in preparation for a test case I created a new schema (testcase) and copied 6 tables to that, including only the
columnssignificant to enable the VIEWs to be created. I took the 3 VIEWs involved and tweaked them into the new schema
(justrenamed to testcase.viewname and referencing testcase.relation). However, when run from in there, the results are
asexpected rather than erroneous. The live data and VIEWs still produce erroneous results. Any clues?? 
Bill MacArthur <webmaster@dhs-club.com> writes:
> Tom, in preparation for a test case I created a new schema (testcase) and copied 6 tables to that, including only the
columnssignificant to enable the VIEWs to be created. I took the 3 VIEWs involved and tweaked them into the new schema
(justrenamed to testcase.viewname and referencing testcase.relation). However, when run from in there, the results are
asexpected rather than erroneous. The live data and VIEWs still produce erroneous results. Any clues?? 

Is the query plan the same according to EXPLAIN?

If not, you may have forgotten to vacuum/analyze the new tables, or
forgotten some relevant index.  Or it might be that the total table size
is affecting the plan choice, in which case you need dummy data in the
"irrelevant" columns rather than removing them altogether.

            regards, tom lane
On 10/15/2012 2:29 PM, Tom Lane wrote:
> Bill MacArthur <webmaster@dhs-club.com> writes:
>> Tom, in preparation for a test case I created a new schema (testcase) and copied 6 tables to that, including only
thecolumns significant to enable the VIEWs to be created. I took the 3 VIEWs involved and tweaked them into the new
schema(just renamed to testcase.viewname and referencing testcase.relation). However, when run from in there, the
resultsare as expected rather than erroneous. The live data and VIEWs still produce erroneous results. Any clues?? 
>
> Is the query plan the same according to EXPLAIN?
>
> If not, you may have forgotten to vacuum/analyze the new tables, or
> forgotten some relevant index.  Or it might be that the total table size
> is affecting the plan choice, in which case you need dummy data in the
> "irrelevant" columns rather than removing them altogether.
>
>             regards, tom lane
>
Update, I started placing primary keys on the testcase tables and watched the planner output. Once I put a PK on one of
thetables in particular, the planner revised the plan to use the PK. At that point, the results become erroneous as the
planneralso moves another filter evaluation to an earlier point at which time I don't think it has the data to make the
decision.

I can still finish up with the test case if you like, but here is a highlight. The first VIEW, which I did not
originallypost is this: 

CREATE OR REPLACE VIEW testcase.members_cancel_pending AS
  SELECT m.id, m.alias, m.emailaddress, m.firstname, m.lastname,
     m.mail_option_lvl,
         CASE
             WHEN c.id IS NULL THEN m.membertype
             WHEN c.status = 'tr'::text OR c.status = 'p'::text OR c.status = 'cd'::text OR c.status = 'sp'::text OR
c.status= 'sa'::text THEN m.membertype 
             ELSE 'c'::character varying
         END AS membertype
    FROM testcase.members m
    LEFT JOIN testcase.cancellations c ON c.id = m.id;

The membertype column is actually calculated.
Then I have the two VIEWs I previous posted:

CREATE OR REPLACE VIEW testcase.vip_decline_mailers_base AS
  SELECT m.id, m.alias, m.firstname, m.lastname, m.emailaddress,
         CASE
             WHEN s.void = false THEN s.end_date
             ELSE (s.end_date - '1 mon'::interval)::date
         END AS paid_thru,
     mop.payment_method, m.mail_option_lvl, now()::date AS "current_date"
    FROM testcase.nop_seed,
     testcase.subscriptions s
    JOIN testcase.mop mop ON mop.id = s.member_id
    JOIN testcase.members_cancel_pending m ON m.id = s.member_id AND m.membertype::text = 'v'::text
    JOIN testcase.subscription_types st ON s.subscription_type = st.subscription_type
   WHERE (s.end_date < nop_seed.paid_thru OR s.void = true) AND st.sub_class::text = 'VM'::text;

This should only be looking for membertype='v'

 From here another VIEW is built:

CREATE OR REPLACE VIEW testcase.vip_mailer_unpaid_current AS
  SELECT vip_decline_mailers_base.id, vip_decline_mailers_base.alias,
     vip_decline_mailers_base.firstname, vip_decline_mailers_base.lastname,
     vip_decline_mailers_base.emailaddress, vip_decline_mailers_base.paid_thru,
     vip_decline_mailers_base.payment_method,
     vip_decline_mailers_base.mail_option_lvl,
     vip_decline_mailers_base."current_date",
     current_month_text(now()) AS current_month_text
    FROM testcase.vip_decline_mailers_base vip_decline_mailers_base
   WHERE vip_decline_mailers_base.mail_option_lvl > 0 AND vip_decline_mailers_base.paid_thru >=
first_of_another_month((now()::date- '1 mon'::interval)::date) AND vip_decline_mailers_base.paid_thru <=
(first_of_month()- 1); 

It is assuming that there will only be membertype 'v' is the basic results and is only applying date filters.

Here is the planner output before I put a PK on testcase.cancellations

network=# explain select count(*) from testcase.vip_mailer_unpaid_current;

                                                  QUERY PLAN 


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=29784.07..29784.08 rows=1 width=0)
    ->  Nested Loop  (cost=25947.06..29783.08 rows=395 width=0)
          Join Filter: ((s.end_date < nop_seed.paid_thru) OR s.void)
          ->  Hash Right Join  (cost=25947.06..29720.65 rows=1 width=5)
                Hash Cond: (c.id = m.id)
                Filter: ((CASE WHEN (c.id IS NULL) THEN m.membertype WHEN ((c.status = 'tr'::text) OR (c.status =
'p'::text)OR (c.status = 'cd'::text) OR (c.status = 'sp'::text) OR (c.status = 'sa'::text)) THEN m.membertype ELSE
'c'::char
acter varying END)::text = 'v'::text)
                ->  Seq Scan on cancellations c  (cost=0.00..3324.41 rows=119741 width=6)
                ->  Hash  (cost=25946.03..25946.03 rows=83 width=11)
                      ->  Nested Loop  (cost=0.00..25946.03 rows=83 width=11)
                            ->  Nested Loop  (cost=0.00..25315.99 rows=115 width=13)
                                  ->  Nested Loop  (cost=0.00..25053.20 rows=115 width=9)
                                        Join Filter: (s.subscription_type = st.subscription_type)
                                        ->  Seq Scan on subscriptions s  (cost=0.00..24979.10 rows=403 width=11)
                                              Filter: ((CASE WHEN (NOT void) THEN end_date ELSE ((end_date - '1
mon'::interval))::dateEND <= ((date_trunc('month'::text, now()))::date - 1)) AND (CASE WHEN (NOT void) THEN end_date
ELSE((en 
d_date - '1 mon'::interval))::date END >= first_of_another_month((((now())::date - '1 mon'::interval))::date)))
                                        ->  Materialize  (cost=0.00..1.58 rows=12 width=2)
                                              ->  Seq Scan on subscription_types st  (cost=0.00..1.52 rows=12 width=2)
                                                    Filter: ((sub_class)::text = 'VM'::text)
                                  ->  Index Only Scan using tcmopid on mop  (cost=0.00..2.28 rows=1 width=4)
                                        Index Cond: (id = s.member_id)
                            ->  Index Scan using tcmembersid on members m  (cost=0.00..5.47 rows=1 width=6)
                                  Index Cond: (id = mop.id)
                                  Filter: (mail_option_lvl > 0)
          ->  Seq Scan on nop_seed  (cost=0.00..33.30 rows=2330 width=4)
(23 rows)

network=# select count(*) from testcase.vip_mailer_unpaid_current;
  count
-------
    331
(1 row)


Here is the output after putting a PK on testcase.cancellations:
ALTER TABLE testcase.cancellations
   ADD CONSTRAINT cancellations_pkey PRIMARY KEY(id);

network=# explain select count(*) from testcase.vip_mailer_unpaid_current;

                                               QUERY PLAN 


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------
  Aggregate  (cost=26233.61..26233.62 rows=1 width=0)
    ->  Nested Loop  (cost=0.00..26232.63 rows=395 width=0)
          Join Filter: ((s.end_date < nop_seed.paid_thru) OR s.void)
          ->  Nested Loop Left Join  (cost=0.00..26170.20 rows=1 width=5)
                ->  Nested Loop  (cost=0.00..25946.03 rows=83 width=11)
                      ->  Nested Loop  (cost=0.00..25315.99 rows=115 width=13)
                            ->  Nested Loop  (cost=0.00..25053.20 rows=115 width=9)
                                  Join Filter: (s.subscription_type = st.subscription_type)
                                  ->  Seq Scan on subscriptions s  (cost=0.00..24979.10 rows=403 width=11)
                                        Filter: ((CASE WHEN (NOT void) THEN end_date ELSE ((end_date - '1
mon'::interval))::dateEND <= ((date_trunc('month'::text, now()))::date - 1)) AND (CASE WHEN (NOT void) THEN end_date
ELSE((end_date 
  - '1 mon'::interval))::date END >= first_of_another_month((((now())::date - '1 mon'::interval))::date)))
                                  ->  Materialize  (cost=0.00..1.58 rows=12 width=2)
                                        ->  Seq Scan on subscription_types st  (cost=0.00..1.52 rows=12 width=2)
                                              Filter: ((sub_class)::text = 'VM'::text)
                            ->  Index Only Scan using tcmopid on mop  (cost=0.00..2.28 rows=1 width=4)
                                  Index Cond: (id = s.member_id)
                      ->  Index Scan using tcmembersid on members m  (cost=0.00..5.47 rows=1 width=6)
                            Index Cond: (id = mop.id)
                            Filter: (mail_option_lvl > 0)
                ->  Index Scan using cancellations_pkey on cancellations c  (cost=0.00..2.69 rows=1 width=6)
                      Index Cond: (id = m.id)
                      Filter: ((CASE WHEN (id IS NULL) THEN m.membertype WHEN ((status = 'tr'::text) OR (status =
'p'::text)OR (status = 'cd'::text) OR (status = 'sp'::text) OR (status = 'sa'::text)) THEN m.membertype ELSE
'c'::character
varying END)::text = 'v'::text)
          ->  Seq Scan on nop_seed  (cost=0.00..33.30 rows=2330 width=4)
(22 rows)

network=# select count(*) from testcase.vip_mailer_unpaid_current;
  count
-------
    390
(1 row)



Would you still like a test case to run?
Bill MacArthur <webmaster@dhs-club.com> writes:
> Update, I started placing primary keys on the testcase tables and watched the planner output. Once I put a PK on one
ofthe tables in particular, the planner revised the plan to use the PK. At that point, the results become erroneous as
theplanner also moves another filter evaluation to an earlier point at which time I don't think it has the data to make
thedecision. 

Curious ...

> I can still finish up with the test case if you like, but here is a highlight. The first VIEW, which I did not
originallypost is this: 

Test case, please.

            regards, tom lane