Thread: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
From
webmaster@dhs-club.com
Date:
The following bug has been logged on the website: Bug reference: 7604 Logged by: Bill MacArthur Email address: webmaster@dhs-club.com PostgreSQL version: 9.2.1 Operating system: CentOS 5.8 Description: = vip_declines_mailers_base is a VIEW that itself uses another VIEW alongside several other joined tables. nop_seed is a 1 column table that contains 1 date as a reference. members_cancel_pending is a VIEW. CREATE OR REPLACE VIEW vip_decline_mailers_base AS = SELECT m.id, m.alias, m.firstname, m.lastname, m.emailaddress, = CASE WHEN s.void =3D false THEN s.end_date ELSE (s.end_date - '1 mon'::interval)::date END AS paid_thru, = mop.payment_method, m.mail_option_lvl, now()::date AS "current_date" FROM nop_seed, = subscriptions s JOIN mop ON mop.id =3D s.member_id JOIN members_cancel_pending m ON m.id =3D s.member_id AND m.membertype::text =3D 'v'::text JOIN subscription_types st ON s.subscription_type =3D st.subscription_ty= pe WHERE (s.end_date < nop_seed.paid_thru OR s.void =3D true) AND st.sub_class::text =3D 'VM'::text; Then executing a query against vip_declines_mailers_base with no constraining arguments, the complete result set counts as this: network=3D# select count(*) from vip_decline_mailers_base; count ------- 358 vip_declines_mailers_base is another VIEW that merely adds some criteria to limit the result set of the 'base' VIEW. In versions 9.0 and back it did just that. After upgrading to 9.2, the criteria actually expand the result set. CREATE OR REPLACE VIEW vip_mailer_unpaid_current AS = SELECT vip_decline_mailers_base.id, vip_decline_mailers_base.alias, = vip_decline_mailers_base.firstname, vip_decline_mailers_base.lastname, = vip_decline_mailers_base.emailaddress, vip_decline_mailers_base.paid_thru, = vip_decline_mailers_base.payment_method, = vip_decline_mailers_base.mail_option_lvl, = vip_decline_mailers_base."current_date", = current_month_text(now()) AS current_month_text FROM vip_decline_mailers_base WHERE vip_decline_mailers_base.mail_option_lvl > 0 AND vip_decline_mailers_base.paid_thru >=3D first_of_another_month((now()::date= - '1 mon'::interval)::date) AND vip_decline_mailers_base.paid_thru <=3D (first_of_month() - 1); network=3D# select count(*) from vip_mailer_unpaid_current; count ------- 391 How can this be? What's worse, is that adding the criteria somehow mangles the inner workings of the 'base' VIEW and causes it to return results where the membertype does not even match the join criteria which should be 'v' only. I could create a self contained test case, but the number of tables and scrubbing the data could be tedious. Perhaps there is enough here to help pinpoint a trouble spot. I should restate, also, that these VIEWS have been working fine with 9.0 and earlier versions. FWIW, here are the EXPLAINs on the two queries. network=3D# explain select count(*) from vip_decline_mailers_base; = QUERY PLAN ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ------------- -------------------------------- Aggregate (cost=3D82439.33..82439.34 rows=3D1 width=3D0) -> Hash Right Join (cost=3D78647.01..82439.21 rows=3D47 width=3D0) Hash Cond: (c.id =3D m.id) Filter: ((CASE WHEN (c.id IS NULL) THEN m.membertype WHEN ((c.status =3D 'tr'::text) OR (c.status =3D 'p'::text) OR (c.status =3D 'cd'::text) OR (c.status =3D 'sp'::text) OR (c.status =3D 'sa'::text)) THEN m.membertype ELSE 'c'::character varying END)::text =3D 'v'::text) -> Seq Scan on cancellations c (cost=3D0.00..3325.35 rows=3D1197= 35 width=3D6) -> Hash (cost=3D78529.91..78529.91 rows=3D9368 width=3D6) -> Hash Join (cost=3D6128.57..78529.91 rows=3D9368 width= =3D6) Hash Cond: (m.id =3D mop.id) -> Seq Scan on members m (cost=3D0.00..66417.39 rows=3D1570739 width=3D6) -> Hash (cost=3D6011.47..6011.47 rows=3D9368 width= =3D8) -> Hash Join (cost=3D3129.39..6011.47 rows=3D9= 368 width=3D8) Hash Cond: (mop.id =3D s.member_id) -> Seq Scan on mop (cost=3D0.00..2158.67 rows=3D71967 width=3D4) -> Hash (cost=3D3012.28..3012.28 rows=3D= 9369 width=3D4) -> Nested Loop (cost=3D2.67..3012.= 28 rows=3D9369 width=3D4) Join Filter: ((s.end_date < nop_seed.paid_thru) OR s.void) -> Seq Scan on nop_seed = (cost=3D0.00..1.01 rows=3D1 width=3D4) -> Hash Join = (cost=3D2.67..2723.56 rows=3D23017 width=3D9) Hash Cond: (s.subscription_type =3D st.subscription_type) -> Seq Scan on subscriptions s (cost=3D0.00..2188.61 rows=3D80561 width=3D11) -> Hash = (cost=3D2.52..2.52 rows=3D12 width=3D2) -> Seq Scan on subscription_types st (cost=3D0.00..2.52 rows=3D12 width=3D2) Filter: ((sub_class)::text =3D 'VM'::text) (23 rows) network=3D# explain select count(*) from vip_mailer_unpaid_current; = = QUERY PLAN ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ------------- ---------------------------------------------------------------------------= ------------------------------- Aggregate (cost=3D25954.57..25954.58 rows=3D1 width=3D0) -> Nested Loop (cost=3D0.00..25954.57 rows=3D1 width=3D0) -> Nested Loop Left Join (cost=3D0.00..25948.57 rows=3D1 width= =3D8) -> Nested Loop (cost=3D0.00..25741.69 rows=3D34 width=3D10) -> Nested Loop (cost=3D0.00..25186.62 rows=3D47 widt= h=3D4) Join Filter: (s.subscription_type =3D st.subscription_type) -> Nested Loop (cost=3D0.00..25154.54 rows=3D1= 64 width=3D6) Join Filter: ((s.end_date < nop_seed.paid_thru) OR s.void) -> Seq Scan on nop_seed (cost=3D0.00..1.= 01 rows=3D1 width=3D4) -> Seq Scan on subscriptions s = (cost=3D0.00..25148.50 rows=3D403 width=3D11) Filter: ((CASE WHEN (NOT void) THEN end_date ELSE ((end_date - '1 mon'::interval))::date END <=3D ((date_trunc('month'::text, now()))::date - 1)) AND (CASE WHEN (NOT void) THEN end_date ELSE ((end_date - '1 mon'::interval))::date END >=3D first_of_another_month((((now())::dat= e - '1 mon'::interval))::date))) -> Materialize (cost=3D0.00..2.58 rows=3D12 width=3D2) -> Seq Scan on subscription_types st = (cost=3D0.00..2.52 rows=3D12 width=3D2) Filter: ((sub_class)::text =3D 'VM'::text) -> Index Scan using members_pkey on members m = (cost=3D0.00..11.80 rows=3D1 width=3D6) Index Cond: (id =3D s.member_id) Filter: (mail_option_lvl > 0) -> Index Scan using cancellations_id_key on cancellations c = (cost=3D0.00..6.07 rows=3D1 width=3D6) Index Cond: (id =3D m.id) Filter: ((CASE WHEN (id IS NULL) THEN m.membertype WHEN ((status =3D 'tr'::text) OR (status =3D 'p'::text) OR (status =3D 'cd'::tex= t) OR (status =3D 'sp'::text) OR (status =3D 'sa'::text)) THEN m.membertype ELSE 'c'::character varying END)::text =3D 'v'::text) -> Index Only Scan using mop_pkey on mop (cost=3D0.00..5.99 rows= =3D1 width=3D4) Index Cond: (id =3D m.id) (22 rows)
Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
From
Tom Lane
Date:
webmaster@dhs-club.com writes: > vip_declines_mailers_base is another VIEW that merely adds some criteria to > limit the result set of the 'base' VIEW. In versions 9.0 and back it did > just that. After upgrading to 9.2, the criteria actually expand the result > set. This doesn't seem to match any of the known bugs in 9.2, so I'm afraid there's no help for it: you need to create a self-contained test case. regards, tom lane
Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
From
Bill MacArthur
Date:
On 10/15/2012 12:18 PM, Tom Lane wrote: > webmaster@dhs-club.com writes: >> vip_declines_mailers_base is another VIEW that merely adds some criteria to >> limit the result set of the 'base' VIEW. In versions 9.0 and back it did >> just that. After upgrading to 9.2, the criteria actually expand the result >> set. > > This doesn't seem to match any of the known bugs in 9.2, so I'm afraid > there's no help for it: you need to create a self-contained test case. > > regards, tom lane > Tom, in preparation for a test case I created a new schema (testcase) and copied 6 tables to that, including only the columnssignificant to enable the VIEWs to be created. I took the 3 VIEWs involved and tweaked them into the new schema (justrenamed to testcase.viewname and referencing testcase.relation). However, when run from in there, the results are asexpected rather than erroneous. The live data and VIEWs still produce erroneous results. Any clues??
Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
From
Tom Lane
Date:
Bill MacArthur <webmaster@dhs-club.com> writes: > Tom, in preparation for a test case I created a new schema (testcase) and copied 6 tables to that, including only the columnssignificant to enable the VIEWs to be created. I took the 3 VIEWs involved and tweaked them into the new schema (justrenamed to testcase.viewname and referencing testcase.relation). However, when run from in there, the results are asexpected rather than erroneous. The live data and VIEWs still produce erroneous results. Any clues?? Is the query plan the same according to EXPLAIN? If not, you may have forgotten to vacuum/analyze the new tables, or forgotten some relevant index. Or it might be that the total table size is affecting the plan choice, in which case you need dummy data in the "irrelevant" columns rather than removing them altogether. regards, tom lane
Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
From
Bill MacArthur
Date:
On 10/15/2012 2:29 PM, Tom Lane wrote: > Bill MacArthur <webmaster@dhs-club.com> writes: >> Tom, in preparation for a test case I created a new schema (testcase) and copied 6 tables to that, including only thecolumns significant to enable the VIEWs to be created. I took the 3 VIEWs involved and tweaked them into the new schema(just renamed to testcase.viewname and referencing testcase.relation). However, when run from in there, the resultsare as expected rather than erroneous. The live data and VIEWs still produce erroneous results. Any clues?? > > Is the query plan the same according to EXPLAIN? > > If not, you may have forgotten to vacuum/analyze the new tables, or > forgotten some relevant index. Or it might be that the total table size > is affecting the plan choice, in which case you need dummy data in the > "irrelevant" columns rather than removing them altogether. > > regards, tom lane > Update, I started placing primary keys on the testcase tables and watched the planner output. Once I put a PK on one of thetables in particular, the planner revised the plan to use the PK. At that point, the results become erroneous as the planneralso moves another filter evaluation to an earlier point at which time I don't think it has the data to make the decision. I can still finish up with the test case if you like, but here is a highlight. The first VIEW, which I did not originallypost is this: CREATE OR REPLACE VIEW testcase.members_cancel_pending AS SELECT m.id, m.alias, m.emailaddress, m.firstname, m.lastname, m.mail_option_lvl, CASE WHEN c.id IS NULL THEN m.membertype WHEN c.status = 'tr'::text OR c.status = 'p'::text OR c.status = 'cd'::text OR c.status = 'sp'::text OR c.status= 'sa'::text THEN m.membertype ELSE 'c'::character varying END AS membertype FROM testcase.members m LEFT JOIN testcase.cancellations c ON c.id = m.id; The membertype column is actually calculated. Then I have the two VIEWs I previous posted: CREATE OR REPLACE VIEW testcase.vip_decline_mailers_base AS SELECT m.id, m.alias, m.firstname, m.lastname, m.emailaddress, CASE WHEN s.void = false THEN s.end_date ELSE (s.end_date - '1 mon'::interval)::date END AS paid_thru, mop.payment_method, m.mail_option_lvl, now()::date AS "current_date" FROM testcase.nop_seed, testcase.subscriptions s JOIN testcase.mop mop ON mop.id = s.member_id JOIN testcase.members_cancel_pending m ON m.id = s.member_id AND m.membertype::text = 'v'::text JOIN testcase.subscription_types st ON s.subscription_type = st.subscription_type WHERE (s.end_date < nop_seed.paid_thru OR s.void = true) AND st.sub_class::text = 'VM'::text; This should only be looking for membertype='v' From here another VIEW is built: CREATE OR REPLACE VIEW testcase.vip_mailer_unpaid_current AS SELECT vip_decline_mailers_base.id, vip_decline_mailers_base.alias, vip_decline_mailers_base.firstname, vip_decline_mailers_base.lastname, vip_decline_mailers_base.emailaddress, vip_decline_mailers_base.paid_thru, vip_decline_mailers_base.payment_method, vip_decline_mailers_base.mail_option_lvl, vip_decline_mailers_base."current_date", current_month_text(now()) AS current_month_text FROM testcase.vip_decline_mailers_base vip_decline_mailers_base WHERE vip_decline_mailers_base.mail_option_lvl > 0 AND vip_decline_mailers_base.paid_thru >= first_of_another_month((now()::date- '1 mon'::interval)::date) AND vip_decline_mailers_base.paid_thru <= (first_of_month()- 1); It is assuming that there will only be membertype 'v' is the basic results and is only applying date filters. Here is the planner output before I put a PK on testcase.cancellations network=# explain select count(*) from testcase.vip_mailer_unpaid_current; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------- Aggregate (cost=29784.07..29784.08 rows=1 width=0) -> Nested Loop (cost=25947.06..29783.08 rows=395 width=0) Join Filter: ((s.end_date < nop_seed.paid_thru) OR s.void) -> Hash Right Join (cost=25947.06..29720.65 rows=1 width=5) Hash Cond: (c.id = m.id) Filter: ((CASE WHEN (c.id IS NULL) THEN m.membertype WHEN ((c.status = 'tr'::text) OR (c.status = 'p'::text)OR (c.status = 'cd'::text) OR (c.status = 'sp'::text) OR (c.status = 'sa'::text)) THEN m.membertype ELSE 'c'::char acter varying END)::text = 'v'::text) -> Seq Scan on cancellations c (cost=0.00..3324.41 rows=119741 width=6) -> Hash (cost=25946.03..25946.03 rows=83 width=11) -> Nested Loop (cost=0.00..25946.03 rows=83 width=11) -> Nested Loop (cost=0.00..25315.99 rows=115 width=13) -> Nested Loop (cost=0.00..25053.20 rows=115 width=9) Join Filter: (s.subscription_type = st.subscription_type) -> Seq Scan on subscriptions s (cost=0.00..24979.10 rows=403 width=11) Filter: ((CASE WHEN (NOT void) THEN end_date ELSE ((end_date - '1 mon'::interval))::dateEND <= ((date_trunc('month'::text, now()))::date - 1)) AND (CASE WHEN (NOT void) THEN end_date ELSE((en d_date - '1 mon'::interval))::date END >= first_of_another_month((((now())::date - '1 mon'::interval))::date))) -> Materialize (cost=0.00..1.58 rows=12 width=2) -> Seq Scan on subscription_types st (cost=0.00..1.52 rows=12 width=2) Filter: ((sub_class)::text = 'VM'::text) -> Index Only Scan using tcmopid on mop (cost=0.00..2.28 rows=1 width=4) Index Cond: (id = s.member_id) -> Index Scan using tcmembersid on members m (cost=0.00..5.47 rows=1 width=6) Index Cond: (id = mop.id) Filter: (mail_option_lvl > 0) -> Seq Scan on nop_seed (cost=0.00..33.30 rows=2330 width=4) (23 rows) network=# select count(*) from testcase.vip_mailer_unpaid_current; count ------- 331 (1 row) Here is the output after putting a PK on testcase.cancellations: ALTER TABLE testcase.cancellations ADD CONSTRAINT cancellations_pkey PRIMARY KEY(id); network=# explain select count(*) from testcase.vip_mailer_unpaid_current; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------- Aggregate (cost=26233.61..26233.62 rows=1 width=0) -> Nested Loop (cost=0.00..26232.63 rows=395 width=0) Join Filter: ((s.end_date < nop_seed.paid_thru) OR s.void) -> Nested Loop Left Join (cost=0.00..26170.20 rows=1 width=5) -> Nested Loop (cost=0.00..25946.03 rows=83 width=11) -> Nested Loop (cost=0.00..25315.99 rows=115 width=13) -> Nested Loop (cost=0.00..25053.20 rows=115 width=9) Join Filter: (s.subscription_type = st.subscription_type) -> Seq Scan on subscriptions s (cost=0.00..24979.10 rows=403 width=11) Filter: ((CASE WHEN (NOT void) THEN end_date ELSE ((end_date - '1 mon'::interval))::dateEND <= ((date_trunc('month'::text, now()))::date - 1)) AND (CASE WHEN (NOT void) THEN end_date ELSE((end_date - '1 mon'::interval))::date END >= first_of_another_month((((now())::date - '1 mon'::interval))::date))) -> Materialize (cost=0.00..1.58 rows=12 width=2) -> Seq Scan on subscription_types st (cost=0.00..1.52 rows=12 width=2) Filter: ((sub_class)::text = 'VM'::text) -> Index Only Scan using tcmopid on mop (cost=0.00..2.28 rows=1 width=4) Index Cond: (id = s.member_id) -> Index Scan using tcmembersid on members m (cost=0.00..5.47 rows=1 width=6) Index Cond: (id = mop.id) Filter: (mail_option_lvl > 0) -> Index Scan using cancellations_pkey on cancellations c (cost=0.00..2.69 rows=1 width=6) Index Cond: (id = m.id) Filter: ((CASE WHEN (id IS NULL) THEN m.membertype WHEN ((status = 'tr'::text) OR (status = 'p'::text)OR (status = 'cd'::text) OR (status = 'sp'::text) OR (status = 'sa'::text)) THEN m.membertype ELSE 'c'::character varying END)::text = 'v'::text) -> Seq Scan on nop_seed (cost=0.00..33.30 rows=2330 width=4) (22 rows) network=# select count(*) from testcase.vip_mailer_unpaid_current; count ------- 390 (1 row) Would you still like a test case to run?
Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
From
Tom Lane
Date:
Bill MacArthur <webmaster@dhs-club.com> writes: > Update, I started placing primary keys on the testcase tables and watched the planner output. Once I put a PK on one ofthe tables in particular, the planner revised the plan to use the PK. At that point, the results become erroneous as theplanner also moves another filter evaluation to an earlier point at which time I don't think it has the data to make thedecision. Curious ... > I can still finish up with the test case if you like, but here is a highlight. The first VIEW, which I did not originallypost is this: Test case, please. regards, tom lane