Thread: New Instance of Variable Not Found in Subplan Bug
Tom, I think I have a new instance of the "Variable not Found in Subplan Target List" bug, or at least one that was not patched in 7.4.1. Version: 7.4.1 from source Platform: RH Linux 7.3 running on Dual Athalon Severity: Showstopper Symptoms: Converted 7.2 databse to 7.4.1 three weeks ago. This view worked normally for those 3 weeks; in fact, it worked normally until a couple of hours ago (and was in heavy use all that time, being queried about 1000 times per day) It is still in use on a mirror server, with identical schema but slightly different data, where the error does NOT occur. Starting about 2 hours ago, we began to get this: net_test=# select * from sv_cases; ERROR: variable not found in subplan target lists The database is huge, proprietary, and very complex. I can't provide you with full schema on this list, but could provide more information privately. Here is the view: CREATE VIEW "sv_cases" as SELECT cases.case_id, cases.case_name, cases.docket, status.status_label, cases.opp_counsel_name, trial_groups.tgroup_name, cases.tgroup_id, cases.status, cases.lead_case_docket, cases.lead_case_id, cases.priority, tpr.rollup1 as pr_element, tpr.rollup2 as pr_label FROM status, ( SELECT vchar_to_int2(list_value) as priority, rollup1, rollup2 from text_list_values WHERE list_name = 'Case Priority' ) tpr, cases LEFT JOIN trial_groups on cases.tgroup_id = trial_groups.tgroup_id WHERE (cases.status = status.status AND status.relation = 'cases'::"varchar") AND cases.priority = tpr.priority; I cannot run an EXPLAIN, it errors out as well. And, per one of your previous e-mails, I tried forcing a change in the plan, but to no benefit: jwnet_test=# set enable_hashjoin=false; SET jwnet_test=# select * from sv_cases; ERROR: variable not found in subplan target lists jwnet_test=# set enable_hashjoin=true; SET jwnet_test=# set enable_mergejoin=false; SET jwnet_test=# select * from sv_cases; ERROR: variable not found in subplan target lists jwnet_test=# set enable_mergejoin=true; SET jwnet_test=# set enable_nestloop=false; SET jwnet_test=# select * from sv_cases; ERROR: variable not found in subplan target lists If there is a patch for this that isn't in 7.4.1, please let me know where I can grab it other than the archives, as the HTML formatting is messing me up with the 11/2003 patch. Thanks! -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > I think I have a new instance of the "Variable not Found in Subplan Target > List" bug, or at least one that was not patched in 7.4.1. There are several (two or three, I forget) post-7.4.1 fixes that resolve bugs that all have that symptom. I can't tell with this much info whether you have a new case or one of the known ones. I'd suggest pulling the tip of REL7_4_STABLE branch to see if it's fixed. regards, tom lane
Tom, Further information: > CREATE VIEW "sv_cases" as > SELECT cases.case_id, cases.case_name, cases.docket, status.status_label, > cases.opp_counsel_name, trial_groups.tgroup_name, cases.tgroup_id, > cases.status, cases.lead_case_docket, cases.lead_case_id, > cases.priority, tpr.rollup1 as pr_element, tpr.rollup2 as pr_label > FROM status, > ( SELECT vchar_to_int2(list_value) as priority, rollup1, rollup2 > from text_list_values WHERE list_name = 'Case Priority' ) tpr, > cases LEFT JOIN trial_groups on cases.tgroup_id = trial_groups.tgroup_id > WHERE (cases.status = status.status AND status.relation = 'cases'::"varchar") > AND cases.priority = tpr.priority; In the above view, text_list_values is another, simple view. Removing that view from the equation fixed it, becuase it turns out that the issue is with the text_list_value view: CREATE VIEW text_list_values AS SELECT text_lists.list_id, text_lists.list_name, text_lists.list_group, text_lists.app_id, text_lists.status AS list_status, s1.status_label AS list_status_label, text_lists.list_format, text_lists.item_length, list_values.value_id, list_values.list_value, list_values.description, list_values.rollup1, list_values.rollup2, list_values.status AS value_status, s2.status AS value_status_label FROM text_lists JOIN list_values USING (list_id) JOIN status s1 ON text_lists.status = s1.status AND s1.relation::text = 'text_lists'::character varying::text JOIN status s2 ON list_values.status = s2.status AND s2.relation::text = 'list_values'::character varying::text; RELOADING the view fixed the error. Here's the EXPLAIN plan: QUERY PLAN ------------------------------------------------------------------------------------------------ Merge Join (cost=14.51..15.69 rows=66 width=130) Merge Cond: ("outer".status = "inner".status) -> Sort (cost=1.94..1.94 rows=3 width=2) Sort Key: s2.status -> Seq Scan on status s2 (cost=0.00..1.91 rows=3 width=2) Filter: ((relation)::text = 'list_values'::text) -> Sort (cost=12.57..12.83 rows=102 width=128) Sort Key: list_values.status -> Hash Join (cost=4.11..9.17 rows=102 width=128) Hash Cond: ("outer".list_id = "inner".list_id) -> Seq Scan on list_values (cost=0.00..3.36 rows=136 width=69) -> Hash (cost=4.06..4.06 rows=18 width=63) -> Merge Join (cost=3.74..4.06 rows=18 width=63) Merge Cond: ("outer".status = "inner".status) -> Sort (cost=1.95..1.96 rows=4 width=16) Sort Key: s1.status -> Seq Scan on status s1 (cost=0.00..1.91 rows=4 width=16) Filter: ((relation)::text = 'text_lists'::text) -> Sort (cost=1.79..1.85 rows=24 width=49) Sort Key: text_lists.status -> Seq Scan on text_lists (cost=0.00..1.24 rows=24 width=49) Suggestions on how to diagnose this, before I erase all evidence of it? -- -Josh Berkus Aglio Database Solutions San Francisco
Tom, > There are several (two or three, I forget) post-7.4.1 fixes that resolve > bugs that all have that symptom. I can't tell with this much info > whether you have a new case or one of the known ones. > > I'd suggest pulling the tip of REL7_4_STABLE branch to see if it's > fixed. Hmmm ... problem is, per my last e-mail, the bug is not reproducable off of this particular database instance -- if I copy it to my laptop, the bug goes away. And even though it's not a production database, it *is* a production *server*. Which means that I can't apply CVS code to it ... -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Hmmm ... problem is, per my last e-mail, the bug is not reproducable off of > this particular database instance -- if I copy it to my laptop, the bug goes > away. It's presumably dependent on the contents of pg_statistic and the relpages/reltuples counts in pg_class for the tables involved. You could likely reproduce it by migrating that data to your laptop. It would take a little bit of hacking to get the pg_statistic data in (adjusting starelid for instance) but I think it's doable. Note that the planner control settings (eg effective_cache_size) might also need to be copied. regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: > RELOADING the view fixed the error. What do you mean by "reloading the view", exactly? > Here's the EXPLAIN plan: The cost numbers here are very small; are the tables themselves small, or did you reload them too? regards, tom lane
Tom, > It's presumably dependent on the contents of pg_statistic and the > relpages/reltuples counts in pg_class for the tables involved. > You could likely reproduce it by migrating that data to your laptop. > It would take a little bit of hacking to get the pg_statistic data > in (adjusting starelid for instance) but I think it's doable. > > Note that the planner control settings (eg effective_cache_size) might > also need to be copied. Hmmm ... could I do it through a binary file copy? I'm on a bit of a deadline here, and need to replace the bad view in the next hour or so. -- -Josh Berkus Aglio Database Solutions San Francisco
Tom, > > RELOADING the view fixed the error. > > What do you mean by "reloading the view", exactly? I created the same view under a new name. The new view runs fine. I suspect that if I REPLACED the view, it would be fixed, but I don't want to do that if we want to analyze it further. > The cost numbers here are very small; are the tables themselves small, or > did you reload them too? The tables are quite small, the largest < 200 rows. This view just links a bunch of reference lists. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Hmmm ... could I do it through a binary file copy? I'm on a bit of a > deadline here, and need to replace the bad view in the next hour or so. For the moment I'd just suggest saving the contents of pg_class and pg_statistic (COPY TO STDOUT or some such). If we can reproduce it at all, that will be the info we need. regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: >> What do you mean by "reloading the view", exactly? > I created the same view under a new name. The new view runs fine. [scratches head...] That doesn't make any sense to me at all ... there must be some difference between the two view definitions. The planner doesn't have any statistics associated with views, only with underlying tables (in fact it never even sees the views). Please compare \d output for the two views. regards, tom lane
Tom, > [scratches head...] That doesn't make any sense to me at all ... > there must be some difference between the two view definitions. > The planner doesn't have any statistics associated with views, > only with underlying tables (in fact it never even sees the views). Unlikely, given that I created the second view by copying the \d output of the first view. However, here goes. First is \d for the bad view, and second is \d and 2nd for the good view. I can't see any difference. Can you? View "public.text_list_values" Column | Type | Modifiers --------------------+------------------------+----------- list_id | integer | list_name | character varying(30) | list_group | character varying(30) | app_id | integer | list_status | integer | list_status_label | character varying(30) | list_format | character varying(30) | item_length | smallint | value_id | integer | list_value | character varying(50) | description | character varying(100) | rollup1 | character varying(100) | rollup2 | character varying(50) | value_status | integer | value_status_label | smallint | View definition: SELECT text_lists.list_id, text_lists.list_name, text_lists.list_group, text_lists.app_id, text_lists.status AS list_status, s1.status_label AS list_status_label, text_lists.list_format, text_lists.item_length, list_values.value_id, list_values.list_value, list_values.description, list_values.rollup1, list_values.rollup2, list_values.status AS value_status, s2.status AS value_status_label FROM text_lists JOIN list_values USING (list_id) JOIN status s1 ON text_lists.status = s1.status AND s1.relation::text = 'text_lists'::character varying::text JOIN status s2 ON list_values.status = s2.status AND s2.relation::text = 'list_values'::character varying::text; View "public.text_list_values_2" Column | Type | Modifiers --------------------+------------------------+----------- list_id | integer | list_name | character varying(30) | list_group | character varying(30) | app_id | integer | list_status | integer | list_status_label | character varying(30) | list_format | character varying(30) | item_length | smallint | value_id | integer | list_value | character varying(50) | description | character varying(100) | rollup1 | character varying(100) | rollup2 | character varying(50) | value_status | integer | value_status_label | smallint | View definition: SELECT text_lists.list_id, text_lists.list_name, text_lists.list_group, text_lists.app_id, text_lists.status AS list_status, s1.status_label AS list_status_label, text_lists.list_format, text_lists.item_length, list_values.value_id, list_values.list_value, list_values.description, list_values.rollup1, list_values.rollup2, list_values.status AS value_status, s2.status AS value_status_label FROM text_lists JOIN list_values USING (list_id) JOIN status s1 ON text_lists.status = s1.status AND s1.relation::text = 'text_lists'::character varying::text JOIN status s2 ON list_values.status = s2.status AND s2.relation::text = 'list_values'::character varying::text; -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > However, here goes. First is \d for the bad view, and second is \d and 2nd > for the good view. I can't see any difference. Can you? They look the same to me too, but I still think there must be a difference. Would you look at their pg_rewrite rows to see if the ev_action text is identical? (You want the rows with ev_class matching the pg_class OIDs of the two views.) regards, tom lane