Thread: Slow query when used in a view
Hey everyone! A developer was complaining about a view he created to abstract an added column in a left join. He was contemplating denormalizing the added value into the parent table and using a trigger to maintain it instead, and I obviously looked into the problem. I noticed the view was incurring a sequence scan on an obvious index condition, but the regular join version was not. Curious, I whipped up this test case: CREATE TABLE foo (id BIGINT, small_label VARCHAR); INSERT INTO foo (id) VALUES (generate_series(1, 10000)); ALTER TABLE foo ADD CONSTRAINT pk_foo_id PRIMARY KEY (id); CREATE TABLE bar (id BIGINT, foo_id BIGINT); INSERT INTO bar (id, foo_id) SELECT a, a%10000 FROM generate_series(1, 100000) a; ALTER TABLE bar ADD CONSTRAINT pk_bar_id PRIMARY KEY (id); CREATE TABLE tiny_foo (small_label VARCHAR NOT NULL PRIMARY KEY); INSERT INTO tiny_foo (small_label) VALUES (('yes', 'we', 'have', 'no', 'bananas')); UPDATE foo SET small_label = 'bananas' WHERE id=750; ANALYZE foo; ANALYZE bar; ANALYZE tiny_foo; CREATE VIEW v_slow_view AS SELECT foo.*, tf.small_label IS NOT NULL AS has_small_label FROM foo LEFT JOIN tiny_foo tf USING (small_label); Now, this is with PostgreSQL 9.1.8, basically default everything in a base Ubuntu install. So, the good query plan using all tables directly: SELECT bar.*, foo.*, tf.small_label IS NOT NULL AS has_small_label FROM bar LEFT JOIN foo ON (foo.id = bar.foo_id) LEFT JOIN tiny_foo tf USING (small_label) WHERE bar.id IN (750, 1750, 2750) ORDER BY bar.id; does this: Index Scan using pk_foo_id on foo (cost=0.00..8.27 rows=1 width=16) Index Cond: (id = bar.foo_id) The bad one using the view: SELECT bar.*, sv.* FROM bar LEFT JOIN v_slow_view sv ON (sv.id = bar.foo_id) WHERE bar.id IN (750, 1750, 2750) ORDER BY bar.id; Mysteriously, does this: Seq Scan on foo (cost=0.00..145.00 rows=10000 width=16) I'm... perplexed. This test case is way too shallow to be affected by join_collapse_limit and its ilk, so I'm not sure what's going on here. I sense an optimization fence, but I can't see where. Thanks in advance! -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Shaun Thomas <sthomas@optionshouse.com> writes: > A developer was complaining about a view he created to abstract an added > column in a left join. ... > Curious, I whipped up this test case: > CREATE VIEW v_slow_view AS > SELECT foo.*, tf.small_label IS NOT NULL AS has_small_label > FROM foo > LEFT JOIN tiny_foo tf USING (small_label); > SELECT bar.*, foo.*, tf.small_label IS NOT NULL AS has_small_label > FROM bar > LEFT JOIN foo ON (foo.id = bar.foo_id) > LEFT JOIN tiny_foo tf USING (small_label) > WHERE bar.id IN (750, 1750, 2750) > ORDER BY bar.id; > SELECT bar.*, sv.* > FROM bar > LEFT JOIN v_slow_view sv ON (sv.id = bar.foo_id) > WHERE bar.id IN (750, 1750, 2750) > ORDER BY bar.id; These queries are not actually equivalent. In the first one, it is impossible for "has_small_label" to read out as NULL: it will either be true or false. However, in the second one, the IS NOT NULL is evaluated below the LEFT JOIN to "sv", and therefore it is required that the query return NULL for "has_small_label" in any row where bar.foo_id lacks a join partner. To implement that behavior correctly, we're forced to form the foo-to-tiny_foo join first, then do the left join with bar (which'll replace RHS columns by nulls where necessary). And that means that you get the inefficient plan wherein the foo-to-tiny_foo join is computed in its entirety. 9.2 does this case better, by virtue of the "parameterized plan" stuff, which exists specifically to let us use nestloop-with-inner-indexscan plans even when there are some join order restrictions complicating matters. regards, tom lane
On 03/11/2013 06:56 PM, Tom Lane wrote: > And that means that you get the inefficient plan wherein the > foo-to-tiny_foo join is computed in its entirety. :( That's unfortunate, though I guess it makes sense. I moved the join in the view into the SELECT clause as an EXISTS, and that seems to work without major adverse effects. Apparently the tiny table really will be tiny in actual use, so impact should be minimal. I just really don't like using subselects that way. :) Thanks, Tom! -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email