Thread: CTE vs Subquery
Hi all, i am having any problems with performance of queries that uses CTE, can the join on a CTE use the index of the original table?, suppose two simple tables: CREATE TABLE employee ( emp_id integer NOT NULL, name character varying NOT NULL, CONSTRAINT employee_pkey PRIMARY KEY (emp_id ) ); CREATE TABLE employee_telephone ( emp_id integer NOT NULL, phone_type character varying NOT NULL, phone_number character varying NOT NULL, CONSTRAINT employee_telephone_pkey PRIMARY KEY (emp_id , phone_type ), CONSTRAINT employee_telephone_emp_id_fkey FOREIGN KEY (emp_id) REFERENCES employee (emp_id) ); and this two queries, i know this particular case don't need either a CTE or subquery it is only an example: WITH phones AS (SELECT emp_id, phone_number ORDER BY emp_id, phone_type) SELECT emp.emp_id, emp.name, array_to_string(array_agg(phones.phone_number)) AS phones FROM employee AS emp JOIN phones ON phones.emp_id = emp.emp_id VS SELECT emp.emp_id, emp.name, array_to_string(array_agg(phones.phone_number)) AS phones FROM employee AS emp JOIN (SELECT emp_id, phone_number ORDER BY emp_id, phone_type) AS phones ON phones.emp_id = emp.emp_id Why the CTE it is slower in many cases? does the CTE don't use the index for the join and the subquery do? if the CTE it is usually slower where should be used instead of a subquery other than recursive CTE's? Regards, Miguel Angel.
Linos <info@linos.es> writes: > i am having any problems with performance of queries that uses CTE, can the > join on a CTE use the index of the original table? CTEs act as optimization fences. This is a feature, not a bug. Use them when you want to isolate the evaluation of a subquery. regards, tom lane
El 25/10/11 18:43, Tom Lane escribió: > Linos <info@linos.es> writes: >> i am having any problems with performance of queries that uses CTE, can the >> join on a CTE use the index of the original table? > > CTEs act as optimization fences. This is a feature, not a bug. Use > them when you want to isolate the evaluation of a subquery. > > regards, tom lane > The truth it is that complex queries seems more readable using them (maybe a personal preference no doubt). Do have other popular databases the same behavior? SQL Server or Oracle for example? Regards, Miguel Ángel.
On Tue, Oct 25, 2011 at 11:47 AM, Linos <info@linos.es> wrote: > El 25/10/11 18:43, Tom Lane escribió: >> Linos <info@linos.es> writes: >>> i am having any problems with performance of queries that uses CTE, can the >>> join on a CTE use the index of the original table? >> >> CTEs act as optimization fences. This is a feature, not a bug. Use >> them when you want to isolate the evaluation of a subquery. >> >> regards, tom lane >> > > The truth it is that complex queries seems more readable using them (maybe a > personal preference no doubt). > > Do have other popular databases the same behavior? SQL Server or Oracle for example? In my experience, SQL server also materializes them -- basically CTE is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to foo. If you want join behavior, use a join (by the way IIRC SQL Server is a lot more restrictive about placement of ORDER BY). I like CTE current behavior -- the main place I find it awkward is in use of recursive queries because the CTE fence forces me to abstract the recursion behind a function, not a view since pushing the view qual down into the CTE is pretty horrible: postgres=# explain select foo.id, (with bar as (select id from foo f where f.id = foo.id) select * from bar) from foo where foo.id = 11; QUERY PLAN ------------------------------------------------------------------------------------- Index Scan using foo_idx on foo (cost=0.00..16.57 rows=1 width=4) Index Cond: (id = 11) SubPlan 2 -> CTE Scan on bar (cost=8.28..8.30 rows=1 width=4) CTE bar -> Index Scan using foo_idx on foo f (cost=0.00..8.28 rows=1 width=4) Index Cond: (id = $0) (7 rows) whereas for function you can inject your qual inside the CTE pretty easily. this is a different problem than the one you're describing though. for the most part, CTE execution fence is a very good thing, since it enforces restrictions that other features can leverage, for example 'data modifying with' queries (by far my all time favorite postgres enhancement). merlin
El 25/10/11 19:11, Merlin Moncure escribió: > On Tue, Oct 25, 2011 at 11:47 AM, Linos <info@linos.es> wrote: >> El 25/10/11 18:43, Tom Lane escribió: >>> Linos <info@linos.es> writes: >>>> i am having any problems with performance of queries that uses CTE, can the >>>> join on a CTE use the index of the original table? >>> >>> CTEs act as optimization fences. This is a feature, not a bug. Use >>> them when you want to isolate the evaluation of a subquery. >>> >>> regards, tom lane >>> >> >> The truth it is that complex queries seems more readable using them (maybe a >> personal preference no doubt). >> >> Do have other popular databases the same behavior? SQL Server or Oracle for example? > > In my experience, SQL server also materializes them -- basically CTE > is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to > foo. If you want join behavior, use a join (by the way IIRC SQL > Server is a lot more restrictive about placement of ORDER BY). > > I like CTE current behavior -- the main place I find it awkward is in > use of recursive queries because the CTE fence forces me to abstract > the recursion behind a function, not a view since pushing the view > qual down into the CTE is pretty horrible: > > postgres=# explain select foo.id, (with bar as (select id from foo f > where f.id = foo.id) select * from bar) from foo where foo.id = 11; > QUERY PLAN > ------------------------------------------------------------------------------------- > Index Scan using foo_idx on foo (cost=0.00..16.57 rows=1 width=4) > Index Cond: (id = 11) > SubPlan 2 > -> CTE Scan on bar (cost=8.28..8.30 rows=1 width=4) > CTE bar > -> Index Scan using foo_idx on foo f (cost=0.00..8.28 > rows=1 width=4) > Index Cond: (id = $0) > (7 rows) > > whereas for function you can inject your qual inside the CTE pretty > easily. this is a different problem than the one you're describing > though. for the most part, CTE execution fence is a very good thing, > since it enforces restrictions that other features can leverage, for > example 'data modifying with' queries (by far my all time favorite > postgres enhancement). > > merlin > ok, i get the idea, but i still don't understand what Tom says about isolate evaluation, apart from the performance and the readability, if i am not using writable CTE or recursive CTE, what it is the difference in evaluation (about being isolate) of a subquery vs CTE with the same text inside. I have been using this form lately: WITH inv (SELECT item_id, SUM(units) AS units FROM invoices), quo AS (SELECT item_id, SUM(units) AS units FROM quotes) SELECT items.item_id, CASE WHEN inv.units IS NOT NULL THEN inv.units ELSE 0 END AS units_invoices, CASE WHEN quo.units IS NOT NULL THEN quo.units ELSE 0 END AS units_quotes FROM items LEFT JOIN inv ON inv.item_id = items.item_id LEFT JOIN quo ON quo.item_id = items.item_id Well this is oversimplified because i use much more tables and filter based on dates, but you get the idea, it seems that this type of query should use subqueries, no? Regards, Miguel Angel.
On Wed, Oct 26, 2011 at 4:00 AM, Linos <info@linos.es> wrote: > El 25/10/11 19:11, Merlin Moncure escribió: >> On Tue, Oct 25, 2011 at 11:47 AM, Linos <info@linos.es> wrote: >>> El 25/10/11 18:43, Tom Lane escribió: >>>> Linos <info@linos.es> writes: >>>>> i am having any problems with performance of queries that uses CTE, can the >>>>> join on a CTE use the index of the original table? >>>> >>>> CTEs act as optimization fences. This is a feature, not a bug. Use >>>> them when you want to isolate the evaluation of a subquery. >>>> >>>> regards, tom lane >>>> >>> >>> The truth it is that complex queries seems more readable using them (maybe a >>> personal preference no doubt). >>> >>> Do have other popular databases the same behavior? SQL Server or Oracle for example? >> >> In my experience, SQL server also materializes them -- basically CTE >> is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to >> foo. If you want join behavior, use a join (by the way IIRC SQL >> Server is a lot more restrictive about placement of ORDER BY). >> >> I like CTE current behavior -- the main place I find it awkward is in >> use of recursive queries because the CTE fence forces me to abstract >> the recursion behind a function, not a view since pushing the view >> qual down into the CTE is pretty horrible: >> >> postgres=# explain select foo.id, (with bar as (select id from foo f >> where f.id = foo.id) select * from bar) from foo where foo.id = 11; >> QUERY PLAN >> ------------------------------------------------------------------------------------- >> Index Scan using foo_idx on foo (cost=0.00..16.57 rows=1 width=4) >> Index Cond: (id = 11) >> SubPlan 2 >> -> CTE Scan on bar (cost=8.28..8.30 rows=1 width=4) >> CTE bar >> -> Index Scan using foo_idx on foo f (cost=0.00..8.28 >> rows=1 width=4) >> Index Cond: (id = $0) >> (7 rows) >> >> whereas for function you can inject your qual inside the CTE pretty >> easily. this is a different problem than the one you're describing >> though. for the most part, CTE execution fence is a very good thing, >> since it enforces restrictions that other features can leverage, for >> example 'data modifying with' queries (by far my all time favorite >> postgres enhancement). >> >> merlin >> > > ok, i get the idea, but i still don't understand what Tom says about isolate > evaluation, apart from the performance and the readability, if i am not using > writable CTE or recursive CTE, what it is the difference in evaluation (about > being isolate) of a subquery vs CTE with the same text inside. > > I have been using this form lately: > > WITH inv (SELECT item_id, > SUM(units) AS units > FROM invoices), > > quo AS (SELECT item_id, > SUM(units) AS units > FROM quotes) > > SELECT items.item_id, > CASE WHEN inv.units IS NOT NULL THEN inv.units ELSE 0 END AS > units_invoices, > CASE WHEN quo.units IS NOT NULL THEN quo.units ELSE 0 END AS > units_quotes > > FROM items > LEFT JOIN inv ON inv.item_id = items.item_id > LEFT JOIN quo ON quo.item_id = items.item_id > > Well this is oversimplified because i use much more tables and filter based on > dates, but you get the idea, it seems that this type of query should use > subqueries, no? Think about a query like this: with foo as ( select id, volatile_func() from bar ) select * from baz join foo using (id) join bla using(id) limit 10; How many times does volatile_func() get called? How many times in the JOIN version? The answers are different... One of the key features of CTEs is controlling how/when query operations occur so you can do things like control side effects and force query plans that the server would not otherwise choose (although this is typically an unoptimization). merlin
El 26/10/11 14:23, Merlin Moncure escribió: > On Wed, Oct 26, 2011 at 4:00 AM, Linos <info@linos.es> wrote: >> El 25/10/11 19:11, Merlin Moncure escribió: >>> On Tue, Oct 25, 2011 at 11:47 AM, Linos <info@linos.es> wrote: >>>> El 25/10/11 18:43, Tom Lane escribió: >>>>> Linos <info@linos.es> writes: >>>>>> i am having any problems with performance of queries that uses CTE, can the >>>>>> join on a CTE use the index of the original table? >>>>> >>>>> CTEs act as optimization fences. This is a feature, not a bug. Use >>>>> them when you want to isolate the evaluation of a subquery. >>>>> >>>>> regards, tom lane >>>>> >>>> >>>> The truth it is that complex queries seems more readable using them (maybe a >>>> personal preference no doubt). >>>> >>>> Do have other popular databases the same behavior? SQL Server or Oracle for example? >>> >>> In my experience, SQL server also materializes them -- basically CTE >>> is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to >>> foo. If you want join behavior, use a join (by the way IIRC SQL >>> Server is a lot more restrictive about placement of ORDER BY). >>> >>> I like CTE current behavior -- the main place I find it awkward is in >>> use of recursive queries because the CTE fence forces me to abstract >>> the recursion behind a function, not a view since pushing the view >>> qual down into the CTE is pretty horrible: >>> >>> postgres=# explain select foo.id, (with bar as (select id from foo f >>> where f.id = foo.id) select * from bar) from foo where foo.id = 11; >>> QUERY PLAN >>> ------------------------------------------------------------------------------------- >>> Index Scan using foo_idx on foo (cost=0.00..16.57 rows=1 width=4) >>> Index Cond: (id = 11) >>> SubPlan 2 >>> -> CTE Scan on bar (cost=8.28..8.30 rows=1 width=4) >>> CTE bar >>> -> Index Scan using foo_idx on foo f (cost=0.00..8.28 >>> rows=1 width=4) >>> Index Cond: (id = $0) >>> (7 rows) >>> >>> whereas for function you can inject your qual inside the CTE pretty >>> easily. this is a different problem than the one you're describing >>> though. for the most part, CTE execution fence is a very good thing, >>> since it enforces restrictions that other features can leverage, for >>> example 'data modifying with' queries (by far my all time favorite >>> postgres enhancement). >>> >>> merlin >>> >> >> ok, i get the idea, but i still don't understand what Tom says about isolate >> evaluation, apart from the performance and the readability, if i am not using >> writable CTE or recursive CTE, what it is the difference in evaluation (about >> being isolate) of a subquery vs CTE with the same text inside. >> >> I have been using this form lately: >> >> WITH inv (SELECT item_id, >> SUM(units) AS units >> FROM invoices), >> >> quo AS (SELECT item_id, >> SUM(units) AS units >> FROM quotes) >> >> SELECT items.item_id, >> CASE WHEN inv.units IS NOT NULL THEN inv.units ELSE 0 END AS >> units_invoices, >> CASE WHEN quo.units IS NOT NULL THEN quo.units ELSE 0 END AS >> units_quotes >> >> FROM items >> LEFT JOIN inv ON inv.item_id = items.item_id >> LEFT JOIN quo ON quo.item_id = items.item_id >> >> Well this is oversimplified because i use much more tables and filter based on >> dates, but you get the idea, it seems that this type of query should use >> subqueries, no? > > Think about a query like this: > with foo as > ( > select id, volatile_func() from bar > ) > select * from baz join foo using (id) join bla using(id) limit 10; > > How many times does volatile_func() get called? How many times in the > JOIN version? The answers are different... > > One of the key features of CTEs is controlling how/when query > operations occur so you can do things like control side effects and > force query plans that the server would not otherwise choose (although > this is typically an unoptimization). > > merlin Ok, i think i understand now the difference, thanks Merlin. Regards, Miguel Ángel.