Re: CTE vs Subquery - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: CTE vs Subquery
Date
Msg-id CAHyXU0z7qj=6G-B6aJ+-D0+kQtV7rBg3-J5=iFuhQEg_0WmoEA@mail.gmail.com
Whole thread Raw
In response to Re: CTE vs Subquery  (Linos <info@linos.es>)
Responses Re: CTE vs Subquery  (Linos <info@linos.es>)
List pgsql-performance
On Wed, Oct 26, 2011 at 4:00 AM, Linos <info@linos.es> wrote:
> El 25/10/11 19:11, Merlin Moncure escribió:
>> On Tue, Oct 25, 2011 at 11:47 AM, Linos <info@linos.es> wrote:
>>> El 25/10/11 18:43, Tom Lane escribió:
>>>> Linos <info@linos.es> writes:
>>>>>     i am having any problems with performance of queries that uses CTE, can the
>>>>> join on a CTE use the index of the original table?
>>>>
>>>> CTEs act as optimization fences.  This is a feature, not a bug.  Use
>>>> them when you want to isolate the evaluation of a subquery.
>>>>
>>>>                       regards, tom lane
>>>>
>>>
>>> The truth it is that complex queries seems more readable using them (maybe a
>>> personal preference no doubt).
>>>
>>> Do have other popular databases the same behavior? SQL Server or Oracle for example?
>>
>> In my experience, SQL server also materializes them -- basically CTE
>> is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to
>> foo.  If you want join behavior, use a join (by the way IIRC SQL
>> Server is a lot more restrictive about placement of ORDER BY).
>>
>> I like CTE current behavior -- the main place I find it awkward is in
>> use of recursive queries because the CTE fence forces me to abstract
>> the recursion behind a function, not a view since pushing the view
>> qual down into the CTE is pretty horrible:
>>
>> postgres=# explain select foo.id, (with bar as (select id from foo f
>> where f.id = foo.id) select * from bar) from foo where foo.id = 11;
>>                                      QUERY PLAN
>> -------------------------------------------------------------------------------------
>>  Index Scan using foo_idx on foo  (cost=0.00..16.57 rows=1 width=4)
>>    Index Cond: (id = 11)
>>    SubPlan 2
>>      ->  CTE Scan on bar  (cost=8.28..8.30 rows=1 width=4)
>>            CTE bar
>>              ->  Index Scan using foo_idx on foo f  (cost=0.00..8.28
>> rows=1 width=4)
>>                    Index Cond: (id = $0)
>> (7 rows)
>>
>> whereas for function you can inject your qual inside the CTE pretty
>> easily.  this is a different problem than the one you're describing
>> though.  for the most part, CTE execution fence is a very good thing,
>> since it enforces restrictions that other features can leverage, for
>> example 'data modifying with' queries (by far my all time favorite
>> postgres enhancement).
>>
>> merlin
>>
>
> ok, i get the idea, but i still don't understand what Tom says about isolate
> evaluation, apart from the performance and the readability, if i am not using
> writable CTE or recursive CTE, what it is the difference in evaluation (about
> being isolate) of a subquery vs CTE with the same text inside.
>
> I have been using this form lately:
>
> WITH inv (SELECT item_id,
>                               SUM(units) AS units
>                 FROM invoices),
>
> quo AS (SELECT item_id,
>                             SUM(units) AS units
>              FROM quotes)
>
> SELECT items.item_id,
>              CASE WHEN inv.units IS NOT NULL THEN inv.units ELSE 0 END AS
> units_invoices,
>              CASE WHEN quo.units IS NOT NULL THEN quo.units ELSE 0 END AS
> units_quotes
>
> FROM items
>    LEFT JOIN inv ON inv.item_id = items.item_id
>    LEFT JOIN quo ON quo.item_id = items.item_id
>
> Well this is oversimplified because i use much more tables and filter based on
> dates, but you get the idea, it seems that this type of query should use
> subqueries, no?

Think about a query like this:
with foo as
(
  select id, volatile_func() from bar
)
select * from baz join foo using (id) join bla using(id) limit 10;

How many times does volatile_func() get called?  How many times in the
JOIN version?  The answers are different...

One of the key features of CTEs is controlling how/when query
operations occur so you can do things like control side effects and
force query plans that the server would not otherwise choose (although
this is typically an unoptimization).

merlin

pgsql-performance by date:

Previous
From: "Jens Reufsteck"
Date:
Subject: Anti join miscalculates row number?
Next
From: "Cezariusz Marek"
Date:
Subject: Slow cursor