Re: CTE vs Subquery - Mailing list pgsql-performance

From Linos
Subject Re: CTE vs Subquery
Date
Msg-id 4EA80348.3030203@linos.es
Whole thread Raw
In response to Re: CTE vs Subquery  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance
El 26/10/11 14:23, Merlin Moncure escribió:
> On Wed, Oct 26, 2011 at 4:00 AM, Linos <info@linos.es> wrote:
>> El 25/10/11 19:11, Merlin Moncure escribió:
>>> On Tue, Oct 25, 2011 at 11:47 AM, Linos <info@linos.es> wrote:
>>>> El 25/10/11 18:43, Tom Lane escribió:
>>>>> Linos <info@linos.es> writes:
>>>>>>     i am having any problems with performance of queries that uses CTE, can the
>>>>>> join on a CTE use the index of the original table?
>>>>>
>>>>> CTEs act as optimization fences.  This is a feature, not a bug.  Use
>>>>> them when you want to isolate the evaluation of a subquery.
>>>>>
>>>>>                       regards, tom lane
>>>>>
>>>>
>>>> The truth it is that complex queries seems more readable using them (maybe a
>>>> personal preference no doubt).
>>>>
>>>> Do have other popular databases the same behavior? SQL Server or Oracle for example?
>>>
>>> In my experience, SQL server also materializes them -- basically CTE
>>> is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to
>>> foo.  If you want join behavior, use a join (by the way IIRC SQL
>>> Server is a lot more restrictive about placement of ORDER BY).
>>>
>>> I like CTE current behavior -- the main place I find it awkward is in
>>> use of recursive queries because the CTE fence forces me to abstract
>>> the recursion behind a function, not a view since pushing the view
>>> qual down into the CTE is pretty horrible:
>>>
>>> postgres=# explain select foo.id, (with bar as (select id from foo f
>>> where f.id = foo.id) select * from bar) from foo where foo.id = 11;
>>>                                      QUERY PLAN
>>> -------------------------------------------------------------------------------------
>>>  Index Scan using foo_idx on foo  (cost=0.00..16.57 rows=1 width=4)
>>>    Index Cond: (id = 11)
>>>    SubPlan 2
>>>      ->  CTE Scan on bar  (cost=8.28..8.30 rows=1 width=4)
>>>            CTE bar
>>>              ->  Index Scan using foo_idx on foo f  (cost=0.00..8.28
>>> rows=1 width=4)
>>>                    Index Cond: (id = $0)
>>> (7 rows)
>>>
>>> whereas for function you can inject your qual inside the CTE pretty
>>> easily.  this is a different problem than the one you're describing
>>> though.  for the most part, CTE execution fence is a very good thing,
>>> since it enforces restrictions that other features can leverage, for
>>> example 'data modifying with' queries (by far my all time favorite
>>> postgres enhancement).
>>>
>>> merlin
>>>
>>
>> ok, i get the idea, but i still don't understand what Tom says about isolate
>> evaluation, apart from the performance and the readability, if i am not using
>> writable CTE or recursive CTE, what it is the difference in evaluation (about
>> being isolate) of a subquery vs CTE with the same text inside.
>>
>> I have been using this form lately:
>>
>> WITH inv (SELECT item_id,
>>                               SUM(units) AS units
>>                 FROM invoices),
>>
>> quo AS (SELECT item_id,
>>                             SUM(units) AS units
>>              FROM quotes)
>>
>> SELECT items.item_id,
>>              CASE WHEN inv.units IS NOT NULL THEN inv.units ELSE 0 END AS
>> units_invoices,
>>              CASE WHEN quo.units IS NOT NULL THEN quo.units ELSE 0 END AS
>> units_quotes
>>
>> FROM items
>>    LEFT JOIN inv ON inv.item_id = items.item_id
>>    LEFT JOIN quo ON quo.item_id = items.item_id
>>
>> Well this is oversimplified because i use much more tables and filter based on
>> dates, but you get the idea, it seems that this type of query should use
>> subqueries, no?
>
> Think about a query like this:
> with foo as
> (
>   select id, volatile_func() from bar
> )
> select * from baz join foo using (id) join bla using(id) limit 10;
>
> How many times does volatile_func() get called?  How many times in the
> JOIN version?  The answers are different...
>
> One of the key features of CTEs is controlling how/when query
> operations occur so you can do things like control side effects and
> force query plans that the server would not otherwise choose (although
> this is typically an unoptimization).
>
> merlin

Ok, i think i understand now the difference, thanks Merlin.

Regards,
Miguel Ángel.

pgsql-performance by date:

Previous
From: Gregg Jaskiewicz
Date:
Subject: Re: Slow cursor
Next
From: "Cezariusz Marek"
Date:
Subject: Re: Slow cursor