Re: Performance issues - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Performance issues
Date
Msg-id 55071FD8.6050106@2ndquadrant.com
Whole thread Raw
In response to Re: Performance issues  (Marc Mamin <M.Mamin@intershop.de>)
Responses Re: Performance issues
List pgsql-performance
On 16.3.2015 18:49, Marc Mamin wrote:
>
>> Hi Team,
>>
>> This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY:

FWIW, this is a somewhat more readable version of the plan:

    http://explain.depesz.com/s/nbB

In the future, please do two things:

(1) Attach the plan as a text file, because the mail clients tend to
    screw things up (wrapping long lines). Unless the plan is trivial,
    of course - but pgsql-performance usually deals with complex stuff.

(2) Put the plan on explain.depesz.com helps too, because it's
    considerably more readable (but always do 1, because resorces
    placed somewhere else tends to disappear, and the posts then make
    very little sense, which is bad when searching in the archives)

(3) Same for stuff pasted somewhere else - always attach it to the
    message. For example I'd like to give you more accurate advice, but
    I can't as http://pgsql.privatepaste.com/41207bea45 is unavailable.

>
>
>     >Rows Removed by Join Filter: 3577676116
>
>     That's quite a lot.
>     You're possibly missing a clause in a join, resulting in a cross join.
>     It is also helpful to put your result here:
>     http://explain.depesz.com/
>     regards,

IMHO this is merely a consequence of using the CTE, which produces 52997
rows and is scanned 67508x as the inner relation of a nested loop. That
gives you 3577721476 tuples in total, and only 45360 are kept (hence
3577676116 are removed).

This is a prime example of why CTEs are not just aliases for subqueries,
but may actually cause serious trouble.

There are other issues (e.g. the row count estimate of the CTE is
seriously off, most likely because of the HashAggregate in the outer
branch), but that's a secondary issue IMHO.

Vivekanand, try this (in the order of intrusiveness):

(1) Get rid of the CTE, and just replace it with subselect in the FROM
    part of the query, so instead of this:

    WITH valid_executions AS (...)
    SELECT ... FROM ... JOIN valid_executions ON (...)

    you'll have something like this:

    SELECT ... FROM ... JOIN (...) AS valid_executions ON (...)

    This way the subselect will optimized properly.


(2) Replace the CTE with a materialized view, or a temporary table.
    This has both advantages and disadvantages - the main advantage is
    that you can create indexes, collect statistics. Disadvantage is
    you have to refresh the MV, fill temporary table etc.

I expect (1) to improve the performance significantly, and (2) might
improve it even further by fixing the misestimates.


regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-performance by date:

Previous
From: Marc Mamin
Date:
Subject: Re: Performance issues
Next
From: Tom Lane
Date:
Subject: Re: Bad cost estimate with FALSE filter condition