Re: getting the ranks out of items with SHARED - Mailing list pgsql-general

From Tom Lane
Subject Re: getting the ranks out of items with SHARED
Date
Msg-id 20658.1121360033@sss.pgh.pa.us
Whole thread Raw
In response to getting the ranks out of items with SHARED  (Janning Vygen <vygen@gmx.de>)
List pgsql-general
Janning Vygen <vygen@planwerk6.de> writes:
> and this is the combined statement:

>    UPDATE temp_gc
>     SET gc_rank = ranking.rank
>     FROM (

>     SELECT
>       *,
>     ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
>     FROM (
>         SELECT
>           mg_name,
>           gc_gesamtpunkte,
>           gc_gesamtsiege
>         FROM temp_gc
>         ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
>     ) AS r1
>     ) AS ranking
>     WHERE temp_gc.mg_name = ranking.mg_name;


> to me it looks like the call to ranking() is moved from the subquery plan
> "upwards". but i really can't interpret this explain output regarding to
> "where" the ranking funcion is called.

Yeah, I think you are exactly right.  The EXPLAIN output doesn't show
targetlist contents (EXPLAIN VERBOSE would, but not very readably :-(),
but the small difference in the estimated costs seems to indicate that
the function evaluation is not happening at the level of the "subquery
scan" plan node in the combined plan.  This is consistent with the
planner's behavior in general.  It will honor the "order by" in the
sense that the output of the "r1" subquery is delivered to the upper
plan level in that order, but it feels no compunction about flattening
the "ranking" subquery into the outer query, and after that you cannot
be certain about the order in which the executions of the ranking()
function happen.

What you need is to prevent the flattening of the intermediate subquery.
The current favorite technique is to insert a dummy OFFSET clause:

 UPDATE temp_gc
    SET gc_rank = ranking.rank
    FROM (

    SELECT
      *,
    ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
    FROM (
        SELECT
          mg_name,
          gc_gesamtpunkte,
          gc_gesamtsiege
        FROM temp_gc
        ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
    ) AS r1
    OFFSET 0
    ) AS ranking
    WHERE temp_gc.mg_name = ranking.mg_name;

although you could also do it by inserting an ORDER BY at that level.

BTW, there's been some discussion of preventing flattening of subqueries
whose SELECT lists contain volatile functions.  If we did that then
declaring ranking() as volatile would be enough to avoid the problem.
I've been hesitant to make the change because I'm concerned about the
probable loss of optimization in cases where the function is labeled
volatile by default, merely because the author didn't think about how
to mark it.  And in any case "volatile" doesn't really describe the
issue with your function...

> my problem is that getting a rank out of items is very expensive with
> aggregate functions,

I'm unconvinced --- I don't see a reason that you can't implement it
exactly the same way as an aggregate.  The only difference is that the
state value is an aggregate state value instead of a global variable.
There might be some extra palloc overhead, but nothing worse.

            regards, tom lane

pgsql-general by date:

Previous
From: "Thomas F. O'Connell"
Date:
Subject: Re: ERROR: could not open relation
Next
From: Tom Lane
Date:
Subject: Re: strange error with temp table: pg_type_typname_nsp_index