Re: [PERFORM] Big IN() clauses etc : feature proposal - Mailing list pgsql-hackers

From PFC
Subject Re: [PERFORM] Big IN() clauses etc : feature proposal
Date
Msg-id op.s9cayhotcigqcu@apollo13
Whole thread Raw
In response to Re: [PERFORM] Big IN() clauses etc : feature proposal  (Markus Schaber <schabi@logix-tt.com>)
Responses Re: [PERFORM] Big IN() clauses etc : feature proposal
Re: [PERFORM] Big IN() clauses etc : feature proposal
List pgsql-hackers
>>     The problem is that you need a set-returning function to retrieve
>> the  values. SRFs don't have rowcount estimates, so the plans suck.
>
> What about adding some way of rowcount estimation to SRFs, in the way of:
>
> CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS
> $$ ... function code ... $$ LANGUAGE plpgsql
> ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ;
>
> Internally, this could create two functions, foo (para, meters) and
> estimate_foo(para, meters) that are the same language and coupled
> together (just like a SERIAL column and its sequence). The estimator
> functions have an implicit return parameter of int8. Parameters may be
> NULL when they are not known at query planning time.
>
> What do you think about this idea?

    It would be very useful.
    A few thoughts...

    You need to do some processing to know how many rows the function would
return.
    Often, this processing will be repeated in the function itself.
    Sometimes it's very simple (ie. the function will RETURN NEXT each
element in an array, you know the array length...)
    Sometimes, for functions returning few rows, it might be faster to
compute the entire result set in the cost estimator.

    So, it might be a bit hairy to find a good compromise.

    Ideas on how to do this (clueless hand-waving mode) :

    1- Add new attributes to set-returning functions ; basically a list of
functions, each returning an estimation parameter (rowcount, cpu tuple
cost, etc).
    This is just like you said.

    2- Add an "estimator", to a function, which would just be another
function, returning one row, a record, containing the estimations in
several columns (rowcount, cpu tuple cost, etc).
    Pros : only one function call to estimate, easier and faster, the
estimator just leaves the unknown columns to NULL.
    The estimator needs not be in the same language as the function itself.
It's just another function.

    3- The estimator could be a set-returning function itself which would
return rows mimicking pg_statistics
    Pros : planner-friendly, the planner would SELECT from the SRF instead of
looking in pg_statistics, and the estimator could tell the planner that,
for instance, the function will return unique values.
    Cons : complex, maybe slow

    4- Add simple flags to a function, like :
    - returns unique values
    - returns sorted values (no need to sort my results)
    - please execute me and store my results in a temporary storage, count
the rows returned, and plan the outer query accordingly
    - etc.


pgsql-hackers by date:

Previous
From: Markus Schaber
Date:
Subject: Re: [PERFORM] Big IN() clauses etc : feature proposal
Next
From: Martijn van Oosterhout
Date:
Subject: Re: [PERFORM] Big IN() clauses etc : feature proposal