Re: Faster methods for getting SPI results (460%improvement) - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Faster methods for getting SPI results (460%improvement)
Date
Msg-id a96b9506-5587-7290-9eed-8830ce7f318e@openscg.com
Whole thread Raw
In response to Re: Faster methods for getting SPI results (460% improvement)  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Faster methods for getting SPI results (460%improvement)  (Jim Nasby <jim.nasby@openscg.com>)
List pgsql-hackers
On 4/4/17 7:44 PM, Craig Ringer wrote:
> On 5 April 2017 at 08:23, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 5 April 2017 at 08:00, Craig Ringer <craig@2ndquadrant.com> wrote:
> This patch fails to update the documentation at all.
>
> https://www.postgresql.org/docs/devel/static/spi.html

I'll fix that soon.

> missing newline

Fixed.

> +/* Execute a previously prepared plan with a callback Destination */
>
>
> caps "Destination"

Hmm, I capitalized it since DestReceiver is capitalized. I guess it's 
best to just drop Destination.

> +                // XXX throw error if callback is set

Fixed (opted to just use an Assert).

> +static DestReceiver spi_callbackDR = {
> +    donothingReceive, donothingStartup, donothingCleanup, donothingCleanup,
> +    DestSPICallback
> +};
> Presumably that's a default destination you're then supposed to modify
> with your own callbacks? There aren't any comments to indicate what's
> going on here.

Correct. Added the following comment:

> /*
>  * This is strictly a starting point for creating a callback. It should not
>  * actually be used.
>  */



> Comments on patch 2:
>
> If this is the "bare minimum" patch, what is pending? Does it impose
> any downsides or limits?

No limits. I'm not aware of any downsides.

It's "bare minimum" because a follow-on step is to allow different 
methods of returning results. In particular, my testing indicates that 
patch 1 + returning a dict of lists (as opposed to the current list of 
dicts) results in a 460% improvement vs ~30% with patch 2.

> +/* Get switch execution contexts */
> +extern PLyExecutionContext
> *PLy_switch_execution_context(PLyExecutionContext *new);
>
> Comment makes no sense to me. This seems something like memory context
> switch, where you supply the new and return the old. But the comment
> doesn't explain it at all.

Comment changed to
/* Switch execution context (similar to MemoryContextSwitchTo */

> +void PLy_CSStartup(DestReceiver *self, int operation, TupleDesc typeinfo);
> +void PLy_CSDestroy(DestReceiver *self);
>
> These are declared in the plpy_spi.c. Why aren't these static?

Derp. Fixed.

> +    volatile MemoryContext oldcontext;
> +    volatile ResourceOwner oldowner;
>      int         rv;
> -    volatile MemoryContext oldcontext;
> -    volatile ResourceOwner oldowner;
>
> Unnecessary code movement.

IMHO it reads better that way. I've left it for now so $COMMITTER can 
decide, but if it really bugs you let me know and I'll swap it.

> In PLy_Callback_New, I think your use of a sub-context is sensible. Is
> it necessary to palloc0 though?

Hrm, maybe not... but it seems like cheap insurance considering the 
amount of other stuff involved in just starting a new SPI call. And 
honestly, I'd rather not mess with it at this point. :) I have added an 
XXX comment though.

> +static CallbackState *
> +PLy_Callback_Free(CallbackState *callback)
>
> The code here looks like it could be part of spi.c's callback support,
> rather than plpy_spi specific, since you provide a destroy callback in
> the SPI callback struct.

I added this for use in PG_CATCH() blocks, because it's not clear to me 
that the portal gets cleaned up in those cases. It's certainly possible 
that it's pointless.

FWIW, I'm pretty sure I copied that pattern from somewhere else... 
probably plpgsql or pltcl.

> +    /* We need to store this because the TupleDesc the receive
> function gets has no names. */
> +    myState->desc = typeinfo;
>
> Is it safe to just store the pointer to the TupleDesc here? What's the lifetime?

Only Portal lifetime.

> +     * will clean it up when the time is right. XXX This might result in a leak
> +     * if an error happens and the result doesn't get dereferenced.
> +     */
> +    MemoryContextSwitchTo(TopMemoryContext);
> +    result->tupdesc = CreateTupleDescCopy(typeinfo);
>
> especially given this XXX comment...

I've changed the comment to the following. Hopefully this clarifies things:

>     /*
>      * Save tuple descriptor for later use by result set metadata
>      * functions.  Save it in TopMemoryContext so that it survives outside of
>      * an SPI context.  We trust that PLy_result_dealloc() will clean it up
>      * when the time is right. The difference between result and everything
>      * else is that result needs to survive after the portal is destroyed,
>      * because result is what's handed back to the plpython function. While
>      * it's tempting to use something other than TopMemoryContext, that won't
>      * work: the user could potentially put result into the global dictionary,
>      * which means it could survive as long as the session does.  This might
>      * result in a leak if an error happens and the result doesn't get
>      * dereferenced, but if that happens it means the python GC has failed us,
>      * at which point we probably have bigger problems.
>      *
>      * This still isn't perfect though; if something the result tupledesc
>      * references has it's OID changed then the tupledesc will be invalid. I'm
>      * not sure it's worth worrying about that though.
>      */

Updated patches attached, but I still need to update the docs.
-- 
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG                 http://OpenSCG.com

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Interval for launching the table sync worker
Next
From: Craig Ringer
Date:
Subject: Re: Faster methods for getting SPI results (460% improvement)