Re: Faster methods for getting SPI results (460% improvement) - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Faster methods for getting SPI results (460% improvement)
Date
Msg-id CAMsr+YHP2D44FG+oAzuUqZUEiJ=nkywo5XSmk4dptmoZK6Um9g@mail.gmail.com
Whole thread Raw
In response to Re: Faster methods for getting SPI results (460%improvement)  (Jim Nasby <jim.nasby@openscg.com>)
List pgsql-hackers
On 6 April 2017 at 11:50, Jim Nasby <jim.nasby@openscg.com> wrote:

> Attached is a complete series of patches that includes the docs patch.

+   <function>SPI_execute_callback</function> is the same as
+   <function>SPI_execute</function>, except that instead of returning results
+   via <structname>SPITupleTable</structname>, the user-supplied
<parameter>callback</parameter>
+   is used. Unlike
+   <function>SPI_execute</function>,
+   <function>SPI_execute_callback</function>
+   will run the callback for every SQL command passed in to
<parameter>command</parameter>.

This doesn't explain why the user should care or prefer this approach.
Maybe after "Unlike":

"<function>SPI_execute_callback</> does not need to accumulate all the
query results into memory before the caller can process them. Instead
of building a <literal>SPI_tuptable</> containing all the results, the
supplied callback is invoked for each row processed by SPI. The row
data is passed to the callback then discarded when the callback
returns. This reduces copying and allows the application to process
results sooner."

The docs do not discuss memory lifetimes. What memory context is each
call invoked in and when is it reset? Something like:

"<literal>rStartup</>, <literal>receiveSlot</> and
<literal>rShutdown</> are all called in a memory context that is reset
for each <function>SPI_execute_callback</>."

Also, what rules apply in terms of what you can/cannot do from within
a callback? Presumably it's unsafe to perform additional SPI calls,
perform transactions, call into the executor, change the current
snapshot, etc, but I would consider that reasonably obvious. Are there
any specific things to avoid?

Under what circumstances is  the query execution context [...] freed"
such that the destroy callback is called?

It is not clear to me when reading the document that the user of
SPI_execute_callback is expected to define their own
DestReceiver-compatible struct, and that the private fields that "may
appear beyond this point" are those defined by the application. How
does the app know that its DestReceiver is compatible with the result
of CreateDestReceiver(DestSPICallback)? Should it copy the fields?
etc. Your PLPython example incorporates DestReceiver by-value in its
own state struct; maybe your docs program listing should illustrate
that instead, show how to initialize the destreceiver member, and
separately list the callbacks in destreceiver in an <itemisedlist> ?


It definitely needs to mention the relevant parts of plpython to look
at for an example of usage.

> Right now, the docs don't include a concrete example, because adding one
> would be a pretty large if it demonstrated real usage, which presumably
> means Yet Another Contrib Module strictly for the purpose of demonstrating
> something. Rather than doing that, ISTM it'd be better to point the user at
> what plpythonu is doing.

Seems fine to me.

Notes on the docs aside, I am pretty happy with this and think it's
reasonable to proceed with it for Pg 10.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Implementation of SASLprep for SCRAM-SHA-256
Next
From: Craig Ringer
Date:
Subject: Re: Faster methods for getting SPI results (460% improvement)