Re: [patch] libpq one-row-at-a-time API - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: [patch] libpq one-row-at-a-time API
Date
Msg-id CAHyXU0w5G25FshZtHe4DS1QY3GCUWW6mG-SBebmq3scV2CgyAA@mail.gmail.com
Whole thread Raw
In response to Re: [patch] libpq one-row-at-a-time API  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [patch] libpq one-row-at-a-time API  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On Tue, Jul 24, 2012 at 11:57 AM, Marko Kreen <markokr@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> But, the faster rowbuf method is a generally incompatible way of
>> dealing with data vs current libpq -- this is bad.  If it's truly
>> impossible to get those benefits without bypassing result API that
>> then I remove my objection on the grounds it's optional behavior (my
>> gut tells me it is possible though).
>
> Um, please clarify what are you talking about here?
>
> What is the incompatibility of PGresult from branch 1?

Incompatibility in terms of usage -- we should be getting data with
PQgetdata.  I think you're suspecting that I incorrectly believe your
forced to use the rowbuf API -- I don't (although I wasn't clear on
that earlier).  Basically I'm saying that we should only buy into that
if all other alternative routes to getting the faster performance are
exhausted.

On Tue, Jul 24, 2012 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> I think the dummy copy of PGresult is plausible (if by that you mean
>> optimizing PQgetResult when in single row mode).  That would be even
>> better: you'd remove the need for the rowbuf mode.
>
> I haven't spent any time looking at this, but my gut tells me that a big
> chunk of the expense is copying the PGresult's metadata (the column
> names, types, etc).  It has to be possible to make that cheaper.
>
> One idea is to rearrange the internal storage so that that part reduces
> to one memcpy().  Another thought is to allow PGresults to share
> metadata by treating the metadata as a separate reference-counted
> object.  The second could be a bit hazardous though, as we advertise
> that PGresults are independent objects that can be manipulated by
> separate threads.  I don't want to introduce mutexes into PGresults,
> but I'm not sure reference-counted metadata can be safe without them.
> So maybe the memcpy idea is the only workable one.

Yeah -- we had a very similar problem in libpqtypes and we solved it
exactly as you're thinking.  libpqtypes has to create a result with
each row iteration potentially (we expose rows and composites as on
the fly created result objects) and stores some extra non-trivial data
with the result.  We solved it with the optimized-memcpy method (look
here: http://libpqtypes.esilo.com/browse_source.html?file=libpqtypes.h
and you'll see all the important structs like PGtypeHandler are
somewhat haphazardly designed to be run through a memcpy.   We
couldn't do anything about internal libpq issues though, but some
micro optimization of PQsetResultAttrs (which is called via
PQcopyResult) might fit the bill.

The 'source' result (or source data that would be copied into the
destination result) would be stored in the PGconn, right? So, the idea
is that when you set up single row mode the connection generates a
template PGconn which is then copied out repeatedly during row-by-row
processing.  I like it, but only if we're reasonably confident the
PGresult can be sufficiently optimized like that.

merlin


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: canceling autovacuum task woes
Next
From: Merlin Moncure
Date:
Subject: Re: [patch] libpq one-row-at-a-time API