Home > mailing lists

Re: [patch] libpq one-row-at-a-time API - Mailing list pgsql-hackers

From	Merlin Moncure
Subject	Re: [patch] libpq one-row-at-a-time API
Date	July 24, 2012 18:33:32
Msg-id	CAHyXU0w5G25FshZtHe4DS1QY3GCUWW6mG-SBebmq3scV2CgyAA@mail.gmail.com Whole thread Raw
In response to	Re: [patch] libpq one-row-at-a-time API (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [patch] libpq one-row-at-a-time API (Merlin Moncure <mmoncure@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Jul 24, 2012 at 11:57 AM, Marko Kreen <markokr@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> But, the faster rowbuf method is a generally incompatible way of
>> dealing with data vs current libpq -- this is bad.  If it's truly
>> impossible to get those benefits without bypassing result API that
>> then I remove my objection on the grounds it's optional behavior (my
>> gut tells me it is possible though).
>
> Um, please clarify what are you talking about here?
>
> What is the incompatibility of PGresult from branch 1?

Incompatibility in terms of usage -- we should be getting data with
PQgetdata.  I think you're suspecting that I incorrectly believe your
forced to use the rowbuf API -- I don't (although I wasn't clear on
that earlier).  Basically I'm saying that we should only buy into that
if all other alternative routes to getting the faster performance are
exhausted.

On Tue, Jul 24, 2012 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> I think the dummy copy of PGresult is plausible (if by that you mean
>> optimizing PQgetResult when in single row mode).  That would be even
>> better: you'd remove the need for the rowbuf mode.
>
> I haven't spent any time looking at this, but my gut tells me that a big
> chunk of the expense is copying the PGresult's metadata (the column
> names, types, etc).  It has to be possible to make that cheaper.
>
> One idea is to rearrange the internal storage so that that part reduces
> to one memcpy().  Another thought is to allow PGresults to share
> metadata by treating the metadata as a separate reference-counted
> object.  The second could be a bit hazardous though, as we advertise
> that PGresults are independent objects that can be manipulated by
> separate threads.  I don't want to introduce mutexes into PGresults,
> but I'm not sure reference-counted metadata can be safe without them.
> So maybe the memcpy idea is the only workable one.

Yeah -- we had a very similar problem in libpqtypes and we solved it
exactly as you're thinking.  libpqtypes has to create a result with
each row iteration potentially (we expose rows and composites as on
the fly created result objects) and stores some extra non-trivial data
with the result.  We solved it with the optimized-memcpy method (look
here: http://libpqtypes.esilo.com/browse_source.html?file=libpqtypes.h
and you'll see all the important structs like PGtypeHandler are
somewhat haphazardly designed to be run through a memcpy.   We
couldn't do anything about internal libpq issues though, but some
micro optimization of PQsetResultAttrs (which is called via
PQcopyResult) might fit the bill.

The 'source' result (or source data that would be copied into the
destination result) would be stored in the PGconn, right? So, the idea
is that when you set up single row mode the connection generates a
template PGconn which is then copied out repeatedly during row-by-row
processing.  I like it, but only if we're reasonably confident the
PGresult can be sufficiently optimized like that.

merlin

pgsql-hackers by date:

From: Andrew Dunstan
Date: 24 July 2012, 18:24:43
Subject: Re: canceling autovacuum task woes

From: Merlin Moncure
Date: 24 July 2012, 18:49:20
Subject: Re: [patch] libpq one-row-at-a-time API

Re: [patch] libpq one-row-at-a-time API - Mailing list pgsql-hackers

Previous

Next