Speed dblink using alternate libpq tuple storage - Mailing list pgsql-hackers

From Greg Smith
Subject Speed dblink using alternate libpq tuple storage
Date
Msg-id 4F110AD3.2090607@2ndQuadrant.com
Whole thread Raw
In response to Re: Allow substitute allocators for PGresult.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp>)
Responses Re: Speed dblink using alternate libpq tuple storage  (Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp>)
List pgsql-hackers
One patch that fell off the truck during a turn in the November
CommitFest was "Allow substitute allocators for PGresult".  Re-reading
the whole thing again, it actually turned into a rather different
submission in the middle, and I know I didn't follow that shift
correctly then.  I'm replying to its thread but have changed the subject
to reflect that change.  From a procedural point of view, I don't feel
right kicking this back to its author on a Friday night when the
deadline for resubmitting it would be Sunday.  Instead I've refreshed
the patch myself and am adding it to the January CommitFest.  The new
patch is a single file; it's easy enough to split out the dblink changes
if someone wants to work with the pieces separately.

After my meta-review I think we should get another reviewer familiar
with using dblink to look at this next.  This is fundamentally a
performance patch now.  Some results and benchmarking code were
submitted along with it; the other issues are moot if those aren't
reproducible.  The secondary goal for a new review here is to provide
another opinion on the naming issues and abstraction concerns raised so far.

To clear out the original line of thinking, this is not a replacement
low-level storage allocator anymore.  The idea of using such a mechanism
to help catch memory leaks has also been dropped.

Instead this adds and documents a new path for libpq callers to more
directly receive tuples, for both improved speed and lower memory
usage.  dblink has been modified to use this new mechanism.
Benchmarking by the author suggests no significant change in libpq speed
when only that change was made, while the modified dblink using the new
mechanism was significantly faster.  It jumped from 332K tuples/sec to
450K, a 35% gain, and had a lower memory footprint too.  Test
methodology and those results are at
http://archives.postgresql.org/pgsql-hackers/2011-12/msg00008.php

Robert Haas did a quick code review of this already, it along with
author response mixed in are at
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01149.php  I see
two areas of contention there:

-There are several naming bits no one is happy with yet.  Robert didn't
like some of them, but neither did Kyotaro.  I don't have an opinion
myself.  Is it the case that some changes to the existing code's
terminology are what's actually needed to make this all better?  Or is
this just fundamentally warty and there's nothing to be done about it.
Dunno.

-There is an abstraction wrapper vs. coding convenience trade-off
centering around PGresAttValue.  It sounded to me like it raised always
fun questions like "where's the right place for the line between
lipq-fe.h and libpq-int.h to be?"

dblink is pretty popular, and this is a big performance win for it.  If
naming and API boundary issues are the worst problems here, this sounds
like something well worth pursuing as part of 9.2's still advancing
performance theme.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


Attachment

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Disabled features on Hot Standby
Next
From: Robert Haas
Date:
Subject: Re: Disabled features on Hot Standby