Re: Speed dblink using alternate libpq tuple storage - Mailing list pgsql-hackers

From Marko Kreen
Subject Re: Speed dblink using alternate libpq tuple storage
Date
Msg-id 20120127154214.GB4107@gmail.com
Whole thread Raw
In response to Re: Speed dblink using alternate libpq tuple storage  (Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp>)
List pgsql-hackers
On Fri, Jan 27, 2012 at 05:57:01PM +0900, Kyotaro HORIGUCHI wrote:
> Hello, This is a new version of the patch formerly known as
> 'alternative storage for libpq'.
> 
> - Changed the concept to 'Alternative Row Processor' from
>   'Storage handler'. Symbol names are also changed.
> 
> - Callback function is modified following to the comment.
> 
> - From the restriction of time, I did minimum check for this
>   patch. The purpose of this patch is to show the new implement.
> 
> - Proformance is not measured for this patch for the same
>   reason. I will do that on next monday.
> 
> - The meaning of PGresAttValue is changed. The field 'value' now
>   contains a value withOUT terminating zero. This change seems to
>   have no effect on any other portion within the whole source
>   tree of postgresql from what I've seen.


I think we have general structure in place.  Good.

Minor notes:

= rowhandler api =

* It returns bool, so void* is wrong.  Instead libpq style is to use int, with 1=OK, 0=Failure.  Seems that was also
oldpqAddTuple() convention.
 

* Drop PQgetRowProcessorParam(), instead give param as argument.

* PQsetRowProcessorErrMes() should strdup() the message.  That gets rid of allocator requirements in API.  This also
makessafe to pass static strings there.  If strdup() fails, fall back to generic no-mem message.
 

* Create new struct to replace PGresAttValue for rowhandler usage. RowHandler API is pretty unique and self-contained.
Itshould have it's own struct.  Main reason is that it allows to properly document it. Otherwise the minor details get
lostas they are different from libpq-internal usage.  Also this allows two structs to be improved separately.
(PGresRawValue?)

* Stop storing null_value into ->value.  It's libpq internal detail. Instead the ->value should always point into
bufferwhere the value info is located, even for NULL.  This makes safe to simply subtract pointers to get row size
estimate.Seems pqAddTuple() already does null_value logic, so no need to do it in rowhandler api.
 

= libpq =

Currently its confusing whether rowProcessor can be NULL, and what
should be done if so.  I think its better to fix usage so that
it is always set.

* PQregisterRowProcessor() should use default func if func==NULL. and set default handler if so.
* Never set rowProcessor directly, always via PQregisterRowProcessor()
* Drop all if(rowProcessor) checks.

= dblink =

* There are malloc failure checks missing in initStoreInfo() & storeHandler().


-- 
marko


PS.  You did not hear it from me, but most raw values are actually
nul-terminated in protocol.  Think big-endian.  And those which
are not, you can make so, as the data is not touched anymore.
You cannot do it for last value, as next byte may not be allocated.
But you could memmove() it lower address so you can null-terminate.

I'm not suggesting it for official patch, but it would be fun to know
if such hack is benchmarkable, and benchmarkable on realistic load.



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Speed dblink using alternate libpq tuple storage
Next
From: "MauMau"
Date:
Subject: Unreliable "pg_ctl -w start" again