Thread: Re: Speed dblink using alternate libpq tuple storage

Re: Speed dblink using alternate libpq tuple storage

From

horiguchi.kyotaro@oss.ntt.co.jp

Date:

30 January 2012, 07:42:41

I'm sorry.

> Thank you for comments, this is revised version of the patch.

The malloc error handling in dblink.c of the patch is broken. It
is leaking memory and breaking control.

i'll re-send the properly fixed patch for dblink.c later.

# This severe back pain should have made me stupid :-p

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

30 January 2012, 23:00:13

This is fixed version of dblink.c for row processor.

> i'll re-send the properly fixed patch for dblink.c later.

- malloc error in initStoreInfo throws ERRCODE_OUT_OF_MEMORY. (new error)

- storeHandler() now returns FALSE on malloc failure. Garbage cleanup is done in dblink_fetch() or
dblink_record_internal().The behavior that this dblink displays this error as 'unkown error/could not execute query' on
theuser session is as it did before.
 

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..7a82ea1 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn    bool        newXactForCursor;        /* Opened a transaction for a
cursor*/} remoteConn;
 
+typedef struct storeInfo
+{
+    Tuplestorestate *tuplestore;
+    int nattrs;
+    MemoryContext oldcontext;
+    AttInMetadata *attinmeta;
+    char** valbuf;
+    int *valbuflen;
+    bool error_occurred;
+    bool nummismatch;
+    ErrorData *edata;
+} storeInfo;
+/* * Internal declarations */static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);static remoteConn *getConnectionByName(const
char*name);static HTAB *createConnHash(void);static void createNewConnection(const char *name, remoteConn *rconn);
 
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);static void validate_pkattnums(Relation rel,
          int2vector *pkattnums_arg, int32 pknumatts_arg,                   int **pkattnums, int *pknumatts);
 
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+/* Global */static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)    char       *curname = NULL;    int            howmany = 0;
bool       fail = true;    /* default to backward compatible */
 
+    storeInfo   storeinfo;    DBLINK_INIT;
@@ -559,15 +576,36 @@ dblink_fetch(PG_FUNCTION_ARGS)    appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
/*
 
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
+    /*     * Try to execute the query.  Note that since libpq uses malloc, the     * PGresult will be long-lived even
thoughwe are still in a short-lived     * memory context.     */    res = PQexec(conn, buf.data);
 
+    finishStoreInfo(&storeinfo);
+    if (!res ||        (PQresultStatus(res) != PGRES_COMMAND_OK &&         PQresultStatus(res) != PGRES_TUPLES_OK))
{
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }
+        else if (storeinfo.edata)
+            ReThrowError(storeinfo.edata);
+        dblink_res_error(conname, res, "could not fetch from cursor", fail);        return (Datum) 0;    }
@@ -579,8 +617,8 @@ dblink_fetch(PG_FUNCTION_ARGS)                (errcode(ERRCODE_INVALID_CURSOR_NAME),
errmsg("cursor \"%s\" does not exist", curname)));    }
 
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
@@ -640,6 +678,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    remoteConn *rconn = NULL;    bool
      fail = true;    /* default to backward compatible */    bool        freeconn = false;
 
+    storeInfo   storeinfo;    /* check to see if caller supports us returning a tuplestore */    if (rsinfo == NULL ||
!IsA(rsinfo,ReturnSetInfo))
 
@@ -715,164 +754,217 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    rsinfo->setResult = NULL;
rsinfo->setDesc= NULL;
 
+
+    /*
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec/PQgetResult below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+    /* synchronous query, or async result retrieval */    if (!is_async)        res = PQexec(conn, sql);    else
-    {        res = PQgetResult(conn);
-        /* NULL means we're all done with the async results */
-        if (!res)
-            return (Datum) 0;
-    }
-    /* if needed, close the connection to the database and cleanup */
-    if (freeconn)
-        PQfinish(conn);
+    finishStoreInfo(&storeinfo);
-    if (!res ||
-        (PQresultStatus(res) != PGRES_COMMAND_OK &&
-         PQresultStatus(res) != PGRES_TUPLES_OK))
+    /* NULL res from async get means we're all done with the results */
+    if (res || !is_async)    {
-        dblink_res_error(conname, res, "could not execute query", fail);
-        return (Datum) 0;
+        if (freeconn)
+            PQfinish(conn);
+
+        if (!res ||
+            (PQresultStatus(res) != PGRES_COMMAND_OK &&
+             PQresultStatus(res) != PGRES_TUPLES_OK))
+        {
+            /* finishStoreInfo saves the fields referred to below. */
+            if (storeinfo.nummismatch)
+            {
+                /* This is only for backward compatibility */
+                ereport(ERROR,
+                        (errcode(ERRCODE_DATATYPE_MISMATCH),
+                         errmsg("remote query result rowtype does not match "
+                                "the specified FROM clause rowtype")));
+            }
+            else if (storeinfo.edata)
+                ReThrowError(storeinfo.edata);
+
+            dblink_res_error(conname, res, "could not execute query", fail);
+            return (Datum) 0;
+        }    }
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo){    ReturnSetInfo *rsinfo = (ReturnSetInfo *)
fcinfo->resultinfo;
+    TupleDesc    tupdesc;
+    int i;
+    
+    switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+    {
+        case TYPEFUNC_COMPOSITE:
+            /* success */
+            break;
+        case TYPEFUNC_RECORD:
+            /* failed to determine actual type of RECORD */
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+            break;
+        default:
+            /* result type isn't composite */
+            elog(ERROR, "return type must be a row type");
+            break;
+    }
+    
+    sinfo->oldcontext = MemoryContextSwitchTo(
+        rsinfo->econtext->ecxt_per_query_memory);
+
+    /* make sure we have a persistent copy of the tupdesc */
+    tupdesc = CreateTupleDescCopy(tupdesc);
+
+    sinfo->error_occurred = FALSE;
+    sinfo->nummismatch = FALSE;
+    sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+    sinfo->edata = NULL;
+    sinfo->nattrs = tupdesc->natts;
+    sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+    sinfo->valbuf = NULL;
+    sinfo->valbuflen = NULL;
+
+    /* Preallocate memory of same size with c string array for values. */
+    sinfo->valbuf = (char **) malloc(sinfo->nattrs * sizeof(char*));
+    sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+    if (sinfo->valbuf == NULL || sinfo->valbuflen == NULL)
+    {
+        finishStoreInfo(sinfo);
+        ereport(ERROR,
+                (errcode(ERRCODE_OUT_OF_MEMORY),
+                 errmsg("out of memory")));
+    }
-    Assert(rsinfo->returnMode == SFRM_Materialize);
-
-    PG_TRY();
+    for (i = 0 ; i < sinfo->nattrs ; i++)    {
-        TupleDesc    tupdesc;
-        bool        is_sql_cmd = false;
-        int            ntuples;
-        int            nfields;
+        sinfo->valbuf[i] = NULL;
+        sinfo->valbuflen[i] = -1;
+    }
-        if (PQresultStatus(res) == PGRES_COMMAND_OK)
-        {
-            is_sql_cmd = true;
-
-            /*
-             * need a tuple descriptor representing one TEXT column to return
-             * the command status string as our result tuple
-             */
-            tupdesc = CreateTemplateTupleDesc(1, false);
-            TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-                               TEXTOID, -1, 0);
-            ntuples = 1;
-            nfields = 1;
-        }
-        else
-        {
-            Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+    rsinfo->setResult = sinfo->tuplestore;
+    rsinfo->setDesc = tupdesc;
+}
-            is_sql_cmd = false;
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+    int i;
-            /* get a tuple descriptor for our result type */
-            switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+    if (sinfo->valbuf)
+    {
+        for (i = 0 ; i < sinfo->nattrs ; i++)
+        {
+            if (sinfo->valbuf[i])            {
-                case TYPEFUNC_COMPOSITE:
-                    /* success */
-                    break;
-                case TYPEFUNC_RECORD:
-                    /* failed to determine actual type of RECORD */
-                    ereport(ERROR,
-                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                        errmsg("function returning record called in context "
-                               "that cannot accept type record")));
-                    break;
-                default:
-                    /* result type isn't composite */
-                    elog(ERROR, "return type must be a row type");
-                    break;
+                free(sinfo->valbuf[i]);
+                sinfo->valbuf[i] = NULL;            }
-
-            /* make sure we have a persistent copy of the tupdesc */
-            tupdesc = CreateTupleDescCopy(tupdesc);
-            ntuples = PQntuples(res);
-            nfields = PQnfields(res);        }
+        free(sinfo->valbuf);
+        sinfo->valbuf = NULL;
+    }
-        /*
-         * check result and tuple descriptor have the same number of columns
-         */
-        if (nfields != tupdesc->natts)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("remote query result rowtype does not match "
-                            "the specified FROM clause rowtype")));
-
-        if (ntuples > 0)
-        {
-            AttInMetadata *attinmeta;
-            Tuplestorestate *tupstore;
-            MemoryContext oldcontext;
-            int            row;
-            char      **values;
-
-            attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-            oldcontext = MemoryContextSwitchTo(
-                                    rsinfo->econtext->ecxt_per_query_memory);
-            tupstore = tuplestore_begin_heap(true, false, work_mem);
-            rsinfo->setResult = tupstore;
-            rsinfo->setDesc = tupdesc;
-            MemoryContextSwitchTo(oldcontext);
-
-            values = (char **) palloc(nfields * sizeof(char *));
+    if (sinfo->valbuflen)
+    {
+        free(sinfo->valbuflen);
+        sinfo->valbuflen = NULL;
+    }
+    MemoryContextSwitchTo(sinfo->oldcontext);
+}
-            /* put all tuples into the tuplestore */
-            for (row = 0; row < ntuples; row++)
-            {
-                HeapTuple    tuple;
+static int
+storeHandler(PGresult *res, void *param, PGrowValue *columns)
+{
+    storeInfo *sinfo = (storeInfo *)param;
+    HeapTuple  tuple;
+    int        fields = PQnfields(res);
+    int        i;
+    char      *cstrs[PQnfields(res)];
-                if (!is_sql_cmd)
-                {
-                    int            i;
+    if (sinfo->error_occurred)
+        return FALSE;
-                    for (i = 0; i < nfields; i++)
-                    {
-                        if (PQgetisnull(res, row, i))
-                            values[i] = NULL;
-                        else
-                            values[i] = PQgetvalue(res, row, i);
-                    }
-                }
-                else
-                {
-                    values[0] = PQcmdStatus(res);
-                }
+    if (sinfo->nattrs != fields)
+    {
+        sinfo->error_occurred = TRUE;
+        sinfo->nummismatch = TRUE;
+        finishStoreInfo(sinfo);
+
+        /* This error will be processed in
+         * dblink_record_internal(). So do not set error message
+         * here. */
+        return FALSE;
+    }
-                /* build the tuple and put it into the tuplestore. */
-                tuple = BuildTupleFromCStrings(attinmeta, values);
-                tuplestore_puttuple(tupstore, tuple);
+    /*
+     * value input functions assumes that the input string is
+     * terminated by zero. We should make the values to be so.
+     */
+    for(i = 0 ; i < fields ; i++)
+    {
+        int len = columns[i].len;
+        if (len < 0)
+            cstrs[i] = NULL;
+        else
+        {
+            if (sinfo->valbuf[i] == NULL)
+            {
+                sinfo->valbuf[i] = (char *)malloc(len + 1);
+                sinfo->valbuflen[i] = len + 1;
+            }
+            else if (sinfo->valbuflen[i] < len + 1)
+            {
+                sinfo->valbuf[i] = (char *)realloc(sinfo->valbuf[i], len + 1);
+                sinfo->valbuflen[i] = len + 1;            }
-            /* clean up and return the tuplestore */
-            tuplestore_donestoring(tupstore);
+            if (sinfo->valbuf[i] == NULL)
+                return FALSE;
+
+            cstrs[i] = sinfo->valbuf[i];
+            memcpy(cstrs[i], columns[i].value, len);
+            cstrs[i][len] = '\0';        }
+    }
-        PQclear(res);
+    PG_TRY();
+    {
+        tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+        tuplestore_puttuple(sinfo->tuplestore, tuple);    }    PG_CATCH();    {
-        /* be sure to release the libpq result */
-        PQclear(res);
-        PG_RE_THROW();
+        MemoryContext context;
+        /*
+         * Store exception for later ReThrow and cancel the exception.
+         */
+        sinfo->error_occurred = TRUE;
+        context = MemoryContextSwitchTo(sinfo->oldcontext);
+        sinfo->edata = CopyErrorData();
+        MemoryContextSwitchTo(context);
+        FlushErrorState();
+        return FALSE;    }    PG_END_TRY();
+
+    return TRUE;}/*

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

31 January 2012, 03:19:39

On Tue, Jan 31, 2012 at 4:59 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:
> This is fixed version of dblink.c for row processor.
>
>> i'll re-send the properly fixed patch for dblink.c later.
>
> - malloc error in initStoreInfo throws ERRCODE_OUT_OF_MEMORY. (new error)
>
> - storeHandler() now returns FALSE on malloc failure. Garbage
>  cleanup is done in dblink_fetch() or dblink_record_internal().
>  The behavior that this dblink displays this error as 'unkown
>  error/could not execute query' on the user session is as it did
>  before.

Another thing: if realloc() fails, the old pointer stays valid.
So it needs to be assigned to temp variable first, before
overwriting old pointer.

And seems malloc() is preferable to palloc() to avoid
exceptions inside row processor.  Although latter
one could be made to work, it might be unnecessary
complexity.  (store current pqresult into remoteConn?)

--
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

01 February 2012, 04:32:39

Hello,

> Another thing: if realloc() fails, the old pointer stays valid.
> So it needs to be assigned to temp variable first, before
> overwriting old pointer.
mmm. I've misunderstood of the realloc.. I'll fix there in the
next patch.

> And seems malloc() is preferable to palloc() to avoid
> exceptions inside row processor.  Although latter
> one could be made to work, it might be unnecessary
> complexity.  (store current pqresult into remoteConn?)

Hmm.. palloc may throw ERRCODE_OUT_OF_MEMORY so I must catch it
and return NULL. That seems there is no difference to using
malloc after all.. However, the inhibition of throwing exceptions
in RowProcessor is not based on any certain fact, so palloc here
may make sense if we can do that.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

01 February 2012, 05:52:42

On Wed, Feb 1, 2012 at 10:32 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:
>> Another thing: if realloc() fails, the old pointer stays valid.
>> So it needs to be assigned to temp variable first, before
>> overwriting old pointer.
>
>  mmm. I've misunderstood of the realloc.. I'll fix there in the
> next patch.

Please wait a moment, I started doing small cleanups,
and now have some larger ones too.  I'll send it soon.

OTOH, if you have already done something, you can send it,
I have various states in GIT so it should not be hard
to merge things.

>> And seems malloc() is preferable to palloc() to avoid
>> exceptions inside row processor.  Although latter
>> one could be made to work, it might be unnecessary
>> complexity.  (store current pqresult into remoteConn?)
>
> Hmm.. palloc may throw ERRCODE_OUT_OF_MEMORY so I must catch it
> and return NULL. That seems there is no difference to using
> malloc after all.. However, the inhibition of throwing exceptions
> in RowProcessor is not based on any certain fact, so palloc here
> may make sense if we can do that.

No, I was thinking about storing the result in connection
struct and then letting the exception pass, as the
PGresult can be cleaned later.  Thus we could get rid
of TRY/CATCH in per-row handler.  (At that point
the PGresult is already under PGconn, so maybe
it's enough to clean that one later?)

But for now, as the TRY is already there, it should be
also simple to move palloc usage inside it.

--
marko

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

01 February 2012, 10:20:26

On Wed, Feb 01, 2012 at 11:52:27AM +0200, Marko Kreen wrote:
> Please wait a moment, I started doing small cleanups,
> and now have some larger ones too.  I'll send it soon.

I started doing small comment cleanups, but then the changes
spread out a bit...

Kyotaro-san, please review.  If you don't find anything to fix,
then it's ready for commiter, I think.

Changes:

== dblink ==

- Increase area in bigger steps

- Fix realloc leak on failure

== libpq ==

- Replace dynamic stack array with malloced rowBuf on PGconn.
  It is only increased, never decreased.

- Made PGresult.rowProcessorErrMsg allocated with pqResultStrdup.
  Seems more proper.  Although it would be even better to
  use ->errmsg for it, but it's lifetime and usage
  are unclear to me.

- Removed the rowProcessor, rowProcessorParam from PGresult.
  They are unnecessary there.

- Move conditional msg outside from libpq_gettext()
- Removed the unnecessary memcpy() from pqAddRow

- Moved PQregisterRowProcessor to fe-exec.c, made pqAddRow static.

- Restored pqAddTuple export.

- Renamed few symbols:
  PQregisterRowProcessor -> PQsetRowProcessor
  RowProcessor -> PQrowProcessor
  Mes -> Msg  (more common abbreviation)

- Updated some comments

- Updated sgml doc

== Benchmark ==

I did try to benchmark whether the patch affects stock libpq usage.
But, uh, could not draw any conclusions.  It *seems* that patch is
bit faster with few columns, bit slower with many, but the difference
seems smaller than test noise.  OTOH, the test noise is pretty big,
so maybe I don't have stable-enough setup to properly benchmark.

As the test-app does not actually touch resultset, it seems probable
that application that actually does something with resultset,
will no see the difference.

It would be nice if anybody who has stable pgbench setup could
run few tests to see whether the patch moves things either way.

Just in case, I attached the minimal test files.

--
marko

Attachment

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

02 February 2012, 03:52:07

Hello, This is new version of dblink.c

- Memory is properly freed when realloc returns NULL in storeHandler().

- The bug that free() in finishStoreInfo() will be fed with garbage pointer when malloc for sinfo->valbuflen fails is
fixed.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..28c967c 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn    bool        newXactForCursor;        /* Opened a transaction for a
cursor*/} remoteConn;
 
+typedef struct storeInfo
+{
+    Tuplestorestate *tuplestore;
+    int nattrs;
+    MemoryContext oldcontext;
+    AttInMetadata *attinmeta;
+    char** valbuf;
+    int *valbuflen;
+    bool error_occurred;
+    bool nummismatch;
+    ErrorData *edata;
+} storeInfo;
+/* * Internal declarations */static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);static remoteConn *getConnectionByName(const
char*name);static HTAB *createConnHash(void);static void createNewConnection(const char *name, remoteConn *rconn);
 
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);static void validate_pkattnums(Relation rel,
          int2vector *pkattnums_arg, int32 pknumatts_arg,                   int **pkattnums, int *pknumatts);
 
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+/* Global */static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)    char       *curname = NULL;    int            howmany = 0;
bool       fail = true;    /* default to backward compatible */
 
+    storeInfo   storeinfo;    DBLINK_INIT;
@@ -559,15 +576,36 @@ dblink_fetch(PG_FUNCTION_ARGS)    appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
/*
 
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
+    /*     * Try to execute the query.  Note that since libpq uses malloc, the     * PGresult will be long-lived even
thoughwe are still in a short-lived     * memory context.     */    res = PQexec(conn, buf.data);
 
+    finishStoreInfo(&storeinfo);
+    if (!res ||        (PQresultStatus(res) != PGRES_COMMAND_OK &&         PQresultStatus(res) != PGRES_TUPLES_OK))
{
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }
+        else if (storeinfo.edata)
+            ReThrowError(storeinfo.edata);
+        dblink_res_error(conname, res, "could not fetch from cursor", fail);        return (Datum) 0;    }
@@ -579,8 +617,8 @@ dblink_fetch(PG_FUNCTION_ARGS)                (errcode(ERRCODE_INVALID_CURSOR_NAME),
errmsg("cursor \"%s\" does not exist", curname)));    }
 
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
@@ -640,6 +678,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    remoteConn *rconn = NULL;    bool
      fail = true;    /* default to backward compatible */    bool        freeconn = false;
 
+    storeInfo   storeinfo;    /* check to see if caller supports us returning a tuplestore */    if (rsinfo == NULL ||
!IsA(rsinfo,ReturnSetInfo))
 
@@ -715,164 +754,225 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    rsinfo->setResult = NULL;
rsinfo->setDesc= NULL;
 
+
+    /*
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec/PQgetResult below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+    /* synchronous query, or async result retrieval */    if (!is_async)        res = PQexec(conn, sql);    else
-    {        res = PQgetResult(conn);
-        /* NULL means we're all done with the async results */
-        if (!res)
-            return (Datum) 0;
-    }
-    /* if needed, close the connection to the database and cleanup */
-    if (freeconn)
-        PQfinish(conn);
+    finishStoreInfo(&storeinfo);
-    if (!res ||
-        (PQresultStatus(res) != PGRES_COMMAND_OK &&
-         PQresultStatus(res) != PGRES_TUPLES_OK))
+    /* NULL res from async get means we're all done with the results */
+    if (res || !is_async)    {
-        dblink_res_error(conname, res, "could not execute query", fail);
-        return (Datum) 0;
+        if (freeconn)
+            PQfinish(conn);
+
+        if (!res ||
+            (PQresultStatus(res) != PGRES_COMMAND_OK &&
+             PQresultStatus(res) != PGRES_TUPLES_OK))
+        {
+            /* finishStoreInfo saves the fields referred to below. */
+            if (storeinfo.nummismatch)
+            {
+                /* This is only for backward compatibility */
+                ereport(ERROR,
+                        (errcode(ERRCODE_DATATYPE_MISMATCH),
+                         errmsg("remote query result rowtype does not match "
+                                "the specified FROM clause rowtype")));
+            }
+            else if (storeinfo.edata)
+                ReThrowError(storeinfo.edata);
+
+            dblink_res_error(conname, res, "could not execute query", fail);
+            return (Datum) 0;
+        }    }
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo){    ReturnSetInfo *rsinfo = (ReturnSetInfo *)
fcinfo->resultinfo;
+    TupleDesc    tupdesc;
+    int i;
+    
+    switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+    {
+        case TYPEFUNC_COMPOSITE:
+            /* success */
+            break;
+        case TYPEFUNC_RECORD:
+            /* failed to determine actual type of RECORD */
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+            break;
+        default:
+            /* result type isn't composite */
+            elog(ERROR, "return type must be a row type");
+            break;
+    }
+    
+    sinfo->oldcontext = MemoryContextSwitchTo(
+        rsinfo->econtext->ecxt_per_query_memory);
+
+    /* make sure we have a persistent copy of the tupdesc */
+    tupdesc = CreateTupleDescCopy(tupdesc);
+
+    sinfo->error_occurred = FALSE;
+    sinfo->nummismatch = FALSE;
+    sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+    sinfo->edata = NULL;
+    sinfo->nattrs = tupdesc->natts;
+    sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+    sinfo->valbuf = NULL;
+    sinfo->valbuflen = NULL;
+
+    /* Preallocate memory of same size with c string array for values. */
+    sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+    if (sinfo->valbuf)
+        sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+    if (sinfo->valbuflen == NULL)
+    {
+        if (sinfo->valbuf)
+            free(sinfo->valbuf);
-    Assert(rsinfo->returnMode == SFRM_Materialize);
+        ereport(ERROR,
+                (errcode(ERRCODE_OUT_OF_MEMORY),
+                 errmsg("out of memory")));
+    }
-    PG_TRY();
+    for (i = 0 ; i < sinfo->nattrs ; i++)    {
-        TupleDesc    tupdesc;
-        bool        is_sql_cmd = false;
-        int            ntuples;
-        int            nfields;
+        sinfo->valbuf[i] = NULL;
+        sinfo->valbuflen[i] = -1;
+    }
-        if (PQresultStatus(res) == PGRES_COMMAND_OK)
-        {
-            is_sql_cmd = true;
+    rsinfo->setResult = sinfo->tuplestore;
+    rsinfo->setDesc = tupdesc;
+}
-            /*
-             * need a tuple descriptor representing one TEXT column to return
-             * the command status string as our result tuple
-             */
-            tupdesc = CreateTemplateTupleDesc(1, false);
-            TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-                               TEXTOID, -1, 0);
-            ntuples = 1;
-            nfields = 1;
-        }
-        else
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+    int i;
+
+    if (sinfo->valbuf)
+    {
+        for (i = 0 ; i < sinfo->nattrs ; i++)        {
-            Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+            if (sinfo->valbuf[i])
+                free(sinfo->valbuf[i]);
+        }
+        free(sinfo->valbuf);
+        sinfo->valbuf = NULL;
+    }
-            is_sql_cmd = false;
+    if (sinfo->valbuflen)
+    {
+        free(sinfo->valbuflen);
+        sinfo->valbuflen = NULL;
+    }
+    MemoryContextSwitchTo(sinfo->oldcontext);
+}
-            /* get a tuple descriptor for our result type */
-            switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-            {
-                case TYPEFUNC_COMPOSITE:
-                    /* success */
-                    break;
-                case TYPEFUNC_RECORD:
-                    /* failed to determine actual type of RECORD */
-                    ereport(ERROR,
-                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                        errmsg("function returning record called in context "
-                               "that cannot accept type record")));
-                    break;
-                default:
-                    /* result type isn't composite */
-                    elog(ERROR, "return type must be a row type");
-                    break;
-            }
+static int
+storeHandler(PGresult *res, void *param, PGrowValue *columns)
+{
+    storeInfo *sinfo = (storeInfo *)param;
+    HeapTuple  tuple;
+    int        fields = PQnfields(res);
+    int        i;
+    char      *cstrs[PQnfields(res)];
-            /* make sure we have a persistent copy of the tupdesc */
-            tupdesc = CreateTupleDescCopy(tupdesc);
-            ntuples = PQntuples(res);
-            nfields = PQnfields(res);
-        }
+    if (sinfo->error_occurred)
+        return FALSE;
-        /*
-         * check result and tuple descriptor have the same number of columns
-         */
-        if (nfields != tupdesc->natts)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("remote query result rowtype does not match "
-                            "the specified FROM clause rowtype")));
+    if (sinfo->nattrs != fields)
+    {
+        sinfo->error_occurred = TRUE;
+        sinfo->nummismatch = TRUE;
+        finishStoreInfo(sinfo);
+
+        /* This error will be processed in
+         * dblink_record_internal(). So do not set error message
+         * here. */
+        return FALSE;
+    }
-        if (ntuples > 0)
+    /*
+     * value input functions assumes that the input string is
+     * terminated by zero. We should make the values to be so.
+     */
+    for(i = 0 ; i < fields ; i++)
+    {
+        int len = columns[i].len;
+        if (len < 0)
+            cstrs[i] = NULL;
+        else        {
-            AttInMetadata *attinmeta;
-            Tuplestorestate *tupstore;
-            MemoryContext oldcontext;
-            int            row;
-            char      **values;
-
-            attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-            oldcontext = MemoryContextSwitchTo(
-                                    rsinfo->econtext->ecxt_per_query_memory);
-            tupstore = tuplestore_begin_heap(true, false, work_mem);
-            rsinfo->setResult = tupstore;
-            rsinfo->setDesc = tupdesc;
-            MemoryContextSwitchTo(oldcontext);
-
-            values = (char **) palloc(nfields * sizeof(char *));
-
-            /* put all tuples into the tuplestore */
-            for (row = 0; row < ntuples; row++)
-            {
-                HeapTuple    tuple;
+            char *tmp = NULL;
-                if (!is_sql_cmd)
-                {
-                    int            i;
+            /*
+             * Divide calls to malloc and realloc so that things will
+             * go fine even on the systems of which realloc() does not
+             * accept NULL as old memory block.
+             */
+            if (sinfo->valbuf[i] == NULL)
+                tmp = (char *)malloc(len + 1);
+            else
+                tmp = (char *)realloc(sinfo->valbuf[i], len + 1);
-                    for (i = 0; i < nfields; i++)
-                    {
-                        if (PQgetisnull(res, row, i))
-                            values[i] = NULL;
-                        else
-                            values[i] = PQgetvalue(res, row, i);
-                    }
-                }
-                else
-                {
-                    values[0] = PQcmdStatus(res);
-                }
+            /*
+             * sinfo->valbuf[n] will be freed in finishStoreInfo()
+             * when realloc returns NULL.
+             */
+            if (tmp == NULL)
+                return FALSE;
-                /* build the tuple and put it into the tuplestore. */
-                tuple = BuildTupleFromCStrings(attinmeta, values);
-                tuplestore_puttuple(tupstore, tuple);
-            }
+            sinfo->valbuf[i] = tmp;
+            sinfo->valbuflen[i] = len + 1;
-            /* clean up and return the tuplestore */
-            tuplestore_donestoring(tupstore);
+            cstrs[i] = sinfo->valbuf[i];
+            memcpy(cstrs[i], columns[i].value, len);
+            cstrs[i][len] = '\0';        }
+    }
-        PQclear(res);
+    PG_TRY();
+    {
+        tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+        tuplestore_puttuple(sinfo->tuplestore, tuple);    }    PG_CATCH();    {
-        /* be sure to release the libpq result */
-        PQclear(res);
-        PG_RE_THROW();
+        MemoryContext context;
+        /*
+         * Store exception for later ReThrow and cancel the exception.
+         */
+        sinfo->error_occurred = TRUE;
+        context = MemoryContextSwitchTo(sinfo->oldcontext);
+        sinfo->edata = CopyErrorData();
+        MemoryContextSwitchTo(context);
+        FlushErrorState();
+        return FALSE;    }    PG_END_TRY();
+
+    return TRUE;}/*

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

02 February 2012, 10:31:31

On Thu, Feb 02, 2012 at 04:51:37PM +0900, Kyotaro HORIGUCHI wrote:
> Hello, This is new version of dblink.c
>
> - Memory is properly freed when realloc returns NULL in storeHandler().
>
> - The bug that free() in finishStoreInfo() will be fed with
>   garbage pointer when malloc for sinfo->valbuflen fails is
>   fixed.

Thanks, merged.  I also did some minor coding style cleanups.

Tagging it Ready For Committer.  I don't see any notable
issues with the patch anymore.

There is some potential for experimenting with more aggressive
optimizations on dblink side, but I'd like to get a nod from
a committer for libpq changes first.

--
marko

Attachment

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

03 February 2012, 00:01:49

Hello,

> Thanks, merged.  I also did some minor coding style cleanups.

Thank you for editing many comments and some code I'd left
unchanged from my carelessness, and lack of understanding your
comments. I'll be more careful about that...

> There is some potential for experimenting with more aggressive
> optimizations on dblink side, but I'd like to get a nod from
> a committer for libpq changes first.

I'm looking forward to the aggressiveness of that:-)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Shigeru Hanada

Date:

07 February 2012, 06:25:13

(2012/02/02 23:30), Marko Kreen wrote:
> On Thu, Feb 02, 2012 at 04:51:37PM +0900, Kyotaro HORIGUCHI wrote:
>> Hello, This is new version of dblink.c
>>
>> - Memory is properly freed when realloc returns NULL in storeHandler().
>>
>> - The bug that free() in finishStoreInfo() will be fed with
>>    garbage pointer when malloc for sinfo->valbuflen fails is
>>    fixed.
> 
> Thanks, merged.  I also did some minor coding style cleanups.
> 
> Tagging it Ready For Committer.  I don't see any notable
> issues with the patch anymore.
> 
> There is some potential for experimenting with more aggressive
> optimizations on dblink side, but I'd like to get a nod from
> a committer for libpq changes first.

I tried to use this feature in pgsql_fdw patch, and found some small issues.

- Typos   - mes -> msg   - funcion -> function   - overritten -> overwritten   - costom -> custom
- What is the right (or recommended) way to prevent from throwing
exceptoin in row-processor callback function?  When author should use
PG_TRY block to catch exception in the callback function?
- It would be better to describe how to determine whether a column
result is NULL should be described clearly.  Perhaps result value is
NULL when PGrowValue.len is less than 0, right?

Regards,
-- 
Shigeru Hanada

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

07 February 2012, 10:44:39

On Tue, Feb 07, 2012 at 07:25:14PM +0900, Shigeru Hanada wrote:
> (2012/02/02 23:30), Marko Kreen wrote:
> > On Thu, Feb 02, 2012 at 04:51:37PM +0900, Kyotaro HORIGUCHI wrote:
> >> Hello, This is new version of dblink.c
> >>
> >> - Memory is properly freed when realloc returns NULL in storeHandler().
> >>
> >> - The bug that free() in finishStoreInfo() will be fed with
> >>    garbage pointer when malloc for sinfo->valbuflen fails is
> >>    fixed.
> >
> > Thanks, merged.  I also did some minor coding style cleanups.
> >
> > Tagging it Ready For Committer.  I don't see any notable
> > issues with the patch anymore.
> >
> > There is some potential for experimenting with more aggressive
> > optimizations on dblink side, but I'd like to get a nod from
> > a committer for libpq changes first.
>
> I tried to use this feature in pgsql_fdw patch, and found some small issues.
>
> - Typos
>     - mes -> msg
>     - funcion -> function
>     - overritten -> overwritten
>     - costom -> custom

Fixed.

> - What is the right (or recommended) way to prevent from throwing
> exceptoin in row-processor callback function?  When author should use
> PG_TRY block to catch exception in the callback function?

When it calls backend functions that can throw exceptions?
As all handlers running in backend will want to convert data
to Datums, that means "always wrap handler code in PG_TRY"?

Although it seems we could allow exceptions, at least when we are
speaking of Postgres backend, as the connection and result are
internally consistent state when the handler is called, and the
partial PGresult is stored under PGconn->result.  Even the
connection is in consistent state, as the row packet is
fully processed.

So we have 3 variants, all work, but which one do we want to support?

1) Exceptions are not allowed.
2) Exceptions are allowed, but when it happens, user must call
   PQfinish() next, to avoid processing incoming data from
   potentially invalid state.
3) Exceptions are allowed, and further row processing can continue
   with PQisBusy() / PQgetResult()
3.1) The problematic row is retried.  (Current behaviour)
3.2) The problematic row is skipped.

No clue if that is ok for handler written in C++, I have no idea
whether you can throw C++ exception when part of the stack
contains raw C calls.

> - It would be better to describe how to determine whether a column
> result is NULL should be described clearly.  Perhaps result value is
> NULL when PGrowValue.len is less than 0, right?

Eh, seems it's documented everywhere except in sgml doc.  Fixed.
[ Is it better to document that it is "-1" or "< 0"? ]

Also I removed one remaining dynamic stack array in dblink.c

Current state of patch attached.

--
marko

Attachment

Re: Speed dblink using alternate libpq tuple storage

From

Robert Haas

Date:

07 February 2012, 11:58:25

On Tue, Feb 7, 2012 at 9:44 AM, Marko Kreen <markokr@gmail.com> wrote:
>> - What is the right (or recommended) way to prevent from throwing
>> exceptoin in row-processor callback function?  When author should use
>> PG_TRY block to catch exception in the callback function?
>
> When it calls backend functions that can throw exceptions?
> As all handlers running in backend will want to convert data
> to Datums, that means "always wrap handler code in PG_TRY"?

I would hate to have such a rule.  PG_TRY isn't free, and it's prone
to subtle bugs, like failing to mark enough stuff in the same function
"volatile".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Speed dblink using alternate libpq tuple storage

From

Shigeru Hanada

Date:

08 February 2012, 01:44:18

(2012/02/07 23:44), Marko Kreen wrote:
> On Tue, Feb 07, 2012 at 07:25:14PM +0900, Shigeru Hanada wrote:
>> - What is the right (or recommended) way to prevent from throwing
>> exceptoin in row-processor callback function?  When author should use
>> PG_TRY block to catch exception in the callback function?
> 
> When it calls backend functions that can throw exceptions?
> As all handlers running in backend will want to convert data
> to Datums, that means "always wrap handler code in PG_TRY"?

ISTM that telling a) what happens to PGresult and PGconn when row
processor ends without return, and b) how to recover them would be
necessary.  We can't assume much about caller because libpq is a client
library.  IMHO, it's most important to tell authors of row processor
clearly what should be done on error case.

In such case, must we call PQfinish, or is calling PQgetResult until it
returns NULL enough to reuse the connection?  IIUC calling
pqClearAsyncResult seems sufficient, but it's not exported for client...

Regards,
-- 
Shigeru Hanada

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

08 February 2012, 07:51:37

On Wed, Feb 08, 2012 at 02:44:13PM +0900, Shigeru Hanada wrote:
> (2012/02/07 23:44), Marko Kreen wrote:
> > On Tue, Feb 07, 2012 at 07:25:14PM +0900, Shigeru Hanada wrote:
> >> - What is the right (or recommended) way to prevent from throwing
> >> exceptoin in row-processor callback function?  When author should use
> >> PG_TRY block to catch exception in the callback function?
> > 
> > When it calls backend functions that can throw exceptions?
> > As all handlers running in backend will want to convert data
> > to Datums, that means "always wrap handler code in PG_TRY"?
> 
> ISTM that telling a) what happens to PGresult and PGconn when row
> processor ends without return, and b) how to recover them would be
> necessary.  We can't assume much about caller because libpq is a client
> library.  IMHO, it's most important to tell authors of row processor
> clearly what should be done on error case.

Yes.

> In such case, must we call PQfinish, or is calling PQgetResult until it
> returns NULL enough to reuse the connection?  IIUC calling
> pqClearAsyncResult seems sufficient, but it's not exported for client...

Simply dropping ->result is not useful as there are still rows
coming in.  Now you cannot process them anymore.

The rule of "after exception it's valid to close the connection with PQfinish()
or continue processing with PQgetResult()/PQisBusy()/PQconsumeInput()" seems
quite simple and clear.  Perhaps only clarification whats valid on sync
connection and whats valid on async connection would be useful.

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

13 February 2012, 19:39:37

On Tue, Feb 07, 2012 at 04:44:09PM +0200, Marko Kreen wrote:
> Although it seems we could allow exceptions, at least when we are
> speaking of Postgres backend, as the connection and result are
> internally consistent state when the handler is called, and the
> partial PGresult is stored under PGconn->result.  Even the
> connection is in consistent state, as the row packet is
> fully processed.
>
> So we have 3 variants, all work, but which one do we want to support?
>
> 1) Exceptions are not allowed.
> 2) Exceptions are allowed, but when it happens, user must call
>    PQfinish() next, to avoid processing incoming data from
>    potentially invalid state.
> 3) Exceptions are allowed, and further row processing can continue
>    with PQisBusy() / PQgetResult()
> 3.1) The problematic row is retried.  (Current behaviour)
> 3.2) The problematic row is skipped.

I converted the patch to support 3.2), that is - skip row on exception.
That required some cleanups of getAnotherTuple() API, plus I made some
other cleanups.  Details below.

But during this I also started to think what happens if the user does
*not* throw exceptions.  Eg. Python does exceptions via regular return
values, which means complications when passing them upwards.

The main case I'm thinking of is actually resultset iterator in high-level
language.  Current callback-only style API requires that rows are
stuffed into temporary buffer until the network blocks and user code
gets control again.  This feels clumsy for a performance-oriented API.
Another case is user code that wants to do complex processing.  Doing
lot of stuff under callback seems dubious.  And what if some part of
code calls PQfinish() during some error recovery?

I tried imaging some sort of getFoo() style API for fetching in-flight
row data, but I always ended up with "rewrite libpq" step, so I feel
it's not productive to go there.

Instead I added simple feature: rowProcessor can return '2',
in which case getAnotherTuple() does early exit without setting
any error state.  In user side it appears as PQisBusy() returned
with TRUE result.  All pointers stay valid, so callback can just
stuff them into some temp area.  ATM there is not indication though
whether the exit was due to callback or other reasons, so user
must detect it based on whether new temp pointers appeares,
which means those must be cleaned before calling PQisBusy() again.
This actually feels like feature, those must not stay around
after single call.

It's included in main patch, but I also attached it as separate patch
so that it can be examined separately and reverted if not acceptable.

But as it's really simple, I recommend including it.

It's usage might now be obvious though, should we include
example code in doc?

New feature:

* Row processor can return 2, then PQisBusy() does early exit.
  Supportde only when connection is in non-blocking mode.

Cleanups:

* Row data is tagged as processed when rowProcessor is called,
  so exceptions will skip the row.  This simplifies non-exceptional
  handling as well.

* Converted 'return EOF' in V3 getAnotherTuple() to set error instead.
  AFAICS those EOFs are remnants from V2 getAnotherTuple()
  not something that is coded deliberately.  Note that when
  v3 getAnotherTuple() is called the row packet is fully in buffer.
  The EOF on broken packet to signify 'give me more data' does not
  result in any useful behaviour, instead you can simply get
  into infinite loop.

Fix bugs in my previous changes:

* Split the OOM error handling from custom error message handling,
  previously the error message in PGresult was lost due to OOM logic
  early free of PGresult.

* Drop unused goto label from experimental debug code.

--
marko

Attachment

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

16 February 2012, 01:24:57

Hello, sorry for long absense.

As far as I see, on an out-of-memory in getAnotherTuple() makes
conn->result->resultStatus = PGRES_FATAL_ERROR and
qpParseInputp[23]() skips succeeding 'D' messages consequently.

When exception raised within row processor, pg_conn->inCursor
always positioned in consistent and result->resultStatus ==
PGRES_TUPLES_OK.

The choices of the libpq user on that point are,

- Continue to read succeeding tuples.
 Call PQgetResult() to read 'D' messages and hand it to row processor succeedingly.

- Throw away the remaining results.
 Call pqClearAsyncResult() and pqSaveErrorResult(), then call PQgetResult() to skip over the succeeding 'D' messages.
(Ofcourse the user can't do that on current implement.)
 

To make the users able to select the second choice (I think this
is rather major), we should only provide and export the new PQ*
function to do that, I think.

void
PQskipRemainingResult(PGconn *conn)
{ pqClearAsyncResult(conn);  /* conn->result is always NULL here */ pqSaveErrorResult(conn);
 /* Skip over remaining 'D' messages. * / PQgetResult(conn);
}

User may write code with this function.

...
PG_TRY();
{ ... res = PQexec(conn, "...."); ...
}
PG_CATCH();
{ PQskipRemainingResult(conn); goto error;
}
PG_END_TRY();


Of cource, this is applicable to C++ client in the same manner.

try { ... res = PQexec(conn, "...."); ...
} catch (const myexcep& ex) { PQskipRemainingResult(conn); throw ex;
}


By the way, where should I insert this function ?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

16 February 2012, 04:17:44

On Thu, Feb 16, 2012 at 02:24:19PM +0900, Kyotaro HORIGUCHI wrote:
> As far as I see, on an out-of-memory in getAnotherTuple() makes
> conn->result->resultStatus = PGRES_FATAL_ERROR and
> qpParseInputp[23]() skips succeeding 'D' messages consequently.
> 
> When exception raised within row processor, pg_conn->inCursor
> always positioned in consistent and result->resultStatus ==
> PGRES_TUPLES_OK.
> 
> The choices of the libpq user on that point are,
> 
> - Continue to read succeeding tuples.
> 
>   Call PQgetResult() to read 'D' messages and hand it to row
>   processor succeedingly.
> 
> - Throw away the remaining results.
> 
>   Call pqClearAsyncResult() and pqSaveErrorResult(), then call
>   PQgetResult() to skip over the succeeding 'D' messages. (Of
>   course the user can't do that on current implement.)

There is also third choice, which may be even more popular than
those ones - PQfinish().
> To make the users able to select the second choice (I think this
> is rather major), we should only provide and export the new PQ*
> function to do that, I think.

I think we already have such function - PQsetRowProcessor().
Considering the user can use that to install skipping callback
or simply set some flag in it's own per-connection state,
I suspect the need is not that big.

But if you want to set error state for skipping, I would instead
generalize PQsetRowProcessorErrMsg() to support setting error state
outside of callback.  That would also help the external processing with
'return 2'.  But I would keep the requirement that row processing must
be ongoing, standalone error setting does not make sense.  Thus the name
can stay.

There seems to be 2 ways to do it:

1) Replace the PGresult under PGconn.  This seems ot require that  PQsetRowProcessorErrMsg takes PGconn as argument
insteadof  PGresult.  This also means callback should get PGconn as  argument.  Kind of makes sense even.

2) Convert current partial PGresult to error state.  That also  makes sense, current use ->rowProcessorErrMsg to
transport the message to later set the error in caller feels weird.

I guess I slightly prefer 2) to 1).

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

16 February 2012, 04:50:26

Hello,
I added the function PQskipRemainingResult() and use it in
dblink. This reduces the number of executing try-catch block from
the number of rows to one per query in dblink.

- fe-exec.c : new function PQskipRemainingResult.
- dblink.c  : using PQskipRemainingResult in dblink_record_internal().
- libpq.sgml: documentation for PQskipRemainingResult and related             change in RowProcessor.

> Instead I added simple feature: rowProcessor can return '2',
> in which case getAnotherTuple() does early exit without setting
> any error state.  In user side it appears as PQisBusy() returned
> with TRUE result.  All pointers stay valid, so callback can just
> stuff them into some temp area.
...
> It's included in main patch, but I also attached it as separate patch
> so that it can be examined separately and reverted if not acceptable.

This patch is based on the patch above and composed in the same
manner - main three patches include all modifications and the '2'
patch separately.

This patch is not rebased to the HEAD because the HEAD yields
error about the symbol LEAKPROOF...

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/libpq/#fe-protocol3.c# b/src/interfaces/libpq/#fe-protocol3.c#
new file mode 100644
index 0000000..8b7eed2
--- /dev/null
+++ b/src/interfaces/libpq/#fe-protocol3.c#
@@ -0,0 +1,1967 @@
+/*-------------------------------------------------------------------------
+ *
+ * fe-protocol3.c
+ *      functions that are specific to frontend/backend protocol version 3
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *      src/interfaces/libpq/fe-protocol3.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <ctype.h>
+#include <fcntl.h>
+
+#include "libpq-fe.h"
+#include "libpq-int.h"
+
+#include "mb/pg_wchar.h"
+
+#ifdef WIN32
+#include "win32.h"
+#else
+#include <unistd.h>
+#include <netinet/in.h>
+#ifdef HAVE_NETINET_TCP_H
+#include <netinet/tcp.h>
+#endif
+#include <arpa/inet.h>
+#endif
+
+
+/*
+ * This macro lists the backend message types that could be "long" (more
+ * than a couple of kilobytes).
+ */
+#define VALID_LONG_MESSAGE_TYPE(id) \
+    ((id) == 'T' || (id) == 'D' || (id) == 'd' || (id) == 'V' || \
+     (id) == 'E' || (id) == 'N' || (id) == 'A')
+
+
+static void handleSyncLoss(PGconn *conn, char id, int msgLength);
+static int    getRowDescriptions(PGconn *conn);
+static int    getParamDescriptions(PGconn *conn);
+static int    getAnotherTuple(PGconn *conn, int msgLength);
+static int    getParameterStatus(PGconn *conn);
+static int    getNotify(PGconn *conn);
+static int    getCopyStart(PGconn *conn, ExecStatusType copytype);
+static int    getReadyForQuery(PGconn *conn);
+static void reportErrorPosition(PQExpBuffer msg, const char *query,
+                    int loc, int encoding);
+static int build_startup_packet(const PGconn *conn, char *packet,
+                     const PQEnvironmentOption *options);
+
+
+/*
+ * parseInput: if appropriate, parse input data from backend
+ * until input is exhausted or a stopping state is reached.
+ * Note that this function will NOT attempt to read more data from the backend.
+ */
+void
+pqParseInput3(PGconn *conn)
+{
+    char        id;
+    int            msgLength;
+    int            avail;
+
+    /*
+     * Loop to parse successive complete messages available in the buffer.
+     */
+    for (;;)
+    {
+        /*
+         * Try to read a message.  First get the type code and length. Return
+         * if not enough data.
+         */
+        conn->inCursor = conn->inStart;
+        if (pqGetc(&id, conn))
+            return;
+        if (pqGetInt(&msgLength, 4, conn))
+            return;
+
+        /*
+         * Try to validate message type/length here.  A length less than 4 is
+         * definitely broken.  Large lengths should only be believed for a few
+         * message types.
+         */
+        if (msgLength < 4)
+        {
+            handleSyncLoss(conn, id, msgLength);
+            return;
+        }
+        if (msgLength > 30000 && !VALID_LONG_MESSAGE_TYPE(id))
+        {
+            handleSyncLoss(conn, id, msgLength);
+            return;
+        }
+
+        /*
+         * Can't process if message body isn't all here yet.
+         */
+        msgLength -= 4;
+        avail = conn->inEnd - conn->inCursor;
+        if (avail < msgLength)
+        {
+            /*
+             * Before returning, enlarge the input buffer if needed to hold
+             * the whole message.  This is better than leaving it to
+             * pqReadData because we can avoid multiple cycles of realloc()
+             * when the message is large; also, we can implement a reasonable
+             * recovery strategy if we are unable to make the buffer big
+             * enough.
+             */
+            if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength,
+                                     conn))
+            {
+                /*
+                 * XXX add some better recovery code... plan is to skip over
+                 * the message using its length, then report an error. For the
+                 * moment, just treat this like loss of sync (which indeed it
+                 * might be!)
+                 */
+                handleSyncLoss(conn, id, msgLength);
+            }
+            return;
+        }
+
+        /*
+         * NOTIFY and NOTICE messages can happen in any state; always process
+         * them right away.
+         *
+         * Most other messages should only be processed while in BUSY state.
+         * (In particular, in READY state we hold off further parsing until
+         * the application collects the current PGresult.)
+         *
+         * However, if the state is IDLE then we got trouble; we need to deal
+         * with the unexpected message somehow.
+         *
+         * ParameterStatus ('S') messages are a special case: in IDLE state we
+         * must process 'em (this case could happen if a new value was adopted
+         * from config file due to SIGHUP), but otherwise we hold off until
+         * BUSY state.
+         */
+        if (id == 'A')
+        {
+            if (getNotify(conn))
+                return;
+        }
+        else if (id == 'N')
+        {
+            if (pqGetErrorNotice3(conn, false))
+                return;
+        }
+        else if (conn->asyncStatus != PGASYNC_BUSY)
+        {
+            /* If not IDLE state, just wait ... */
+            if (conn->asyncStatus != PGASYNC_IDLE)
+                return;
+
+            /*
+             * Unexpected message in IDLE state; need to recover somehow.
+             * ERROR messages are displayed using the notice processor;
+             * ParameterStatus is handled normally; anything else is just
+             * dropped on the floor after displaying a suitable warning
+             * notice.    (An ERROR is very possibly the backend telling us why
+             * it is about to close the connection, so we don't want to just
+             * discard it...)
+             */
+            if (id == 'E')
+            {
+                if (pqGetErrorNotice3(conn, false /* treat as notice */ ))
+                    return;
+            }
+            else if (id == 'S')
+            {
+                if (getParameterStatus(conn))
+                    return;
+            }
+            else
+            {
+                pqInternalNotice(&conn->noticeHooks,
+                        "message type 0x%02x arrived from server while idle",
+                                 id);
+                /* Discard the unexpected message */
+                conn->inCursor += msgLength;
+            }
+        }
+        else
+        {
+            /*
+             * In BUSY state, we can process everything.
+             */
+            switch (id)
+            {
+                case 'C':        /* command complete */
+                    if (pqGets(&conn->workBuffer, conn))
+                        return;
+                    if (conn->result == NULL)
+                    {
+                        conn->result = PQmakeEmptyPGresult(conn,
+                                                           PGRES_COMMAND_OK);
+                        if (!conn->result)
+                            return;
+                    }
+                    strncpy(conn->result->cmdStatus, conn->workBuffer.data,
+                            CMDSTATUS_LEN);
+                    conn->asyncStatus = PGASYNC_READY;
+                    break;
+                case 'E':        /* error return */
+                    if (pqGetErrorNotice3(conn, true))
+                        return;
+                    conn->asyncStatus = PGASYNC_READY;
+                    break;
+                case 'Z':        /* backend is ready for new query */
+                    if (getReadyForQuery(conn))
+                        return;
+                    conn->asyncStatus = PGASYNC_IDLE;
+                    break;
+                case 'I':        /* empty query */
+                    if (conn->result == NULL)
+                    {
+                        conn->result = PQmakeEmptyPGresult(conn,
+                                                           PGRES_EMPTY_QUERY);
+                        if (!conn->result)
+                            return;
+                    }
+                    conn->asyncStatus = PGASYNC_READY;
+                    break;
+                case '1':        /* Parse Complete */
+                    /* If we're doing PQprepare, we're done; else ignore */
+                    if (conn->queryclass == PGQUERY_PREPARE)
+                    {
+                        if (conn->result == NULL)
+                        {
+                            conn->result = PQmakeEmptyPGresult(conn,
+                                                           PGRES_COMMAND_OK);
+                            if (!conn->result)
+                                return;
+                        }
+                        conn->asyncStatus = PGASYNC_READY;
+                    }
+                    break;
+                case '2':        /* Bind Complete */
+                case '3':        /* Close Complete */
+                    /* Nothing to do for these message types */
+                    break;
+                case 'S':        /* parameter status */
+                    if (getParameterStatus(conn))
+                        return;
+                    break;
+                case 'K':        /* secret key data from the backend */
+
+                    /*
+                     * This is expected only during backend startup, but it's
+                     * just as easy to handle it as part of the main loop.
+                     * Save the data and continue processing.
+                     */
+                    if (pqGetInt(&(conn->be_pid), 4, conn))
+                        return;
+                    if (pqGetInt(&(conn->be_key), 4, conn))
+                        return;
+                    break;
+                case 'T':        /* Row Description */
+                    if (conn->result == NULL ||
+                        conn->queryclass == PGQUERY_DESCRIBE)
+                    {
+                        /* First 'T' in a query sequence */
+                        if (getRowDescriptions(conn))
+                            return;
+
+                        /*
+                         * If we're doing a Describe, we're ready to pass the
+                         * result back to the client.
+                         */
+                        if (conn->queryclass == PGQUERY_DESCRIBE)
+                            conn->asyncStatus = PGASYNC_READY;
+                    }
+                    else
+                    {
+                        /*
+                         * A new 'T' message is treated as the start of
+                         * another PGresult.  (It is not clear that this is
+                         * really possible with the current backend.) We stop
+                         * parsing until the application accepts the current
+                         * result.
+                         */
+                        conn->asyncStatus = PGASYNC_READY;
+                        return;
+                    }
+                    break;
+                case 'n':        /* No Data */
+
+                    /*
+                     * NoData indicates that we will not be seeing a
+                     * RowDescription message because the statement or portal
+                     * inquired about doesn't return rows.
+                     *
+                     * If we're doing a Describe, we have to pass something
+                     * back to the client, so set up a COMMAND_OK result,
+                     * instead of TUPLES_OK.  Otherwise we can just ignore
+                     * this message.
+                     */
+                    if (conn->queryclass == PGQUERY_DESCRIBE)
+                    {
+                        if (conn->result == NULL)
+                        {
+                            conn->result = PQmakeEmptyPGresult(conn,
+                                                           PGRES_COMMAND_OK);
+                            if (!conn->result)
+                                return;
+                        }
+                        conn->asyncStatus = PGASYNC_READY;
+                    }
+                    break;
+                case 't':        /* Parameter Description */
+                    if (getParamDescriptions(conn))
+                        return;
+                    break;
+                case 'D':        /* Data Row */
+                    if (conn->result != NULL &&
+                        conn->result->resultStatus == PGRES_TUPLES_OK)
+                    {
+                        /* Read another tuple of a normal query response */
+                        if (getAnotherTuple(conn, msgLength))
+                            return;
+                    }
+                    else if (conn->result != NULL &&
+                             conn->result->resultStatus == PGRES_FATAL_ERROR)
+                    {
+                        /*
+                         * We've already choked for some reason.  Just discard
+                         * tuples till we get to the end of the query.
+                         */
+                        conn->inCursor += msgLength;
+                    }
+                    else
+                    {
+                        /* Set up to report error at end of query */
+                        printfPQExpBuffer(&conn->errorMessage,
+                                          libpq_gettext("server sent data (\"D\" message) without prior row
description(\"T\" message)\n"));
 
+                        pqSaveErrorResult(conn);
+                        /* Discard the unexpected message */
+                        conn->inCursor += msgLength;
+                    }
+                    break;
+                case 'G':        /* Start Copy In */
+                    if (getCopyStart(conn, PGRES_COPY_IN))
+                        return;
+                    conn->asyncStatus = PGASYNC_COPY_IN;
+                    break;
+                case 'H':        /* Start Copy Out */
+                    if (getCopyStart(conn, PGRES_COPY_OUT))
+                        return;
+                    conn->asyncStatus = PGASYNC_COPY_OUT;
+                    conn->copy_already_done = 0;
+                    break;
+                case 'W':        /* Start Copy Both */
+                    if (getCopyStart(conn, PGRES_COPY_BOTH))
+                        return;
+                    conn->asyncStatus = PGASYNC_COPY_BOTH;
+                    conn->copy_already_done = 0;
+                    break;
+                case 'd':        /* Copy Data */
+
+                    /*
+                     * If we see Copy Data, just silently drop it.    This would
+                     * only occur if application exits COPY OUT mode too
+                     * early.
+                     */
+                    conn->inCursor += msgLength;
+                    break;
+                case 'c':        /* Copy Done */
+
+                    /*
+                     * If we see Copy Done, just silently drop it.    This is
+                     * the normal case during PQendcopy.  We will keep
+                     * swallowing data, expecting to see command-complete for
+                     * the COPY command.
+                     */
+                    break;
+                default:
+                    printfPQExpBuffer(&conn->errorMessage,
+                                      libpq_gettext(
+                                                    "unexpected response from server; first received character was
\"%c\"\n"),
+                                      id);
+                    /* build an error result holding the error message */
+                    pqSaveErrorResult(conn);
+                    /* not sure if we will see more, so go to ready state */
+                    conn->asyncStatus = PGASYNC_READY;
+                    /* Discard the unexpected message */
+                    conn->inCursor += msgLength;
+                    break;
+            }                    /* switch on protocol character */
+        }
+        /* Successfully consumed this message */
+        if (conn->inCursor == conn->inStart + 5 + msgLength)
+        {
+            /* Normal case: parsing agrees with specified length */
+            conn->inStart = conn->inCursor;
+        }
+        else
+        {
+            /* Trouble --- report it */
+            printfPQExpBuffer(&conn->errorMessage,
+                              libpq_gettext("message contents do not agree with length in message type \"%c\"\n"),
+                              id);
+            /* build an error result holding the error message */
+            pqSaveErrorResult(conn);
+            conn->asyncStatus = PGASYNC_READY;
+            /* trust the specified message length as what to skip */
+            conn->inStart += 5 + msgLength;
+        }
+    }
+}
+
+/*
+ * handleSyncLoss: clean up after loss of message-boundary sync
+ *
+ * There isn't really a lot we can do here except abandon the connection.
+ */
+static void
+handleSyncLoss(PGconn *conn, char id, int msgLength)
+{
+    printfPQExpBuffer(&conn->errorMessage,
+                      libpq_gettext(
+    "lost synchronization with server: got message type \"%c\", length %d\n"),
+                      id, msgLength);
+    /* build an error result holding the error message */
+    pqSaveErrorResult(conn);
+    conn->asyncStatus = PGASYNC_READY;    /* drop out of GetResult wait loop */
+
+    pqsecure_close(conn);
+    closesocket(conn->sock);
+    conn->sock = -1;
+    conn->status = CONNECTION_BAD;        /* No more connection to backend */
+}
+
+/*
+ * parseInput subroutine to read a 'T' (row descriptions) message.
+ * We'll build a new PGresult structure (unless called for a Describe
+ * command for a prepared statement) containing the attribute data.
+ * Returns: 0 if completed message, EOF if not enough data yet.
+ *
+ * Note that if we run out of data, we have to release the partially
+ * constructed PGresult, and rebuild it again next time.  Fortunately,
+ * that shouldn't happen often, since 'T' messages usually fit in a packet.
+ */
+static int
+getRowDescriptions(PGconn *conn)
+{
+    PGresult   *result;
+    int            nfields;
+    int            i;
+
+    /*
+     * When doing Describe for a prepared statement, there'll already be a
+     * PGresult created by getParamDescriptions, and we should fill data into
+     * that.  Otherwise, create a new, empty PGresult.
+     */
+    if (conn->queryclass == PGQUERY_DESCRIBE)
+    {
+        if (conn->result)
+            result = conn->result;
+        else
+            result = PQmakeEmptyPGresult(conn, PGRES_COMMAND_OK);
+    }
+    else
+        result = PQmakeEmptyPGresult(conn, PGRES_TUPLES_OK);
+    if (!result)
+        goto failure;
+
+    /* parseInput already read the 'T' label and message length. */
+    /* the next two bytes are the number of fields */
+    if (pqGetInt(&(result->numAttributes), 2, conn))
+        goto failure;
+    nfields = result->numAttributes;
+
+    /* allocate space for the attribute descriptors */
+    if (nfields > 0)
+    {
+        result->attDescs = (PGresAttDesc *)
+            pqResultAlloc(result, nfields * sizeof(PGresAttDesc), TRUE);
+        if (!result->attDescs)
+            goto failure;
+        MemSet(result->attDescs, 0, nfields * sizeof(PGresAttDesc));
+    }
+
+    /* result->binary is true only if ALL columns are binary */
+    result->binary = (nfields > 0) ? 1 : 0;
+
+    /* get type info */
+    for (i = 0; i < nfields; i++)
+    {
+        int            tableid;
+        int            columnid;
+        int            typid;
+        int            typlen;
+        int            atttypmod;
+        int            format;
+
+        if (pqGets(&conn->workBuffer, conn) ||
+            pqGetInt(&tableid, 4, conn) ||
+            pqGetInt(&columnid, 2, conn) ||
+            pqGetInt(&typid, 4, conn) ||
+            pqGetInt(&typlen, 2, conn) ||
+            pqGetInt(&atttypmod, 4, conn) ||
+            pqGetInt(&format, 2, conn))
+        {
+            goto failure;
+        }
+
+        /*
+         * Since pqGetInt treats 2-byte integers as unsigned, we need to
+         * coerce these results to signed form.
+         */
+        columnid = (int) ((int16) columnid);
+        typlen = (int) ((int16) typlen);
+        format = (int) ((int16) format);
+
+        result->attDescs[i].name = pqResultStrdup(result,
+                                                  conn->workBuffer.data);
+        if (!result->attDescs[i].name)
+            goto failure;
+        result->attDescs[i].tableid = tableid;
+        result->attDescs[i].columnid = columnid;
+        result->attDescs[i].format = format;
+        result->attDescs[i].typid = typid;
+        result->attDescs[i].typlen = typlen;
+        result->attDescs[i].atttypmod = atttypmod;
+
+        if (format != 1)
+            result->binary = 0;
+    }
+
+    /* Success! */
+    conn->result = result;
+    return 0;
+
+failure:
+
+    /*
+     * Discard incomplete result, unless it's from getParamDescriptions.
+     *
+     * Note that if we hit a bufferload boundary while handling the
+     * describe-statement case, we'll forget any PGresult space we just
+     * allocated, and then reallocate it on next try.  This will bloat the
+     * PGresult a little bit but the space will be freed at PQclear, so it
+     * doesn't seem worth trying to be smarter.
+     */
+    if (result != conn->result)
+        PQclear(result);
+    return EOF;
+}
+
+/*
+ * parseInput subroutine to read a 't' (ParameterDescription) message.
+ * We'll build a new PGresult structure containing the parameter data.
+ * Returns: 0 if completed message, EOF if not enough data yet.
+ *
+ * Note that if we run out of data, we have to release the partially
+ * constructed PGresult, and rebuild it again next time.  Fortunately,
+ * that shouldn't happen often, since 't' messages usually fit in a packet.
+ */
+static int
+getParamDescriptions(PGconn *conn)
+{
+    PGresult   *result;
+    int            nparams;
+    int            i;
+
+    result = PQmakeEmptyPGresult(conn, PGRES_COMMAND_OK);
+    if (!result)
+        goto failure;
+
+    /* parseInput already read the 't' label and message length. */
+    /* the next two bytes are the number of parameters */
+    if (pqGetInt(&(result->numParameters), 2, conn))
+        goto failure;
+    nparams = result->numParameters;
+
+    /* allocate space for the parameter descriptors */
+    if (nparams > 0)
+    {
+        result->paramDescs = (PGresParamDesc *)
+            pqResultAlloc(result, nparams * sizeof(PGresParamDesc), TRUE);
+        if (!result->paramDescs)
+            goto failure;
+        MemSet(result->paramDescs, 0, nparams * sizeof(PGresParamDesc));
+    }
+
+    /* get parameter info */
+    for (i = 0; i < nparams; i++)
+    {
+        int            typid;
+
+        if (pqGetInt(&typid, 4, conn))
+            goto failure;
+        result->paramDescs[i].typid = typid;
+    }
+
+    /* Success! */
+    conn->result = result;
+    return 0;
+
+failure:
+    PQclear(result);
+    return EOF;
+}
+
+/*
+ * parseInput subroutine to read a 'D' (row data) message.
+ * We add another tuple to the existing PGresult structure.
+ * Returns: 0 if completed message, EOF if error or not enough data yet.
+ *
+ * Note that if we run out of data, we have to suspend and reprocess
+ * the message after more data is received.  We keep a partially constructed
+ * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ */
+static int
+getAnotherTuple(PGconn *conn, int msgLength)
+{
+    PGresult   *result = conn->result;
+    int            nfields = result->numAttributes;
+    PGrowValue  rowval[result->numAttributes + 1];
+    int            tupnfields;        /* # fields from tuple */
+    int            vlen;            /* length of the current field value */
+    int            i;
+
+    /* Allocate tuple space if first time for this data message */
+    /* Get the field count and make sure it's what we expect */
+    if (pqGetInt(&tupnfields, 2, conn))
+        return EOF;
+
+    if (tupnfields != nfields)
+    {
+        /* Replace partially constructed result with an error result */
+        printfPQExpBuffer(&conn->errorMessage,
+                 libpq_gettext("unexpected field count in \"D\" message\n"));
+        pqSaveErrorResult(conn);
+        /* Discard the failed message by pretending we read it */
+        conn->inCursor = conn->inStart + 5 + msgLength;
+        return 0;
+    }
+
+    /* Scan the fields */
+    for (i = 0; i < nfields; i++)
+    {
+        /* get the value length */
+        if (pqGetInt(&vlen, 4, conn))
+            return EOF;
+        if (vlen == -1)
+            vlen = NULL_LEN;
+        else if (vlen < 0)
+            vlen = 0;
+        
+        /*
+         * Buffer content may be shifted on reloading additional
+         * data. So we must set all pointers on every scan.
+         * 
+         * rowval[i].value always points to the next address of the
+         * length field even if the value length is zero or the value
+         * is NULL for the access safety.
+         */
+        rowval[i].value = conn->inBuffer + conn->inCursor;
+         rowval[i].len = vlen;
+
+        /* Skip to the next length field */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            return EOF;
+    }
+
+    /*
+     * Set rowval[nfields] for the access safety. We can estimate the
+     * length of the buffer to store by
+     *
+     *    rowval[nfields].value - rowval[0].value - 4 * nfields.
+     */
+    rowval[nfields].value = conn->inBuffer + conn->inCursor;
+    rowval[nfields].len = NULL_LEN;
+
+    /* Success!  Pass the completed row values to rowProcessor */
+    if (!result->rowProcessor(result, result->rowProcessorParam, rowval))
+        goto rowProcessError;
+    
+    /* Free garbage error message. */
+    if (result->rowProcessorErrMes)
+    {
+        free(result->rowProcessorErrMes);
+        result->rowProcessorErrMes = NULL;
+    }
+
+    return 0;
+
+rowProcessError:
+
+    /*
+     * Replace partially constructed result with an error result. First
+     * discard the old result to try to win back some memory.
+     */
+    pqClearAsyncResult(conn);
+    resetPQExpBuffer(&conn->errorMessage);
+
+    /*
+     * If error message is passed from addTupleFunc, set it into
+     * PGconn, assume out of memory if not.
+     */
+    appendPQExpBufferStr(&conn->errorMessage,
+                         libpq_gettext(result->rowProcessorErrMes ?
+                                       result->rowProcessorErrMes : 
+                                       "out of memory for query result\n"));
+    if (result->rowProcessorErrMes)
+    {
+o        free(result->rowProcessorErrMes);
+        result->rowProcessorErrMes = NULL;
+    }
+    pqSaveErrorResult(conn);
+
+    /* Discard the failed message by pretending we read it */
+    conn->inCursor = conn->inStart + 5 + msgLength;
+    return 0;
+}
+
+
+/*
+ * Attempt to read an Error or Notice response message.
+ * This is possible in several places, so we break it out as a subroutine.
+ * Entry: 'E' or 'N' message type and length have already been consumed.
+ * Exit: returns 0 if successfully consumed message.
+ *         returns EOF if not enough data.
+ */
+int
+pqGetErrorNotice3(PGconn *conn, bool isError)
+{
+    PGresult   *res = NULL;
+    PQExpBufferData workBuf;
+    char        id;
+    const char *val;
+    const char *querytext = NULL;
+    int            querypos = 0;
+
+    /*
+     * Since the fields might be pretty long, we create a temporary
+     * PQExpBuffer rather than using conn->workBuffer.    workBuffer is intended
+     * for stuff that is expected to be short.    We shouldn't use
+     * conn->errorMessage either, since this might be only a notice.
+     */
+    initPQExpBuffer(&workBuf);
+
+    /*
+     * Make a PGresult to hold the accumulated fields.    We temporarily lie
+     * about the result status, so that PQmakeEmptyPGresult doesn't uselessly
+     * copy conn->errorMessage.
+     */
+    res = PQmakeEmptyPGresult(conn, PGRES_EMPTY_QUERY);
+    if (!res)
+        goto fail;
+    res->resultStatus = isError ? PGRES_FATAL_ERROR : PGRES_NONFATAL_ERROR;
+
+    /*
+     * Read the fields and save into res.
+     */
+    for (;;)
+    {
+        if (pqGetc(&id, conn))
+            goto fail;
+        if (id == '\0')
+            break;                /* terminator found */
+        if (pqGets(&workBuf, conn))
+            goto fail;
+        pqSaveMessageField(res, id, workBuf.data);
+    }
+
+    /*
+     * Now build the "overall" error message for PQresultErrorMessage.
+     *
+     * Also, save the SQLSTATE in conn->last_sqlstate.
+     */
+    resetPQExpBuffer(&workBuf);
+    val = PQresultErrorField(res, PG_DIAG_SEVERITY);
+    if (val)
+        appendPQExpBuffer(&workBuf, "%s:  ", val);
+    val = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+    if (val)
+    {
+        if (strlen(val) < sizeof(conn->last_sqlstate))
+            strcpy(conn->last_sqlstate, val);
+        if (conn->verbosity == PQERRORS_VERBOSE)
+            appendPQExpBuffer(&workBuf, "%s: ", val);
+    }
+    val = PQresultErrorField(res, PG_DIAG_MESSAGE_PRIMARY);
+    if (val)
+        appendPQExpBufferStr(&workBuf, val);
+    val = PQresultErrorField(res, PG_DIAG_STATEMENT_POSITION);
+    if (val)
+    {
+        if (conn->verbosity != PQERRORS_TERSE && conn->last_query != NULL)
+        {
+            /* emit position as a syntax cursor display */
+            querytext = conn->last_query;
+            querypos = atoi(val);
+        }
+        else
+        {
+            /* emit position as text addition to primary message */
+            /* translator: %s represents a digit string */
+            appendPQExpBuffer(&workBuf, libpq_gettext(" at character %s"),
+                              val);
+        }
+    }
+    else
+    {
+        val = PQresultErrorField(res, PG_DIAG_INTERNAL_POSITION);
+        if (val)
+        {
+            querytext = PQresultErrorField(res, PG_DIAG_INTERNAL_QUERY);
+            if (conn->verbosity != PQERRORS_TERSE && querytext != NULL)
+            {
+                /* emit position as a syntax cursor display */
+                querypos = atoi(val);
+            }
+            else
+            {
+                /* emit position as text addition to primary message */
+                /* translator: %s represents a digit string */
+                appendPQExpBuffer(&workBuf, libpq_gettext(" at character %s"),
+                                  val);
+            }
+        }
+    }
+    appendPQExpBufferChar(&workBuf, '\n');
+    if (conn->verbosity != PQERRORS_TERSE)
+    {
+        if (querytext && querypos > 0)
+            reportErrorPosition(&workBuf, querytext, querypos,
+                                conn->client_encoding);
+        val = PQresultErrorField(res, PG_DIAG_MESSAGE_DETAIL);
+        if (val)
+            appendPQExpBuffer(&workBuf, libpq_gettext("DETAIL:  %s\n"), val);
+        val = PQresultErrorField(res, PG_DIAG_MESSAGE_HINT);
+        if (val)
+            appendPQExpBuffer(&workBuf, libpq_gettext("HINT:  %s\n"), val);
+        val = PQresultErrorField(res, PG_DIAG_INTERNAL_QUERY);
+        if (val)
+            appendPQExpBuffer(&workBuf, libpq_gettext("QUERY:  %s\n"), val);
+        val = PQresultErrorField(res, PG_DIAG_CONTEXT);
+        if (val)
+            appendPQExpBuffer(&workBuf, libpq_gettext("CONTEXT:  %s\n"), val);
+    }
+    if (conn->verbosity == PQERRORS_VERBOSE)
+    {
+        const char *valf;
+        const char *vall;
+
+        valf = PQresultErrorField(res, PG_DIAG_SOURCE_FILE);
+        vall = PQresultErrorField(res, PG_DIAG_SOURCE_LINE);
+        val = PQresultErrorField(res, PG_DIAG_SOURCE_FUNCTION);
+        if (val || valf || vall)
+        {
+            appendPQExpBufferStr(&workBuf, libpq_gettext("LOCATION:  "));
+            if (val)
+                appendPQExpBuffer(&workBuf, libpq_gettext("%s, "), val);
+            if (valf && vall)    /* unlikely we'd have just one */
+                appendPQExpBuffer(&workBuf, libpq_gettext("%s:%s"),
+                                  valf, vall);
+            appendPQExpBufferChar(&workBuf, '\n');
+        }
+    }
+
+    /*
+     * Either save error as current async result, or just emit the notice.
+     */
+    if (isError)
+    {
+        res->errMsg = pqResultStrdup(res, workBuf.data);
+        if (!res->errMsg)
+            goto fail;
+        pqClearAsyncResult(conn);
+        conn->result = res;
+        appendPQExpBufferStr(&conn->errorMessage, workBuf.data);
+    }
+    else
+    {
+        /* We can cheat a little here and not copy the message. */
+        res->errMsg = workBuf.data;
+        if (res->noticeHooks.noticeRec != NULL)
+            (*res->noticeHooks.noticeRec) (res->noticeHooks.noticeRecArg, res);
+        PQclear(res);
+    }
+
+    termPQExpBuffer(&workBuf);
+    return 0;
+
+fail:
+    PQclear(res);
+    termPQExpBuffer(&workBuf);
+    return EOF;
+}
+
+/*
+ * Add an error-location display to the error message under construction.
+ *
+ * The cursor location is measured in logical characters; the query string
+ * is presumed to be in the specified encoding.
+ */
+static void
+reportErrorPosition(PQExpBuffer msg, const char *query, int loc, int encoding)
+{
+#define DISPLAY_SIZE    60        /* screen width limit, in screen cols */
+#define MIN_RIGHT_CUT    10        /* try to keep this far away from EOL */
+
+    char       *wquery;
+    int            slen,
+                cno,
+                i,
+               *qidx,
+               *scridx,
+                qoffset,
+                scroffset,
+                ibeg,
+                iend,
+                loc_line;
+    bool        mb_encoding,
+                beg_trunc,
+                end_trunc;
+
+    /* Convert loc from 1-based to 0-based; no-op if out of range */
+    loc--;
+    if (loc < 0)
+        return;
+
+    /* Need a writable copy of the query */
+    wquery = strdup(query);
+    if (wquery == NULL)
+        return;                    /* fail silently if out of memory */
+
+    /*
+     * Each character might occupy multiple physical bytes in the string, and
+     * in some Far Eastern character sets it might take more than one screen
+     * column as well.    We compute the starting byte offset and starting
+     * screen column of each logical character, and store these in qidx[] and
+     * scridx[] respectively.
+     */
+
+    /* we need a safe allocation size... */
+    slen = strlen(wquery) + 1;
+
+    qidx = (int *) malloc(slen * sizeof(int));
+    if (qidx == NULL)
+    {
+        free(wquery);
+        return;
+    }
+    scridx = (int *) malloc(slen * sizeof(int));
+    if (scridx == NULL)
+    {
+        free(qidx);
+        free(wquery);
+        return;
+    }
+
+    /* We can optimize a bit if it's a single-byte encoding */
+    mb_encoding = (pg_encoding_max_length(encoding) != 1);
+
+    /*
+     * Within the scanning loop, cno is the current character's logical
+     * number, qoffset is its offset in wquery, and scroffset is its starting
+     * logical screen column (all indexed from 0).    "loc" is the logical
+     * character number of the error location.    We scan to determine loc_line
+     * (the 1-based line number containing loc) and ibeg/iend (first character
+     * number and last+1 character number of the line containing loc). Note
+     * that qidx[] and scridx[] are filled only as far as iend.
+     */
+    qoffset = 0;
+    scroffset = 0;
+    loc_line = 1;
+    ibeg = 0;
+    iend = -1;                    /* -1 means not set yet */
+
+    for (cno = 0; wquery[qoffset] != '\0'; cno++)
+    {
+        char        ch = wquery[qoffset];
+
+        qidx[cno] = qoffset;
+        scridx[cno] = scroffset;
+
+        /*
+         * Replace tabs with spaces in the writable copy.  (Later we might
+         * want to think about coping with their variable screen width, but
+         * not today.)
+         */
+        if (ch == '\t')
+            wquery[qoffset] = ' ';
+
+        /*
+         * If end-of-line, count lines and mark positions. Each \r or \n
+         * counts as a line except when \r \n appear together.
+         */
+        else if (ch == '\r' || ch == '\n')
+        {
+            if (cno < loc)
+            {
+                if (ch == '\r' ||
+                    cno == 0 ||
+                    wquery[qidx[cno - 1]] != '\r')
+                    loc_line++;
+                /* extract beginning = last line start before loc. */
+                ibeg = cno + 1;
+            }
+            else
+            {
+                /* set extract end. */
+                iend = cno;
+                /* done scanning. */
+                break;
+            }
+        }
+
+        /* Advance */
+        if (mb_encoding)
+        {
+            int            w;
+
+            w = pg_encoding_dsplen(encoding, &wquery[qoffset]);
+            /* treat any non-tab control chars as width 1 */
+            if (w <= 0)
+                w = 1;
+            scroffset += w;
+            qoffset += pg_encoding_mblen(encoding, &wquery[qoffset]);
+        }
+        else
+        {
+            /* We assume wide chars only exist in multibyte encodings */
+            scroffset++;
+            qoffset++;
+        }
+    }
+    /* Fix up if we didn't find an end-of-line after loc */
+    if (iend < 0)
+    {
+        iend = cno;                /* query length in chars, +1 */
+        qidx[iend] = qoffset;
+        scridx[iend] = scroffset;
+    }
+
+    /* Print only if loc is within computed query length */
+    if (loc <= cno)
+    {
+        /* If the line extracted is too long, we truncate it. */
+        beg_trunc = false;
+        end_trunc = false;
+        if (scridx[iend] - scridx[ibeg] > DISPLAY_SIZE)
+        {
+            /*
+             * We first truncate right if it is enough.  This code might be
+             * off a space or so on enforcing MIN_RIGHT_CUT if there's a wide
+             * character right there, but that should be okay.
+             */
+            if (scridx[ibeg] + DISPLAY_SIZE >= scridx[loc] + MIN_RIGHT_CUT)
+            {
+                while (scridx[iend] - scridx[ibeg] > DISPLAY_SIZE)
+                    iend--;
+                end_trunc = true;
+            }
+            else
+            {
+                /* Truncate right if not too close to loc. */
+                while (scridx[loc] + MIN_RIGHT_CUT < scridx[iend])
+                {
+                    iend--;
+                    end_trunc = true;
+                }
+
+                /* Truncate left if still too long. */
+                while (scridx[iend] - scridx[ibeg] > DISPLAY_SIZE)
+                {
+                    ibeg++;
+                    beg_trunc = true;
+                }
+            }
+        }
+
+        /* truncate working copy at desired endpoint */
+        wquery[qidx[iend]] = '\0';
+
+        /* Begin building the finished message. */
+        i = msg->len;
+        appendPQExpBuffer(msg, libpq_gettext("LINE %d: "), loc_line);
+        if (beg_trunc)
+            appendPQExpBufferStr(msg, "...");
+
+        /*
+         * While we have the prefix in the msg buffer, compute its screen
+         * width.
+         */
+        scroffset = 0;
+        for (; i < msg->len; i += pg_encoding_mblen(encoding, &msg->data[i]))
+        {
+            int            w = pg_encoding_dsplen(encoding, &msg->data[i]);
+
+            if (w <= 0)
+                w = 1;
+            scroffset += w;
+        }
+
+        /* Finish up the LINE message line. */
+        appendPQExpBufferStr(msg, &wquery[qidx[ibeg]]);
+        if (end_trunc)
+            appendPQExpBufferStr(msg, "...");
+        appendPQExpBufferChar(msg, '\n');
+
+        /* Now emit the cursor marker line. */
+        scroffset += scridx[loc] - scridx[ibeg];
+        for (i = 0; i < scroffset; i++)
+            appendPQExpBufferChar(msg, ' ');
+        appendPQExpBufferChar(msg, '^');
+        appendPQExpBufferChar(msg, '\n');
+    }
+
+    /* Clean up. */
+    free(scridx);
+    free(qidx);
+    free(wquery);
+}
+
+
+/*
+ * Attempt to read a ParameterStatus message.
+ * This is possible in several places, so we break it out as a subroutine.
+ * Entry: 'S' message type and length have already been consumed.
+ * Exit: returns 0 if successfully consumed message.
+ *         returns EOF if not enough data.
+ */
+static int
+getParameterStatus(PGconn *conn)
+{
+    PQExpBufferData valueBuf;
+
+    /* Get the parameter name */
+    if (pqGets(&conn->workBuffer, conn))
+        return EOF;
+    /* Get the parameter value (could be large) */
+    initPQExpBuffer(&valueBuf);
+    if (pqGets(&valueBuf, conn))
+    {
+        termPQExpBuffer(&valueBuf);
+        return EOF;
+    }
+    /* And save it */
+    pqSaveParameterStatus(conn, conn->workBuffer.data, valueBuf.data);
+    termPQExpBuffer(&valueBuf);
+    return 0;
+}
+
+
+/*
+ * Attempt to read a Notify response message.
+ * This is possible in several places, so we break it out as a subroutine.
+ * Entry: 'A' message type and length have already been consumed.
+ * Exit: returns 0 if successfully consumed Notify message.
+ *         returns EOF if not enough data.
+ */
+static int
+getNotify(PGconn *conn)
+{
+    int            be_pid;
+    char       *svname;
+    int            nmlen;
+    int            extralen;
+    PGnotify   *newNotify;
+
+    if (pqGetInt(&be_pid, 4, conn))
+        return EOF;
+    if (pqGets(&conn->workBuffer, conn))
+        return EOF;
+    /* must save name while getting extra string */
+    svname = strdup(conn->workBuffer.data);
+    if (!svname)
+        return EOF;
+    if (pqGets(&conn->workBuffer, conn))
+    {
+        free(svname);
+        return EOF;
+    }
+
+    /*
+     * Store the strings right after the PQnotify structure so it can all be
+     * freed at once.  We don't use NAMEDATALEN because we don't want to tie
+     * this interface to a specific server name length.
+     */
+    nmlen = strlen(svname);
+    extralen = strlen(conn->workBuffer.data);
+    newNotify = (PGnotify *) malloc(sizeof(PGnotify) + nmlen + extralen + 2);
+    if (newNotify)
+    {
+        newNotify->relname = (char *) newNotify + sizeof(PGnotify);
+        strcpy(newNotify->relname, svname);
+        newNotify->extra = newNotify->relname + nmlen + 1;
+        strcpy(newNotify->extra, conn->workBuffer.data);
+        newNotify->be_pid = be_pid;
+        newNotify->next = NULL;
+        if (conn->notifyTail)
+            conn->notifyTail->next = newNotify;
+        else
+            conn->notifyHead = newNotify;
+        conn->notifyTail = newNotify;
+    }
+
+    free(svname);
+    return 0;
+}
+
+/*
+ * getCopyStart - process CopyInResponse, CopyOutResponse or
+ * CopyBothResponse message
+ *
+ * parseInput already read the message type and length.
+ */
+static int
+getCopyStart(PGconn *conn, ExecStatusType copytype)
+{
+    PGresult   *result;
+    int            nfields;
+    int            i;
+
+    result = PQmakeEmptyPGresult(conn, copytype);
+    if (!result)
+        goto failure;
+
+    if (pqGetc(&conn->copy_is_binary, conn))
+        goto failure;
+    result->binary = conn->copy_is_binary;
+    /* the next two bytes are the number of fields    */
+    if (pqGetInt(&(result->numAttributes), 2, conn))
+        goto failure;
+    nfields = result->numAttributes;
+
+    /* allocate space for the attribute descriptors */
+    if (nfields > 0)
+    {
+        result->attDescs = (PGresAttDesc *)
+            pqResultAlloc(result, nfields * sizeof(PGresAttDesc), TRUE);
+        if (!result->attDescs)
+            goto failure;
+        MemSet(result->attDescs, 0, nfields * sizeof(PGresAttDesc));
+    }
+
+    for (i = 0; i < nfields; i++)
+    {
+        int            format;
+
+        if (pqGetInt(&format, 2, conn))
+            goto failure;
+
+        /*
+         * Since pqGetInt treats 2-byte integers as unsigned, we need to
+         * coerce these results to signed form.
+         */
+        format = (int) ((int16) format);
+        result->attDescs[i].format = format;
+    }
+
+    /* Success! */
+    conn->result = result;
+    return 0;
+
+failure:
+    PQclear(result);
+    return EOF;
+}
+
+/*
+ * getReadyForQuery - process ReadyForQuery message
+ */
+static int
+getReadyForQuery(PGconn *conn)
+{
+    char        xact_status;
+
+    if (pqGetc(&xact_status, conn))
+        return EOF;
+    switch (xact_status)
+    {
+        case 'I':
+            conn->xactStatus = PQTRANS_IDLE;
+            break;
+        case 'T':
+            conn->xactStatus = PQTRANS_INTRANS;
+            break;
+        case 'E':
+            conn->xactStatus = PQTRANS_INERROR;
+            break;
+        default:
+            conn->xactStatus = PQTRANS_UNKNOWN;
+            break;
+    }
+
+    return 0;
+}
+
+/*
+ * getCopyDataMessage - fetch next CopyData message, process async messages
+ *
+ * Returns length word of CopyData message (> 0), or 0 if no complete
+ * message available, -1 if end of copy, -2 if error.
+ */
+static int
+getCopyDataMessage(PGconn *conn)
+{
+    char        id;
+    int            msgLength;
+    int            avail;
+
+    for (;;)
+    {
+        /*
+         * Do we have the next input message?  To make life simpler for async
+         * callers, we keep returning 0 until the next message is fully
+         * available, even if it is not Copy Data.
+         */
+        conn->inCursor = conn->inStart;
+        if (pqGetc(&id, conn))
+            return 0;
+        if (pqGetInt(&msgLength, 4, conn))
+            return 0;
+        if (msgLength < 4)
+        {
+            handleSyncLoss(conn, id, msgLength);
+            return -2;
+        }
+        avail = conn->inEnd - conn->inCursor;
+        if (avail < msgLength - 4)
+        {
+            /*
+             * Before returning, enlarge the input buffer if needed to hold
+             * the whole message.  See notes in parseInput.
+             */
+            if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength - 4,
+                                     conn))
+            {
+                /*
+                 * XXX add some better recovery code... plan is to skip over
+                 * the message using its length, then report an error. For the
+                 * moment, just treat this like loss of sync (which indeed it
+                 * might be!)
+                 */
+                handleSyncLoss(conn, id, msgLength);
+                return -2;
+            }
+            return 0;
+        }
+
+        /*
+         * If it's a legitimate async message type, process it.  (NOTIFY
+         * messages are not currently possible here, but we handle them for
+         * completeness.)  Otherwise, if it's anything except Copy Data,
+         * report end-of-copy.
+         */
+        switch (id)
+        {
+            case 'A':            /* NOTIFY */
+                if (getNotify(conn))
+                    return 0;
+                break;
+            case 'N':            /* NOTICE */
+                if (pqGetErrorNotice3(conn, false))
+                    return 0;
+                break;
+            case 'S':            /* ParameterStatus */
+                if (getParameterStatus(conn))
+                    return 0;
+                break;
+            case 'd':            /* Copy Data, pass it back to caller */
+                return msgLength;
+            default:            /* treat as end of copy */
+                return -1;
+        }
+
+        /* Drop the processed message and loop around for another */
+        conn->inStart = conn->inCursor;
+    }
+}
+
+/*
+ * PQgetCopyData - read a row of data from the backend during COPY OUT
+ * or COPY BOTH
+ *
+ * If successful, sets *buffer to point to a malloc'd row of data, and
+ * returns row length (always > 0) as result.
+ * Returns 0 if no row available yet (only possible if async is true),
+ * -1 if end of copy (consult PQgetResult), or -2 if error (consult
+ * PQerrorMessage).
+ */
+int
+pqGetCopyData3(PGconn *conn, char **buffer, int async)
+{
+    int            msgLength;
+
+    for (;;)
+    {
+        /*
+         * Collect the next input message.    To make life simpler for async
+         * callers, we keep returning 0 until the next message is fully
+         * available, even if it is not Copy Data.
+         */
+        msgLength = getCopyDataMessage(conn);
+        if (msgLength < 0)
+        {
+            /*
+             * On end-of-copy, exit COPY_OUT or COPY_BOTH mode and let caller
+             * read status with PQgetResult().    The normal case is that it's
+             * Copy Done, but we let parseInput read that.    If error, we
+             * expect the state was already changed.
+             */
+            if (msgLength == -1)
+                conn->asyncStatus = PGASYNC_BUSY;
+            return msgLength;    /* end-of-copy or error */
+        }
+        if (msgLength == 0)
+        {
+            /* Don't block if async read requested */
+            if (async)
+                return 0;
+            /* Need to load more data */
+            if (pqWait(TRUE, FALSE, conn) ||
+                pqReadData(conn) < 0)
+                return -2;
+            continue;
+        }
+
+        /*
+         * Drop zero-length messages (shouldn't happen anyway).  Otherwise
+         * pass the data back to the caller.
+         */
+        msgLength -= 4;
+        if (msgLength > 0)
+        {
+            *buffer = (char *) malloc(msgLength + 1);
+            if (*buffer == NULL)
+            {
+                printfPQExpBuffer(&conn->errorMessage,
+                                  libpq_gettext("out of memory\n"));
+                return -2;
+            }
+            memcpy(*buffer, &conn->inBuffer[conn->inCursor], msgLength);
+            (*buffer)[msgLength] = '\0';        /* Add terminating null */
+
+            /* Mark message consumed */
+            conn->inStart = conn->inCursor + msgLength;
+
+            return msgLength;
+        }
+
+        /* Empty, so drop it and loop around for another */
+        conn->inStart = conn->inCursor;
+    }
+}
+
+/*
+ * PQgetline - gets a newline-terminated string from the backend.
+ *
+ * See fe-exec.c for documentation.
+ */
+int
+pqGetline3(PGconn *conn, char *s, int maxlen)
+{
+    int            status;
+
+    if (conn->sock < 0 ||
+        conn->asyncStatus != PGASYNC_COPY_OUT ||
+        conn->copy_is_binary)
+    {
+        printfPQExpBuffer(&conn->errorMessage,
+                      libpq_gettext("PQgetline: not doing text COPY OUT\n"));
+        *s = '\0';
+        return EOF;
+    }
+
+    while ((status = PQgetlineAsync(conn, s, maxlen - 1)) == 0)
+    {
+        /* need to load more data */
+        if (pqWait(TRUE, FALSE, conn) ||
+            pqReadData(conn) < 0)
+        {
+            *s = '\0';
+            return EOF;
+        }
+    }
+
+    if (status < 0)
+    {
+        /* End of copy detected; gin up old-style terminator */
+        strcpy(s, "\\.");
+        return 0;
+    }
+
+    /* Add null terminator, and strip trailing \n if present */
+    if (s[status - 1] == '\n')
+    {
+        s[status - 1] = '\0';
+        return 0;
+    }
+    else
+    {
+        s[status] = '\0';
+        return 1;
+    }
+}
+
+/*
+ * PQgetlineAsync - gets a COPY data row without blocking.
+ *
+ * See fe-exec.c for documentation.
+ */
+int
+pqGetlineAsync3(PGconn *conn, char *buffer, int bufsize)
+{
+    int            msgLength;
+    int            avail;
+
+    if (conn->asyncStatus != PGASYNC_COPY_OUT)
+        return -1;                /* we are not doing a copy... */
+
+    /*
+     * Recognize the next input message.  To make life simpler for async
+     * callers, we keep returning 0 until the next message is fully available
+     * even if it is not Copy Data.  This should keep PQendcopy from blocking.
+     * (Note: unlike pqGetCopyData3, we do not change asyncStatus here.)
+     */
+    msgLength = getCopyDataMessage(conn);
+    if (msgLength < 0)
+        return -1;                /* end-of-copy or error */
+    if (msgLength == 0)
+        return 0;                /* no data yet */
+
+    /*
+     * Move data from libpq's buffer to the caller's.  In the case where a
+     * prior call found the caller's buffer too small, we use
+     * conn->copy_already_done to remember how much of the row was already
+     * returned to the caller.
+     */
+    conn->inCursor += conn->copy_already_done;
+    avail = msgLength - 4 - conn->copy_already_done;
+    if (avail <= bufsize)
+    {
+        /* Able to consume the whole message */
+        memcpy(buffer, &conn->inBuffer[conn->inCursor], avail);
+        /* Mark message consumed */
+        conn->inStart = conn->inCursor + avail;
+        /* Reset state for next time */
+        conn->copy_already_done = 0;
+        return avail;
+    }
+    else
+    {
+        /* We must return a partial message */
+        memcpy(buffer, &conn->inBuffer[conn->inCursor], bufsize);
+        /* The message is NOT consumed from libpq's buffer */
+        conn->copy_already_done += bufsize;
+        return bufsize;
+    }
+}
+
+/*
+ * PQendcopy
+ *
+ * See fe-exec.c for documentation.
+ */
+int
+pqEndcopy3(PGconn *conn)
+{
+    PGresult   *result;
+
+    if (conn->asyncStatus != PGASYNC_COPY_IN &&
+        conn->asyncStatus != PGASYNC_COPY_OUT)
+    {
+        printfPQExpBuffer(&conn->errorMessage,
+                          libpq_gettext("no COPY in progress\n"));
+        return 1;
+    }
+
+    /* Send the CopyDone message if needed */
+    if (conn->asyncStatus == PGASYNC_COPY_IN)
+    {
+        if (pqPutMsgStart('c', false, conn) < 0 ||
+            pqPutMsgEnd(conn) < 0)
+            return 1;
+
+        /*
+         * If we sent the COPY command in extended-query mode, we must issue a
+         * Sync as well.
+         */
+        if (conn->queryclass != PGQUERY_SIMPLE)
+        {
+            if (pqPutMsgStart('S', false, conn) < 0 ||
+                pqPutMsgEnd(conn) < 0)
+                return 1;
+        }
+    }
+
+    /*
+     * make sure no data is waiting to be sent, abort if we are non-blocking
+     * and the flush fails
+     */
+    if (pqFlush(conn) && pqIsnonblocking(conn))
+        return 1;
+
+    /* Return to active duty */
+    conn->asyncStatus = PGASYNC_BUSY;
+    resetPQExpBuffer(&conn->errorMessage);
+
+    /*
+     * Non blocking connections may have to abort at this point.  If everyone
+     * played the game there should be no problem, but in error scenarios the
+     * expected messages may not have arrived yet.    (We are assuming that the
+     * backend's packetizing will ensure that CommandComplete arrives along
+     * with the CopyDone; are there corner cases where that doesn't happen?)
+     */
+    if (pqIsnonblocking(conn) && PQisBusy(conn))
+        return 1;
+
+    /* Wait for the completion response */
+    result = PQgetResult(conn);
+
+    /* Expecting a successful result */
+    if (result && result->resultStatus == PGRES_COMMAND_OK)
+    {
+        PQclear(result);
+        return 0;
+    }
+
+    /*
+     * Trouble. For backwards-compatibility reasons, we issue the error
+     * message as if it were a notice (would be nice to get rid of this
+     * silliness, but too many apps probably don't handle errors from
+     * PQendcopy reasonably).  Note that the app can still obtain the error
+     * status from the PGconn object.
+     */
+    if (conn->errorMessage.len > 0)
+    {
+        /* We have to strip the trailing newline ... pain in neck... */
+        char        svLast = conn->errorMessage.data[conn->errorMessage.len - 1];
+
+        if (svLast == '\n')
+            conn->errorMessage.data[conn->errorMessage.len - 1] = '\0';
+        pqInternalNotice(&conn->noticeHooks, "%s", conn->errorMessage.data);
+        conn->errorMessage.data[conn->errorMessage.len - 1] = svLast;
+    }
+
+    PQclear(result);
+
+    return 1;
+}
+
+
+/*
+ * PQfn - Send a function call to the POSTGRES backend.
+ *
+ * See fe-exec.c for documentation.
+ */
+PGresult *
+pqFunctionCall3(PGconn *conn, Oid fnid,
+                int *result_buf, int *actual_result_len,
+                int result_is_int,
+                const PQArgBlock *args, int nargs)
+{
+    bool        needInput = false;
+    ExecStatusType status = PGRES_FATAL_ERROR;
+    char        id;
+    int            msgLength;
+    int            avail;
+    int            i;
+
+    /* PQfn already validated connection state */
+
+    if (pqPutMsgStart('F', false, conn) < 0 ||    /* function call msg */
+        pqPutInt(fnid, 4, conn) < 0 ||    /* function id */
+        pqPutInt(1, 2, conn) < 0 ||        /* # of format codes */
+        pqPutInt(1, 2, conn) < 0 ||        /* format code: BINARY */
+        pqPutInt(nargs, 2, conn) < 0)    /* # of args */
+    {
+        pqHandleSendFailure(conn);
+        return NULL;
+    }
+
+    for (i = 0; i < nargs; ++i)
+    {                            /* len.int4 + contents       */
+        if (pqPutInt(args[i].len, 4, conn))
+        {
+            pqHandleSendFailure(conn);
+            return NULL;
+        }
+        if (args[i].len == -1)
+            continue;            /* it's NULL */
+
+        if (args[i].isint)
+        {
+            if (pqPutInt(args[i].u.integer, args[i].len, conn))
+            {
+                pqHandleSendFailure(conn);
+                return NULL;
+            }
+        }
+        else
+        {
+            if (pqPutnchar((char *) args[i].u.ptr, args[i].len, conn))
+            {
+                pqHandleSendFailure(conn);
+                return NULL;
+            }
+        }
+    }
+
+    if (pqPutInt(1, 2, conn) < 0)        /* result format code: BINARY */
+    {
+        pqHandleSendFailure(conn);
+        return NULL;
+    }
+
+    if (pqPutMsgEnd(conn) < 0 ||
+        pqFlush(conn))
+    {
+        pqHandleSendFailure(conn);
+        return NULL;
+    }
+
+    for (;;)
+    {
+        if (needInput)
+        {
+            /* Wait for some data to arrive (or for the channel to close) */
+            if (pqWait(TRUE, FALSE, conn) ||
+                pqReadData(conn) < 0)
+                break;
+        }
+
+        /*
+         * Scan the message. If we run out of data, loop around to try again.
+         */
+        needInput = true;
+
+        conn->inCursor = conn->inStart;
+        if (pqGetc(&id, conn))
+            continue;
+        if (pqGetInt(&msgLength, 4, conn))
+            continue;
+
+        /*
+         * Try to validate message type/length here.  A length less than 4 is
+         * definitely broken.  Large lengths should only be believed for a few
+         * message types.
+         */
+        if (msgLength < 4)
+        {
+            handleSyncLoss(conn, id, msgLength);
+            break;
+        }
+        if (msgLength > 30000 && !VALID_LONG_MESSAGE_TYPE(id))
+        {
+            handleSyncLoss(conn, id, msgLength);
+            break;
+        }
+
+        /*
+         * Can't process if message body isn't all here yet.
+         */
+        msgLength -= 4;
+        avail = conn->inEnd - conn->inCursor;
+        if (avail < msgLength)
+        {
+            /*
+             * Before looping, enlarge the input buffer if needed to hold the
+             * whole message.  See notes in parseInput.
+             */
+            if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength,
+                                     conn))
+            {
+                /*
+                 * XXX add some better recovery code... plan is to skip over
+                 * the message using its length, then report an error. For the
+                 * moment, just treat this like loss of sync (which indeed it
+                 * might be!)
+                 */
+                handleSyncLoss(conn, id, msgLength);
+                break;
+            }
+            continue;
+        }
+
+        /*
+         * We should see V or E response to the command, but might get N
+         * and/or A notices first. We also need to swallow the final Z before
+         * returning.
+         */
+        switch (id)
+        {
+            case 'V':            /* function result */
+                if (pqGetInt(actual_result_len, 4, conn))
+                    continue;
+                if (*actual_result_len != -1)
+                {
+                    if (result_is_int)
+                    {
+                        if (pqGetInt(result_buf, *actual_result_len, conn))
+                            continue;
+                    }
+                    else
+                    {
+                        if (pqGetnchar((char *) result_buf,
+                                       *actual_result_len,
+                                       conn))
+                            continue;
+                    }
+                }
+                /* correctly finished function result message */
+                status = PGRES_COMMAND_OK;
+                break;
+            case 'E':            /* error return */
+                if (pqGetErrorNotice3(conn, true))
+                    continue;
+                status = PGRES_FATAL_ERROR;
+                break;
+            case 'A':            /* notify message */
+                /* handle notify and go back to processing return values */
+                if (getNotify(conn))
+                    continue;
+                break;
+            case 'N':            /* notice */
+                /* handle notice and go back to processing return values */
+                if (pqGetErrorNotice3(conn, false))
+                    continue;
+                break;
+            case 'Z':            /* backend is ready for new query */
+                if (getReadyForQuery(conn))
+                    continue;
+                /* consume the message and exit */
+                conn->inStart += 5 + msgLength;
+                /* if we saved a result object (probably an error), use it */
+                if (conn->result)
+                    return pqPrepareAsyncResult(conn);
+                return PQmakeEmptyPGresult(conn, status);
+            case 'S':            /* parameter status */
+                if (getParameterStatus(conn))
+                    continue;
+                break;
+            default:
+                /* The backend violates the protocol. */
+                printfPQExpBuffer(&conn->errorMessage,
+                                  libpq_gettext("protocol error: id=0x%x\n"),
+                                  id);
+                pqSaveErrorResult(conn);
+                /* trust the specified message length as what to skip */
+                conn->inStart += 5 + msgLength;
+                return pqPrepareAsyncResult(conn);
+        }
+        /* Completed this message, keep going */
+        /* trust the specified message length as what to skip */
+        conn->inStart += 5 + msgLength;
+        needInput = false;
+    }
+
+    /*
+     * We fall out of the loop only upon failing to read data.
+     * conn->errorMessage has been set by pqWait or pqReadData. We want to
+     * append it to any already-received error message.
+     */
+    pqSaveErrorResult(conn);
+    return pqPrepareAsyncResult(conn);
+}
+
+
+/*
+ * Construct startup packet
+ *
+ * Returns a malloc'd packet buffer, or NULL if out of memory
+ */
+char *
+pqBuildStartupPacket3(PGconn *conn, int *packetlen,
+                      const PQEnvironmentOption *options)
+{
+    char       *startpacket;
+
+    *packetlen = build_startup_packet(conn, NULL, options);
+    startpacket = (char *) malloc(*packetlen);
+    if (!startpacket)
+        return NULL;
+    *packetlen = build_startup_packet(conn, startpacket, options);
+    return startpacket;
+}
+
+/*
+ * Build a startup packet given a filled-in PGconn structure.
+ *
+ * We need to figure out how much space is needed, then fill it in.
+ * To avoid duplicate logic, this routine is called twice: the first time
+ * (with packet == NULL) just counts the space needed, the second time
+ * (with packet == allocated space) fills it in.  Return value is the number
+ * of bytes used.
+ */
+static int
+build_startup_packet(const PGconn *conn, char *packet,
+                     const PQEnvironmentOption *options)
+{
+    int            packet_len = 0;
+    const PQEnvironmentOption *next_eo;
+    const char *val;
+
+    /* Protocol version comes first. */
+    if (packet)
+    {
+        ProtocolVersion pv = htonl(conn->pversion);
+
+        memcpy(packet + packet_len, &pv, sizeof(ProtocolVersion));
+    }
+    packet_len += sizeof(ProtocolVersion);
+
+    /* Add user name, database name, options */
+
+#define ADD_STARTUP_OPTION(optname, optval) \
+    do { \
+        if (packet) \
+            strcpy(packet + packet_len, optname); \
+        packet_len += strlen(optname) + 1; \
+        if (packet) \
+            strcpy(packet + packet_len, optval); \
+        packet_len += strlen(optval) + 1; \
+    } while(0)
+
+    if (conn->pguser && conn->pguser[0])
+        ADD_STARTUP_OPTION("user", conn->pguser);
+    if (conn->dbName && conn->dbName[0])
+        ADD_STARTUP_OPTION("database", conn->dbName);
+    if (conn->replication && conn->replication[0])
+        ADD_STARTUP_OPTION("replication", conn->replication);
+    if (conn->pgoptions && conn->pgoptions[0])
+        ADD_STARTUP_OPTION("options", conn->pgoptions);
+    if (conn->send_appname)
+    {
+        /* Use appname if present, otherwise use fallback */
+        val = conn->appname ? conn->appname : conn->fbappname;
+        if (val && val[0])
+            ADD_STARTUP_OPTION("application_name", val);
+    }
+
+    if (conn->client_encoding_initial && conn->client_encoding_initial[0])
+        ADD_STARTUP_OPTION("client_encoding", conn->client_encoding_initial);
+
+    /* Add any environment-driven GUC settings needed */
+    for (next_eo = options; next_eo->envName; next_eo++)
+    {
+        if ((val = getenv(next_eo->envName)) != NULL)
+        {
+            if (pg_strcasecmp(val, "default") != 0)
+                ADD_STARTUP_OPTION(next_eo->pgName, val);
+        }
+    }
+
+    /* Add trailing terminator */
+    if (packet)
+        packet[packet_len] = '\0';
+    packet_len++;
+
+    return packet_len;
+}
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..23cf729 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,6 @@ PQconnectStartParams      157PQping                    158PQpingParams              159PQlibVersion
            160
 
+PQsetRowProcessor      161
+PQsetRowProcessorErrMsg      162
+PQskipRemainingResults      163
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)    conn->wait_ssl_try = false;#endif
+    /* set default row processor */
+    PQsetRowProcessor(conn, NULL, NULL);
+    /*     * We try to send at least 8K at a time, which is the usual size of pipe     * buffers on Unix systems.
Thatway, when we are sending a large amount
 
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)    initPQExpBuffer(&conn->errorMessage);
initPQExpBuffer(&conn->workBuffer);
+    /* set up initial row buffer */
+    conn->rowBufLen = 32;
+    conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+    if (conn->inBuffer == NULL ||        conn->outBuffer == NULL ||
+        conn->rowBuf == NULL ||        PQExpBufferBroken(&conn->errorMessage) ||
PQExpBufferBroken(&conn->workBuffer))   {
 
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)        free(conn->inBuffer);    if (conn->outBuffer)
free(conn->outBuffer);
+    if (conn->rowBuf)
+        free(conn->rowBuf);    termPQExpBuffer(&conn->errorMessage);    termPQExpBuffer(&conn->workBuffer);
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)    return prev;}
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..d7f3ae9 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);static int PQsendDescribe(PGconn *conn, char desc_type,
           const char *desc_target);static int    check_field_number(const PGresult *res, int field_num);
 
+static int    pqAddRow(PGresult *res, void *param, PGrowValue *columns);/* ----------------
@@ -160,6 +161,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)    result->curBlock = NULL;
result->curOffset= 0;    result->spaceLeft = 0;
 
+    result->rowProcessorErrMsg = NULL;    if (conn)    {
@@ -701,7 +703,6 @@ pqClearAsyncResult(PGconn *conn)    if (conn->result)        PQclear(conn->result);    conn->result
=NULL;
 
-    conn->curTuple = NULL;}/*
@@ -756,7 +757,6 @@ pqPrepareAsyncResult(PGconn *conn)     */    res = conn->result;    conn->result = NULL;        /*
handingover ownership to caller */
 
-    conn->curTuple = NULL;        /* just in case */    if (!res)        res = PQmakeEmptyPGresult(conn,
PGRES_FATAL_ERROR);   else
 
@@ -828,6 +828,73 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)}/*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+    conn->rowProcessor = (func ? func : pqAddRow);
+    conn->rowProcessorParam = param;
+}
+
+/*
+ * PQsetRowProcessorErrMsg
+ *    Set the error message pass back to the caller of RowProcessor.
+ *
+ *    You can replace the previous message by alternative mes, or clear
+ *    it with NULL.
+ */
+void
+PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+{
+    if (msg)
+        res->rowProcessorErrMsg = pqResultStrdup(res, msg);
+    else
+        res->rowProcessorErrMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *      add a row to the PGresult structure, growing it if necessary
+ *      Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, void *param, PGrowValue *columns)
+{
+    PGresAttValue *tup;
+    int            nfields = res->numAttributes;
+    int            i;
+
+    tup = (PGresAttValue *)
+        pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+    if (tup == NULL)
+        return FALSE;
+
+    for (i = 0 ; i < nfields ; i++)
+    {
+        tup[i].len = columns[i].len;
+        if (tup[i].len == NULL_LEN)
+        {
+            tup[i].value = res->null_field;
+        }
+        else
+        {
+            bool isbinary = (res->attDescs[i].format != 0);
+            tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+            if (tup[i].value == NULL)
+                return FALSE;
+
+            memcpy(tup[i].value, columns[i].value, tup[i].len);
+            /* We have to terminate this ourselves */
+            tup[i].value[tup[i].len] = '\0';
+        }
+    }
+
+    return pqAddTuple(res, tup);
+}
+
+/* * pqAddTuple *      add a row pointer to the PGresult structure, growing it if necessary *      Returns TRUE if OK,
FALSEif not enough memory to add the row
 
@@ -1223,7 +1290,6 @@ PQsendQueryStart(PGconn *conn)    /* initialize async result-accumulation state */
conn->result= NULL;
 
-    conn->curTuple = NULL;    /* ready to send command message */    return true;
@@ -1832,6 +1898,22 @@ PQexecFinish(PGconn *conn)}/*
+ * Exaust remaining Data Rows in curret conn.
+ */
+PGresult *
+PQskipRemainingResults(PGconn *conn)
+{
+    pqClearAsyncResult(conn);
+
+    /* conn->result is set NULL in pqClearAsyncResult() */
+    pqSaveErrorResult(conn);
+
+    /* Skip over remaining Data Row messages */
+    PQgetResult(conn);
+}
+
+
+/* * PQdescribePrepared *      Obtain information about a previously prepared statement *
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)}/*
+ * pqGetnchar:
+ *    skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+    if (len > (size_t) (conn->inEnd - conn->inCursor))
+        return EOF;
+
+    conn->inCursor += len;
+
+    if (conn->Pfdebug)
+        fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+                (unsigned long) len);
+
+    return 0;
+}
+
+/* * pqPutnchar: *    write exactly len bytes to the current message */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..ae4d7b0 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, FALSE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, TRUE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -703,52 +707,51 @@ failure:/* * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor. * Returns: 0 if completed message, EOF if error
ornot enough data yet. * * Note that if we run out of data, we have to suspend and reprocess
 
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, bool binary){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    /* the backend sends us a bitmap of which attributes are null */    char
std_bitmap[64];/* used unless it doesn't fit */    char       *bitmap = std_bitmap;    int            i;
 
+    int            rp;    size_t        nbytes;            /* the number of bytes in bitmap  */    char        bmap;
        /* One byte of the bitmap */    int            bitmap_index;    /* Its index */    int            bitcnt;
    /* number of bits examined in current byte */    int            vlen;            /* length of the current field
value*/
 
+    /* resize row buffer if needed */
+    if (nfields > conn->rowBufLen)
+    {
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)
+            goto rowProcessError;
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;
+    }
+    else
+    {
+        rowbuf = conn->rowBuf;
+    }
+    result->binary = binary;
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
+    if (binary)    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-        /*
-         * If it's binary, fix the column format indicators.  We assume the
-         * backend will consistently send either B or D, not a mix.
-         */
-        if (binary)
-        {
-            for (i = 0; i < nfields; i++)
-                result->attDescs[i].format = 1;
-        }
+        for (i = 0; i < nfields; i++)
+            result->attDescs[i].format = 1;    }
-    tup = conn->curTuple;    /* Get the null-value bitmap */    nbytes = (nfields + BITS_PER_BYTE - 1) /
BITS_PER_BYTE;
@@ -757,7 +760,7 @@ getAnotherTuple(PGconn *conn, bool binary)    {        bitmap = (char *) malloc(nbytes);        if
(!bitmap)
-            goto outOfMemory;
+            goto rowProcessError;    }    if (pqGetnchar(bitmap, nbytes, conn))
@@ -771,34 +774,29 @@ getAnotherTuple(PGconn *conn, bool binary)    for (i = 0; i < nfields; i++)    {        if
(!(bmap& 0200)) 
-        {
-            /* if the field value is absent, make it a null string */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-        }
+            vlen = NULL_LEN;
+        else if (pqGetInt(&vlen, 4, conn))
+                goto EOFexit;        else        {
-            /* get the value length (the first four bytes are for length) */
-            if (pqGetInt(&vlen, 4, conn))
-                goto EOFexit;            if (!binary)                vlen = vlen - 4;            if (vlen < 0)
      vlen = 0;
 
-            if (tup[i].value == NULL)
-            {
-                tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-                if (tup[i].value == NULL)
-                    goto outOfMemory;
-            }
-            tup[i].len = vlen;
-            /* read in the value */
-            if (vlen > 0)
-                if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                    goto EOFexit;
-            /* we have to terminate this ourselves */
-            tup[i].value[vlen] = '\0';        }
+
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip the value */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            goto EOFexit;
+        /* advance the bitmap stuff */        bitcnt++;        if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +809,56 @@ getAnotherTuple(PGconn *conn, bool binary)            bmap <<= 1;    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
-    if (bitmap != std_bitmap)        free(bitmap);
-    return 0;
+    bitmap = NULL;
+
+    /* tag the row as parsed */
+    conn->inStart = conn->inCursor;
+
+    /* Pass the completed row values to rowProcessor */
+    rp= conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
+    if (rp == 1)
+        return 0;
+    else if (rp == 2 && pqIsnonblocking(conn))
+        /* processor requested early exit */
+        return EOF;
+    else if (rp != 0)
+        PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
+
+rowProcessError:
-outOfMemory:    /* Replace partially constructed result with an error result */
-    /*
-     * we do NOT use pqSaveErrorResult() here, because of the likelihood that
-     * there's not enough memory to concatenate messages...
-     */
-    pqClearAsyncResult(conn);
-    printfPQExpBuffer(&conn->errorMessage,
-                      libpq_gettext("out of memory for query result\n"));
+    if (result->rowProcessorErrMsg)
+    {
+        printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+        pqSaveErrorResult(conn);
+    }
+    else
+    {
+        /*
+         * we do NOT use pqSaveErrorResult() here, because of the likelihood that
+         * there's not enough memory to concatenate messages...
+         */
+        pqClearAsyncResult(conn);
+        resetPQExpBuffer(&conn->errorMessage);
-    /*
-     * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
-     * do to recover...
-     */
-    conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+        /*
+         * If error message is passed from RowProcessor, set it into
+         * PGconn, assume out of memory if not.
+         */
+        appendPQExpBufferStr(&conn->errorMessage,
+                             libpq_gettext("out of memory for query result\n"));
+
+        /*
+         * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
+         * do to recover...
+         */
+        conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+    }    conn->asyncStatus = PGASYNC_READY;
+    /* Discard the failed message --- good idea? */    conn->inStart = conn->inEnd;
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..0260ba6 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, msgLength))                            return;
 
+
+                        /* getAnotherTuple() moves inStart itself */
+                        continue;                    }                    else if (conn->result != NULL &&
               conn->result->resultStatus == PGRES_FATAL_ERROR)
 
@@ -613,33 +616,22 @@ failure:/* * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor. * Returns: 0 if completed message, EOF if error
ornot enough data yet. * * Note that if we run out of data, we have to suspend and reprocess
 
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, int msgLength){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    int            tupnfields;        /* # fields from tuple */    int            vlen;
   /* length of the current field value */    int            i;
 
-
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
-    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-    }
-    tup = conn->curTuple;
+    int            rp;    /* Get the field count and make sure it's what we expect */    if (pqGetInt(&tupnfields, 2,
conn))
@@ -652,52 +644,88 @@ getAnotherTuple(PGconn *conn, int msgLength)                 libpq_gettext("unexpected field
countin \"D\" message\n"));        pqSaveErrorResult(conn);        /* Discard the failed message by pretending we read
it*/
 
-        conn->inCursor = conn->inStart + 5 + msgLength;
+        conn->inStart += 5 + msgLength;        return 0;    }
+    /* resize row buffer if needed */
+    rowbuf = conn->rowBuf;
+    if (nfields > conn->rowBufLen)
+    {
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)
+        {
+            goto outOfMemory1;
+        }
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;
+    }
+    /* Scan the fields */    for (i = 0; i < nfields; i++)    {        /* get the value length */        if
(pqGetInt(&vlen,4, conn))
 
-            return EOF;
+            goto protocolError;        if (vlen == -1)
-        {
-            /* null field */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-            continue;
-        }
-        if (vlen < 0)
+            vlen = NULL_LEN;
+        else if (vlen < 0)            vlen = 0;
-        if (tup[i].value == NULL)
-        {
-            bool        isbinary = (result->attDescs[i].format != 0);
-            tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-            if (tup[i].value == NULL)
-                goto outOfMemory;
-        }
-        tup[i].len = vlen;
-        /* read in the value */
-        if (vlen > 0)
-            if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                return EOF;
-        /* we have to terminate this ourselves */
-        tup[i].value[vlen] = '\0';
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip to the next length field */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            goto protocolError;    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
+    /* tag the row as parsed, check if correctly */
+    conn->inStart += 5 + msgLength;
+    if (conn->inCursor != conn->inStart)
+        goto protocolError;
+    /* Pass the completed row values to rowProcessor */
+    rp = conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
+    if (rp == 1)
+    {
+        /* everything is good */
+        return 0;
+    }
+    if (rp == 2 && pqIsnonblocking(conn))
+    {
+        /* processor requested early exit */
+        return EOF;
+    }
+
+    /* there was some problem */
+    if (rp == 0)
+    {
+        if (result->rowProcessorErrMsg == NULL)
+            goto outOfMemory2;
+
+        /* use supplied error message */
+        printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+    }
+    else
+    {
+        /* row processor messed up */
+        printfPQExpBuffer(&conn->errorMessage,
+                          libpq_gettext("invalid return value from row processor\n"));
+    }
+    pqSaveErrorResult(conn);    return 0;
-outOfMemory:
+outOfMemory1:
+    /* Discard the failed message by pretending we read it */
+    conn->inStart += 5 + msgLength;
+outOfMemory2:    /*     * Replace partially constructed result with an error result. First     * discard the old
resultto try to win back some memory.
 
@@ -706,9 +734,14 @@ outOfMemory:    printfPQExpBuffer(&conn->errorMessage,                      libpq_gettext("out of
memoryfor query result\n"));    pqSaveErrorResult(conn);
 
+    return 0;
+protocolError:
+    printfPQExpBuffer(&conn->errorMessage,
+                      libpq_gettext("invalid row contents\n"));
+    pqSaveErrorResult(conn);    /* Discard the failed message by pretending we read it */
-    conn->inCursor = conn->inStart + 5 + msgLength;
+    conn->inStart += 5 + msgLength;    return 0;}
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..2d913c4 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify    struct pgNotify *next;        /* list link */} PGnotify;
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+    int            len;            /* length in bytes of the value */
+    char       *value;            /* actual value, without null termination */
+} PGrowValue;
+/* Function types for notice-handling callbacks */typedef void (*PQnoticeReceiver) (void *arg, const PGresult
*res);typedefvoid (*PQnoticeProcessor) (void *arg, const char *message);
 
@@ -367,6 +378,8 @@ extern PGresult *PQexecPrepared(PGconn *conn,               const int *paramFormats,
intresultFormat);
 
+extern PGresult *PQskipRemainingResults(PGconn *conn);
+/* Interface for multiple-result or asynchronous queries */extern int    PQsendQuery(PGconn *conn, const char
*query);externint PQsendQueryParams(PGconn *conn,
 
@@ -416,6 +429,37 @@ extern PGPing PQping(const char *conninfo);extern PGPing PQpingParams(const char *const *
keywords,            const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one
+ * pointing to particular column data in network buffer.
+ * This function is supposed to copy data out from there
+ * and store somewhere.  NULL is signified with len<0.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQsetRowProcessorErrMsg.  It
+ * is assumed by caller as out of memory when the error message is not
+ * set on failure. This function is assumed not to throw any
+ * exception.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, void *param,
+                                PGrowValue *columns);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+                                   void *rowProcessorParam);
+/* Force the write buffer to be written (or at least try) */extern int    PQflush(PGconn *conn);
@@ -454,6 +498,7 @@ extern char *PQcmdTuples(PGresult *res);extern char *PQgetvalue(const PGresult *res, int tup_num,
intfield_num);extern int    PQgetlength(const PGresult *res, int tup_num, int field_num);extern int
PQgetisnull(constPGresult *res, int tup_num, int field_num);
 
+extern void    PQsetRowProcessorErrMsg(PGresult *res, char *msg);extern int    PQnparams(const PGresult *res);extern
Oid   PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..1fc5aab 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -209,6 +209,9 @@ struct pg_result    PGresult_data *curBlock;    /* most recently allocated block */    int
 curOffset;        /* start offset of free space in block */    int            spaceLeft;        /* number of free
bytesremaining in block */
 
+
+    /* temp etorage for message from row processor callback */
+    char       *rowProcessorErrMsg;};/* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -398,7 +401,6 @@ struct pg_conn    /* Status for asynchronous result construction */    PGresult   *result;
 /* result being constructed */
 
-    PGresAttValue *curTuple;    /* tuple currently being read */#ifdef USE_SSL    bool        allow_ssl_try;    /*
Allowedto try SSL negotiation */
 
@@ -443,6 +445,14 @@ struct pg_conn    /* Buffer for receiving various parts of messages */    PQExpBufferData
workBuffer;/* expansible string */
 
+
+    /*
+     * Read column data from network buffer.
+     */
+    PQrowProcessor rowProcessor;/* Function pointer */
+    void *rowProcessorParam;    /* Contextual parameter for rowProcessor */
+    PGrowValue *rowBuf;            /* Buffer for passing values to rowProcessor */
+    int rowBufLen;                /* Number of columns allocated in rowBuf */};/* PGcancel stores all data necessary
tocancel a connection. A copy of this
 
@@ -560,6 +570,7 @@ extern int    pqGets(PQExpBuffer buf, PGconn *conn);extern int    pqGets_append(PQExpBuffer buf,
PGconn*conn);extern int    pqPuts(const char *s, PGconn *conn);extern int    pqGetnchar(char *s, size_t len, PGconn
*conn);
+extern int    pqSkipnchar(size_t len, PGconn *conn);extern int    pqPutnchar(const char *s, size_t len, PGconn
*conn);externint    pqGetInt(int *result, size_t bytes, PGconn *conn);extern int    pqPutInt(int value, size_t bytes,
PGconn*conn); 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..9cb6d65 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,274 @@ int PQisthreadsafe(); </sect1>
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PQresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PQresult</type> is passed to user.  This behaviour can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object to set the row processor function.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>func</parameter></term>
+       <listitem>
+         <para>
+           Storage handler function to set. NULL means to use the
+           default processor.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>param</parameter></term>
+       <listitem>
+         <para>
+           A pointer to contextual parameter passed
+           to <parameter>func</parameter>.
+         </para>
+       </listitem>
+     </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, void *param, PGrowValue *columns);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and 0 for failure.
+       On failure this function should set the error message
+       with <function>PGsetRowProcessorErrMsg</function> if the cause
+       is other than out of memory.
+       When non-blocking API is in use, it can also return 2
+       for early exit from <function>PQisBusy</function> function.
+       The supplied <parameter>res</parameter> and <parameter>columns</parameter>
+       values will stay valid so row can be processed outside of callback.
+       Caller is resposible for tracking whether the <parameter>PQisBusy</parameter>
+       returned early from callback or for other reasons.
+       Usually this should happen via setting cached values to NULL
+       before calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed via <function>PQfinish</function>.
+       Processing can also be continued without closing the connection,
+       call <function>getResult</function> on syncronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaing rows
+       by calling <function>PQskipRemainingResults</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+     <term><parameter>res</parameter></term>
+     <listitem>
+       <para>
+         A pointer to the <type>PGresult</type> object.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>param</parameter></term>
+     <listitem>
+       <para>
+         Extra parameter that was given to <function>PQsetRowProcessor</function>.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>columns</parameter></term>
+     <listitem>
+       <para>
+         Column values of the row to process.  Column values
+         are located in network buffer, the processor must
+         copy them out from there.
+       </para>
+       <para>
+         Column values are not null-terminated, so processor cannot
+         use C string functions on them directly.
+       </para>
+     </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipremaingresults">
+    <term>
+     <function>PQskipRemainigResults</function>
+     <indexterm>
+      <primary>PQskipRemainigResults</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+        Discard all the remaining row data
+        after <function>PQexec</function>
+        or <function>PQgetResult</function> exits by the exception raised
+        in <type>RowProcessor</type> without closing connection.
+      </para>
+      <para>
+        Do not call this function when the functions above return normally.
+<synopsis>
+void PQskipRemainingResults(PGconn *conn)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object.
+         </para>
+       </listitem>
+     </varlistentry>
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessorerrmsg">
+    <term>
+     <function>PQsetRowProcessorErrMsg</function>
+     <indexterm>
+      <primary>PQsetRowProcessorErrMsg</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+    Set the message for the error occurred
+    in <type>PQrowProcessor</type>.  If this message is not set, the
+    caller assumes the error to be out of memory.
+<synopsis>
+void PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+      <varlistentry>
+        <term><parameter>res</parameter></term>
+        <listitem>
+          <para>
+        A pointer to the <type>PGresult</type> object
+        passed to <type>PQrowProcessor</type>.
+          </para>
+        </listitem>
+      </varlistentry>
+      <varlistentry>
+        <term><parameter>msg</parameter></term>
+        <listitem>
+          <para>
+        Error message. This will be copied internally so there is
+        no need to care of the scope.
+          </para>
+          <para>
+        If <parameter>res</parameter> already has a message previously
+        set, it will be overwritten. Set NULL to cancel the the custom
+        message.
+          </para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect1>
+
+ <sect1 id="libpq-build">  <title>Building <application>libpq</application> Programs</title>
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..b1c171a 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,24 @@ typedef struct remoteConn    bool        newXactForCursor;        /* Opened a transaction for a
cursor*/} remoteConn;
 
+typedef struct storeInfo
+{
+    Tuplestorestate *tuplestore;
+    int nattrs;
+    MemoryContext oldcontext;
+    AttInMetadata *attinmeta;
+    char** valbuf;
+    int *valbuflen;
+    char **cstrs;
+    bool error_occurred;
+    bool nummismatch;
+    ErrorData *edata;
+} storeInfo;
+/* * Internal declarations */static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);static remoteConn *getConnectionByName(const
char*name);static HTAB *createConnHash(void);static void createNewConnection(const char *name, remoteConn *rconn);
 
@@ -90,6 +103,10 @@ static char *escape_param_str(const char *from);static void validate_pkattnums(Relation rel,
          int2vector *pkattnums_arg, int32 pknumatts_arg,                   int **pkattnums, int *pknumatts);
 
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+/* Global */static remoteConn *pconn = NULL;
@@ -503,6 +520,7 @@ dblink_fetch(PG_FUNCTION_ARGS)    char       *curname = NULL;    int            howmany = 0;
bool       fail = true;    /* default to backward compatible */
 
+    storeInfo   storeinfo;    DBLINK_INIT;
@@ -559,15 +577,36 @@ dblink_fetch(PG_FUNCTION_ARGS)    appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
/*
 
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+    /*     * Try to execute the query.  Note that since libpq uses malloc, the     * PGresult will be long-lived even
thoughwe are still in a short-lived     * memory context.     */    res = PQexec(conn, buf.data);
 
+    finishStoreInfo(&storeinfo);
+    if (!res ||        (PQresultStatus(res) != PGRES_COMMAND_OK &&         PQresultStatus(res) != PGRES_TUPLES_OK))
{
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }
+        else if (storeinfo.edata)
+            ReThrowError(storeinfo.edata);
+        dblink_res_error(conname, res, "could not fetch from cursor", fail);        return (Datum) 0;    }
@@ -579,8 +618,8 @@ dblink_fetch(PG_FUNCTION_ARGS)                (errcode(ERRCODE_INVALID_CURSOR_NAME),
errmsg("cursor \"%s\" does not exist", curname)));    }
 
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
@@ -640,6 +679,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    remoteConn *rconn = NULL;    bool
      fail = true;    /* default to backward compatible */    bool        freeconn = false;
 
+    storeInfo   storeinfo;    /* check to see if caller supports us returning a tuplestore */    if (rsinfo == NULL ||
!IsA(rsinfo,ReturnSetInfo))
 
@@ -715,164 +755,249 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    rsinfo->setResult = NULL;
rsinfo->setDesc= NULL;
 
+
+    /*
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec/PQgetResult below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+    /* synchronous query, or async result retrieval */
-    if (!is_async)
-        res = PQexec(conn, sql);
-    else
+    PG_TRY();    {
-        res = PQgetResult(conn);
-        /* NULL means we're all done with the async results */
-        if (!res)
-            return (Datum) 0;
+        if (!is_async)
+            res = PQexec(conn, sql);
+        else
+            res = PQgetResult(conn);
+    }
+    PG_CATCH();
+    {
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipRemainingResults(conn);
+        storeinfo.error_occurred = TRUE;    }
+    PG_END_TRY();
-    /* if needed, close the connection to the database and cleanup */
-    if (freeconn)
-        PQfinish(conn);
+    finishStoreInfo(&storeinfo);
-    if (!res ||
-        (PQresultStatus(res) != PGRES_COMMAND_OK &&
-         PQresultStatus(res) != PGRES_TUPLES_OK))
+    /* NULL res from async get means we're all done with the results */
+    if (res || !is_async)    {
-        dblink_res_error(conname, res, "could not execute query", fail);
-        return (Datum) 0;
+        if (freeconn)
+            PQfinish(conn);
+
+        if (!res ||
+            (PQresultStatus(res) != PGRES_COMMAND_OK &&
+             PQresultStatus(res) != PGRES_TUPLES_OK))
+        {
+            /* finishStoreInfo saves the fields referred to below. */
+            if (storeinfo.nummismatch)
+            {
+                /* This is only for backward compatibility */
+                ereport(ERROR,
+                        (errcode(ERRCODE_DATATYPE_MISMATCH),
+                         errmsg("remote query result rowtype does not match "
+                                "the specified FROM clause rowtype")));
+            }
+            else if (storeinfo.edata)
+                ReThrowError(storeinfo.edata);
+
+            dblink_res_error(conname, res, "could not execute query", fail);
+            return (Datum) 0;
+        }    }
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo){    ReturnSetInfo *rsinfo = (ReturnSetInfo *)
fcinfo->resultinfo;
+    TupleDesc    tupdesc;
+    int i;
-    Assert(rsinfo->returnMode == SFRM_Materialize);
+    switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+    {
+        case TYPEFUNC_COMPOSITE:
+            /* success */
+            break;
+        case TYPEFUNC_RECORD:
+            /* failed to determine actual type of RECORD */
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+            break;
+        default:
+            /* result type isn't composite */
+            elog(ERROR, "return type must be a row type");
+            break;
+    }
-    PG_TRY();
+    sinfo->oldcontext = MemoryContextSwitchTo(
+        rsinfo->econtext->ecxt_per_query_memory);
+
+    /* make sure we have a persistent copy of the tupdesc */
+    tupdesc = CreateTupleDescCopy(tupdesc);
+
+    sinfo->error_occurred = FALSE;
+    sinfo->nummismatch = FALSE;
+    sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+    sinfo->edata = NULL;
+    sinfo->nattrs = tupdesc->natts;
+    sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+    sinfo->valbuf = NULL;
+    sinfo->valbuflen = NULL;
+
+    /* Preallocate memory of same size with c string array for values. */
+    sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+    if (sinfo->valbuf)
+        sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+    if (sinfo->valbuflen)
+        sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
+
+    if (sinfo->cstrs == NULL)    {
-        TupleDesc    tupdesc;
-        bool        is_sql_cmd = false;
-        int            ntuples;
-        int            nfields;
+        if (sinfo->valbuf)
+            free(sinfo->valbuf);
+        if (sinfo->valbuflen)
+            free(sinfo->valbuflen);
-        if (PQresultStatus(res) == PGRES_COMMAND_OK)
-        {
-            is_sql_cmd = true;
+        ereport(ERROR,
+                (errcode(ERRCODE_OUT_OF_MEMORY),
+                 errmsg("out of memory")));
+    }
-            /*
-             * need a tuple descriptor representing one TEXT column to return
-             * the command status string as our result tuple
-             */
-            tupdesc = CreateTemplateTupleDesc(1, false);
-            TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-                               TEXTOID, -1, 0);
-            ntuples = 1;
-            nfields = 1;
-        }
-        else
-        {
-            Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+    for (i = 0 ; i < sinfo->nattrs ; i++)
+    {
+        sinfo->valbuf[i] = NULL;
+        sinfo->valbuflen[i] = -1;
+    }
-            is_sql_cmd = false;
+    rsinfo->setResult = sinfo->tuplestore;
+    rsinfo->setDesc = tupdesc;
+}
-            /* get a tuple descriptor for our result type */
-            switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-            {
-                case TYPEFUNC_COMPOSITE:
-                    /* success */
-                    break;
-                case TYPEFUNC_RECORD:
-                    /* failed to determine actual type of RECORD */
-                    ereport(ERROR,
-                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                        errmsg("function returning record called in context "
-                               "that cannot accept type record")));
-                    break;
-                default:
-                    /* result type isn't composite */
-                    elog(ERROR, "return type must be a row type");
-                    break;
-            }
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+    int i;
-            /* make sure we have a persistent copy of the tupdesc */
-            tupdesc = CreateTupleDescCopy(tupdesc);
-            ntuples = PQntuples(res);
-            nfields = PQnfields(res);
+    if (sinfo->valbuf)
+    {
+        for (i = 0 ; i < sinfo->nattrs ; i++)
+        {
+            if (sinfo->valbuf[i])
+                free(sinfo->valbuf[i]);        }
+        free(sinfo->valbuf);
+        sinfo->valbuf = NULL;
+    }
-        /*
-         * check result and tuple descriptor have the same number of columns
-         */
-        if (nfields != tupdesc->natts)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("remote query result rowtype does not match "
-                            "the specified FROM clause rowtype")));
+    if (sinfo->valbuflen)
+    {
+        free(sinfo->valbuflen);
+        sinfo->valbuflen = NULL;
+    }
-        if (ntuples > 0)
-        {
-            AttInMetadata *attinmeta;
-            Tuplestorestate *tupstore;
-            MemoryContext oldcontext;
-            int            row;
-            char      **values;
-
-            attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-            oldcontext = MemoryContextSwitchTo(
-                                    rsinfo->econtext->ecxt_per_query_memory);
-            tupstore = tuplestore_begin_heap(true, false, work_mem);
-            rsinfo->setResult = tupstore;
-            rsinfo->setDesc = tupdesc;
-            MemoryContextSwitchTo(oldcontext);
+    if (sinfo->cstrs)
+    {
+        free(sinfo->cstrs);
+        sinfo->cstrs = NULL;
+    }
-            values = (char **) palloc(nfields * sizeof(char *));
+    MemoryContextSwitchTo(sinfo->oldcontext);
+}
-            /* put all tuples into the tuplestore */
-            for (row = 0; row < ntuples; row++)
-            {
-                HeapTuple    tuple;
+static int
+storeHandler(PGresult *res, void *param, PGrowValue *columns)
+{
+    storeInfo *sinfo = (storeInfo *)param;
+    HeapTuple  tuple;
+    int        fields = PQnfields(res);
+    int        i;
+    char      **cstrs = sinfo->cstrs;
-                if (!is_sql_cmd)
-                {
-                    int            i;
+    if (sinfo->error_occurred)
+        return FALSE;
-                    for (i = 0; i < nfields; i++)
-                    {
-                        if (PQgetisnull(res, row, i))
-                            values[i] = NULL;
-                        else
-                            values[i] = PQgetvalue(res, row, i);
-                    }
-                }
-                else
-                {
-                    values[0] = PQcmdStatus(res);
-                }
+    if (sinfo->nattrs != fields)
+    {
+        sinfo->error_occurred = TRUE;
+        sinfo->nummismatch = TRUE;
+        finishStoreInfo(sinfo);
+
+        /* This error will be processed in
+         * dblink_record_internal(). So do not set error message
+         * here. */
+        return FALSE;
+    }
+
+    /*
+     * value input functions assumes that the input string is
+     * terminated by zero. We should make the values to be so.
+     */
+    for(i = 0 ; i < fields ; i++)
+    {
+        int len = columns[i].len;
+        if (len < 0)
+            cstrs[i] = NULL;
+        else
+        {
+            char *tmp = sinfo->valbuf[i];
+            int tmplen = sinfo->valbuflen[i];
-                /* build the tuple and put it into the tuplestore. */
-                tuple = BuildTupleFromCStrings(attinmeta, values);
-                tuplestore_puttuple(tupstore, tuple);
+            /*
+             * Divide calls to malloc and realloc so that things will
+             * go fine even on the systems of which realloc() does not
+             * accept NULL as old memory block.
+             *
+             * Also try to (re)allocate in bigger steps to
+             * avoid flood of allocations on weird data.
+             */
+            if (tmp == NULL)
+            {
+                tmplen = len + 1;
+                if (tmplen < 64)
+                    tmplen = 64;
+                tmp = (char *)malloc(tmplen);
+            }
+            else if (tmplen < len + 1)
+            {
+                if (len + 1 > tmplen * 2)
+                    tmplen = len + 1;
+                else
+                    tmplen = tmplen * 2;
+                tmp = (char *)realloc(tmp, tmplen);            }
-            /* clean up and return the tuplestore */
-            tuplestore_donestoring(tupstore);
-        }
+            /*
+             * sinfo->valbuf[n] will be freed in finishStoreInfo()
+             * when realloc returns NULL.
+             */
+            if (tmp == NULL)
+                return FALSE;
-        PQclear(res);
-    }
-    PG_CATCH();
-    {
-        /* be sure to release the libpq result */
-        PQclear(res);
-        PG_RE_THROW();
+            sinfo->valbuf[i] = tmp;
+            sinfo->valbuflen[i] = tmplen;
+
+            cstrs[i] = sinfo->valbuf[i];
+            memcpy(cstrs[i], columns[i].value, len);
+            cstrs[i][len] = '\0';
+        }    }
-    PG_END_TRY();
+
+    /*
+     * These functions may throw exception. It will be caught in
+     * dblink_record_internal()
+     */
+    tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+    tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+    return TRUE;}/*
diff --git b/doc/src/sgml/libpq.sgml a/doc/src/sgml/libpq.sgml
index de95ea8..9cb6d65 100644
--- b/doc/src/sgml/libpq.sgml
+++ a/doc/src/sgml/libpq.sgml
@@ -7352,6 +7352,14 @@ typedef struct       On failure this function should set the error message       with
<function>PGsetRowProcessorErrMsg</function>if the cause       is other than out of memory.
 
+       When non-blocking API is in use, it can also return 2
+       for early exit from <function>PQisBusy</function> function.
+       The supplied <parameter>res</parameter> and <parameter>columns</parameter>
+       values will stay valid so row can be processed outside of callback.
+       Caller is resposible for tracking whether the <parameter>PQisBusy</parameter>
+       returned early from callback or for other reasons.
+       Usually this should happen via setting cached values to NULL
+       before calling <function>PQisBusy</function>.     </para>     <para>
diff --git b/src/interfaces/libpq/fe-protocol2.c a/src/interfaces/libpq/fe-protocol2.c
index 7498580..ae4d7b0 100644
--- b/src/interfaces/libpq/fe-protocol2.c
+++ a/src/interfaces/libpq/fe-protocol2.c
@@ -820,6 +820,9 @@ getAnotherTuple(PGconn *conn, bool binary)    rp= conn->rowProcessor(result,
conn->rowProcessorParam,rowbuf);    if (rp == 1)        return 0;
 
+    else if (rp == 2 && pqIsnonblocking(conn))
+        /* processor requested early exit */
+        return EOF;    else if (rp != 0)        PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value
fromrow processor\n"));
 
diff --git b/src/interfaces/libpq/fe-protocol3.c a/src/interfaces/libpq/fe-protocol3.c
index a67e3ac..0260ba6 100644
--- b/src/interfaces/libpq/fe-protocol3.c
+++ a/src/interfaces/libpq/fe-protocol3.c
@@ -697,6 +697,11 @@ getAnotherTuple(PGconn *conn, int msgLength)        /* everything is good */        return 0;
}
+    if (rp == 2 && pqIsnonblocking(conn))
+    {
+        /* processor requested early exit */
+        return EOF;
+    }    /* there was some problem */    if (rp == 0)

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

16 February 2012, 11:17:46

On Thu, Feb 16, 2012 at 05:49:34PM +0900, Kyotaro HORIGUCHI wrote:
>  I added the function PQskipRemainingResult() and use it in
> dblink. This reduces the number of executing try-catch block from
> the number of rows to one per query in dblink.

This implementation is wrong - you must not simply call PQgetResult()
when connection is in async mode.  And the resulting PGresult must
be freed.

Please just make PGsetRowProcessorErrMsg() callable outside from
callback.  That also makes clear to code that sees final PGresult
what happened.  As a bonus, this will simply make the function
more robust and less special.

Although it's easy to create some PQsetRowSkipFlag() function
that will silently skip remaining rows, I think such logic
is best left to user to handle.  It creates unnecessary special
case when handling final PGresult, so in the big picture
it creates confusion.

> This patch is based on the patch above and composed in the same
> manner - main three patches include all modifications and the '2'
> patch separately.

I think there is not need to carry the early-exit patch separately.
It is visible in archives and nobody screamed about it yet,
so I guess it's acceptable.  Also it's so simple, there is no
point in spending time rebasing it.

> diff --git a/src/interfaces/libpq/#fe-protocol3.c# b/src/interfaces/libpq/#fe-protocol3.c#

There is some backup file in your git repo. 

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

18 February 2012, 17:00:19

Demos/tests of the new API:
 https://github.com/markokr/libpq-rowproc-demos

Comments resulting from that:

- PQsetRowProcessorErrMsg() should take const char*

- callback API should be (void *, PGresult *, PQrowValue*) or void* at the end, but not in the middle

I have not looked yet what needs to be done to get
ErrMsg callable outside of callback, if it requires PGconn,
then we should add PGconn also to callback args.

> On Thu, Feb 16, 2012 at 05:49:34PM +0900, Kyotaro HORIGUCHI wrote:
>>  I added the function PQskipRemainingResult() and use it in
>> dblink. This reduces the number of executing try-catch block from
>> the number of rows to one per query in dblink.

I still think we don't need extra skipping function.

Yes, the callback function needs have a flag to know that
rows need to be skip, but for such low-level API it does
not seem to be that hard requirement.

If this really needs to be made easier then getRowProcessor
might be better approach, to allow easy implementation
of generic skipping func for user.

--
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

21 February 2012, 01:12:22

Hello, 

I don't have any attachment to PQskipRemainingResults(), but I
think that some formal method to skip until Command-Complete('C')
without breaking session is necessary because current libpq does
so.

====
> On Thu, Feb 16, 2012 at 02:24:19PM +0900, Kyotaro HORIGUCHI wrote:
> > The choices of the libpq user on that point are,
> > 
> > - Continue to read succeeding tuples.
> > - Throw away the remaining results.
> 
> There is also third choice, which may be even more popular than
> those ones - PQfinish().

That's it. I've implicitly assumed not to tear off the current
session.

> I think we already have such function - PQsetRowProcessor().
> Considering the user can use that to install skipping callback
> or simply set some flag in it's own per-connection state,

PQsetRowProcessor() sets row processor not to PGresult but
PGconn. So using PGsetRowProcessor() makes no sense for the
PGresult on which the user currently working. Another interface
function is needed to do that on PGresult.

But of course the users can do that by signalling using flags
within their code without PQsetRowProcessor or
PQskipRemainingResults.

Or returning to the beginning implement using PG_TRY() to inhibit
longjmp out of the row processor itself makes that possible too.

Altough it is possible in that ways, it needs (currently)
undocumented (in human-readable langs :-) knowledge about the
client protocol and the handling manner of that in libpq which
might be changed with no description in the release notes.

> I suspect the need is not that big.

I think so, too. But current implement of libpq does so for the
out-of-memory on receiving result rows. So I think some formal
(documented, in other words) way to do that is indispensable.

> But if you want to set error state for skipping, I would instead
> generalize PQsetRowProcessorErrMsg() to support setting error state
> outside of callback.  That would also help the external processing with
> 'return 2'.  But I would keep the requirement that row processing must
> be ongoing, standalone error setting does not make sense.  Thus the name
> can stay.

mmm.. I consider that the cause of the problem proposed here is
the exceptions raised by certain server-side functions called in
row processor especially in C/C++ code. And we shouldn't use
PG_TRY() to catch it there where is too frequently executed. I
think 'return 2' is not applicable for the case. Some aid for
non-local exit from row processors (PQexec and the link from
users's sight) is needed. And I think it should be formal.

> There seems to be 2 ways to do it:
> 
> 1) Replace the PGresult under PGconn.  This seems ot require that
>    PQsetRowProcessorErrMsg takes PGconn as argument instead of
>    PGresult.  This also means callback should get PGconn as
>    argument.  Kind of makes sense even.
> 
> 2) Convert current partial PGresult to error state.  That also
>    makes sense, current use ->rowProcessorErrMsg to transport
>    the message to later set the error in caller feels weird.
> 
> I guess I slightly prefer 2) to 1).

The former might be inappropreate from the point of view of the
`undocumented knowledge' above. The latter seems missing the
point about exceptions.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

21 February 2012, 03:41:39

On Tue, Feb 21, 2012 at 02:11:35PM +0900, Kyotaro HORIGUCHI wrote:
> I don't have any attachment to PQskipRemainingResults(), but I
> think that some formal method to skip until Command-Complete('C')
> without breaking session is necessary because current libpq does
> so.

We have such function: PQgetResult().  Only question is how
to flag that the rows should be dropped.

> > I think we already have such function - PQsetRowProcessor().
> > Considering the user can use that to install skipping callback
> > or simply set some flag in it's own per-connection state,
> 
> PQsetRowProcessor() sets row processor not to PGresult but
> PGconn. So using PGsetRowProcessor() makes no sense for the
> PGresult on which the user currently working. Another interface
> function is needed to do that on PGresult.

If we are talking about skipping incoming result rows,
it's PGconn feature, not PGresult.  Because you want to do
network traffic for that, yes?

Also, as row handler is on connection, then changing it's
behavior is connection context, not result.

> But of course the users can do that by signalling using flags
> within their code without PQsetRowProcessor or
> PQskipRemainingResults.
> 
> Or returning to the beginning implement using PG_TRY() to inhibit
> longjmp out of the row processor itself makes that possible too.
> 
> Altough it is possible in that ways, it needs (currently)
> undocumented (in human-readable langs :-) knowledge about the
> client protocol and the handling manner of that in libpq which
> might be changed with no description in the release notes.

You might be right that how to do it may not be obvious.

Ok, lets see how it looks.  But please do it like this:

- PQgetRowProcessor() that returns existing row processor.

- PQskipResult() that just replaces row processor, then calls PQgetResult() to eat the result.  It's behaviour fully
matchesPQgetResult() then.

I guess the main thing that annoys me with skipping is that
it would require additional magic flag inside libpq.
With PQgetRowProcessor() it does not need anymore, it's
just a helper function that user can implement as well.

Only question here is what should happen if there are
several incoming resultsets (several queries in PQexec).
Should we leave to user to loop over them?

Seems there is 2 approaches here:

1) int PQskipResult(PGconn)
2) int PQskipResult(PGconn, int skipAll)

Both cases returning:   1 - got resultset, there might be more   0 - PQgetResult() returned NULL, connection is empty
-1- error

Although 1) mirrors regular PGgetResult() better, most users
might not know that function as they are using sync API.
They have simpler time with 2).  So 2) then?

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

21 February 2012, 05:43:16

Hello,

> > I don't have any attachment to PQskipRemainingResults(), but I
> > think that some formal method to skip until Command-Complete('C')
> > without breaking session is necessary because current libpq does
> > so.
> 
> We have such function: PQgetResult().  Only question is how
> to flag that the rows should be dropped.

I agree with it. I did this by conn->result->resultStatus ==
PGRES_FATAL_ERROR that instructs pqParseInput[23]() to skip
calling getAnotherTuple() but another means to do the same thing
without marking error is needed.

> Also, as row handler is on connection, then changing it's
> behavior is connection context, not result.

OK, current implement copying the pointer to the row processor
from PGconn to PGresult and getAnotherTuple() using it on
PGresult to avoid unintended replacement of row processor by
PQsetRowProcessor(), and I understand that row processor setting
should be in PGconn context and the change by PGsetRowProcessor()
should immediately become effective. That's right?

> Ok, lets see how it looks.  But please do it like this:
> 
> - PQgetRowProcessor() that returns existing row processor.

This seems also can be done by the return value of
PQsetRowProcessor() (currently void). Anyway, I provide this
function and also change the return value of PQsetRowProcessor().

> - PQskipResult() that just replaces row processor, then calls
>   PQgetResult() to eat the result.  It's behaviour fully
>   matches PQgetResult() then.

There seems to be two choices, one is that PQskipResult()
replaces the row processor with NULL pointer and
getAnotherTuple() calls row processor if not NULL. And the
another is PQskipResult() sets the empty function as row
processor. I do the latter for the present.

This approach does needless call of getAnotherTuple(). Seeing if
the pointer to row processor is NULL in pqParseInput[23]() could
avoid this extra calls and may reduce the time for skip, but I
think it is no need to do so for such rare cases.

> I guess the main thing that annoys me with skipping is that
> it would require additional magic flag inside libpq.
> With PQgetRowProcessor() it does not need anymore, it's
> just a helper function that user can implement as well.

Ok.

> Only question here is what should happen if there are
> several incoming resultsets (several queries in PQexec).
> Should we leave to user to loop over them?
> 
> Seems there is 2 approaches here:
> 
> 1) int PQskipResult(PGconn)
> 2) int PQskipResult(PGconn, int skipAll)
> 
> Both cases returning:
>     1 - got resultset, there might be more
>     0 - PQgetResult() returned NULL, connection is empty
>    -1 - error
> 
> Although 1) mirrors regular PGgetResult() better, most users
> might not know that function as they are using sync API.
> They have simpler time with 2).  So 2) then?

Let me confirm the effects of this function. Is the below
description right?

- PQskipResult(conn, false) makes just following PQgetResult() to skip current bunch of rows and the consequent
PQgetResult()'sgathers rows as usual.
 

- PQskipResult(conn, true) makes all consequent PQgetResult()'s to skip all the rows.

If this is right, row processor should stay also in PGresult
context. PQskipResult() replaces the row processor in PGconn when
the second parameter is true, and in PGresult for false.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

21 February 2012, 05:56:25

Sorry, I should fix a wrong word selection..

> Let me confirm the effects of this function. Is the below
> description right?
> 
> - PQskipResult(conn, false) makes just following PQgetResult() to
>   skip current bunch of rows and the consequent PQgetResult()'s
>   gathers rows as usual.

- PQskipResult(conn, false) makes just following PQgetResult() to skip current bunch of rows and the succeeding
PQgetResult()'sgathers rows as usual.             ~~~~~~~~~~
 

> - PQskipResult(conn, true) makes all consequent PQgetResult()'s
>   to skip all the rows.

- PQskipResult(conn, true) makes all succeeding PQgetResult()'s to skip all the rows.              ~~~~~~~~~~

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

21 February 2012, 05:57:14

On Tue, Feb 21, 2012 at 11:42 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:
>> Also, as row handler is on connection, then changing it's
>> behavior is connection context, not result.
>
> OK, current implement copying the pointer to the row processor
> from PGconn to PGresult and getAnotherTuple() using it on
> PGresult to avoid unintended replacement of row processor by
> PQsetRowProcessor(), and I understand that row processor setting
> should be in PGconn context and the change by PGsetRowProcessor()
> should immediately become effective. That's right?

Note I dropped the row processor from under PGresult.
Please don't put it back there.

>> Ok, lets see how it looks.  But please do it like this:
>>
>> - PQgetRowProcessor() that returns existing row processor.
>
> This seems also can be done by the return value of
> PQsetRowProcessor() (currently void). Anyway, I provide this
> function and also change the return value of PQsetRowProcessor().

Note you need processorParam as well.
I think it's simpler to rely on PQgetProcessor()

>> - PQskipResult() that just replaces row processor, then calls
>>   PQgetResult() to eat the result.  It's behaviour fully
>>   matches PQgetResult() then.
>
> There seems to be two choices, one is that PQskipResult()
> replaces the row processor with NULL pointer and
> getAnotherTuple() calls row processor if not NULL. And the
> another is PQskipResult() sets the empty function as row
> processor. I do the latter for the present.

Yes, let's avoid NULLs.

> This approach does needless call of getAnotherTuple(). Seeing if
> the pointer to row processor is NULL in pqParseInput[23]() could
> avoid this extra calls and may reduce the time for skip, but I
> think it is no need to do so for such rare cases.

We should optimize for common case, which is non-skipping
row processor.

>> Only question here is what should happen if there are
>> several incoming resultsets (several queries in PQexec).
>> Should we leave to user to loop over them?
>>
>> Seems there is 2 approaches here:
>>
>> 1) int PQskipResult(PGconn)
>> 2) int PQskipResult(PGconn, int skipAll)
>>
>> Both cases returning:
>>     1 - got resultset, there might be more
>>     0 - PQgetResult() returned NULL, connection is empty
>>    -1 - error
>>
>> Although 1) mirrors regular PGgetResult() better, most users
>> might not know that function as they are using sync API.
>> They have simpler time with 2).  So 2) then?
>
> Let me confirm the effects of this function. Is the below
> description right?
>
> - PQskipResult(conn, false) makes just following PQgetResult() to
>  skip current bunch of rows and the consequent PQgetResult()'s
>  gathers rows as usual.

Yes.

> - PQskipResult(conn, true) makes all consequent PQgetResult()'s
>  to skip all the rows.
>
> If this is right, row processor should stay also in PGresult
> context. PQskipResult() replaces the row processor in PGconn when
> the second parameter is true, and in PGresult for false.

No, let's keep row processor only under PGconn.

--
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

21 February 2012, 06:14:33

Hello,

> Note I dropped the row processor from under PGresult.
> Please don't put it back there.

I overlooked that. I understand it.

> > This seems also can be done by the return value of
> > PQsetRowProcessor() (currently void). Anyway, I provide this
> > function and also change the return value of PQsetRowProcessor().
>
> Note you need processorParam as well.
> I think it's simpler to rely on PQgetProcessor()

Hmm. Ok.

> > Let me confirm the effects of this function. Is the below
> > description right?
> >
> > - PQskipResult(conn, false) makes just following PQgetResult() to
> >  skip current bunch of rows and the consequent PQgetResult()'s
> >  gathers rows as usual.
>
> Yes.
>
> > - PQskipResult(conn, true) makes all consequent PQgetResult()'s
> >  to skip all the rows.

Well, Is this right?

> > If this is right, row processor should stay also in PGresult
> > context. PQskipResult() replaces the row processor in PGconn when
> > the second parameter is true, and in PGresult for false.
>
> No, let's keep row processor only under PGconn.

Then, Should I add the stash for the row processor (and needless
for param) to recall after in PGconn?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

21 February 2012, 06:45:51

On Tue, Feb 21, 2012 at 12:13 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:
>> > - PQskipResult(conn, true) makes all consequent PQgetResult()'s
>> >  to skip all the rows.
>
> Well, Is this right?

Yes, call getResult() until it returns NULL.

>> > If this is right, row processor should stay also in PGresult
>> > context. PQskipResult() replaces the row processor in PGconn when
>> > the second parameter is true, and in PGresult for false.
>>
>> No, let's keep row processor only under PGconn.
>
> Then, Should I add the stash for the row processor (and needless
> for param) to recall after in PGconn?

PQskipResult:
- store old callback and param in local vars
- set do-nothing row callback
- call PQgetresult() once, or until it returns NULL
- restore old callback
- return 1 if last result was non-NULL, 0 otherwise

--
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

21 February 2012, 07:15:25

<p>Thank you. Everything seems clear.<br /> Please wait for a while.<p>> PQskipResult:<br /> > - store old
callbackand param in local vars<br /> > - set do-nothing row callback<br /> > - call PQgetresu<blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, Feb 21, 2012 at 12:13
PM,Kyotaro HORIGUCHI<br /> <<a href="mailto:horiguchi.kyotaro@oss.ntt.co.jp">horiguchi.kyotaro@oss.ntt.co.jp</a>>
wrote:<br/> >> > - PQskipResult(conn, true) makes all consequent PQgetResult()'s<br /> >> >  to skip
allthe rows.<br /> ><br /> > Well, Is this right?<br /><br /> Yes, call getResult() until it returns NULL.<br
/><br/> >> > If this is right, row processor should stay also in PGresult<br /> >> > context.
PQskipResult()replaces the row processor in PGconn when<br /> >> > the second parameter is true, and in
PGresultfor false.<br /> >><br /> >> No, let's keep row processor only under PGconn.<br /> ><br /> >
Then,Should I add the stash for the row processor (and needless<br /> > for param) to recall after in PGconn?<br
/><br/> PQskipResult:<br /> - store old callback and param in local vars<br /> - set do-nothing row callback<br /> -
callPQgetresult() once, or until it returns NULL<br /> - restore old callback<br /> - return 1 if last result was
non-NULL,0 otherwise<br /><br /> --<br /> marko<br /><br /></blockquote>

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

21 February 2012, 11:28:17

Hello, this is ... nth version of the patch.

> Thank you. Everything seems clear.
> Please wait for a while.

libpq-fe.h
- PQskipResult() is added instead of PGskipRemainigResults().

fe-exec.c
- PQskipResult() is added instead of PGskipRemainigResults().
- PQgetRowProcessor() is added.

dblink.c
- Use PQskipReslt() to skip remaining rows.
- Shorten the path from catching exception to rethrowing it. And
  storeInfo.edata has been removed due to that.
- dblink_fetch() now catches exceptions properly.

libpq.sgml
- PQskipResult() is added instead of PGskipRemainigResults().
- Some misspelling has been corrected.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

21 February 2012, 19:36:52

On Wed, Feb 22, 2012 at 12:27:57AM +0900, Kyotaro HORIGUCHI wrote:
> fe-exec.c
> - PQskipResult() is added instead of PGskipRemainigResults().

It must free the PGresults that PQgetResult() returns.

Also, please fix 2 issues mentined here:
 http://archives.postgresql.org/message-id/CACMqXCLvpkjb9+c6sqJXitMHvrRCo+yu4q4bQ--0d7L=vw62Yg@mail.gmail.com

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

23 February 2012, 06:14:50

Hello, this is new version of the patch.

# This patch is based on the commit
# 2bbd88f8f841b01efb073972b60d4dc1ff1f6fd0 @ Feb 13 to avoid the
# compile error caused by undeclared LEAKPROOF in kwlist.h.

> It must free the PGresults that PQgetResult() returns.
I'm sorry. It slipped out of my mind. Add PQclear() for thereturn value.

> Also, please fix 2 issues mentined here:

- PQsetRowProcessorErrMsg() now handles msg as const string.

- Changed the order of the parameters of the type PQrowProcessor. New order is (PGresult *res, PGrowValue *columns,
void*params).
 

# PQsetRowProcessorErrMsg outside of callback is not implemented.

- Documentation and dblink are modified according to the changes above.


By the way, I would like to ask you one question. What is the
reason why void* should be head or tail of the parameter list?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..7e02497 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,7 @@ PQconnectStartParams      157PQping                    158PQpingParams              159PQlibVersion
            160
 
+PQsetRowProcessor      161
+PQgetRowProcessor      162
+PQsetRowProcessorErrMsg      163
+PQskipResult          164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)    conn->wait_ssl_try = false;#endif
+    /* set default row processor */
+    PQsetRowProcessor(conn, NULL, NULL);
+    /*     * We try to send at least 8K at a time, which is the usual size of pipe     * buffers on Unix systems.
Thatway, when we are sending a large amount
 
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)    initPQExpBuffer(&conn->errorMessage);
initPQExpBuffer(&conn->workBuffer);
+    /* set up initial row buffer */
+    conn->rowBufLen = 32;
+    conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+    if (conn->inBuffer == NULL ||        conn->outBuffer == NULL ||
+        conn->rowBuf == NULL ||        PQExpBufferBroken(&conn->errorMessage) ||
PQExpBufferBroken(&conn->workBuffer))   {
 
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)        free(conn->inBuffer);    if (conn->outBuffer)
free(conn->outBuffer);
+    if (conn->rowBuf)
+        free(conn->rowBuf);    termPQExpBuffer(&conn->errorMessage);    termPQExpBuffer(&conn->workBuffer);
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)    return prev;}
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..cd287cd 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);static int PQsendDescribe(PGconn *conn, char desc_type,
           const char *desc_target);static int    check_field_number(const PGresult *res, int field_num);
 
+static int    pqAddRow(PGresult *res, PGrowValue *columns, void *param);/* ----------------
@@ -160,6 +161,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)    result->curBlock = NULL;
result->curOffset= 0;    result->spaceLeft = 0;
 
+    result->rowProcessorErrMsg = NULL;    if (conn)    {
@@ -701,7 +703,6 @@ pqClearAsyncResult(PGconn *conn)    if (conn->result)        PQclear(conn->result);    conn->result
=NULL;
 
-    conn->curTuple = NULL;}/*
@@ -756,7 +757,6 @@ pqPrepareAsyncResult(PGconn *conn)     */    res = conn->result;    conn->result = NULL;        /*
handingover ownership to caller */
 
-    conn->curTuple = NULL;        /* just in case */    if (!res)        res = PQmakeEmptyPGresult(conn,
PGRES_FATAL_ERROR);   else
 
@@ -828,6 +828,87 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)}/*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+    conn->rowProcessor = (func ? func : pqAddRow);
+    conn->rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+    if (param)
+        *param = conn->rowProcessorParam;
+
+    return conn->rowProcessor;
+}
+
+/*
+ * PQsetRowProcessorErrMsg
+ *    Set the error message pass back to the caller of RowProcessor.
+ *
+ *    You can replace the previous message by alternative mes, or clear
+ *    it with NULL.
+ */
+void
+PQsetRowProcessorErrMsg(PGresult *res, const char *msg)
+{
+    if (msg)
+        res->rowProcessorErrMsg = pqResultStrdup(res, msg);
+    else
+        res->rowProcessorErrMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *      add a row to the PGresult structure, growing it if necessary
+ *      Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+{
+    PGresAttValue *tup;
+    int            nfields = res->numAttributes;
+    int            i;
+
+    tup = (PGresAttValue *)
+        pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+    if (tup == NULL)
+        return FALSE;
+
+    for (i = 0 ; i < nfields ; i++)
+    {
+        tup[i].len = columns[i].len;
+        if (tup[i].len == NULL_LEN)
+        {
+            tup[i].value = res->null_field;
+        }
+        else
+        {
+            bool isbinary = (res->attDescs[i].format != 0);
+            tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+            if (tup[i].value == NULL)
+                return FALSE;
+
+            memcpy(tup[i].value, columns[i].value, tup[i].len);
+            /* We have to terminate this ourselves */
+            tup[i].value[tup[i].len] = '\0';
+        }
+    }
+
+    return pqAddTuple(res, tup);
+}
+
+/* * pqAddTuple *      add a row pointer to the PGresult structure, growing it if necessary *      Returns TRUE if OK,
FALSEif not enough memory to add the row
 
@@ -1223,7 +1304,6 @@ PQsendQueryStart(PGconn *conn)    /* initialize async result-accumulation state */
conn->result= NULL;
 
-    conn->curTuple = NULL;    /* ready to send command message */    return true;
@@ -1831,6 +1911,55 @@ PQexecFinish(PGconn *conn)    return lastResult;}
+
+/*
+ * Do-nothing row processor for PQskipResult
+ */
+static int
+dummyRowProcessor(PGresult *res, PGrowValue *columns, void *param)
+{
+    return 1;
+}
+
+/*
+ * Exaust remaining Data Rows in curret conn.
+ * 
+ * Exaust current result if skipAll is false and all succeeding results if
+ * true.
+ */
+int
+PQskipResult(PGconn *conn, int skipAll)
+{
+    PQrowProcessor savedRowProcessor;
+    void * savedRowProcParam;
+    PGresult *res;
+    int ret = 0;
+
+    /* save the current row processor settings and set dummy processor */
+    savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+    PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+    
+    /*
+     * Throw away the remaining rows in current result, or all succeeding
+     * results if skipAll is not FALSE.
+     */
+    if (skipAll)
+    {
+        while ((res = PQgetResult(conn)) != NULL)
+            PQclear(res);
+    }
+    else if ((res = PQgetResult(conn)) != NULL)
+    {
+        PQclear(res);
+        ret = 1;
+    }
+    
+    PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+
+    return ret;
+}
+
+/* * PQdescribePrepared *      Obtain information about a previously prepared statement
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)}/*
+ * pqGetnchar:
+ *    skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+    if (len > (size_t) (conn->inEnd - conn->inCursor))
+        return EOF;
+
+    conn->inCursor += len;
+
+    if (conn->Pfdebug)
+        fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+                (unsigned long) len);
+
+    return 0;
+}
+
+/* * pqPutnchar: *    write exactly len bytes to the current message */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..6578019 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, FALSE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, TRUE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -703,52 +707,51 @@ failure:/* * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor. * Returns: 0 if completed message, EOF if error
ornot enough data yet. * * Note that if we run out of data, we have to suspend and reprocess
 
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, bool binary){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    /* the backend sends us a bitmap of which attributes are null */    char
std_bitmap[64];/* used unless it doesn't fit */    char       *bitmap = std_bitmap;    int            i;
 
+    int            rp;    size_t        nbytes;            /* the number of bytes in bitmap  */    char        bmap;
        /* One byte of the bitmap */    int            bitmap_index;    /* Its index */    int            bitcnt;
    /* number of bits examined in current byte */    int            vlen;            /* length of the current field
value*/
 
+    /* resize row buffer if needed */
+    if (nfields > conn->rowBufLen)
+    {
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)
+            goto rowProcessError;
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;
+    }
+    else
+    {
+        rowbuf = conn->rowBuf;
+    }
+    result->binary = binary;
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
+    if (binary)    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-        /*
-         * If it's binary, fix the column format indicators.  We assume the
-         * backend will consistently send either B or D, not a mix.
-         */
-        if (binary)
-        {
-            for (i = 0; i < nfields; i++)
-                result->attDescs[i].format = 1;
-        }
+        for (i = 0; i < nfields; i++)
+            result->attDescs[i].format = 1;    }
-    tup = conn->curTuple;    /* Get the null-value bitmap */    nbytes = (nfields + BITS_PER_BYTE - 1) /
BITS_PER_BYTE;
@@ -757,7 +760,7 @@ getAnotherTuple(PGconn *conn, bool binary)    {        bitmap = (char *) malloc(nbytes);        if
(!bitmap)
-            goto outOfMemory;
+            goto rowProcessError;    }    if (pqGetnchar(bitmap, nbytes, conn))
@@ -771,34 +774,29 @@ getAnotherTuple(PGconn *conn, bool binary)    for (i = 0; i < nfields; i++)    {        if
(!(bmap& 0200))
 
-        {
-            /* if the field value is absent, make it a null string */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-        }
+            vlen = NULL_LEN;
+        else if (pqGetInt(&vlen, 4, conn))
+                goto EOFexit;        else        {
-            /* get the value length (the first four bytes are for length) */
-            if (pqGetInt(&vlen, 4, conn))
-                goto EOFexit;            if (!binary)                vlen = vlen - 4;            if (vlen < 0)
      vlen = 0;
 
-            if (tup[i].value == NULL)
-            {
-                tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-                if (tup[i].value == NULL)
-                    goto outOfMemory;
-            }
-            tup[i].len = vlen;
-            /* read in the value */
-            if (vlen > 0)
-                if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                    goto EOFexit;
-            /* we have to terminate this ourselves */
-            tup[i].value[vlen] = '\0';        }
+
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip the value */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            goto EOFexit;
+        /* advance the bitmap stuff */        bitcnt++;        if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +809,56 @@ getAnotherTuple(PGconn *conn, bool binary)            bmap <<= 1;    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
-    if (bitmap != std_bitmap)        free(bitmap);
-    return 0;
+    bitmap = NULL;
+
+    /* tag the row as parsed */
+    conn->inStart = conn->inCursor;
+
+    /* Pass the completed row values to rowProcessor */
+    rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+    if (rp == 1)
+        return 0;
+    else if (rp == 2 && pqIsnonblocking(conn))
+        /* processor requested early exit */
+        return EOF;
+    else if (rp != 0)
+        PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
+
+rowProcessError:
-outOfMemory:    /* Replace partially constructed result with an error result */
-    /*
-     * we do NOT use pqSaveErrorResult() here, because of the likelihood that
-     * there's not enough memory to concatenate messages...
-     */
-    pqClearAsyncResult(conn);
-    printfPQExpBuffer(&conn->errorMessage,
-                      libpq_gettext("out of memory for query result\n"));
+    if (result->rowProcessorErrMsg)
+    {
+        printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+        pqSaveErrorResult(conn);
+    }
+    else
+    {
+        /*
+         * we do NOT use pqSaveErrorResult() here, because of the likelihood that
+         * there's not enough memory to concatenate messages...
+         */
+        pqClearAsyncResult(conn);
+        resetPQExpBuffer(&conn->errorMessage);
-    /*
-     * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
-     * do to recover...
-     */
-    conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+        /*
+         * If error message is passed from RowProcessor, set it into
+         * PGconn, assume out of memory if not.
+         */
+        appendPQExpBufferStr(&conn->errorMessage,
+                             libpq_gettext("out of memory for query result\n"));
+
+        /*
+         * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
+         * do to recover...
+         */
+        conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+    }    conn->asyncStatus = PGASYNC_READY;
+    /* Discard the failed message --- good idea? */    conn->inStart = conn->inEnd;
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..a19ee88 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, msgLength))                            return;
 
+
+                        /* getAnotherTuple() moves inStart itself */
+                        continue;                    }                    else if (conn->result != NULL &&
               conn->result->resultStatus == PGRES_FATAL_ERROR)
 
@@ -613,33 +616,22 @@ failure:/* * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor. * Returns: 0 if completed message, EOF if error
ornot enough data yet. * * Note that if we run out of data, we have to suspend and reprocess
 
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, int msgLength){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    int            tupnfields;        /* # fields from tuple */    int            vlen;
   /* length of the current field value */    int            i;
 
-
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
-    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-    }
-    tup = conn->curTuple;
+    int            rp;    /* Get the field count and make sure it's what we expect */    if (pqGetInt(&tupnfields, 2,
conn))
@@ -652,52 +644,88 @@ getAnotherTuple(PGconn *conn, int msgLength)                 libpq_gettext("unexpected field
countin \"D\" message\n"));        pqSaveErrorResult(conn);        /* Discard the failed message by pretending we read
it*/
 
-        conn->inCursor = conn->inStart + 5 + msgLength;
+        conn->inStart += 5 + msgLength;        return 0;    }
+    /* resize row buffer if needed */
+    rowbuf = conn->rowBuf;
+    if (nfields > conn->rowBufLen)
+    {
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)
+        {
+            goto outOfMemory1;
+        }
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;
+    }
+    /* Scan the fields */    for (i = 0; i < nfields; i++)    {        /* get the value length */        if
(pqGetInt(&vlen,4, conn))
 
-            return EOF;
+            goto protocolError;        if (vlen == -1)
-        {
-            /* null field */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-            continue;
-        }
-        if (vlen < 0)
+            vlen = NULL_LEN;
+        else if (vlen < 0)            vlen = 0;
-        if (tup[i].value == NULL)
-        {
-            bool        isbinary = (result->attDescs[i].format != 0);
-            tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-            if (tup[i].value == NULL)
-                goto outOfMemory;
-        }
-        tup[i].len = vlen;
-        /* read in the value */
-        if (vlen > 0)
-            if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                return EOF;
-        /* we have to terminate this ourselves */
-        tup[i].value[vlen] = '\0';
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip to the next length field */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            goto protocolError;    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
+    /* tag the row as parsed, check if correctly */
+    conn->inStart += 5 + msgLength;
+    if (conn->inCursor != conn->inStart)
+        goto protocolError;
+    /* Pass the completed row values to rowProcessor */
+    rp = conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+    if (rp == 1)
+    {
+        /* everything is good */
+        return 0;
+    }
+    if (rp == 2 && pqIsnonblocking(conn))
+    {
+        /* processor requested early exit */
+        return EOF;
+    }
+
+    /* there was some problem */
+    if (rp == 0)
+    {
+        if (result->rowProcessorErrMsg == NULL)
+            goto outOfMemory2;
+
+        /* use supplied error message */
+        printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+    }
+    else
+    {
+        /* row processor messed up */
+        printfPQExpBuffer(&conn->errorMessage,
+                          libpq_gettext("invalid return value from row processor\n"));
+    }
+    pqSaveErrorResult(conn);    return 0;
-outOfMemory:
+outOfMemory1:
+    /* Discard the failed message by pretending we read it */
+    conn->inStart += 5 + msgLength;
+outOfMemory2:    /*     * Replace partially constructed result with an error result. First     * discard the old
resultto try to win back some memory.
 
@@ -706,9 +734,14 @@ outOfMemory:    printfPQExpBuffer(&conn->errorMessage,                      libpq_gettext("out of
memoryfor query result\n"));    pqSaveErrorResult(conn);
 
+    return 0;
+protocolError:
+    printfPQExpBuffer(&conn->errorMessage,
+                      libpq_gettext("invalid row contents\n"));
+    pqSaveErrorResult(conn);    /* Discard the failed message by pretending we read it */
-    conn->inCursor = conn->inStart + 5 + msgLength;
+    conn->inStart += 5 + msgLength;    return 0;}
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..b7370e2 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify    struct pgNotify *next;        /* list link */} PGnotify;
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+    int            len;            /* length in bytes of the value */
+    char       *value;            /* actual value, without null termination */
+} PGrowValue;
+/* Function types for notice-handling callbacks */typedef void (*PQnoticeReceiver) (void *arg, const PGresult
*res);typedefvoid (*PQnoticeProcessor) (void *arg, const char *message);
 
@@ -416,6 +427,39 @@ extern PGPing PQping(const char *conninfo);extern PGPing PQpingParams(const char *const *
keywords,            const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one
+ * pointing to particular column data in network buffer.
+ * This function is supposed to copy data out from there
+ * and store somewhere.  NULL is signified with len<0.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQsetRowProcessorErrMsg.  It
+ * is assumed by caller as out of memory when the error message is not
+ * set on failure. This function is assumed not to throw any
+ * exception.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, PGrowValue *columns,
+                              void *param);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+                                   void *rowProcessorParam);
+extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+extern int  PQskipResult(PGconn *conn, int skipAll);
+/* Force the write buffer to be written (or at least try) */extern int    PQflush(PGconn *conn);
@@ -454,6 +498,7 @@ extern char *PQcmdTuples(PGresult *res);extern char *PQgetvalue(const PGresult *res, int tup_num,
intfield_num);extern int    PQgetlength(const PGresult *res, int tup_num, int field_num);extern int
PQgetisnull(constPGresult *res, int tup_num, int field_num);
 
+extern void    PQsetRowProcessorErrMsg(PGresult *res, const char *msg);extern int    PQnparams(const PGresult
*res);externOid    PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..1fc5aab 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -209,6 +209,9 @@ struct pg_result    PGresult_data *curBlock;    /* most recently allocated block */    int
 curOffset;        /* start offset of free space in block */    int            spaceLeft;        /* number of free
bytesremaining in block */
 
+
+    /* temp etorage for message from row processor callback */
+    char       *rowProcessorErrMsg;};/* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -398,7 +401,6 @@ struct pg_conn    /* Status for asynchronous result construction */    PGresult   *result;
 /* result being constructed */
 
-    PGresAttValue *curTuple;    /* tuple currently being read */#ifdef USE_SSL    bool        allow_ssl_try;    /*
Allowedto try SSL negotiation */
 
@@ -443,6 +445,14 @@ struct pg_conn    /* Buffer for receiving various parts of messages */    PQExpBufferData
workBuffer;/* expansible string */
 
+
+    /*
+     * Read column data from network buffer.
+     */
+    PQrowProcessor rowProcessor;/* Function pointer */
+    void *rowProcessorParam;    /* Contextual parameter for rowProcessor */
+    PGrowValue *rowBuf;            /* Buffer for passing values to rowProcessor */
+    int rowBufLen;                /* Number of columns allocated in rowBuf */};/* PGcancel stores all data necessary
tocancel a connection. A copy of this
 
@@ -560,6 +570,7 @@ extern int    pqGets(PQExpBuffer buf, PGconn *conn);extern int    pqGets_append(PQExpBuffer buf,
PGconn*conn);extern int    pqPuts(const char *s, PGconn *conn);extern int    pqGetnchar(char *s, size_t len, PGconn
*conn);
+extern int    pqSkipnchar(size_t len, PGconn *conn);extern int    pqPutnchar(const char *s, size_t len, PGconn
*conn);externint    pqGetInt(int *result, size_t bytes, PGconn *conn);extern int    pqPutInt(int value, size_t bytes,
PGconn*conn); 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..0087b43 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,330 @@ int PQisthreadsafe(); </sect1>
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PQresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PQresult</type> is passed to user.  This behavior can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object to set the row processor function.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>func</parameter></term>
+       <listitem>
+         <para>
+           Storage handler function to set. NULL means to use the
+           default processor.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>param</parameter></term>
+       <listitem>
+         <para>
+           A pointer to contextual parameter passed
+           to <parameter>func</parameter>.
+         </para>
+       </listitem>
+     </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, PGrowValue *columns, void *param);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and 0 for failure.  On
+       failure this function should set the error message
+       with <function>PGsetRowProcessorErrMsg</function> if the cause
+       is other than out of memory.  When non-blocking API is in use,
+       it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed via <function>PQfinish</function>.
+       Processing can also be continued without closing the connection,
+       call <function>getResult</function> on syncronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaining
+       rows by calling <function>PQskipResult</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+     <term><parameter>res</parameter></term>
+     <listitem>
+       <para>
+         A pointer to the <type>PGresult</type> object.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>columns</parameter></term>
+     <listitem>
+       <para>
+         Column values of the row to process.  Column values
+         are located in network buffer, the processor must
+         copy them out from there.
+       </para>
+       <para>
+         Column values are not null-terminated, so processor cannot
+         use C string functions on them directly.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>param</parameter></term>
+     <listitem>
+       <para>
+         Extra parameter that was given to <function>PQsetRowProcessor</function>.
+       </para>
+     </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipresult">
+    <term>
+     <function>PQskipResult</function>
+     <indexterm>
+      <primary>PQskipResult</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+        Discard all the remaining row data
+        after <function>PQexec</function>
+        or <function>PQgetResult</function> exits by the exception raised
+        in <type>RowProcessor</type> without closing connection.
+<synopsis>
+void PQskipResult(PGconn *conn, int skipAll)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object.
+         </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><parameter>skipAll</parameter></term>
+       <listitem>
+         <para>
+           Skip remaining rows in current result
+           if <parameter>skipAll</parameter> is false(0). Skip
+           remaining rows in current result and all rows in
+           succeeding results if true(non-zero).
+         </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessorerrmsg">
+    <term>
+     <function>PQsetRowProcessorErrMsg</function>
+     <indexterm>
+      <primary>PQsetRowProcessorErrMsg</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+    Set the message for the error occurred
+    in <type>PQrowProcessor</type>.  If this message is not set, the
+    caller assumes the error to be out of memory.
+<synopsis>
+void PQsetRowProcessorErrMsg(PGresult *res, const char *msg)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+      <varlistentry>
+        <term><parameter>res</parameter></term>
+        <listitem>
+          <para>
+        A pointer to the <type>PGresult</type> object
+        passed to <type>PQrowProcessor</type>.
+          </para>
+        </listitem>
+      </varlistentry>
+      <varlistentry>
+        <term><parameter>msg</parameter></term>
+        <listitem>
+          <para>
+        Error message. This will be copied internally so there is
+        no need to care of the scope.
+          </para>
+          <para>
+        If <parameter>res</parameter> already has a message previously
+        set, it will be overwritten. Set NULL to cancel the the custom
+        message.
+          </para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprcessor">
+    <term>
+     <function>PQgetRowProcessor</function>
+     <indexterm>
+      <primary>PQgetRowProcessor</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+       Get row processor and its context parameter currently set to
+       the connection.
+<synopsis>
+PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object.
+         </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><parameter>param</parameter></term>
+       <listitem>
+         <para>
+              Set the current row processor parameter of the
+              connection here if not NULL.
+         </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ </sect1>
+
+ <sect1 id="libpq-build">  <title>Building <application>libpq</application> Programs</title>
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..d9f1b3a 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn    bool        newXactForCursor;        /* Opened a transaction for a
cursor*/} remoteConn;
 
+typedef struct storeInfo
+{
+    Tuplestorestate *tuplestore;
+    int nattrs;
+    MemoryContext oldcontext;
+    AttInMetadata *attinmeta;
+    char** valbuf;
+    int *valbuflen;
+    char **cstrs;
+    bool error_occurred;
+    bool nummismatch;
+} storeInfo;
+/* * Internal declarations */static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);static remoteConn *getConnectionByName(const
char*name);static HTAB *createConnHash(void);static void createNewConnection(const char *name, remoteConn *rconn);
 
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);static void validate_pkattnums(Relation rel,
          int2vector *pkattnums_arg, int32 pknumatts_arg,                   int **pkattnums, int *pknumatts);
 
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+/* Global */static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)    char       *curname = NULL;    int            howmany = 0;
bool       fail = true;    /* default to backward compatible */
 
+    storeInfo   storeinfo;    DBLINK_INIT;
@@ -559,15 +576,51 @@ dblink_fetch(PG_FUNCTION_ARGS)    appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
/*
 
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+    /*     * Try to execute the query.  Note that since libpq uses malloc, the     * PGresult will be long-lived even
thoughwe are still in a short-lived     * memory context.     */
 
-    res = PQexec(conn, buf.data);
+    PG_TRY();
+    {
+        res = PQexec(conn, buf.data);
+    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
+
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
+
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, FALSE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+    if (!res ||        (PQresultStatus(res) != PGRES_COMMAND_OK &&         PQresultStatus(res) != PGRES_TUPLES_OK))
{
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }
+        dblink_res_error(conname, res, "could not fetch from cursor", fail);        return (Datum) 0;    }
@@ -579,8 +632,8 @@ dblink_fetch(PG_FUNCTION_ARGS)                (errcode(ERRCODE_INVALID_CURSOR_NAME),
errmsg("cursor \"%s\" does not exist", curname)));    }
 
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
@@ -640,6 +693,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    remoteConn *rconn = NULL;    bool
      fail = true;    /* default to backward compatible */    bool        freeconn = false;
 
+    storeInfo   storeinfo;    /* check to see if caller supports us returning a tuplestore */    if (rsinfo == NULL ||
!IsA(rsinfo,ReturnSetInfo))
 
@@ -715,164 +769,252 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    rsinfo->setResult = NULL;
rsinfo->setDesc= NULL;
 
+
+    /*
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec/PQgetResult below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+    /* synchronous query, or async result retrieval */
-    if (!is_async)
-        res = PQexec(conn, sql);
-    else
+    PG_TRY();    {
-        res = PQgetResult(conn);
-        /* NULL means we're all done with the async results */
-        if (!res)
-            return (Datum) 0;
+        if (!is_async)
+            res = PQexec(conn, sql);
+        else
+            res = PQgetResult(conn);    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
-    /* if needed, close the connection to the database and cleanup */
-    if (freeconn)
-        PQfinish(conn);
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
-    if (!res ||
-        (PQresultStatus(res) != PGRES_COMMAND_OK &&
-         PQresultStatus(res) != PGRES_TUPLES_OK))
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, FALSE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+
+    /* NULL res from async get means we're all done with the results */
+    if (res || !is_async)    {
-        dblink_res_error(conname, res, "could not execute query", fail);
-        return (Datum) 0;
+        if (freeconn)
+            PQfinish(conn);
+
+        if (!res ||
+            (PQresultStatus(res) != PGRES_COMMAND_OK &&
+             PQresultStatus(res) != PGRES_TUPLES_OK))
+        {
+            /* finishStoreInfo saves the fields referred to below. */
+            if (storeinfo.nummismatch)
+            {
+                /* This is only for backward compatibility */
+                ereport(ERROR,
+                        (errcode(ERRCODE_DATATYPE_MISMATCH),
+                         errmsg("remote query result rowtype does not match "
+                                "the specified FROM clause rowtype")));
+            }
+
+            dblink_res_error(conname, res, "could not execute query", fail);
+            return (Datum) 0;
+        }    }
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo){    ReturnSetInfo *rsinfo = (ReturnSetInfo *)
fcinfo->resultinfo;
+    TupleDesc    tupdesc;
+    int i;
+
+    switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+    {
+        case TYPEFUNC_COMPOSITE:
+            /* success */
+            break;
+        case TYPEFUNC_RECORD:
+            /* failed to determine actual type of RECORD */
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+            break;
+        default:
+            /* result type isn't composite */
+            elog(ERROR, "return type must be a row type");
+            break;
+    }
-    Assert(rsinfo->returnMode == SFRM_Materialize);
+    sinfo->oldcontext = MemoryContextSwitchTo(
+        rsinfo->econtext->ecxt_per_query_memory);
-    PG_TRY();
+    /* make sure we have a persistent copy of the tupdesc */
+    tupdesc = CreateTupleDescCopy(tupdesc);
+
+    sinfo->error_occurred = FALSE;
+    sinfo->nummismatch = FALSE;
+    sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+    sinfo->nattrs = tupdesc->natts;
+    sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+    sinfo->valbuf = NULL;
+    sinfo->valbuflen = NULL;
+
+    /* Preallocate memory of same size with c string array for values. */
+    sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+    if (sinfo->valbuf)
+        sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+    if (sinfo->valbuflen)
+        sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
+
+    if (sinfo->cstrs == NULL)    {
-        TupleDesc    tupdesc;
-        bool        is_sql_cmd = false;
-        int            ntuples;
-        int            nfields;
+        if (sinfo->valbuf)
+            free(sinfo->valbuf);
+        if (sinfo->valbuflen)
+            free(sinfo->valbuflen);
-        if (PQresultStatus(res) == PGRES_COMMAND_OK)
-        {
-            is_sql_cmd = true;
+        ereport(ERROR,
+                (errcode(ERRCODE_OUT_OF_MEMORY),
+                 errmsg("out of memory")));
+    }
-            /*
-             * need a tuple descriptor representing one TEXT column to return
-             * the command status string as our result tuple
-             */
-            tupdesc = CreateTemplateTupleDesc(1, false);
-            TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-                               TEXTOID, -1, 0);
-            ntuples = 1;
-            nfields = 1;
-        }
-        else
-        {
-            Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+    for (i = 0 ; i < sinfo->nattrs ; i++)
+    {
+        sinfo->valbuf[i] = NULL;
+        sinfo->valbuflen[i] = -1;
+    }
-            is_sql_cmd = false;
+    rsinfo->setResult = sinfo->tuplestore;
+    rsinfo->setDesc = tupdesc;
+}
-            /* get a tuple descriptor for our result type */
-            switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-            {
-                case TYPEFUNC_COMPOSITE:
-                    /* success */
-                    break;
-                case TYPEFUNC_RECORD:
-                    /* failed to determine actual type of RECORD */
-                    ereport(ERROR,
-                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                        errmsg("function returning record called in context "
-                               "that cannot accept type record")));
-                    break;
-                default:
-                    /* result type isn't composite */
-                    elog(ERROR, "return type must be a row type");
-                    break;
-            }
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+    int i;
-            /* make sure we have a persistent copy of the tupdesc */
-            tupdesc = CreateTupleDescCopy(tupdesc);
-            ntuples = PQntuples(res);
-            nfields = PQnfields(res);
+    if (sinfo->valbuf)
+    {
+        for (i = 0 ; i < sinfo->nattrs ; i++)
+        {
+            if (sinfo->valbuf[i])
+                free(sinfo->valbuf[i]);        }
+        free(sinfo->valbuf);
+        sinfo->valbuf = NULL;
+    }
-        /*
-         * check result and tuple descriptor have the same number of columns
-         */
-        if (nfields != tupdesc->natts)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("remote query result rowtype does not match "
-                            "the specified FROM clause rowtype")));
+    if (sinfo->valbuflen)
+    {
+        free(sinfo->valbuflen);
+        sinfo->valbuflen = NULL;
+    }
-        if (ntuples > 0)
-        {
-            AttInMetadata *attinmeta;
-            Tuplestorestate *tupstore;
-            MemoryContext oldcontext;
-            int            row;
-            char      **values;
-
-            attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-            oldcontext = MemoryContextSwitchTo(
-                                    rsinfo->econtext->ecxt_per_query_memory);
-            tupstore = tuplestore_begin_heap(true, false, work_mem);
-            rsinfo->setResult = tupstore;
-            rsinfo->setDesc = tupdesc;
-            MemoryContextSwitchTo(oldcontext);
+    if (sinfo->cstrs)
+    {
+        free(sinfo->cstrs);
+        sinfo->cstrs = NULL;
+    }
-            values = (char **) palloc(nfields * sizeof(char *));
+    MemoryContextSwitchTo(sinfo->oldcontext);
+}
-            /* put all tuples into the tuplestore */
-            for (row = 0; row < ntuples; row++)
-            {
-                HeapTuple    tuple;
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+    storeInfo *sinfo = (storeInfo *)param;
+    HeapTuple  tuple;
+    int        fields = PQnfields(res);
+    int        i;
+    char      **cstrs = sinfo->cstrs;
-                if (!is_sql_cmd)
-                {
-                    int            i;
+    if (sinfo->error_occurred)
+        return FALSE;
-                    for (i = 0; i < nfields; i++)
-                    {
-                        if (PQgetisnull(res, row, i))
-                            values[i] = NULL;
-                        else
-                            values[i] = PQgetvalue(res, row, i);
-                    }
-                }
-                else
-                {
-                    values[0] = PQcmdStatus(res);
-                }
+    if (sinfo->nattrs != fields)
+    {
+        sinfo->error_occurred = TRUE;
+        sinfo->nummismatch = TRUE;
+        finishStoreInfo(sinfo);
+
+        /* This error will be processed in
+         * dblink_record_internal(). So do not set error message
+         * here. */
+        return FALSE;
+    }
-                /* build the tuple and put it into the tuplestore. */
-                tuple = BuildTupleFromCStrings(attinmeta, values);
-                tuplestore_puttuple(tupstore, tuple);
+    /*
+     * value input functions assumes that the input string is
+     * terminated by zero. We should make the values to be so.
+     */
+    for(i = 0 ; i < fields ; i++)
+    {
+        int len = columns[i].len;
+        if (len < 0)
+            cstrs[i] = NULL;
+        else
+        {
+            char *tmp = sinfo->valbuf[i];
+            int tmplen = sinfo->valbuflen[i];
+
+            /*
+             * Divide calls to malloc and realloc so that things will
+             * go fine even on the systems of which realloc() does not
+             * accept NULL as old memory block.
+             *
+             * Also try to (re)allocate in bigger steps to
+             * avoid flood of allocations on weird data.
+             */
+            if (tmp == NULL)
+            {
+                tmplen = len + 1;
+                if (tmplen < 64)
+                    tmplen = 64;
+                tmp = (char *)malloc(tmplen);
+            }
+            else if (tmplen < len + 1)
+            {
+                if (len + 1 > tmplen * 2)
+                    tmplen = len + 1;
+                else
+                    tmplen = tmplen * 2;
+                tmp = (char *)realloc(tmp, tmplen);            }
-            /* clean up and return the tuplestore */
-            tuplestore_donestoring(tupstore);
-        }
+            /*
+             * sinfo->valbuf[n] will be freed in finishStoreInfo()
+             * when realloc returns NULL.
+             */
+            if (tmp == NULL)
+                return FALSE;
-        PQclear(res);
-    }
-    PG_CATCH();
-    {
-        /* be sure to release the libpq result */
-        PQclear(res);
-        PG_RE_THROW();
+            sinfo->valbuf[i] = tmp;
+            sinfo->valbuflen[i] = tmplen;
+
+            cstrs[i] = sinfo->valbuf[i];
+            memcpy(cstrs[i], columns[i].value, len);
+            cstrs[i][len] = '\0';
+        }    }
-    PG_END_TRY();
+
+    /*
+     * These functions may throw exception. It will be caught in
+     * dblink_record_internal()
+     */
+    tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+    tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+    return TRUE;}/*
diff --git b/doc/src/sgml/libpq.sgml a/doc/src/sgml/libpq.sgml
index 9c4c810..0087b43 100644
--- b/doc/src/sgml/libpq.sgml
+++ a/doc/src/sgml/libpq.sgml
@@ -7351,7 +7351,17 @@ typedef struct       This function must return 1 for success, and 0 for failure.  On
failurethis function should set the error message       with <function>PGsetRowProcessorErrMsg</function> if the cause
 
-       is other than out of memory.
+       is other than out of memory.  When non-blocking API is in use,
+       it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.     </para>     <para>
diff --git b/src/interfaces/libpq/fe-protocol2.c a/src/interfaces/libpq/fe-protocol2.c
index c3220ed..6578019 100644
--- b/src/interfaces/libpq/fe-protocol2.c
+++ a/src/interfaces/libpq/fe-protocol2.c
@@ -820,6 +820,9 @@ getAnotherTuple(PGconn *conn, bool binary)    rp= conn->rowProcessor(result, rowbuf,
conn->rowProcessorParam);   if (rp == 1)        return 0;
 
+    else if (rp == 2 && pqIsnonblocking(conn))
+        /* processor requested early exit */
+        return EOF;    else if (rp != 0)        PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value
fromrow processor\n"));
 
diff --git b/src/interfaces/libpq/fe-protocol3.c a/src/interfaces/libpq/fe-protocol3.c
index 44d1ec4..a19ee88 100644
--- b/src/interfaces/libpq/fe-protocol3.c
+++ a/src/interfaces/libpq/fe-protocol3.c
@@ -697,6 +697,11 @@ getAnotherTuple(PGconn *conn, int msgLength)        /* everything is good */        return 0;
}
+    if (rp == 2 && pqIsnonblocking(conn))
+    {
+        /* processor requested early exit */
+        return EOF;
+    }    /* there was some problem */    if (rp == 0)

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

23 February 2012, 08:34:38

On Thu, Feb 23, 2012 at 07:14:03PM +0900, Kyotaro HORIGUCHI wrote:
> Hello, this is new version of the patch.

Looks good.

> By the way, I would like to ask you one question. What is the
> reason why void* should be head or tail of the parameter list?

Aesthetical reasons:

1) PGresult and PGrowValue belong together.

2) void* is probably the context object for handler.  When doing  object-oriented programming in C the main object is
usuallyfirst.  Like libpq does - PGconn is always first argument.

But as libpq does not know the actual meaning of void* for handler,
is can be last param as well.

When I wrote the demo code, I noticed that it is unnatural to have
void* in the middle.

Last comment - if we drop the plan to make PQsetRowProcessorErrMsg()
usable outside of handler, we can simplify internal usage as well:
the PGresult->rowProcessorErrMsg can be dropped and let's use
->errMsg to transport the error message.

The PGresult is long-lived structure and adding fields for such
temporary usage feels wrong.  There is no other libpq code between
row processor and getAnotherTuple, so the use is safe.

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

24 February 2012, 06:54:06

Hello, this is new version of the patch.

> > By the way, I would like to ask you one question. What is the
> > reason why void* should be head or tail of the parameter list?
> 
> Aesthetical reasons:
I got it. Thank you.

> Last comment - if we drop the plan to make PQsetRowProcessorErrMsg()
> usable outside of handler, we can simplify internal usage as well:
> the PGresult->rowProcessorErrMsg can be dropped and let's use
> ->errMsg to transport the error message.
> 
> The PGresult is long-lived structure and adding fields for such
> temporary usage feels wrong.  There is no other libpq code between
> row processor and getAnotherTuple, so the use is safe.

I almost agree with it. Plus, I think it is no reason to consider
out of memory as particular because now row processor becomes
generic. But the previous patch does different process for OOM
and others, but I couldn't see obvious reason to do so.

- PGresult.rowProcessorErrMes is removed and use PGconn.errMsg instead with the new interface function
PQresultSetErrMes().

- Now row processors should set message for any error status occurred within. pqAddRow and dblink.c is modified to do
so.

- getAnotherTuple() sets the error message `unknown error' for the condition rp == 0 && ->errMsg == NULL.

- Always forward input cursor and do pqClearAsyncResult() and pqSaveErrorResult() when rp == 0 in getAnotherTuple()
regardlesswhether ->errMsg is NULL or not in fe-protocol3.c.
 

- conn->inStart is already moved to the start point of the next message when row processor is called. So forwarding
inStartin outOfMemory1 seems redundant. I removed it.
 

- printfPQExpBuffer() compains for variable message. So use resetPQExpBuffer() and appendPQExpBufferStr() instead. 
=====
- dblink does not use conn->errorMessage before, and also now... all errors are displayed as `Error occurred on dblink
connection...'.

- TODO: No NLS messages for error messages.

- Somehow make check yields error for base revision. So I have not done that.

- I have no idea how to do test for protocol 2...

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..239edb8 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,7 @@ PQconnectStartParams      157PQping                    158PQpingParams              159PQlibVersion
            160
 
+PQsetRowProcessor      161
+PQgetRowProcessor      162
+PQresultSetErrMsg      163
+PQskipResult          164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)    conn->wait_ssl_try = false;#endif
+    /* set default row processor */
+    PQsetRowProcessor(conn, NULL, NULL);
+    /*     * We try to send at least 8K at a time, which is the usual size of pipe     * buffers on Unix systems.
Thatway, when we are sending a large amount
 
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)    initPQExpBuffer(&conn->errorMessage);
initPQExpBuffer(&conn->workBuffer);
+    /* set up initial row buffer */
+    conn->rowBufLen = 32;
+    conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+    if (conn->inBuffer == NULL ||        conn->outBuffer == NULL ||
+        conn->rowBuf == NULL ||        PQExpBufferBroken(&conn->errorMessage) ||
PQExpBufferBroken(&conn->workBuffer))   {
 
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)        free(conn->inBuffer);    if (conn->outBuffer)
free(conn->outBuffer);
+    if (conn->rowBuf)
+        free(conn->rowBuf);    termPQExpBuffer(&conn->errorMessage);    termPQExpBuffer(&conn->workBuffer);
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)    return prev;}
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..7fd3c9c 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);static int PQsendDescribe(PGconn *conn, char desc_type,
           const char *desc_target);static int    check_field_number(const PGresult *res, int field_num);
 
+static int    pqAddRow(PGresult *res, PGrowValue *columns, void *param);/* ----------------
@@ -701,7 +702,6 @@ pqClearAsyncResult(PGconn *conn)    if (conn->result)        PQclear(conn->result);    conn->result
=NULL;
 
-    conn->curTuple = NULL;}/*
@@ -756,7 +756,6 @@ pqPrepareAsyncResult(PGconn *conn)     */    res = conn->result;    conn->result = NULL;        /*
handingover ownership to caller */
 
-    conn->curTuple = NULL;        /* just in case */    if (!res)        res = PQmakeEmptyPGresult(conn,
PGRES_FATAL_ERROR);   else
 
@@ -828,6 +827,93 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)}/*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+    conn->rowProcessor = (func ? func : pqAddRow);
+    conn->rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+    if (param)
+        *param = conn->rowProcessorParam;
+
+    return conn->rowProcessor;
+}
+
+/*
+ * PQresultSetErrMsg
+ *    Set the error message to PGresult.
+ *
+ *    You can replace the previous message by alternative mes, or clear
+ *    it with NULL.
+ */
+void
+PQresultSetErrMsg(PGresult *res, const char *msg)
+{
+    if (msg)
+        res->errMsg = pqResultStrdup(res, msg);
+    else
+        res->errMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *      add a row to the PGresult structure, growing it if necessary
+ *      Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+{
+    PGresAttValue *tup;
+    int            nfields = res->numAttributes;
+    int            i;
+
+    tup = (PGresAttValue *)
+        pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+    if (tup == NULL)
+    {
+        PQresultSetErrMsg(res, "out of memory for query result\n");
+        return FALSE;
+    }
+
+    for (i = 0 ; i < nfields ; i++)
+    {
+        tup[i].len = columns[i].len;
+        if (tup[i].len == NULL_LEN)
+        {
+            tup[i].value = res->null_field;
+        }
+        else
+        {
+            bool isbinary = (res->attDescs[i].format != 0);
+            tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+            if (tup[i].value == NULL)
+            {
+                PQresultSetErrMsg(res, "out of memory for query result\n");
+                return FALSE;
+            }
+
+            memcpy(tup[i].value, columns[i].value, tup[i].len);
+            /* We have to terminate this ourselves */
+            tup[i].value[tup[i].len] = '\0';
+        }
+    }
+
+    return pqAddTuple(res, tup);
+}
+
+/* * pqAddTuple *      add a row pointer to the PGresult structure, growing it if necessary *      Returns TRUE if OK,
FALSEif not enough memory to add the row
 
@@ -1223,7 +1309,6 @@ PQsendQueryStart(PGconn *conn)    /* initialize async result-accumulation state */
conn->result= NULL;
 
-    conn->curTuple = NULL;    /* ready to send command message */    return true;
@@ -1831,6 +1916,55 @@ PQexecFinish(PGconn *conn)    return lastResult;}
+
+/*
+ * Do-nothing row processor for PQskipResult
+ */
+static int
+dummyRowProcessor(PGresult *res, PGrowValue *columns, void *param)
+{
+    return 1;
+}
+
+/*
+ * Exaust remaining Data Rows in curret conn.
+ * 
+ * Exaust current result if skipAll is false and all succeeding results if
+ * true.
+ */
+int
+PQskipResult(PGconn *conn, int skipAll)
+{
+    PQrowProcessor savedRowProcessor;
+    void * savedRowProcParam;
+    PGresult *res;
+    int ret = 0;
+
+    /* save the current row processor settings and set dummy processor */
+    savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+    PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+    
+    /*
+     * Throw away the remaining rows in current result, or all succeeding
+     * results if skipAll is not FALSE.
+     */
+    if (skipAll)
+    {
+        while ((res = PQgetResult(conn)) != NULL)
+            PQclear(res);
+    }
+    else if ((res = PQgetResult(conn)) != NULL)
+    {
+        PQclear(res);
+        ret = 1;
+    }
+    
+    PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+
+    return ret;
+}
+
+/* * PQdescribePrepared *      Obtain information about a previously prepared statement
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)}/*
+ * pqGetnchar:
+ *    skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+    if (len > (size_t) (conn->inEnd - conn->inCursor))
+        return EOF;
+
+    conn->inCursor += len;
+
+    if (conn->Pfdebug)
+        fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+                (unsigned long) len);
+
+    return 0;
+}
+
+/* * pqPutnchar: *    write exactly len bytes to the current message */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..3b0520d 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, FALSE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, TRUE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -703,52 +707,55 @@ failure:/* * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor. * Returns: 0 if completed message, EOF if error
ornot enough data yet. * * Note that if we run out of data, we have to suspend and reprocess
 
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, bool binary){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    /* the backend sends us a bitmap of which attributes are null */    char
std_bitmap[64];/* used unless it doesn't fit */    char       *bitmap = std_bitmap;    int            i;
 
+    int            rp;    size_t        nbytes;            /* the number of bytes in bitmap  */    char        bmap;
        /* One byte of the bitmap */    int            bitmap_index;    /* Its index */    int            bitcnt;
    /* number of bits examined in current byte */    int            vlen;            /* length of the current field
value*/
 
+    char        *errmsg = libpq_gettext("unknown error\n");
-    result->binary = binary;
-
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
+    /* resize row buffer if needed */
+    if (nfields > conn->rowBufLen)    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-        /*
-         * If it's binary, fix the column format indicators.  We assume the
-         * backend will consistently send either B or D, not a mix.
-         */
-        if (binary)
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)        {
-            for (i = 0; i < nfields; i++)
-                result->attDescs[i].format = 1;
+            errmsg = libpq_gettext("out of memory for query result\n");
+            goto error;        }
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;
+    }
+    else
+    {
+        rowbuf = conn->rowBuf;
+    }
+
+    result->binary = binary;
+
+    if (binary)
+    {
+        for (i = 0; i < nfields; i++)
+            result->attDescs[i].format = 1;    }
-    tup = conn->curTuple;    /* Get the null-value bitmap */    nbytes = (nfields + BITS_PER_BYTE - 1) /
BITS_PER_BYTE;
@@ -757,7 +764,10 @@ getAnotherTuple(PGconn *conn, bool binary)    {        bitmap = (char *) malloc(nbytes);        if
(!bitmap)
-            goto outOfMemory;
+        {
+            errmsg = libpq_gettext("out of memory for query result\n");
+            goto error;
+        }    }    if (pqGetnchar(bitmap, nbytes, conn))
@@ -771,34 +781,29 @@ getAnotherTuple(PGconn *conn, bool binary)    for (i = 0; i < nfields; i++)    {        if
(!(bmap& 0200))
 
-        {
-            /* if the field value is absent, make it a null string */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-        }
+            vlen = NULL_LEN;
+        else if (pqGetInt(&vlen, 4, conn))
+                goto EOFexit;        else        {
-            /* get the value length (the first four bytes are for length) */
-            if (pqGetInt(&vlen, 4, conn))
-                goto EOFexit;            if (!binary)                vlen = vlen - 4;            if (vlen < 0)
      vlen = 0;
 
-            if (tup[i].value == NULL)
-            {
-                tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-                if (tup[i].value == NULL)
-                    goto outOfMemory;
-            }
-            tup[i].len = vlen;
-            /* read in the value */
-            if (vlen > 0)
-                if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                    goto EOFexit;
-            /* we have to terminate this ourselves */
-            tup[i].value[vlen] = '\0';        }
+
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip the value */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            goto EOFexit;
+        /* advance the bitmap stuff */        bitcnt++;        if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +816,53 @@ getAnotherTuple(PGconn *conn, bool binary)            bmap <<= 1;    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
-    if (bitmap != std_bitmap)        free(bitmap);
-    return 0;
+    bitmap = NULL;
-outOfMemory:
-    /* Replace partially constructed result with an error result */
+    /* tag the row as parsed */
+    conn->inStart = conn->inCursor;
+    /* Pass the completed row values to rowProcessor */
+    rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+    if (rp == 1)
+        return 0;
+    else if (rp == 2 && pqIsnonblocking(conn))
+        /* processor requested early exit */
+        return EOF;
+    else if (rp == 0)
+    {
+        errmsg = result->errMsg;
+        result->errMsg = NULL;
+        if (errmsg == NULL)
+            errmsg = libpq_gettext("unknown error in row processor\n");
+        goto error;
+    }
+
+    errmsg = libpq_gettext("invalid return value from row processor\n");
+
+error:    /*     * we do NOT use pqSaveErrorResult() here, because of the likelihood that     * there's not enough
memoryto concatenate messages...     */    pqClearAsyncResult(conn);
 
-    printfPQExpBuffer(&conn->errorMessage,
-                      libpq_gettext("out of memory for query result\n"));
-
+    resetPQExpBuffer(&conn->errorMessage);
+    
+    /*
+     * If error message is passed from RowProcessor, set it into
+     * PGconn, assume out of memory if not.
+     */
+    appendPQExpBufferStr(&conn->errorMessage, errmsg);
+        /*     * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can     * do to recover...     */
conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 
+    conn->asyncStatus = PGASYNC_READY;
+    /* Discard the failed message --- good idea? */    conn->inStart = conn->inEnd;
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..c8202c2 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, msgLength))                            return;
 
+
+                        /* getAnotherTuple() moves inStart itself */
+                        continue;                    }                    else if (conn->result != NULL &&
               conn->result->resultStatus == PGRES_FATAL_ERROR)
 
@@ -613,33 +616,23 @@ failure:/* * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor. * Returns: 0 if completed message, EOF if error
ornot enough data yet. * * Note that if we run out of data, we have to suspend and reprocess
 
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, int msgLength){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    int            tupnfields;        /* # fields from tuple */    int            vlen;
   /* length of the current field value */    int            i;
 
-
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
-    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-    }
-    tup = conn->curTuple;
+    int            rp;
+    char        *errmsg = libpq_gettext("unknown error\n");    /* Get the field count and make sure it's what we
expect*/    if (pqGetInt(&tupnfields, 2, conn))
 
@@ -647,13 +640,22 @@ getAnotherTuple(PGconn *conn, int msgLength)    if (tupnfields != nfields)    {
-        /* Replace partially constructed result with an error result */
-        printfPQExpBuffer(&conn->errorMessage,
-                 libpq_gettext("unexpected field count in \"D\" message\n"));
-        pqSaveErrorResult(conn);
-        /* Discard the failed message by pretending we read it */
-        conn->inCursor = conn->inStart + 5 + msgLength;
-        return 0;
+        errmsg = libpq_gettext("unexpected field count in \"D\" message\n");
+        goto error_and_forward;
+    }
+
+    /* resize row buffer if needed */
+    rowbuf = conn->rowBuf;
+    if (nfields > conn->rowBufLen)
+    {
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)
+        {
+            errmsg = libpq_gettext("out of memory for query result\n");
+            goto error_and_forward;
+        }
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;    }    /* Scan the fields */
@@ -661,54 +663,78 @@ getAnotherTuple(PGconn *conn, int msgLength)    {        /* get the value length */        if
(pqGetInt(&vlen,4, conn))
 
-            return EOF;
-        if (vlen == -1)        {
-            /* null field */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-            continue;
+            /*
+             * Forwarding inCursor does not make no sense when protocol error
+             */
+            errmsg = libpq_gettext("invalid row contents\n");
+            goto error;        }
-        if (vlen < 0)
+        if (vlen == -1)
+            vlen = NULL_LEN;
+        else if (vlen < 0)            vlen = 0;
-        if (tup[i].value == NULL)
-        {
-            bool        isbinary = (result->attDescs[i].format != 0);
-            tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-            if (tup[i].value == NULL)
-                goto outOfMemory;
-        }
-        tup[i].len = vlen;
-        /* read in the value */
-        if (vlen > 0)
-            if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                return EOF;
-        /* we have to terminate this ourselves */
-        tup[i].value[vlen] = '\0';
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip to the next length field */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            return EOF;    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
+    /* tag the row as parsed, check if correctly */
+    conn->inStart += 5 + msgLength;
+    if (conn->inCursor != conn->inStart)
+    {
+        errmsg = libpq_gettext("invalid row contents\n");
+        goto error;
+    }
-    return 0;
+    /* Pass the completed row values to rowProcessor */
+    rp = conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+    if (rp == 1)
+    {
+        /* everything is good */
+        return 0;
+    }
+    if (rp == 2 && pqIsnonblocking(conn))
+    {
+        /* processor requested early exit */
+        return EOF;
+    }
-outOfMemory:
+    /* there was some problem */
+    if (rp == 0)
+    {
+        errmsg = result->errMsg;
+        result->errMsg = NULL;
+        if (errmsg == NULL)
+            errmsg = libpq_gettext("unknown error in row processor\n");
+        goto error;
+    }
+
+    errmsg = libpq_gettext("invalid return value from row processor\n");
+    goto error;
+
+error_and_forward:
+    /* Discard the failed message by pretending we read it */
+    conn->inCursor = conn->inStart + 5 + msgLength;
+error:    /*     * Replace partially constructed result with an error result. First     * discard the old result to
tryto win back some memory.     */    pqClearAsyncResult(conn);
 
-    printfPQExpBuffer(&conn->errorMessage,
-                      libpq_gettext("out of memory for query result\n"));
+    resetPQExpBuffer(&conn->errorMessage);
+    appendPQExpBufferStr(&conn->errorMessage, errmsg);    pqSaveErrorResult(conn);
-
-    /* Discard the failed message by pretending we read it */
-    conn->inCursor = conn->inStart + 5 + msgLength;    return 0;}
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..810b04e 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify    struct pgNotify *next;        /* list link */} PGnotify;
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+    int            len;            /* length in bytes of the value */
+    char       *value;            /* actual value, without null termination */
+} PGrowValue;
+/* Function types for notice-handling callbacks */typedef void (*PQnoticeReceiver) (void *arg, const PGresult
*res);typedefvoid (*PQnoticeProcessor) (void *arg, const char *message);
 
@@ -416,6 +427,38 @@ extern PGPing PQping(const char *conninfo);extern PGPing PQpingParams(const char *const *
keywords,            const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one
+ * pointing to particular column data in network buffer.
+ * This function is supposed to copy data out from there
+ * and store somewhere.  NULL is signified with len<0.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQresultSetErrMsg.  It is assumed by
+ * caller as out of memory when the error message is not set on
+ * failure. This function is assumed not to throw any exception.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, PGrowValue *columns,
+                              void *param);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+                                   void *rowProcessorParam);
+extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+extern int  PQskipResult(PGconn *conn, int skipAll);
+/* Force the write buffer to be written (or at least try) */extern int    PQflush(PGconn *conn);
@@ -454,6 +497,7 @@ extern char *PQcmdTuples(PGresult *res);extern char *PQgetvalue(const PGresult *res, int tup_num,
intfield_num);extern int    PQgetlength(const PGresult *res, int tup_num, int field_num);extern int
PQgetisnull(constPGresult *res, int tup_num, int field_num);
 
+extern void    PQresultSetErrMsg(PGresult *res, const char *msg);extern int    PQnparams(const PGresult *res);extern
Oid   PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..9cabd20 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -398,7 +398,6 @@ struct pg_conn    /* Status for asynchronous result construction */    PGresult   *result;
 /* result being constructed */
 
-    PGresAttValue *curTuple;    /* tuple currently being read */#ifdef USE_SSL    bool        allow_ssl_try;    /*
Allowedto try SSL negotiation */
 
@@ -443,6 +442,14 @@ struct pg_conn    /* Buffer for receiving various parts of messages */    PQExpBufferData
workBuffer;/* expansible string */
 
+
+    /*
+     * Read column data from network buffer.
+     */
+    PQrowProcessor rowProcessor;/* Function pointer */
+    void *rowProcessorParam;    /* Contextual parameter for rowProcessor */
+    PGrowValue *rowBuf;            /* Buffer for passing values to rowProcessor */
+    int rowBufLen;                /* Number of columns allocated in rowBuf */};/* PGcancel stores all data necessary
tocancel a connection. A copy of this
 
@@ -560,6 +567,7 @@ extern int    pqGets(PQExpBuffer buf, PGconn *conn);extern int    pqGets_append(PQExpBuffer buf,
PGconn*conn);extern int    pqPuts(const char *s, PGconn *conn);extern int    pqGetnchar(char *s, size_t len, PGconn
*conn);
+extern int    pqSkipnchar(size_t len, PGconn *conn);extern int    pqPutnchar(const char *s, size_t len, PGconn
*conn);externint    pqGetInt(int *result, size_t bytes, PGconn *conn);extern int    pqPutInt(int value, size_t bytes,
PGconn*conn); 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..1f91f98 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,329 @@ int PQisthreadsafe(); </sect1>
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PQresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PQresult</type> is passed to user.  This behavior can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object to set the row processor function.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>func</parameter></term>
+       <listitem>
+         <para>
+           Storage handler function to set. NULL means to use the
+           default processor.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>param</parameter></term>
+       <listitem>
+         <para>
+           A pointer to contextual parameter passed
+           to <parameter>func</parameter>.
+         </para>
+       </listitem>
+     </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, PGrowValue *columns, void *param);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and 0 for failure.  On
+       failure this function should set the error message
+       with <function>PQresultSetErrMsg</function>.  When non-blocking
+       API is in use, it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed via <function>PQfinish</function>.
+       Processing can also be continued without closing the connection,
+       call <function>getResult</function> on syncronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaining
+       rows by calling <function>PQskipResult</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+     <term><parameter>res</parameter></term>
+     <listitem>
+       <para>
+         A pointer to the <type>PGresult</type> object.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>columns</parameter></term>
+     <listitem>
+       <para>
+         Column values of the row to process.  Column values
+         are located in network buffer, the processor must
+         copy them out from there.
+       </para>
+       <para>
+         Column values are not null-terminated, so processor cannot
+         use C string functions on them directly.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>param</parameter></term>
+     <listitem>
+       <para>
+         Extra parameter that was given to <function>PQsetRowProcessor</function>.
+       </para>
+     </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipresult">
+    <term>
+     <function>PQskipResult</function>
+     <indexterm>
+      <primary>PQskipResult</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+        Discard all the remaining row data
+        after <function>PQexec</function>
+        or <function>PQgetResult</function> exits by the exception raised
+        in <type>RowProcessor</type> without closing connection.
+<synopsis>
+void PQskipResult(PGconn *conn, int skipAll)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object.
+         </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><parameter>skipAll</parameter></term>
+       <listitem>
+         <para>
+           Skip remaining rows in current result
+           if <parameter>skipAll</parameter> is false(0). Skip
+           remaining rows in current result and all rows in
+           succeeding results if true(non-zero).
+         </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqresultseterrmsg">
+    <term>
+     <function>PQresultSetErrMsg</function>
+     <indexterm>
+      <primary>PQresultSetErrMsg</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+    Set the message for the error occurred
+    in <type>PQrowProcessor</type>.  If this message is not set, the
+    caller assumes the error to be `unknown' error.
+<synopsis>
+void PQresultSetErrMsg(PGresult *res, const char *msg)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+      <varlistentry>
+        <term><parameter>res</parameter></term>
+        <listitem>
+          <para>
+        A pointer to the <type>PGresult</type> object
+        passed to <type>PQrowProcessor</type>.
+          </para>
+        </listitem>
+      </varlistentry>
+      <varlistentry>
+        <term><parameter>msg</parameter></term>
+        <listitem>
+          <para>
+        Error message. This will be copied internally so there is
+        no need to care of the scope.
+          </para>
+          <para>
+        If <parameter>res</parameter> already has a message previously
+        set, it will be overwritten. Set NULL to cancel the the custom
+        message.
+          </para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprcessor">
+    <term>
+     <function>PQgetRowProcessor</function>
+     <indexterm>
+      <primary>PQgetRowProcessor</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+       Get row processor and its context parameter currently set to
+       the connection.
+<synopsis>
+PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object.
+         </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><parameter>param</parameter></term>
+       <listitem>
+         <para>
+              Set the current row processor parameter of the
+              connection here if not NULL.
+         </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ </sect1>
+
+ <sect1 id="libpq-build">  <title>Building <application>libpq</application> Programs</title>
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..9ce1466 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn    bool        newXactForCursor;        /* Opened a transaction for a
cursor*/} remoteConn;
 
+typedef struct storeInfo
+{
+    Tuplestorestate *tuplestore;
+    int nattrs;
+    MemoryContext oldcontext;
+    AttInMetadata *attinmeta;
+    char** valbuf;
+    int *valbuflen;
+    char **cstrs;
+    bool error_occurred;
+    bool nummismatch;
+} storeInfo;
+/* * Internal declarations */static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);static remoteConn *getConnectionByName(const
char*name);static HTAB *createConnHash(void);static void createNewConnection(const char *name, remoteConn *rconn);
 
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);static void validate_pkattnums(Relation rel,
          int2vector *pkattnums_arg, int32 pknumatts_arg,                   int **pkattnums, int *pknumatts);
 
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+/* Global */static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)    char       *curname = NULL;    int            howmany = 0;
bool       fail = true;    /* default to backward compatible */
 
+    storeInfo   storeinfo;    DBLINK_INIT;
@@ -559,15 +576,51 @@ dblink_fetch(PG_FUNCTION_ARGS)    appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
/*
 
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+    /*     * Try to execute the query.  Note that since libpq uses malloc, the     * PGresult will be long-lived even
thoughwe are still in a short-lived     * memory context.     */
 
-    res = PQexec(conn, buf.data);
+    PG_TRY();
+    {
+        res = PQexec(conn, buf.data);
+    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
+
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
+
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, FALSE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+    if (!res ||        (PQresultStatus(res) != PGRES_COMMAND_OK &&         PQresultStatus(res) != PGRES_TUPLES_OK))
{
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }
+        dblink_res_error(conname, res, "could not fetch from cursor", fail);        return (Datum) 0;    }
@@ -579,8 +632,8 @@ dblink_fetch(PG_FUNCTION_ARGS)                (errcode(ERRCODE_INVALID_CURSOR_NAME),
errmsg("cursor \"%s\" does not exist", curname)));    }
 
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
@@ -640,6 +693,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    remoteConn *rconn = NULL;    bool
      fail = true;    /* default to backward compatible */    bool        freeconn = false;
 
+    storeInfo   storeinfo;    /* check to see if caller supports us returning a tuplestore */    if (rsinfo == NULL ||
!IsA(rsinfo,ReturnSetInfo))
 
@@ -715,164 +769,259 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    rsinfo->setResult = NULL;
rsinfo->setDesc= NULL;
 
+
+    /*
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec/PQgetResult below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+    /* synchronous query, or async result retrieval */
-    if (!is_async)
-        res = PQexec(conn, sql);
-    else
+    PG_TRY();    {
-        res = PQgetResult(conn);
-        /* NULL means we're all done with the async results */
-        if (!res)
-            return (Datum) 0;
+        if (!is_async)
+            res = PQexec(conn, sql);
+        else
+            res = PQgetResult(conn);    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
-    /* if needed, close the connection to the database and cleanup */
-    if (freeconn)
-        PQfinish(conn);
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
-    if (!res ||
-        (PQresultStatus(res) != PGRES_COMMAND_OK &&
-         PQresultStatus(res) != PGRES_TUPLES_OK))
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, FALSE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+
+    /* NULL res from async get means we're all done with the results */
+    if (res || !is_async)    {
-        dblink_res_error(conname, res, "could not execute query", fail);
-        return (Datum) 0;
+        if (freeconn)
+            PQfinish(conn);
+
+        if (!res ||
+            (PQresultStatus(res) != PGRES_COMMAND_OK &&
+             PQresultStatus(res) != PGRES_TUPLES_OK))
+        {
+            /* finishStoreInfo saves the fields referred to below. */
+            if (storeinfo.nummismatch)
+            {
+                /* This is only for backward compatibility */
+                ereport(ERROR,
+                        (errcode(ERRCODE_DATATYPE_MISMATCH),
+                         errmsg("remote query result rowtype does not match "
+                                "the specified FROM clause rowtype")));
+            }
+
+            dblink_res_error(conname, res, "could not execute query", fail);
+            return (Datum) 0;
+        }    }
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo){    ReturnSetInfo *rsinfo = (ReturnSetInfo *)
fcinfo->resultinfo;
+    TupleDesc    tupdesc;
+    int i;
+
+    switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+    {
+        case TYPEFUNC_COMPOSITE:
+            /* success */
+            break;
+        case TYPEFUNC_RECORD:
+            /* failed to determine actual type of RECORD */
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+            break;
+        default:
+            /* result type isn't composite */
+            elog(ERROR, "return type must be a row type");
+            break;
+    }
-    Assert(rsinfo->returnMode == SFRM_Materialize);
+    sinfo->oldcontext = MemoryContextSwitchTo(
+        rsinfo->econtext->ecxt_per_query_memory);
-    PG_TRY();
+    /* make sure we have a persistent copy of the tupdesc */
+    tupdesc = CreateTupleDescCopy(tupdesc);
+
+    sinfo->error_occurred = FALSE;
+    sinfo->nummismatch = FALSE;
+    sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+    sinfo->nattrs = tupdesc->natts;
+    sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+    sinfo->valbuf = NULL;
+    sinfo->valbuflen = NULL;
+
+    /* Preallocate memory of same size with c string array for values. */
+    sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+    if (sinfo->valbuf)
+        sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+    if (sinfo->valbuflen)
+        sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
+
+    if (sinfo->cstrs == NULL)    {
-        TupleDesc    tupdesc;
-        bool        is_sql_cmd = false;
-        int            ntuples;
-        int            nfields;
+        if (sinfo->valbuf)
+            free(sinfo->valbuf);
+        if (sinfo->valbuflen)
+            free(sinfo->valbuflen);
-        if (PQresultStatus(res) == PGRES_COMMAND_OK)
-        {
-            is_sql_cmd = true;
+        ereport(ERROR,
+                (errcode(ERRCODE_OUT_OF_MEMORY),
+                 errmsg("out of memory")));
+    }
-            /*
-             * need a tuple descriptor representing one TEXT column to return
-             * the command status string as our result tuple
-             */
-            tupdesc = CreateTemplateTupleDesc(1, false);
-            TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-                               TEXTOID, -1, 0);
-            ntuples = 1;
-            nfields = 1;
-        }
-        else
-        {
-            Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+    for (i = 0 ; i < sinfo->nattrs ; i++)
+    {
+        sinfo->valbuf[i] = NULL;
+        sinfo->valbuflen[i] = -1;
+    }
-            is_sql_cmd = false;
+    rsinfo->setResult = sinfo->tuplestore;
+    rsinfo->setDesc = tupdesc;
+}
-            /* get a tuple descriptor for our result type */
-            switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-            {
-                case TYPEFUNC_COMPOSITE:
-                    /* success */
-                    break;
-                case TYPEFUNC_RECORD:
-                    /* failed to determine actual type of RECORD */
-                    ereport(ERROR,
-                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                        errmsg("function returning record called in context "
-                               "that cannot accept type record")));
-                    break;
-                default:
-                    /* result type isn't composite */
-                    elog(ERROR, "return type must be a row type");
-                    break;
-            }
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+    int i;
-            /* make sure we have a persistent copy of the tupdesc */
-            tupdesc = CreateTupleDescCopy(tupdesc);
-            ntuples = PQntuples(res);
-            nfields = PQnfields(res);
+    if (sinfo->valbuf)
+    {
+        for (i = 0 ; i < sinfo->nattrs ; i++)
+        {
+            if (sinfo->valbuf[i])
+                free(sinfo->valbuf[i]);        }
+        free(sinfo->valbuf);
+        sinfo->valbuf = NULL;
+    }
-        /*
-         * check result and tuple descriptor have the same number of columns
-         */
-        if (nfields != tupdesc->natts)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("remote query result rowtype does not match "
-                            "the specified FROM clause rowtype")));
+    if (sinfo->valbuflen)
+    {
+        free(sinfo->valbuflen);
+        sinfo->valbuflen = NULL;
+    }
-        if (ntuples > 0)
-        {
-            AttInMetadata *attinmeta;
-            Tuplestorestate *tupstore;
-            MemoryContext oldcontext;
-            int            row;
-            char      **values;
-
-            attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-            oldcontext = MemoryContextSwitchTo(
-                                    rsinfo->econtext->ecxt_per_query_memory);
-            tupstore = tuplestore_begin_heap(true, false, work_mem);
-            rsinfo->setResult = tupstore;
-            rsinfo->setDesc = tupdesc;
-            MemoryContextSwitchTo(oldcontext);
+    if (sinfo->cstrs)
+    {
+        free(sinfo->cstrs);
+        sinfo->cstrs = NULL;
+    }
-            values = (char **) palloc(nfields * sizeof(char *));
+    MemoryContextSwitchTo(sinfo->oldcontext);
+}
-            /* put all tuples into the tuplestore */
-            for (row = 0; row < ntuples; row++)
-            {
-                HeapTuple    tuple;
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+    storeInfo *sinfo = (storeInfo *)param;
+    HeapTuple  tuple;
+    int        fields = PQnfields(res);
+    int        i;
+    char      **cstrs = sinfo->cstrs;
-                if (!is_sql_cmd)
-                {
-                    int            i;
+    if (sinfo->error_occurred)
+    {
+        PQresultSetErrMsg(res, "storeHandler is called after error\n");
+        return FALSE;
+    }
-                    for (i = 0; i < nfields; i++)
-                    {
-                        if (PQgetisnull(res, row, i))
-                            values[i] = NULL;
-                        else
-                            values[i] = PQgetvalue(res, row, i);
-                    }
-                }
+    if (sinfo->nattrs != fields)
+    {
+        sinfo->error_occurred = TRUE;
+        sinfo->nummismatch = TRUE;
+        finishStoreInfo(sinfo);
+
+        /* This error will be processed in
+         * dblink_record_internal(). So do not set error message
+         * here. */
+        
+        PQresultSetErrMsg(res, "unexpected field count in \"D\" message\n");
+        return FALSE;
+    }
+
+    /*
+     * value input functions assumes that the input string is
+     * terminated by zero. We should make the values to be so.
+     */
+    for(i = 0 ; i < fields ; i++)
+    {
+        int len = columns[i].len;
+        if (len < 0)
+            cstrs[i] = NULL;
+        else
+        {
+            char *tmp = sinfo->valbuf[i];
+            int tmplen = sinfo->valbuflen[i];
+
+            /*
+             * Divide calls to malloc and realloc so that things will
+             * go fine even on the systems of which realloc() does not
+             * accept NULL as old memory block.
+             *
+             * Also try to (re)allocate in bigger steps to
+             * avoid flood of allocations on weird data.
+             */
+            if (tmp == NULL)
+            {
+                tmplen = len + 1;
+                if (tmplen < 64)
+                    tmplen = 64;
+                tmp = (char *)malloc(tmplen);
+            }
+            else if (tmplen < len + 1)
+            {
+                if (len + 1 > tmplen * 2)
+                    tmplen = len + 1;                else
-                {
-                    values[0] = PQcmdStatus(res);
-                }
+                    tmplen = tmplen * 2;
+                tmp = (char *)realloc(tmp, tmplen);
+            }
-                /* build the tuple and put it into the tuplestore. */
-                tuple = BuildTupleFromCStrings(attinmeta, values);
-                tuplestore_puttuple(tupstore, tuple);
+            /*
+             * sinfo->valbuf[n] will be freed in finishStoreInfo()
+             * when realloc returns NULL.
+             */
+            if (tmp == NULL)
+            {
+                PQresultSetErrMsg(res, "out of memory for query result\n");                        return FALSE;
    }
 
-            /* clean up and return the tuplestore */
-            tuplestore_donestoring(tupstore);
-        }
+            sinfo->valbuf[i] = tmp;
+            sinfo->valbuflen[i] = tmplen;
-        PQclear(res);
-    }
-    PG_CATCH();
-    {
-        /* be sure to release the libpq result */
-        PQclear(res);
-        PG_RE_THROW();
+            cstrs[i] = sinfo->valbuf[i];
+            memcpy(cstrs[i], columns[i].value, len);
+            cstrs[i][len] = '\0';
+        }    }
-    PG_END_TRY();
+
+    /*
+     * These functions may throw exception. It will be caught in
+     * dblink_record_internal()
+     */
+    tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+    tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+    return TRUE;}/*
diff --git b/doc/src/sgml/libpq.sgml a/doc/src/sgml/libpq.sgml
index 2deb432..1f91f98 100644
--- b/doc/src/sgml/libpq.sgml
+++ a/doc/src/sgml/libpq.sgml
@@ -7350,7 +7350,17 @@ typedef struct     <para>       This function must return 1 for success, and 0 for failure.  On
    failure this function should set the error message
 
-       with <function>PQresultSetErrMsg</function>.
+       with <function>PQresultSetErrMsg</function>.  When non-blocking
+       API is in use, it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.     </para>     <para>
diff --git b/src/interfaces/libpq/fe-protocol2.c a/src/interfaces/libpq/fe-protocol2.c
index da36bfa..3b0520d 100644
--- b/src/interfaces/libpq/fe-protocol2.c
+++ a/src/interfaces/libpq/fe-protocol2.c
@@ -827,6 +827,9 @@ getAnotherTuple(PGconn *conn, bool binary)    rp= conn->rowProcessor(result, rowbuf,
conn->rowProcessorParam);   if (rp == 1)        return 0;
 
+    else if (rp == 2 && pqIsnonblocking(conn))
+        /* processor requested early exit */
+        return EOF;    else if (rp == 0)    {        errmsg = result->errMsg;
diff --git b/src/interfaces/libpq/fe-protocol3.c a/src/interfaces/libpq/fe-protocol3.c
index 12fef30..c8202c2 100644
--- b/src/interfaces/libpq/fe-protocol3.c
+++ a/src/interfaces/libpq/fe-protocol3.c
@@ -703,6 +703,11 @@ getAnotherTuple(PGconn *conn, int msgLength)        /* everything is good */        return 0;
}
+    if (rp == 2 && pqIsnonblocking(conn))
+    {
+        /* processor requested early exit */
+        return EOF;
+    }    /* there was some problem */    if (rp == 0)

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

24 February 2012, 08:12:06

On Fri, Feb 24, 2012 at 07:53:14PM +0900, Kyotaro HORIGUCHI wrote:
> Hello, this is new version of the patch.
> 
> > > By the way, I would like to ask you one question. What is the
> > > reason why void* should be head or tail of the parameter list?
> > 
> > Aesthetical reasons:
> 
>  I got it. Thank you.
> 
> > Last comment - if we drop the plan to make PQsetRowProcessorErrMsg()
> > usable outside of handler, we can simplify internal usage as well:
> > the PGresult->rowProcessorErrMsg can be dropped and let's use
> > ->errMsg to transport the error message.
> > 
> > The PGresult is long-lived structure and adding fields for such
> > temporary usage feels wrong.  There is no other libpq code between
> > row processor and getAnotherTuple, so the use is safe.
> 
> I almost agree with it. Plus, I think it is no reason to consider
> out of memory as particular because now row processor becomes
> generic. But the previous patch does different process for OOM
> and others, but I couldn't see obvious reason to do so.

On OOM, the old result is freed to have higher chance that
constructing new result succeeds.  But if we want to transport
error message, we need to keep old PGresult around.  Thus
two separate paths.

This also means your new code is broken, the errmsg becomes
invalid after pqClearAsyncResult().

It's better to restore old two-path error handling.

> - Now row processors should set message for any error status
>   occurred within. pqAddRow and dblink.c is modified to do so.

I don't think that should be required.  Just use a dummy msg.

> - getAnotherTuple() sets the error message `unknown error' for
>   the condition rp == 0 && ->errMsg == NULL.

Ok.  I think most client will want to drop connection
on error from rowproc, so exact message does not matter.

> - Always forward input cursor and do pqClearAsyncResult() and
>   pqSaveErrorResult() when rp == 0 in getAnotherTuple()
>   regardless whether ->errMsg is NULL or not in fe-protocol3.c.

Ok.  Although skipping packet on OOM does is dubious,
we will skip all further packets anyway, so let's be
consistent on problems.

There is still one EOF in v3 getAnotherTuple() - pqGetInt(tupnfields),
please turn that one also to protocolerror.

> - conn->inStart is already moved to the start point of the next
>   message when row processor is called. So forwarding inStart in
>   outOfMemory1 seems redundant. I removed it.
> 
> - printfPQExpBuffer() compains for variable message. So use
>   resetPQExpBuffer() and appendPQExpBufferStr() instead.

Instead use ("%s", errmsg) as argument there.  libpq code
is noisy enough, no need to add more.

> - I have no idea how to do test for protocol 2...

I have a urge to test with "rm fe-protocol2.c"...

-- 
marko

Let's drop V2 protocol

From

Marko Kreen

Date:

24 February 2012, 09:52:31

On Fri, Feb 24, 2012 at 02:11:45PM +0200, Marko Kreen wrote:
> On Fri, Feb 24, 2012 at 07:53:14PM +0900, Kyotaro HORIGUCHI wrote:
> > - I have no idea how to do test for protocol 2...
> 
> I have a urge to test with "rm fe-protocol2.c"...

Now I tested with 7.3.21 and the non-error case works fine.  Error state
does not - and not because patch is buggy, but because it has never
worked - V2 protocol has no working concept of skipping packets because
pending error state.

On OOM, V2 code does:
  conn->inStart = conn->inEnd;

and hopes for the best, but it does not work, because on short results
it moves past ReadyForQuery, on long results it moves into middle of
some packet.

With user-specified row processor, we need to have a working
error state handling.  With some surgery, it's possible to
introduce something like
  if (conn->result->resultStatus != PGRES_TUPLES_OK)

into various places in the code, to ignore but still
parse the packets.  But it will be rather non-trivial patch.

So could we like, uh, not do it and simply drop the V2 code?

Ofcourse, the row-processor patch does not make the situation worse,
so we could just say "don't use custom row processor with V2 servers",
but it still raises the question: "Does anyone have pre-7.4
servers around and if yes, then why does he need to use 9.2 libpq
to access those?"

-- 
marko

Re: Let's drop V2 protocol

From

Merlin Moncure

Date:

24 February 2012, 10:26:52

On Fri, Feb 24, 2012 at 7:52 AM, Marko Kreen <markokr@gmail.com> wrote:
> On Fri, Feb 24, 2012 at 02:11:45PM +0200, Marko Kreen wrote:
>> On Fri, Feb 24, 2012 at 07:53:14PM +0900, Kyotaro HORIGUCHI wrote:
>> > - I have no idea how to do test for protocol 2...
>>
>> I have a urge to test with "rm fe-protocol2.c"...
>
> Now I tested with 7.3.21 and the non-error case works fine.  Error state
> does not - and not because patch is buggy, but because it has never
> worked - V2 protocol has no working concept of skipping packets because
> pending error state.
>
> On OOM, V2 code does:
>
>   conn->inStart = conn->inEnd;
>
> and hopes for the best, but it does not work, because on short results
> it moves past ReadyForQuery, on long results it moves into middle of
> some packet.
>
> With user-specified row processor, we need to have a working
> error state handling.  With some surgery, it's possible to
> introduce something like
>
>   if (conn->result->resultStatus != PGRES_TUPLES_OK)
>
> into various places in the code, to ignore but still
> parse the packets.  But it will be rather non-trivial patch.
>
> So could we like, uh, not do it and simply drop the V2 code?
>
>
> Ofcourse, the row-processor patch does not make the situation worse,
> so we could just say "don't use custom row processor with V2 servers",
> but it still raises the question: "Does anyone have pre-7.4
> servers around and if yes, then why does he need to use 9.2 libpq
> to access those?"

I think it's plausible that very old client libraries could connect to
a modern server.  But it's pretty unlikely to have a 9.2 app contact
an ancient server IMO.

merlin

Re: Let's drop V2 protocol

From

Marko Kreen

Date:

24 February 2012, 10:40:42

On Fri, Feb 24, 2012 at 4:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Feb 24, 2012 at 7:52 AM, Marko Kreen <markokr@gmail.com> wrote:
>> So could we like, uh, not do it and simply drop the V2 code?
>
> I think it's plausible that very old client libraries could connect to
> a modern server.  But it's pretty unlikely to have a 9.2 app contact
> an ancient server IMO.

We can drop it from libpq but keep  the server-side support,
the codebase is different.

--
marko

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

24 February 2012, 11:46:34

On Tue, Feb 14, 2012 at 01:39:06AM +0200, Marko Kreen wrote:
> I tried imaging some sort of getFoo() style API for fetching in-flight
> row data, but I always ended up with "rewrite libpq" step, so I feel
> it's not productive to go there.
> 
> Instead I added simple feature: rowProcessor can return '2',
> in which case getAnotherTuple() does early exit without setting
> any error state.  In user side it appears as PQisBusy() returned
> with TRUE result.  All pointers stay valid, so callback can just
> stuff them into some temp area.  ATM there is not indication though
> whether the exit was due to callback or other reasons, so user
> must detect it based on whether new temp pointers appeares,
> which means those must be cleaned before calling PQisBusy() again.
> This actually feels like feature, those must not stay around
> after single call.

To see how iterating a resultset would look like I implemented PQgetRow()
function using the currently available public API:
/* * Wait and return next row in resultset. * * returns: *   1 - row data available, the pointers are owned by PGconn *
 0 - result done, use PQgetResult() to get final result *  -1 - some problem, check connection error */int
PQgetRow(PGconn*db, PGresult **hdr_p, PGrowValue **row_p);

code at:
 https://github.com/markokr/libpq-rowproc-demos/blob/master/getrow.c

usage:
/* send query */if (!PQsendQuery(db, q))    die(db, "PQsendQuery");
/* fetch rows one-by-one */while (1) {    rc = PQgetRow(db, &hdr, &row);    if (rc > 0)        proc_row(hdr, row);
elseif (rc == 0)        break;    else        die(db, "streamResult");}/* final PGresult, either PGRES_TUPLES_OK or
error*/final = PQgetResult(db);

It does not look like it can replace the public callback API,
because it does not work with fully-async connections well.
But it *does* continue the line of synchronous APIs:

- PQexec(): last result only
- PQsendQuery() + PQgetResult(): each result separately
- PQsendQuery() + PQgetRow() + PQgetResult(): each row separately

Also the generic implementation is slightly messy, because
it cannot assume anything about surrounding usage patterns,
while same code living in some user framework can.  But
for simple users who just want to synchronously iterate
over resultset, it might be good enough API?

It does have a inconsistency problem - the row data does
not live in PGresult but in custom container.  Proper
API pattern would be to have PQgetRow() that gives
functional PGresult, but that is not interesting for
high-performace users.  Solutions:

- rename to PQrecvRow()
- rename to PQrecvRow() and additionally provide PQgetRow()
- Don't bother, let users implement it themselves via callback API.

Comments?

-- 
marko

Re: Let's drop V2 protocol

From

Tom Lane

Date:

24 February 2012, 12:46:38

Marko Kreen <markokr@gmail.com> writes:
> On Fri, Feb 24, 2012 at 4:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> I think it's plausible that very old client libraries could connect to
>> a modern server. �But it's pretty unlikely to have a 9.2 app contact
>> an ancient server IMO.

> We can drop it from libpq but keep  the server-side support,
> the codebase is different.

I believe it's still somewhat common among JDBC users to force
V2-protocol connections as a workaround for over-eager prepared
statement planning.  Although I think the issue they're trying to dodge
will be fixed as of 9.2, that doesn't mean the server-side support can
go away.

As for taking it out of libpq, I would vote against doing that as long
as we have pg_dump support for pre-7.4 servers.  Now, I think it would
be entirely reasonable to kill pg_dump's support for pre-7.3 servers,
because that would simplify life in numerous ways for pg_dump; but 7.4
was not a big compatibility break for pg_dump so it seems a bit
arbitrary to kill its support for 7.3 specifically.

As near as I can tell the argument here is basically that we don't want
to try to fix unfixable bugs in the V2-protocol code.  Well, yeah,
they're unfixable ... why do you think we invented V3?  But they are
what they are, and as long as you don't run into those limitations the
protocol does still work.  There are plenty of features that require V3
already, so I see no reason not to classify the row-processor stuff as
one more.
        regards, tom lane

Re: Let's drop V2 protocol

From

Marko Kreen

Date:

24 February 2012, 13:11:29

On Fri, Feb 24, 2012 at 11:46:19AM -0500, Tom Lane wrote:
> As for taking it out of libpq, I would vote against doing that as long
> as we have pg_dump support for pre-7.4 servers.  Now, I think it would
> be entirely reasonable to kill pg_dump's support for pre-7.3 servers,
> because that would simplify life in numerous ways for pg_dump; but 7.4
> was not a big compatibility break for pg_dump so it seems a bit
> arbitrary to kill its support for 7.3 specifically.

So we need to maintain V2 protocol in libpq to specifically support 7.3?

What's so special about 7.3?

> As near as I can tell the argument here is basically that we don't want
> to try to fix unfixable bugs in the V2-protocol code.  Well, yeah,
> they're unfixable ... why do you think we invented V3?  But they are
> what they are, and as long as you don't run into those limitations the
> protocol does still work.  There are plenty of features that require V3
> already, so I see no reason not to classify the row-processor stuff as
> one more.

Agreed.  But still - having to reorg the never-used V2 code in parallel
to actual development in V3 code makes all changes in the area
unnecessary harder.  

-- 
marko

Re: Let's drop V2 protocol

From

Tom Lane

Date:

24 February 2012, 13:18:05

Marko Kreen <markokr@gmail.com> writes:
> On Fri, Feb 24, 2012 at 11:46:19AM -0500, Tom Lane wrote:
>> As for taking it out of libpq, I would vote against doing that as long
>> as we have pg_dump support for pre-7.4 servers.  Now, I think it would
>> be entirely reasonable to kill pg_dump's support for pre-7.3 servers,
>> because that would simplify life in numerous ways for pg_dump; but 7.4
>> was not a big compatibility break for pg_dump so it seems a bit
>> arbitrary to kill its support for 7.3 specifically.

> So we need to maintain V2 protocol in libpq to specifically support 7.3?

> What's so special about 7.3?

From pg_dump's viewpoint, the main thing about 7.3 is it's where we
added server-tracked object dependencies.  It also has schemas, though
I don't recall at the moment how much effort pg_dump has to spend on
faking those for an older server.
        regards, tom lane

Re: Let's drop V2 protocol

From

Merlin Moncure

Date:

24 February 2012, 14:25:17

On Fri, Feb 24, 2012 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I believe it's still somewhat common among JDBC users to force
> V2-protocol connections as a workaround for over-eager prepared
> statement planning.  Although I think the issue they're trying to dodge
> will be fixed as of 9.2, that doesn't mean the server-side support can
> go away.

good point. I just went through this.  The JDBC driver has a
'prepareThreshold' directive that *mostly* disables server side
prepared statements so you can work with tools like pgbouncer.
Unfortunately, it doesn't work for explicit transaction control
statements so that you have to downgrade jdbc protocol or patch the
driver (which is really the better way to go, but I can understand why
that won't fly for a lot of people).

merlin

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

26 February 2012, 18:19:46

On Fri, Feb 24, 2012 at 05:46:16PM +0200, Marko Kreen wrote:
> - rename to PQrecvRow() and additionally provide PQgetRow()

I tried it and it seems to work as API - there is valid behaviour
for both sync and async connections.

Sync connection - PQgetRow() waits for data from network:

    if (!PQsendQuery(db, q))
        die(db, "PQsendQuery");
    while (1) {
        r = PQgetRow(db);
        if (!r)
            break;
        handle(r);
        PQclear(r);
    }
    r = PQgetResult(db);

Async connection - PQgetRow() does PQisBusy() loop internally,
but does not read from network:

   on read event:
    PQconsumeInput(db);
    while (1) {
        r = PQgetRow(db);
        if (!r)
            break;
        handle(r);
        PQclear(r);
    }
    if (!PQisBusy(db))
        r = PQgetResult(db);
    else
        waitForMoredata();


As it seems to simplify life for quite many potential users,
it seems worth including in libpq properly.

Attached patch is on top of v20120223 of row-processor
patch.  Only change in general code is allowing
early exit for syncronous connection, as we now have
valid use-case for it.

--
marko

Attachment

pqgetrow.diff

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

27 February 2012, 04:21:30

Hello, 

> On OOM, the old result is freed to have higher chance that
> constructing new result succeeds.  But if we want to transport
> error message, we need to keep old PGresult around.  Thus
> two separate paths.

Ok, I understood the aim. But now we can use non-local exit to do
that for not only asynchronous reading (PQgetResult()) but
synchronous (PQexec()). If we should provide a means other than
exceptions to do that, I think it should be usable for both
syncronous and asynchronous reading. conn->asyncStatus seems to
be used for the case.

Wow is the modification below?

- getAnotherTuple() now returns 0 to continue as before, and 1 instead of EOF to signal EOF state, and 2 to instruct to
exitimmediately.
 

- pqParseInput3 set conn->asyncStatus to PGASYNC_BREAK for the last case,

- then PQgetResult() returns immediately when asyncStatus == PGASYNC_TUPLES_BREAK after parseInput() retunes.

- and PQexecFinish() returns immediately if PQgetResult() returns with aysncStatys == PGASYNC_TUPLES_BREAK.

- PGgetResult() sets asyncStatus = PGRES_TUPLES_OK if called with asyncStatus == PGRES_TUPLES_BREAK

- New libpq API PQisBreaked(PGconn*) returns if asyncStatus == PGRES_TUPLES_BREAK can be used to check if the transfer
isbreaked.
 

> Instead use ("%s", errmsg) as argument there.  libpq code
> is noisy enough, no need to add more.

Ok. I will do so.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

27 February 2012, 06:20:28

On Mon, Feb 27, 2012 at 05:20:30PM +0900, Kyotaro HORIGUCHI wrote:
> Hello, 
> 
> > On OOM, the old result is freed to have higher chance that
> > constructing new result succeeds.  But if we want to transport
> > error message, we need to keep old PGresult around.  Thus
> > two separate paths.
> 
> Ok, I understood the aim. But now we can use non-local exit to do
> that for not only asynchronous reading (PQgetResult()) but
> synchronous (PQexec()). If we should provide a means other than
> exceptions to do that, I think it should be usable for both
> syncronous and asynchronous reading. conn->asyncStatus seems to
> be used for the case.
> 
> Wow is the modification below?
> 
> - getAnotherTuple() now returns 0 to continue as before, and 1
>   instead of EOF to signal EOF state, and 2 to instruct to exit
>   immediately.
> 
> - pqParseInput3 set conn->asyncStatus to PGASYNC_BREAK for the
>   last case,
> 
> - then PQgetResult() returns immediately when
>   asyncStatus == PGASYNC_TUPLES_BREAK after parseInput() retunes.
> 
> - and PQexecFinish() returns immediately if PQgetResult() returns
>   with aysncStatys == PGASYNC_TUPLES_BREAK.
> 
> - PGgetResult() sets asyncStatus = PGRES_TUPLES_OK if called with
>   asyncStatus == PGRES_TUPLES_BREAK
> 
> - New libpq API PQisBreaked(PGconn*) returns if asyncStatus ==
>   PGRES_TUPLES_BREAK can be used to check if the transfer is breaked.

I don't get where are you going with such changes.  Are you trying to
make V2 code survive row-processor errors?  (Only V2 has concept of
"EOF state".)  Then forget about it.  It's more important to not
destabilize V3 code.

And error from row processor is not something special from other errors.
Why does it need special state?  And note that adding new PGASYNC or
PGRES state needs review of all if() and switch() statements in the
code that use those fields...  So there must serious need for it.
At the moment I don't see such need.

I just asked you to replace ->rowProcessorErrMsg with ->errMsg
to get rid of unnecessary field.

But if you want to do bigger cleanup, then you can instead make
PQsetRowProcessorErrMsg() more generic:
 PQsetErrorMessage(PGconn *conn, const char *msg)

Current code has the tiny problem that it adds new special state between
PQsetRowProcessorErrMsg() and return from callback to getAnotherTuple()
where getAnotherTuple() sets actual error state.  When the callback
exits via exception, the state can leak to other code.  It should not
break anything, but it's still annoying.

Also, with the PQgetRow() patch, the need for doing complex processing
under callback has decreased and the need to set error outside callback
has increased.

As a bonus, such generic error-setting function would lose any extra
special state introduced by row-processor patch.

Previously I mentioned that callback would need to have additional
PGconn* argument to make connection available to callback to use such
generic error setting function, but now I think it does not need it -
simple callbacks don't need to set errors and complex callback can make
the PGconn available via Param.  Eg. the internal callback should set
Param to PGconn, instead keeping NULL there.

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

27 February 2012, 22:59:26

Hello.

I will show you fixed version patch later, please wait for a
while.

======
> It's more important to not destabilize V3 code.

Ok, I withdraw that agreeing with that point. And I noticed that
the proposal before is totally a crap becuase I have mixed up
asyncStatus with resultStatus in it.

> And error from row processor is not something special from
> other errors.  Why does it need special state?

I'm sorry to have upset the discussion. What I wanted there is a
means other than exceptions to exit out of PQexec() by row
processor trigger without discarding the result built halfway,
like async.

> I just asked you to replace ->rowProcessorErrMsg with ->errMsg
> to get rid of unnecessary field.

Ok, I will remove extra code.

> Also, with the PQgetRow() patch, the need for doing complex processing
> under callback has decreased and the need to set error outside callback
> has increased.
> 
> As a bonus, such generic error-setting function would lose any extra
> special state introduced by row-processor patch.

That sounds nice. I will show you the patch without it for the
present, then try to include.

> Previously I mentioned that callback would need to have additional
> PGconn* argument to make connection available to callback to use such
> generic error setting function, but now I think it does not need it -
> simple callbacks don't need to set errors and complex callback can make
> the PGconn available via Param.  Eg. the internal callback should set
> Param to PGconn, instead keeping NULL there.

I agree with it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

28 February 2012, 04:05:14

This is the new version of the patch.
It is not rebased to the HEAD because of a build error.

> It's better to restore old two-path error handling.

I restorerd "OOM and save result" route. But it seems needed to
get back any amount of memory on REAL OOM as the comment in
original code says.

So I restored the meaning of rp == 0 && errMsg == NULL as REAL
OOM which is to throw the async result away and the result will
be preserved if errMsg is not NULL. `unknown error' has been
removed.

As the result, if row processor returns 0 the parser skips to the
end of rows and returns the working result or an error result
according to whether errMsg is set or not in the row processor.


> I don't think that should be required.  Just use a dummy msg.

Considering the above, pqAddRow is also restored to leave errMsg
NULL on OOM.

> There is still one EOF in v3 getAnotherTuple() -
> pqGetInt(tupnfields), please turn that one also to
> protocolerror.

pqGetInt() returns EOF only when it wants additional reading from
network if the parameter `bytes' is appropreate. Non-zero return
from it seems should be handled as EOF, not a protocol error. The
one point I had modified bugilly is also restored. The so-called
'protocol error' has been vanished eventually.

> Instead use ("%s", errmsg) as argument there.  libpq code
> is noisy enough, no need to add more.

done

Is there someting left undone?


By the way, I noticed that dblink always says that the current
connection is 'unnamed' in messages the errors in
dblink_record_internal@dblink.  I could see that
dblink_record_internal defines the local variable conname = NULL
and pass it to dblink_res_error to display the error message. But
no assignment on it in the function.

It seemed properly shown when I added the code to set conname
from PG_GETARG_TEXT_PP(0) if available, in other words do that
just after DBLINK_GET_CONN/DBLINK_GET_NAMED_CONN's. It seems the
dblink's manner...  This is not included in this patch.

Furthurmore dblink_res_error looks only into returned PGresult to
display the error and always says only `Error occurred on dblink
connection..: could not execute query'..

Is it right to consider this as follows?
- dblink is wrong in error handling. A client of libpq should  see PGconn by PQerrorMessage() if (or regardless of
whether?) PGresult says nothing about error.
 


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..239edb8 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,7 @@ PQconnectStartParams      157PQping                    158PQpingParams              159PQlibVersion
            160
 
+PQsetRowProcessor      161
+PQgetRowProcessor      162
+PQresultSetErrMsg      163
+PQskipResult          164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)    conn->wait_ssl_try = false;#endif
+    /* set default row processor */
+    PQsetRowProcessor(conn, NULL, NULL);
+    /*     * We try to send at least 8K at a time, which is the usual size of pipe     * buffers on Unix systems.
Thatway, when we are sending a large amount
 
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)    initPQExpBuffer(&conn->errorMessage);
initPQExpBuffer(&conn->workBuffer);
+    /* set up initial row buffer */
+    conn->rowBufLen = 32;
+    conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+    if (conn->inBuffer == NULL ||        conn->outBuffer == NULL ||
+        conn->rowBuf == NULL ||        PQExpBufferBroken(&conn->errorMessage) ||
PQExpBufferBroken(&conn->workBuffer))   {
 
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)        free(conn->inBuffer);    if (conn->outBuffer)
free(conn->outBuffer);
+    if (conn->rowBuf)
+        free(conn->rowBuf);    termPQExpBuffer(&conn->errorMessage);    termPQExpBuffer(&conn->workBuffer);
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)    return prev;}
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..ce58778 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);static int PQsendDescribe(PGconn *conn, char desc_type,
           const char *desc_target);static int    check_field_number(const PGresult *res, int field_num);
 
+static int    pqAddRow(PGresult *res, PGrowValue *columns, void *param);/* ----------------
@@ -701,7 +702,6 @@ pqClearAsyncResult(PGconn *conn)    if (conn->result)        PQclear(conn->result);    conn->result
=NULL;
 
-    conn->curTuple = NULL;}/*
@@ -756,7 +756,6 @@ pqPrepareAsyncResult(PGconn *conn)     */    res = conn->result;    conn->result = NULL;        /*
handingover ownership to caller */
 
-    conn->curTuple = NULL;        /* just in case */    if (!res)        res = PQmakeEmptyPGresult(conn,
PGRES_FATAL_ERROR);   else
 
@@ -828,6 +827,87 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)}/*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+    conn->rowProcessor = (func ? func : pqAddRow);
+    conn->rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+    if (param)
+        *param = conn->rowProcessorParam;
+
+    return conn->rowProcessor;
+}
+
+/*
+ * PQresultSetErrMsg
+ *    Set the error message to PGresult.
+ *
+ *    You can replace the previous message by alternative mes, or clear
+ *    it with NULL.
+ */
+void
+PQresultSetErrMsg(PGresult *res, const char *msg)
+{
+    if (msg)
+        res->errMsg = pqResultStrdup(res, msg);
+    else
+        res->errMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *      add a row to the PGresult structure, growing it if necessary
+ *      Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+{
+    PGresAttValue *tup;
+    int            nfields = res->numAttributes;
+    int            i;
+
+    tup = (PGresAttValue *)
+        pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+    if (tup == NULL)
+        return FALSE;
+
+    for (i = 0 ; i < nfields ; i++)
+    {
+        tup[i].len = columns[i].len;
+        if (tup[i].len == NULL_LEN)
+        {
+            tup[i].value = res->null_field;
+        }
+        else
+        {
+            bool isbinary = (res->attDescs[i].format != 0);
+            tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+            if (tup[i].value == NULL)
+                return FALSE;
+
+            memcpy(tup[i].value, columns[i].value, tup[i].len);
+            /* We have to terminate this ourselves */
+            tup[i].value[tup[i].len] = '\0';
+        }
+    }
+
+    return pqAddTuple(res, tup);
+}
+
+/* * pqAddTuple *      add a row pointer to the PGresult structure, growing it if necessary *      Returns TRUE if OK,
FALSEif not enough memory to add the row
 
@@ -1223,7 +1303,6 @@ PQsendQueryStart(PGconn *conn)    /* initialize async result-accumulation state */
conn->result= NULL;
 
-    conn->curTuple = NULL;    /* ready to send command message */    return true;
@@ -1831,6 +1910,55 @@ PQexecFinish(PGconn *conn)    return lastResult;}
+
+/*
+ * Do-nothing row processor for PQskipResult
+ */
+static int
+dummyRowProcessor(PGresult *res, PGrowValue *columns, void *param)
+{
+    return 1;
+}
+
+/*
+ * Exaust remaining Data Rows in curret conn.
+ * 
+ * Exaust current result if skipAll is false and all succeeding results if
+ * true.
+ */
+int
+PQskipResult(PGconn *conn, int skipAll)
+{
+    PQrowProcessor savedRowProcessor;
+    void * savedRowProcParam;
+    PGresult *res;
+    int ret = 0;
+
+    /* save the current row processor settings and set dummy processor */
+    savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+    PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+    
+    /*
+     * Throw away the remaining rows in current result, or all succeeding
+     * results if skipAll is not FALSE.
+     */
+    if (skipAll)
+    {
+        while ((res = PQgetResult(conn)) != NULL)
+            PQclear(res);
+    }
+    else if ((res = PQgetResult(conn)) != NULL)
+    {
+        PQclear(res);
+        ret = 1;
+    }
+    
+    PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+
+    return ret;
+}
+
+/* * PQdescribePrepared *      Obtain information about a previously prepared statement
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)}/*
+ * pqGetnchar:
+ *    skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+    if (len > (size_t) (conn->inEnd - conn->inCursor))
+        return EOF;
+
+    conn->inCursor += len;
+
+    if (conn->Pfdebug)
+        fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+                (unsigned long) len);
+
+    return 0;
+}
+
+/* * pqPutnchar: *    write exactly len bytes to the current message */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..36773cb 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, FALSE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, TRUE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -703,52 +707,55 @@ failure:/* * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor. * Returns: 0 if completed message, EOF if error
ornot enough data yet. * * Note that if we run out of data, we have to suspend and reprocess
 
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, bool binary){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    /* the backend sends us a bitmap of which attributes are null */    char
std_bitmap[64];/* used unless it doesn't fit */    char       *bitmap = std_bitmap;    int            i;
 
+    int            rp;    size_t        nbytes;            /* the number of bytes in bitmap  */    char        bmap;
        /* One byte of the bitmap */    int            bitmap_index;    /* Its index */    int            bitcnt;
    /* number of bits examined in current byte */    int            vlen;            /* length of the current field
value*/
 
+    char        *errmsg = libpq_gettext("unknown error\n");
-    result->binary = binary;
-
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
+    /* resize row buffer if needed */
+    if (nfields > conn->rowBufLen)    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-        /*
-         * If it's binary, fix the column format indicators.  We assume the
-         * backend will consistently send either B or D, not a mix.
-         */
-        if (binary)
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)        {
-            for (i = 0; i < nfields; i++)
-                result->attDescs[i].format = 1;
+            errmsg = libpq_gettext("out of memory for query result\n");
+            goto error_clearresult;        }
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;
+    }
+    else
+    {
+        rowbuf = conn->rowBuf;
+    }
+
+    result->binary = binary;
+
+    if (binary)
+    {
+        for (i = 0; i < nfields; i++)
+            result->attDescs[i].format = 1;    }
-    tup = conn->curTuple;    /* Get the null-value bitmap */    nbytes = (nfields + BITS_PER_BYTE - 1) /
BITS_PER_BYTE;
@@ -757,11 +764,15 @@ getAnotherTuple(PGconn *conn, bool binary)    {        bitmap = (char *) malloc(nbytes);
if(!bitmap)
 
-            goto outOfMemory;
+        {
+            errmsg = libpq_gettext("out of memory for query result\n");
+            goto error_clearresult;
+        }    }    if (pqGetnchar(bitmap, nbytes, conn))
-        goto EOFexit;
+        goto error_clearresult;
+    /* Scan the fields */    bitmap_index = 0;
@@ -771,34 +782,29 @@ getAnotherTuple(PGconn *conn, bool binary)    for (i = 0; i < nfields; i++)    {        if
(!(bmap& 0200))
 
-        {
-            /* if the field value is absent, make it a null string */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-        }
+            vlen = NULL_LEN;
+        else if (pqGetInt(&vlen, 4, conn))
+                goto EOFexit;        else        {
-            /* get the value length (the first four bytes are for length) */
-            if (pqGetInt(&vlen, 4, conn))
-                goto EOFexit;            if (!binary)                vlen = vlen - 4;            if (vlen < 0)
      vlen = 0;
 
-            if (tup[i].value == NULL)
-            {
-                tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-                if (tup[i].value == NULL)
-                    goto outOfMemory;
-            }
-            tup[i].len = vlen;
-            /* read in the value */
-            if (vlen > 0)
-                if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                    goto EOFexit;
-            /* we have to terminate this ourselves */
-            tup[i].value[vlen] = '\0';        }
+
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip the value */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            goto EOFexit;
+        /* advance the bitmap stuff */        bitcnt++;        if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +817,64 @@ getAnotherTuple(PGconn *conn, bool binary)            bmap <<= 1;    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
-    if (bitmap != std_bitmap)        free(bitmap);
-    return 0;
+    bitmap = NULL;
-outOfMemory:
-    /* Replace partially constructed result with an error result */
+    /* tag the row as parsed */
+    conn->inStart = conn->inCursor;
+
+    /* Pass the completed row values to rowProcessor */
+    rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+    if (rp == 1)
+        return 0;
+    else if (rp == 2 && pqIsnonblocking(conn))
+        /* processor requested early exit */
+        return EOF;
+    else if (rp == 0)
+    {
+        errmsg = result->errMsg;
+        result->errMsg = NULL;
+        if (errmsg == NULL)
+        {
+            /* If errmsg == NULL, we assume that the row processor
+             * notices out of memory. We should immediately free any
+             * space to go forward. */
+            errmsg = "out of memory";
+            goto error_clearresult;
+        }
+        /*
+         * We assume that some ancestor which has a relation with the
+         * row processor wants the result built halfway when row
+         * processor sets any errMsg for rp == 0.
+         */
+        goto error_saveresult;
+    }
+    errmsg = libpq_gettext("invalid return value from row processor\n");
+    /* FALL THROUGH */
+error_clearresult:    /*     * we do NOT use pqSaveErrorResult() here, because of the likelihood that     * there's
notenough memory to concatenate messages...     */    pqClearAsyncResult(conn);
 
-    printfPQExpBuffer(&conn->errorMessage,
-                      libpq_gettext("out of memory for query result\n"));
+error_saveresult:
+    /*
+     * If error message is passed from RowProcessor, set it into
+     * PGconn, assume out of memory if not.
+     */
+    printfPQExpBuffer(&conn->errorMessage, "%s", errmsg);
+        /*     * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can     * do to recover...     */
conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 
+    conn->asyncStatus = PGASYNC_READY;
+    /* Discard the failed message --- good idea? */    conn->inStart = conn->inEnd;
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..2693ce0 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, msgLength))                            return;
 
+
+                        /* getAnotherTuple() moves inStart itself */
+                        continue;                    }                    else if (conn->result != NULL &&
               conn->result->resultStatus == PGRES_FATAL_ERROR)
 
@@ -613,33 +616,23 @@ failure:/* * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor. * Returns: 0 if completed message, EOF if error
ornot enough data yet. * * Note that if we run out of data, we have to suspend and reprocess
 
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, int msgLength){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    int            tupnfields;        /* # fields from tuple */    int            vlen;
   /* length of the current field value */    int            i;
 
-
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
-    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-    }
-    tup = conn->curTuple;
+    int            rp;
+    char        *errmsg = libpq_gettext("unknown error\n");    /* Get the field count and make sure it's what we
expect*/    if (pqGetInt(&tupnfields, 2, conn))
 
@@ -647,13 +640,22 @@ getAnotherTuple(PGconn *conn, int msgLength)    if (tupnfields != nfields)    {
-        /* Replace partially constructed result with an error result */
-        printfPQExpBuffer(&conn->errorMessage,
-                 libpq_gettext("unexpected field count in \"D\" message\n"));
-        pqSaveErrorResult(conn);
-        /* Discard the failed message by pretending we read it */
-        conn->inCursor = conn->inStart + 5 + msgLength;
-        return 0;
+        errmsg = libpq_gettext("unexpected field count in \"D\" message\n");
+        goto error_and_forward;
+    }
+
+    /* resize row buffer if needed */
+    rowbuf = conn->rowBuf;
+    if (nfields > conn->rowBufLen)
+    {
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)
+        {
+            errmsg = libpq_gettext("out of memory for query result\n");
+            goto error_and_forward;
+        }
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;    }    /* Scan the fields */
@@ -662,53 +664,88 @@ getAnotherTuple(PGconn *conn, int msgLength)        /* get the value length */        if
(pqGetInt(&vlen,4, conn))            return EOF;
 
+        if (vlen == -1)
-        {
-            /* null field */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-            continue;
-        }
-        if (vlen < 0)
+            vlen = NULL_LEN;
+        else if (vlen < 0)            vlen = 0;
-        if (tup[i].value == NULL)
-        {
-            bool        isbinary = (result->attDescs[i].format != 0);
-            tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-            if (tup[i].value == NULL)
-                goto outOfMemory;
-        }
-        tup[i].len = vlen;
-        /* read in the value */
-        if (vlen > 0)
-            if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                return EOF;
-        /* we have to terminate this ourselves */
-        tup[i].value[vlen] = '\0';
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip to the next length field */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            return EOF;    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
+    /* tag the row as parsed, check if correctly */
+    conn->inStart += 5 + msgLength;
+    if (conn->inCursor != conn->inStart)
+    {
+        errmsg = libpq_gettext("invalid row contents\n");
+        goto error_clearresult;
+    }
-    return 0;
+    /* Pass the completed row values to rowProcessor */
+    rp = conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+    if (rp == 1)
+    {
+        /* everything is good */
+        return 0;
+    }
+    if (rp == 2 && pqIsnonblocking(conn))
+    {
+        /* processor requested early exit */
+        return EOF;
+    }
+
+    /* there was some problem */
+    if (rp == 0)
+    {
+        /*
+         * Unlink errMsg from result here to use it after
+         * pqClearAsyncResult() is called.
+         */
+        errmsg = result->errMsg;
+        result->errMsg = NULL;
+        if (errmsg == NULL)
+        {
+            /* If errmsg == NULL, we assume that the row processor
+             * notices out of memory. We should immediately free any
+             * space to go forward. */
+            errmsg = "out of memory";
+            goto error_clearresult;
+        }
+        /*
+         * We assume that some ancestor which has a relation with the
+         * row processor wants the result built halfway when row
+         * processor sets any errMsg for rp == 0.
+         */
+        goto error_saveresult;
+    }
-outOfMemory:
+    errmsg = libpq_gettext("invalid return value from row processor\n");
+    goto error_clearresult;
+
+error_and_forward:
+    /* Discard the failed message by pretending we read it */
+    conn->inCursor = conn->inStart + 5 + msgLength;
+error_clearresult:
+    pqClearAsyncResult(conn);
+    
+error_saveresult:    /*     * Replace partially constructed result with an error result. First     * discard the old
resultto try to win back some memory.     */
 
-    pqClearAsyncResult(conn);
-    printfPQExpBuffer(&conn->errorMessage,
-                      libpq_gettext("out of memory for query result\n"));
+    printfPQExpBuffer(&conn->errorMessage, "%s", errmsg);    pqSaveErrorResult(conn);
-
-    /* Discard the failed message by pretending we read it */
-    conn->inCursor = conn->inStart + 5 + msgLength;    return 0;}
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..810b04e 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify    struct pgNotify *next;        /* list link */} PGnotify;
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+    int            len;            /* length in bytes of the value */
+    char       *value;            /* actual value, without null termination */
+} PGrowValue;
+/* Function types for notice-handling callbacks */typedef void (*PQnoticeReceiver) (void *arg, const PGresult
*res);typedefvoid (*PQnoticeProcessor) (void *arg, const char *message);
 
@@ -416,6 +427,38 @@ extern PGPing PQping(const char *conninfo);extern PGPing PQpingParams(const char *const *
keywords,            const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one
+ * pointing to particular column data in network buffer.
+ * This function is supposed to copy data out from there
+ * and store somewhere.  NULL is signified with len<0.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQresultSetErrMsg.  It is assumed by
+ * caller as out of memory when the error message is not set on
+ * failure. This function is assumed not to throw any exception.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, PGrowValue *columns,
+                              void *param);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+                                   void *rowProcessorParam);
+extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+extern int  PQskipResult(PGconn *conn, int skipAll);
+/* Force the write buffer to be written (or at least try) */extern int    PQflush(PGconn *conn);
@@ -454,6 +497,7 @@ extern char *PQcmdTuples(PGresult *res);extern char *PQgetvalue(const PGresult *res, int tup_num,
intfield_num);extern int    PQgetlength(const PGresult *res, int tup_num, int field_num);extern int
PQgetisnull(constPGresult *res, int tup_num, int field_num);
 
+extern void    PQresultSetErrMsg(PGresult *res, const char *msg);extern int    PQnparams(const PGresult *res);extern
Oid   PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..9cabd20 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -398,7 +398,6 @@ struct pg_conn    /* Status for asynchronous result construction */    PGresult   *result;
 /* result being constructed */
 
-    PGresAttValue *curTuple;    /* tuple currently being read */#ifdef USE_SSL    bool        allow_ssl_try;    /*
Allowedto try SSL negotiation */
 
@@ -443,6 +442,14 @@ struct pg_conn    /* Buffer for receiving various parts of messages */    PQExpBufferData
workBuffer;/* expansible string */
 
+
+    /*
+     * Read column data from network buffer.
+     */
+    PQrowProcessor rowProcessor;/* Function pointer */
+    void *rowProcessorParam;    /* Contextual parameter for rowProcessor */
+    PGrowValue *rowBuf;            /* Buffer for passing values to rowProcessor */
+    int rowBufLen;                /* Number of columns allocated in rowBuf */};/* PGcancel stores all data necessary
tocancel a connection. A copy of this 
@@ -560,6 +567,7 @@ extern int    pqGets(PQExpBuffer buf, PGconn *conn);extern int    pqGets_append(PQExpBuffer buf,
PGconn*conn);extern int    pqPuts(const char *s, PGconn *conn);extern int    pqGetnchar(char *s, size_t len, PGconn
*conn);
+extern int    pqSkipnchar(size_t len, PGconn *conn);extern int    pqPutnchar(const char *s, size_t len, PGconn
*conn);externint    pqGetInt(int *result, size_t bytes, PGconn *conn);extern int    pqPutInt(int value, size_t bytes,
PGconn*conn); 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..5ef89e7 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,332 @@ int PQisthreadsafe(); </sect1>
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PGresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PGresult</type> is passed to user.  This behavior can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object to set the row processor function.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>func</parameter></term>
+       <listitem>
+         <para>
+           Storage handler function to set. NULL means to use the
+           default processor.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>param</parameter></term>
+       <listitem>
+         <para>
+           A pointer to contextual parameter passed
+           to <parameter>func</parameter>.
+         </para>
+       </listitem>
+     </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, PGrowValue *columns, void *param);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and 0 for failure.  On
+       failure the caller assumes the error as an out of memory and
+       releases the PGresult under construction. If you set any
+       message with <function>PQresultSetErrMsg</function>, it is set
+       as the PGconn's error message and the PGresult will be
+       preserved.  When non-blocking API is in use, it can also return
+       2 for early exit from <function>PQisBusy</function> function.
+       The supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed
+       via <function>PQfinish</function>.  Processing can also be
+       continued without closing the connection,
+       call <function>getResult</function> on synchronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaining
+       rows by calling <function>PQskipResult</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+     <term><parameter>res</parameter></term>
+     <listitem>
+       <para>
+         A pointer to the <type>PGresult</type> object.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>columns</parameter></term>
+     <listitem>
+       <para>
+         Column values of the row to process.  Column values
+         are located in network buffer, the processor must
+         copy them out from there.
+       </para>
+       <para>
+         Column values are not null-terminated, so processor cannot
+         use C string functions on them directly.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>param</parameter></term>
+     <listitem>
+       <para>
+         Extra parameter that was given to <function>PQsetRowProcessor</function>.
+       </para>
+     </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipresult">
+    <term>
+     <function>PQskipResult</function>
+     <indexterm>
+      <primary>PQskipResult</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+        Discard all the remaining row data
+        after <function>PQexec</function>
+        or <function>PQgetResult</function> exits by the exception raised
+        in <type>RowProcessor</type> without closing connection.
+<synopsis>
+void PQskipResult(PGconn *conn, int skipAll)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object.
+         </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><parameter>skipAll</parameter></term>
+       <listitem>
+         <para>
+           Skip remaining rows in current result
+           if <parameter>skipAll</parameter> is false(0). Skip
+           remaining rows in current result and all rows in
+           succeeding results if true(non-zero).
+         </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqresultseterrmsg">
+    <term>
+     <function>PQresultSetErrMsg</function>
+     <indexterm>
+      <primary>PQresultSetErrMsg</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+    Set the message for the error occurred
+    in <type>PQrowProcessor</type>.  If this message is not set, the
+    caller assumes the error to be `unknown' error.
+<synopsis>
+void PQresultSetErrMsg(PGresult *res, const char *msg)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+      <varlistentry>
+        <term><parameter>res</parameter></term>
+        <listitem>
+          <para>
+        A pointer to the <type>PGresult</type> object
+        passed to <type>PQrowProcessor</type>.
+          </para>
+        </listitem>
+      </varlistentry>
+      <varlistentry>
+        <term><parameter>msg</parameter></term>
+        <listitem>
+          <para>
+        Error message. This will be copied internally so there is
+        no need to care of the scope.
+          </para>
+          <para>
+        If <parameter>res</parameter> already has a message previously
+        set, it will be overwritten. Set NULL to cancel the the custom
+        message.
+          </para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprcessor">
+    <term>
+     <function>PQgetRowProcessor</function>
+     <indexterm>
+      <primary>PQgetRowProcessor</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+       Get row processor and its context parameter currently set to
+       the connection.
+<synopsis>
+PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object.
+         </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><parameter>param</parameter></term>
+       <listitem>
+         <para>
+              Set the current row processor parameter of the
+              connection here if not NULL.
+         </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ </sect1>
+
+ <sect1 id="libpq-build">  <title>Building <application>libpq</application> Programs</title>
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..8bf0759 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn    bool        newXactForCursor;        /* Opened a transaction for a
cursor*/} remoteConn;
 
+typedef struct storeInfo
+{
+    Tuplestorestate *tuplestore;
+    int nattrs;
+    MemoryContext oldcontext;
+    AttInMetadata *attinmeta;
+    char** valbuf;
+    int *valbuflen;
+    char **cstrs;
+    bool error_occurred;
+    bool nummismatch;
+} storeInfo;
+/* * Internal declarations */static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);static remoteConn *getConnectionByName(const
char*name);static HTAB *createConnHash(void);static void createNewConnection(const char *name, remoteConn *rconn);
 
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);static void validate_pkattnums(Relation rel,
          int2vector *pkattnums_arg, int32 pknumatts_arg,                   int **pkattnums, int *pknumatts);
 
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+/* Global */static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)    char       *curname = NULL;    int            howmany = 0;
bool       fail = true;    /* default to backward compatible */
 
+    storeInfo   storeinfo;    DBLINK_INIT;
@@ -559,15 +576,51 @@ dblink_fetch(PG_FUNCTION_ARGS)    appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
/*
 
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+    /*     * Try to execute the query.  Note that since libpq uses malloc, the     * PGresult will be long-lived even
thoughwe are still in a short-lived     * memory context.     */
 
-    res = PQexec(conn, buf.data);
+    PG_TRY();
+    {
+        res = PQexec(conn, buf.data);
+    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
+
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
+
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, FALSE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+    if (!res ||        (PQresultStatus(res) != PGRES_COMMAND_OK &&         PQresultStatus(res) != PGRES_TUPLES_OK))
{
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }
+        dblink_res_error(conname, res, "could not fetch from cursor", fail);        return (Datum) 0;    }
@@ -579,8 +632,8 @@ dblink_fetch(PG_FUNCTION_ARGS)                (errcode(ERRCODE_INVALID_CURSOR_NAME),
errmsg("cursor \"%s\" does not exist", curname)));    }
 
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
@@ -640,6 +693,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    remoteConn *rconn = NULL;    bool
      fail = true;    /* default to backward compatible */    bool        freeconn = false;
 
+    storeInfo   storeinfo;    /* check to see if caller supports us returning a tuplestore */    if (rsinfo == NULL ||
!IsA(rsinfo,ReturnSetInfo))
 
@@ -660,6 +714,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)        {            /*
text,text,bool*/            DBLINK_GET_CONN;
 
+            conname = text_to_cstring(PG_GETARG_TEXT_PP(0));            sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
        fail = PG_GETARG_BOOL(2);        }
 
@@ -675,6 +730,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)            else            {
      DBLINK_GET_CONN;
 
+                conname = text_to_cstring(PG_GETARG_TEXT_PP(0));                sql =
text_to_cstring(PG_GETARG_TEXT_PP(1));           }        }
 
@@ -705,6 +761,8 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)        else            /* shouldn't
happen*/            elog(ERROR, "wrong number of arguments");
 
+
+        conname = text_to_cstring(PG_GETARG_TEXT_PP(0));    }    if (!conn)
@@ -715,164 +773,257 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    rsinfo->setResult = NULL;
rsinfo->setDesc= NULL;
 
+
+    /*
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec/PQgetResult below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+    /* synchronous query, or async result retrieval */
-    if (!is_async)
-        res = PQexec(conn, sql);
-    else
+    PG_TRY();    {
-        res = PQgetResult(conn);
-        /* NULL means we're all done with the async results */
-        if (!res)
-            return (Datum) 0;
+        if (!is_async)
+            res = PQexec(conn, sql);
+        else
+            res = PQgetResult(conn);    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
-    /* if needed, close the connection to the database and cleanup */
-    if (freeconn)
-        PQfinish(conn);
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
-    if (!res ||
-        (PQresultStatus(res) != PGRES_COMMAND_OK &&
-         PQresultStatus(res) != PGRES_TUPLES_OK))
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, FALSE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+
+    /* NULL res from async get means we're all done with the results */
+    if (res || !is_async)    {
-        dblink_res_error(conname, res, "could not execute query", fail);
-        return (Datum) 0;
+        if (freeconn)
+            PQfinish(conn);
+
+        if (!res ||
+            (PQresultStatus(res) != PGRES_COMMAND_OK &&
+             PQresultStatus(res) != PGRES_TUPLES_OK))
+        {
+            /* finishStoreInfo saves the fields referred to below. */
+            if (storeinfo.nummismatch)
+            {
+                /* This is only for backward compatibility */
+                ereport(ERROR,
+                        (errcode(ERRCODE_DATATYPE_MISMATCH),
+                         errmsg("remote query result rowtype does not match "
+                                "the specified FROM clause rowtype")));
+            }
+
+            dblink_res_error(conname, res, "could not execute query", fail);
+            return (Datum) 0;
+        }    }
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo){    ReturnSetInfo *rsinfo = (ReturnSetInfo *)
fcinfo->resultinfo;
+    TupleDesc    tupdesc;
+    int i;
+
+    switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+    {
+        case TYPEFUNC_COMPOSITE:
+            /* success */
+            break;
+        case TYPEFUNC_RECORD:
+            /* failed to determine actual type of RECORD */
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+            break;
+        default:
+            /* result type isn't composite */
+            elog(ERROR, "return type must be a row type");
+            break;
+    }
-    Assert(rsinfo->returnMode == SFRM_Materialize);
+    sinfo->oldcontext = MemoryContextSwitchTo(
+        rsinfo->econtext->ecxt_per_query_memory);
-    PG_TRY();
+    /* make sure we have a persistent copy of the tupdesc */
+    tupdesc = CreateTupleDescCopy(tupdesc);
+
+    sinfo->error_occurred = FALSE;
+    sinfo->nummismatch = FALSE;
+    sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+    sinfo->nattrs = tupdesc->natts;
+    sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+    sinfo->valbuf = NULL;
+    sinfo->valbuflen = NULL;
+
+    /* Preallocate memory of same size with c string array for values. */
+    sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+    if (sinfo->valbuf)
+        sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+    if (sinfo->valbuflen)
+        sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
+
+    if (sinfo->cstrs == NULL)    {
-        TupleDesc    tupdesc;
-        bool        is_sql_cmd = false;
-        int            ntuples;
-        int            nfields;
+        if (sinfo->valbuf)
+            free(sinfo->valbuf);
+        if (sinfo->valbuflen)
+            free(sinfo->valbuflen);
-        if (PQresultStatus(res) == PGRES_COMMAND_OK)
-        {
-            is_sql_cmd = true;
+        ereport(ERROR,
+                (errcode(ERRCODE_OUT_OF_MEMORY),
+                 errmsg("out of memory")));
+    }
-            /*
-             * need a tuple descriptor representing one TEXT column to return
-             * the command status string as our result tuple
-             */
-            tupdesc = CreateTemplateTupleDesc(1, false);
-            TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-                               TEXTOID, -1, 0);
-            ntuples = 1;
-            nfields = 1;
-        }
-        else
-        {
-            Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+    for (i = 0 ; i < sinfo->nattrs ; i++)
+    {
+        sinfo->valbuf[i] = NULL;
+        sinfo->valbuflen[i] = -1;
+    }
-            is_sql_cmd = false;
+    rsinfo->setResult = sinfo->tuplestore;
+    rsinfo->setDesc = tupdesc;
+}
-            /* get a tuple descriptor for our result type */
-            switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-            {
-                case TYPEFUNC_COMPOSITE:
-                    /* success */
-                    break;
-                case TYPEFUNC_RECORD:
-                    /* failed to determine actual type of RECORD */
-                    ereport(ERROR,
-                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                        errmsg("function returning record called in context "
-                               "that cannot accept type record")));
-                    break;
-                default:
-                    /* result type isn't composite */
-                    elog(ERROR, "return type must be a row type");
-                    break;
-            }
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+    int i;
-            /* make sure we have a persistent copy of the tupdesc */
-            tupdesc = CreateTupleDescCopy(tupdesc);
-            ntuples = PQntuples(res);
-            nfields = PQnfields(res);
+    if (sinfo->valbuf)
+    {
+        for (i = 0 ; i < sinfo->nattrs ; i++)
+        {
+            if (sinfo->valbuf[i])
+                free(sinfo->valbuf[i]);        }
+        free(sinfo->valbuf);
+        sinfo->valbuf = NULL;
+    }
-        /*
-         * check result and tuple descriptor have the same number of columns
-         */
-        if (nfields != tupdesc->natts)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("remote query result rowtype does not match "
-                            "the specified FROM clause rowtype")));
+    if (sinfo->valbuflen)
+    {
+        free(sinfo->valbuflen);
+        sinfo->valbuflen = NULL;
+    }
-        if (ntuples > 0)
-        {
-            AttInMetadata *attinmeta;
-            Tuplestorestate *tupstore;
-            MemoryContext oldcontext;
-            int            row;
-            char      **values;
-
-            attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-            oldcontext = MemoryContextSwitchTo(
-                                    rsinfo->econtext->ecxt_per_query_memory);
-            tupstore = tuplestore_begin_heap(true, false, work_mem);
-            rsinfo->setResult = tupstore;
-            rsinfo->setDesc = tupdesc;
-            MemoryContextSwitchTo(oldcontext);
+    if (sinfo->cstrs)
+    {
+        free(sinfo->cstrs);
+        sinfo->cstrs = NULL;
+    }
-            values = (char **) palloc(nfields * sizeof(char *));
+    MemoryContextSwitchTo(sinfo->oldcontext);
+}
-            /* put all tuples into the tuplestore */
-            for (row = 0; row < ntuples; row++)
-            {
-                HeapTuple    tuple;
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+    storeInfo *sinfo = (storeInfo *)param;
+    HeapTuple  tuple;
+    int        fields = PQnfields(res);
+    int        i;
+    char      **cstrs = sinfo->cstrs;
-                if (!is_sql_cmd)
-                {
-                    int            i;
+    if (sinfo->error_occurred)
+    {
+        PQresultSetErrMsg(res, "storeHandler is called after error\n");
+        return FALSE;
+    }
-                    for (i = 0; i < nfields; i++)
-                    {
-                        if (PQgetisnull(res, row, i))
-                            values[i] = NULL;
-                        else
-                            values[i] = PQgetvalue(res, row, i);
-                    }
-                }
-                else
-                {
-                    values[0] = PQcmdStatus(res);
-                }
+    if (sinfo->nattrs != fields)
+    {
+        sinfo->error_occurred = TRUE;
+        sinfo->nummismatch = TRUE;
+        finishStoreInfo(sinfo);
+
+        /* This error will be processed in
+         * dblink_record_internal(). So do not set error message
+         * here. */
+        
+        PQresultSetErrMsg(res, "unexpected field count in \"D\" message\n");
+        return FALSE;
+    }
-                /* build the tuple and put it into the tuplestore. */
-                tuple = BuildTupleFromCStrings(attinmeta, values);
-                tuplestore_puttuple(tupstore, tuple);
+    /*
+     * value input functions assumes that the input string is
+     * terminated by zero. We should make the values to be so.
+     */
+    for(i = 0 ; i < fields ; i++)
+    {
+        int len = columns[i].len;
+        if (len < 0)
+            cstrs[i] = NULL;
+        else
+        {
+            char *tmp = sinfo->valbuf[i];
+            int tmplen = sinfo->valbuflen[i];
+
+            /*
+             * Divide calls to malloc and realloc so that things will
+             * go fine even on the systems of which realloc() does not
+             * accept NULL as old memory block.
+             *
+             * Also try to (re)allocate in bigger steps to
+             * avoid flood of allocations on weird data.
+             */
+            if (tmp == NULL)
+            {
+                tmplen = len + 1;
+                if (tmplen < 64)
+                    tmplen = 64;
+                tmp = (char *)malloc(tmplen);
+            }
+            else if (tmplen < len + 1)
+            {
+                if (len + 1 > tmplen * 2)
+                    tmplen = len + 1;
+                else
+                    tmplen = tmplen * 2;
+                tmp = (char *)realloc(tmp, tmplen);            }
-            /* clean up and return the tuplestore */
-            tuplestore_donestoring(tupstore);
-        }
+            /*
+             * sinfo->valbuf[n] will be freed in finishStoreInfo()
+             * when realloc returns NULL.
+             */
+            if (tmp == NULL)
+                return FALSE;  /* Inform out of memory to the caller */
-        PQclear(res);
-    }
-    PG_CATCH();
-    {
-        /* be sure to release the libpq result */
-        PQclear(res);
-        PG_RE_THROW();
+            sinfo->valbuf[i] = tmp;
+            sinfo->valbuflen[i] = tmplen;
+
+            cstrs[i] = sinfo->valbuf[i];
+            memcpy(cstrs[i], columns[i].value, len);
+            cstrs[i][len] = '\0';
+        }    }
-    PG_END_TRY();
+
+    /*
+     * These functions may throw exception. It will be caught in
+     * dblink_record_internal()
+     */
+    tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+    tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+    return TRUE;}/*
diff --git b/doc/src/sgml/libpq.sgml a/doc/src/sgml/libpq.sgml
index 1245e85..5ef89e7 100644
--- b/doc/src/sgml/libpq.sgml
+++ a/doc/src/sgml/libpq.sgml
@@ -7353,7 +7353,16 @@ typedef struct       releases the PGresult under construction. If you set any       message with
<function>PQresultSetErrMsg</function>,it is set       as the PGconn's error message and the PGresult will be
 
-       preserved.
+       preserved.  When non-blocking API is in use, it can also return
+       2 for early exit from <function>PQisBusy</function> function.
+       The supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.     </para>     <para>
diff --git b/src/interfaces/libpq/fe-protocol2.c a/src/interfaces/libpq/fe-protocol2.c
index 6555f85..36773cb 100644
--- b/src/interfaces/libpq/fe-protocol2.c
+++ a/src/interfaces/libpq/fe-protocol2.c
@@ -828,6 +828,9 @@ getAnotherTuple(PGconn *conn, bool binary)    rp= conn->rowProcessor(result, rowbuf,
conn->rowProcessorParam);   if (rp == 1)        return 0;
 
+    else if (rp == 2 && pqIsnonblocking(conn))
+        /* processor requested early exit */
+        return EOF;    else if (rp == 0)    {        errmsg = result->errMsg;
diff --git b/src/interfaces/libpq/fe-protocol3.c a/src/interfaces/libpq/fe-protocol3.c
index 3725de2..2693ce0 100644
--- b/src/interfaces/libpq/fe-protocol3.c
+++ a/src/interfaces/libpq/fe-protocol3.c
@@ -698,6 +698,11 @@ getAnotherTuple(PGconn *conn, int msgLength)        /* everything is good */        return 0;
}
+    if (rp == 2 && pqIsnonblocking(conn))
+    {
+        /* processor requested early exit */
+        return EOF;
+    }    /* there was some problem */    if (rp == 0)

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

01 March 2012, 12:14:56

On Tue, Feb 28, 2012 at 05:04:44PM +0900, Kyotaro HORIGUCHI wrote:
> > There is still one EOF in v3 getAnotherTuple() -
> > pqGetInt(tupnfields), please turn that one also to
> > protocolerror.
> 
> pqGetInt() returns EOF only when it wants additional reading from
> network if the parameter `bytes' is appropreate. Non-zero return
> from it seems should be handled as EOF, not a protocol error. The
> one point I had modified bugilly is also restored. The so-called
> 'protocol error' has been vanished eventually.

But it's broken in V3 protocol - getAnotherTuple() will be called
only if the packet is fully read.  If the packet contents do not
agree with packet header, it's protocol error.  Only valid EOF
return in V3 getAnotherTuple() is when row processor asks
for early exit.

> Is there someting left undone?

* Convert old EOFs to protocol errors in V3 getAnotherTuple()

* V2 getAnotherTuple() can leak PGresult when handling custom error from row processor.

* remove pqIsnonblocking(conn) check when row processor returned 2. I missed that it's valid to call
PQisBusy/PQconsumeInput/PQgetResulton sync connection.

* It seems the return codes from callback should be remapped, (0, 1, 2) is unusual pattern.  Better would be:
  -1 - error   0 - stop parsing / early exit ("I'm not done yet")   1 - OK ("I'm done with the row")

* Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg(). Main problem is that it needs to be synced with error
handlingin rest of libpq, which is unlike the rest of row processor patch, which consists only of local changes.  All
solutionshere are either ugly hacks or too complex to be part of this patch.

Also considering that we have working exceptions and PQgetRow,
I don't see much need for custom error messages.  If really needed,
it should be introduced as separate patch, as the area of code it
affects is completely different.

Currently the custom error messaging seems to be the blocker for
this patch, because of raised complexity when implementing it and
when reviewing it.  Considering how unimportant the provided
functionality is, compared to rest of the patch, I think we should
simply drop it.

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then.  This allows internal pqAddRow
to set regular "out of memory" error.  Otherwise give generic
"row processor error".

> By the way, I noticed that dblink always says that the current
> connection is 'unnamed' in messages the errors in
> dblink_record_internal@dblink.  I could see that
> dblink_record_internal defines the local variable conname = NULL
> and pass it to dblink_res_error to display the error message. But
> no assignment on it in the function.
> 
> It seemed properly shown when I added the code to set conname
> from PG_GETARG_TEXT_PP(0) if available, in other words do that
> just after DBLINK_GET_CONN/DBLINK_GET_NAMED_CONN's. It seems the
> dblink's manner...  This is not included in this patch.
> 
> Furthurmore dblink_res_error looks only into returned PGresult to
> display the error and always says only `Error occurred on dblink
> connection..: could not execute query'..
> 
> Is it right to consider this as follows?
> 
>  - dblink is wrong in error handling. A client of libpq should
>    see PGconn by PQerrorMessage() if (or regardless of whether?)
>    PGresult says nothing about error.

Yes, it seems like bug.

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

05 March 2012, 22:14:11

Hello, I'm sorry for the abesnce.

> But it's broken in V3 protocol - getAnotherTuple() will be called
> only if the packet is fully read.  If the packet contents do not
> agree with packet header, it's protocol error.  Only valid EOF
> return in V3 getAnotherTuple() is when row processor asks
> for early exit.
Original code of getAnotherTuple returns EOF when the bytes to
be read is not fully loaded. I understand that this was
inappropriately (redundant checks?) written at least for the
pqGetInt() for the field length in getAnotherTuple.  But I don't
understand how to secure the rows (or table data) fully loaded at
the point of getAnotherTuple called...

Nevertheles the first pgGetInt() can return EOF when the
previsous row is fully loaded but the next row is not loaded so
the EOF-rerurn seems necessary even if the each row will passed
after fully loaded.

> * Convert old EOFs to protocol errors in V3 getAnotherTuple()

Ok. I will do that.

> * V2 getAnotherTuple() can leak PGresult when handling custom
>   error from row processor.

mmm. I will confirm it.

> * remove pqIsnonblocking(conn) check when row processor returned 2.
>   I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
>   on sync connection.

mmm. EOF from getAnotherTuple makes PQgetResult try furthur
reading until asyncStatus != PGASYNC_BUSY as far as I saw. And It
seemed to do so when I tried to remove 'return 2'. I think that
it is needed at least one additional state for asyncStatus to
work EOF as desied here.


> * It seems the return codes from callback should be remapped,
>   (0, 1, 2) is unusual pattern.  Better would be:
> 
>    -1 - error
>     0 - stop parsing / early exit ("I'm not done yet")
>     1 - OK ("I'm done with the row")

I almost agree with it. I will consider the suggestion related to
pqAddRow together.


> * Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg().
>   Main problem is that it needs to be synced with error handling
>   in rest of libpq, which is unlike the rest of row processor patch,
>   which consists only of local changes.  All solutions here
>   are either ugly hacks or too complex to be part of this patch.

Ok, I will take your advice. 

> Also considering that we have working exceptions and PQgetRow,
> I don't see much need for custom error messages.  If really needed,
> it should be introduced as separate patch, as the area of code it
> affects is completely different.

I agree with it.

> Currently the custom error messaging seems to be the blocker for
> this patch, because of raised complexity when implementing it and
> when reviewing it.  Considering how unimportant the provided
> functionality is, compared to rest of the patch, I think we should
> simply drop it.

Ok.

> My suggestion - check in getAnotherTuple whether resultStatus is
> already error and do nothing then.  This allows internal pqAddRow
> to set regular "out of memory" error.  Otherwise give generic
> "row processor error".

regards, 

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

06 March 2012, 04:13:54

On Tue, Mar 06, 2012 at 11:13:45AM +0900, Kyotaro HORIGUCHI wrote:
> > But it's broken in V3 protocol - getAnotherTuple() will be called
> > only if the packet is fully read.  If the packet contents do not
> > agree with packet header, it's protocol error.  Only valid EOF
> > return in V3 getAnotherTuple() is when row processor asks
> > for early exit.
> 
>  Original code of getAnotherTuple returns EOF when the bytes to
> be read is not fully loaded. I understand that this was
> inappropriately (redundant checks?) written at least for the
> pqGetInt() for the field length in getAnotherTuple.  But I don't
> understand how to secure the rows (or table data) fully loaded at
> the point of getAnotherTuple called...

Look into pqParseInput3():
if (avail < msgLength){    ...    return;}


> > * remove pqIsnonblocking(conn) check when row processor returned 2.
> >   I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
> >   on sync connection.
> 
> mmm. EOF from getAnotherTuple makes PQgetResult try furthur
> reading until asyncStatus != PGASYNC_BUSY as far as I saw. And It
> seemed to do so when I tried to remove 'return 2'. I think that
> it is needed at least one additional state for asyncStatus to
> work EOF as desied here.

No.  It's valid to do PQisBusy() + PQconsumeInput() loop until
PQisBusy() returns 0.  Otherwise, yes, PQgetResult() will
block until final result is available.  But thats OK.

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

06 March 2012, 20:44:13

Hello, 

> But I don't understand how to secure the rows (or table data)
> fully loaded at the point of getAnotherTuple called...

I found how pqParseInput ensures the entire message is loaded
before getAnotherTuple called.

fe-protocol3.c:107
|   avail = conn->inEnd - conn->inCursor;
|   if (avail < msgLength)
|   {
|     if (pqCheckInBufferSpace(conn->inCursor + (size_t)msgLength, conn))

So now I convinced that the whole row data is loaded at the point
that getAnotherTuple is called. I agree that getAnotherTuple
should not return EOF to request for unloaded part of the
message.

Please wait for a while for the new patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

06 March 2012, 21:04:55

Hello,

Nevertheless, the problem that exit from parseInput() caused by
non-zero return of getAnotherTuple() results in immediate
re-enter into getAnotherTuple() via parseInput() and no other
side effect is still there. But I will do that in the next patch,
though.

> So now I convinced that the whole row data is loaded at the point         ^am
> that getAnotherTuple is called. I agree that getAnotherTuple
> should not return EOF to request for unloaded part of the
> message.
> 
> Please wait for a while for the new patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

07 March 2012, 02:15:21

Hello, this is new version of the patch.

# early_exit.diff is not included for this time and maybe also
# later.  The set of the return values of PQrowProcessor looks
# unnatural if the 0 is removed.

> * Convert old EOFs to protocol errors in V3 getAnotherTuple()

done.

> * V2 getAnotherTuple() can leak PGresult when handling custom
>   error from row processor.

Custom error message is removed from both V2 and V3.

> * remove pqIsnonblocking(conn) check when row processor returned 2.
>   I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
>   on sync connection.

done. This affects PQisBusy, but PQgetResult won't be affected as
far as I see. I have no idea for PQconsumeInput()..

> * It seems the return codes from callback should be remapped,
>   (0, 1, 2) is unusual pattern.  Better would be:
> 
>    -1 - error
>     0 - stop parsing / early exit ("I'm not done yet")
>     1 - OK ("I'm done with the row")

done.

This might be described more precisely as follows,

|    -1 - error - erase result and change result status to
|               - FATAL_ERROR All the rest rows in current result
|               - will skipped(consumed).
|     0 - stop parsing / early exit ("I'm not done yet")
|                - getAnotherTuple returns EOF without dropping PGresult.
#                - We expect PQisBusy(), PQconsumeInput()(?) and 
#                - PQgetResult() to exit immediately and we can
#                - call PQgetResult(), PQskipResult() or
#                - PQisBusy() after.
|     1 - OK ("I'm done with the row")
|                - save result and getAnotherTuple returns 0.

The lines prefixed with '#' is the desirable behavior I have
understood from the discussion so far. And I doubt that it works
as we expected for PQgetResult().

> * Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg().

done.

> My suggestion - check in getAnotherTuple whether resultStatus is
> already error and do nothing then.  This allows internal pqAddRow
> to set regular "out of memory" error.  Otherwise give generic
> "row processor error".

Current implement seems already doing this in
parseInput3(). Could you give me further explanation?

regards, 

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..a6418ec 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,6 @@ PQconnectStartParams      157PQping                    158PQpingParams              159PQlibVersion
            160
 
+PQsetRowProcessor            161
+PQgetRowProcessor            162
+PQskipResult                163
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)    conn->wait_ssl_try = false;#endif
+    /* set default row processor */
+    PQsetRowProcessor(conn, NULL, NULL);
+    /*     * We try to send at least 8K at a time, which is the usual size of pipe     * buffers on Unix systems.
Thatway, when we are sending a large amount
 
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)    initPQExpBuffer(&conn->errorMessage);
initPQExpBuffer(&conn->workBuffer);
+    /* set up initial row buffer */
+    conn->rowBufLen = 32;
+    conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+    if (conn->inBuffer == NULL ||        conn->outBuffer == NULL ||
+        conn->rowBuf == NULL ||        PQExpBufferBroken(&conn->errorMessage) ||
PQExpBufferBroken(&conn->workBuffer))   {
 
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)        free(conn->inBuffer);    if (conn->outBuffer)
free(conn->outBuffer);
+    if (conn->rowBuf)
+        free(conn->rowBuf);    termPQExpBuffer(&conn->errorMessage);    termPQExpBuffer(&conn->workBuffer);
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)    return prev;}
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..161d210 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);static int PQsendDescribe(PGconn *conn, char desc_type,
           const char *desc_target);static int    check_field_number(const PGresult *res, int field_num);
 
+static int    pqAddRow(PGresult *res, PGrowValue *columns, void *param);/* ----------------
@@ -701,7 +702,6 @@ pqClearAsyncResult(PGconn *conn)    if (conn->result)        PQclear(conn->result);    conn->result
=NULL;
 
-    conn->curTuple = NULL;}/*
@@ -756,7 +756,6 @@ pqPrepareAsyncResult(PGconn *conn)     */    res = conn->result;    conn->result = NULL;        /*
handingover ownership to caller */
 
-    conn->curTuple = NULL;        /* just in case */    if (!res)        res = PQmakeEmptyPGresult(conn,
PGRES_FATAL_ERROR);   else
 
@@ -828,6 +827,71 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)}/*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+    conn->rowProcessor = (func ? func : pqAddRow);
+    conn->rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+    if (param)
+        *param = conn->rowProcessorParam;
+
+    return conn->rowProcessor;
+}
+
+/*
+ * pqAddRow
+ *      add a row to the PGresult structure, growing it if necessary
+ *      Returns 1 if OK, -1 if error occurred.
+ */
+static int
+pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+{
+    PGresAttValue *tup;
+    int            nfields = res->numAttributes;
+    int            i;
+
+    tup = (PGresAttValue *)
+        pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+    if (tup == NULL)
+        return -1;
+
+    for (i = 0 ; i < nfields ; i++)
+    {
+        tup[i].len = columns[i].len;
+        if (tup[i].len == NULL_LEN)
+        {
+            tup[i].value = res->null_field;
+        }
+        else
+        {
+            bool isbinary = (res->attDescs[i].format != 0);
+            tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+            if (tup[i].value == NULL)
+                return -1;
+
+            memcpy(tup[i].value, columns[i].value, tup[i].len);
+            /* We have to terminate this ourselves */
+            tup[i].value[tup[i].len] = '\0';
+        }
+    }
+
+    return (pqAddTuple(res, tup) ? 1 : -1);
+}
+
+/* * pqAddTuple *      add a row pointer to the PGresult structure, growing it if necessary *      Returns TRUE if OK,
FALSEif not enough memory to add the row
 
@@ -1223,7 +1287,6 @@ PQsendQueryStart(PGconn *conn)    /* initialize async result-accumulation state */
conn->result= NULL;
 
-    conn->curTuple = NULL;    /* ready to send command message */    return true;
@@ -1831,6 +1894,55 @@ PQexecFinish(PGconn *conn)    return lastResult;}
+
+/*
+ * Do-nothing row processor for PQskipResult
+ */
+static int
+dummyRowProcessor(PGresult *res, PGrowValue *columns, void *param)
+{
+    return 1;
+}
+
+/*
+ * Exaust remaining Data Rows in curret conn.
+ * 
+ * Exaust current result if skipAll is false and all succeeding results if
+ * true.
+ */
+int
+PQskipResult(PGconn *conn, int skipAll)
+{
+    PQrowProcessor savedRowProcessor;
+    void * savedRowProcParam;
+    PGresult *res;
+    int ret = 0;
+
+    /* save the current row processor settings and set dummy processor */
+    savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+    PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+    
+    /*
+     * Throw away the remaining rows in current result, or all succeeding
+     * results if skipAll is not FALSE.
+     */
+    if (skipAll)
+    {
+        while ((res = PQgetResult(conn)) != NULL)
+            PQclear(res);
+    }
+    else if ((res = PQgetResult(conn)) != NULL)
+    {
+        PQclear(res);
+        ret = 1;
+    }
+    
+    PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+
+    return ret;
+}
+
+/* * PQdescribePrepared *      Obtain information about a previously prepared statement
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)}/*
+ * pqGetnchar:
+ *    skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+    if (len > (size_t) (conn->inEnd - conn->inCursor))
+        return EOF;
+
+    conn->inCursor += len;
+
+    if (conn->Pfdebug)
+        fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+                (unsigned long) len);
+
+    return 0;
+}
+
+/* * pqPutnchar: *    write exactly len bytes to the current message */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..7fcb10f 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, FALSE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, TRUE))                            return;
 
+                        /* getAnotherTuple moves inStart itself */
+                        continue;                    }                    else                    {
@@ -703,19 +707,18 @@ failure:/* * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor. * Returns: 0 if completed message, EOF if error
ornot enough data yet. * * Note that if we run out of data, we have to suspend and reprocess
 
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, bool binary){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    /* the backend sends us a bitmap of which attributes are null */    char
std_bitmap[64];/* used unless it doesn't fit */
 
@@ -726,29 +729,32 @@ getAnotherTuple(PGconn *conn, bool binary)    int            bitmap_index;    /* Its index */
int           bitcnt;            /* number of bits examined in current byte */    int            vlen;            /*
lengthof the current field value */
 
+    char        *errmsg = "unknown error\n";
-    result->binary = binary;
-
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
+    /* resize row buffer if needed */
+    if (nfields > conn->rowBufLen)    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-        /*
-         * If it's binary, fix the column format indicators.  We assume the
-         * backend will consistently send either B or D, not a mix.
-         */
-        if (binary)
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)        {
-            for (i = 0; i < nfields; i++)
-                result->attDescs[i].format = 1;
+            errmsg = "out of memory for query result\n";
+            goto error_clearresult;        }
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;
+    }
+    else
+    {
+        rowbuf = conn->rowBuf;
+    }
+
+    result->binary = binary;
+
+    if (binary)
+    {
+        for (i = 0; i < nfields; i++)
+            result->attDescs[i].format = 1;    }
-    tup = conn->curTuple;    /* Get the null-value bitmap */    nbytes = (nfields + BITS_PER_BYTE - 1) /
BITS_PER_BYTE;
@@ -757,11 +763,15 @@ getAnotherTuple(PGconn *conn, bool binary)    {        bitmap = (char *) malloc(nbytes);
if(!bitmap)
 
-            goto outOfMemory;
+        {
+            errmsg = "out of memory for query result\n";
+            goto error_clearresult;
+        }    }    if (pqGetnchar(bitmap, nbytes, conn))
-        goto EOFexit;
+        goto error_clearresult;
+    /* Scan the fields */    bitmap_index = 0;
@@ -771,34 +781,29 @@ getAnotherTuple(PGconn *conn, bool binary)    for (i = 0; i < nfields; i++)    {        if
(!(bmap& 0200))
 
-        {
-            /* if the field value is absent, make it a null string */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-        }
+            vlen = NULL_LEN;
+        else if (pqGetInt(&vlen, 4, conn))
+                goto EOFexit;        else        {
-            /* get the value length (the first four bytes are for length) */
-            if (pqGetInt(&vlen, 4, conn))
-                goto EOFexit;            if (!binary)                vlen = vlen - 4;            if (vlen < 0)
      vlen = 0;
 
-            if (tup[i].value == NULL)
-            {
-                tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-                if (tup[i].value == NULL)
-                    goto outOfMemory;
-            }
-            tup[i].len = vlen;
-            /* read in the value */
-            if (vlen > 0)
-                if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                    goto EOFexit;
-            /* we have to terminate this ourselves */
-            tup[i].value[vlen] = '\0';        }
+
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip the value */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+            goto EOFexit;
+        /* advance the bitmap stuff */        bitcnt++;        if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +816,51 @@ getAnotherTuple(PGconn *conn, bool binary)            bmap <<= 1;    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
-    if (bitmap != std_bitmap)        free(bitmap);
-    return 0;
+    bitmap = NULL;
+
+    /* tag the row as parsed */
+    conn->inStart = conn->inCursor;
+
+    /* Pass the completed row values to rowProcessor */
+    switch (conn->rowProcessor(result, rowbuf, conn->rowProcessorParam))
+    {
+        case 1:
+            /* everything is good */
+            return 0;
-outOfMemory:
-    /* Replace partially constructed result with an error result */
+        case 0:
+            /* processor requested early exit */
+            return EOF;
+            
+        case -1:
+            errmsg = "error in row processor";
+            goto error_clearresult;
+        default:
+            /* Illega reurn code */
+            errmsg = "invalid return value from row processor\n";
+            goto error_clearresult;
+    }
+
+error_clearresult:    /*     * we do NOT use pqSaveErrorResult() here, because of the likelihood that     * there's
notenough memory to concatenate messages...     */    pqClearAsyncResult(conn);
 
-    printfPQExpBuffer(&conn->errorMessage,
-                      libpq_gettext("out of memory for query result\n"));
+    printfPQExpBuffer(&conn->errorMessage, "%s", libpq_gettext(errmsg));
+        /*     * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can     * do to recover...     */
conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 
+    conn->asyncStatus = PGASYNC_READY;
+    /* Discard the failed message --- good idea? */    conn->inStart = conn->inEnd;
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..b51b04c 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)                        /* Read another tuple of a normal query response
*/                       if (getAnotherTuple(conn, msgLength))                            return;
 
+
+                        /* getAnotherTuple() moves inStart itself */
+                        continue;                    }                    else if (conn->result != NULL &&
               conn->result->resultStatus == PGRES_FATAL_ERROR)
 
@@ -613,47 +616,49 @@ failure:/* * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
- * Returns: 0 if completed message, EOF if error or not enough data yet.
+ * It fills rowbuf with column pointers and then calls row processor.
+ * Returns: 0 if completed message, 1 if error. * * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received. */static intgetAnotherTuple(PGconn *conn, int msgLength){    PGresult
*result= conn->result;    int            nfields = result->numAttributes;
 
-    PGresAttValue *tup;
+    PGrowValue  *rowbuf;    int            tupnfields;        /* # fields from tuple */    int            vlen;
   /* length of the current field value */    int            i;
 
-
-    /* Allocate tuple space if first time for this data message */
-    if (conn->curTuple == NULL)
-    {
-        conn->curTuple = (PGresAttValue *)
-            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-        if (conn->curTuple == NULL)
-            goto outOfMemory;
-        MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-    }
-    tup = conn->curTuple;
+    char        *errmsg = "unknown error\n";    /* Get the field count and make sure it's what we expect */    if
(pqGetInt(&tupnfields,2, conn))
 
-        return EOF;
+    {
+        /* Whole the message must be loaded on the buffer here */
+        errmsg = "protocol error\n";
+        goto error_saveresult;
+    }    if (tupnfields != nfields)    {
-        /* Replace partially constructed result with an error result */
-        printfPQExpBuffer(&conn->errorMessage,
-                 libpq_gettext("unexpected field count in \"D\" message\n"));
-        pqSaveErrorResult(conn);
-        /* Discard the failed message by pretending we read it */
-        conn->inCursor = conn->inStart + 5 + msgLength;
-        return 0;
+        errmsg = "unexpected field count in \"D\" message\n";
+        goto error_and_forward;
+    }
+
+    /* resize row buffer if needed */
+    rowbuf = conn->rowBuf;
+    if (nfields > conn->rowBufLen)
+    {
+        rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+        if (!rowbuf)
+        {
+            errmsg = "out of memory for query result\n";
+            goto error_and_forward;
+        }
+        conn->rowBuf = rowbuf;
+        conn->rowBufLen = nfields;    }    /* Scan the fields */
@@ -661,54 +666,78 @@ getAnotherTuple(PGconn *conn, int msgLength)    {        /* get the value length */        if
(pqGetInt(&vlen,4, conn))
 
-            return EOF;
-        if (vlen == -1)        {
-            /* null field */
-            tup[i].value = result->null_field;
-            tup[i].len = NULL_LEN;
-            continue;
+            /* Whole the message must be loaded on the buffer here */
+            errmsg = "protocol error\n";
+            goto error_saveresult;        }
-        if (vlen < 0)
+
+        if (vlen == -1)
+            vlen = NULL_LEN;
+        else if (vlen < 0)            vlen = 0;
-        if (tup[i].value == NULL)
-        {
-            bool        isbinary = (result->attDescs[i].format != 0);
-            tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-            if (tup[i].value == NULL)
-                goto outOfMemory;
+        /*
+         * rowbuf[i].value always points to the next address of the
+         * length field even if the value is NULL, to allow safe
+         * size estimates and data copy.
+         */
+        rowbuf[i].value = conn->inBuffer + conn->inCursor;
+        rowbuf[i].len = vlen;
+
+        /* Skip to the next length field */
+        if (vlen > 0 && pqSkipnchar(vlen, conn))
+        {
+            /* Whole the message must be loaded on the buffer here */
+            errmsg = "protocol error\n";
+            goto error_saveresult;        }
-        tup[i].len = vlen;
-        /* read in the value */
-        if (vlen > 0)
-            if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                return EOF;
-        /* we have to terminate this ourselves */
-        tup[i].value[vlen] = '\0';    }
-    /* Success!  Store the completed tuple in the result */
-    if (!pqAddTuple(result, tup))
-        goto outOfMemory;
-    /* and reset for a new message */
-    conn->curTuple = NULL;
+    /* tag the row as parsed, check if correctly */
+    conn->inStart += 5 + msgLength;
+    if (conn->inCursor != conn->inStart)
+    {
+        errmsg = "invalid row contents\n";
+        goto error_clearresult;
+    }
+
+    /* Pass the completed row values to rowProcessor */
+    switch (conn->rowProcessor(result, rowbuf, conn->rowProcessorParam))
+    {
+        case 1:
+            /* everything is good */
+            return 0;
+
+        case 0:
+            /* processor requested early exit - stop parsing without error*/
+            return EOF;
+
+        case -1:
+            errmsg = "error in row processor";
+            goto error_clearresult;
+
+        default:
+            /* Illega reurn code */
+            errmsg = "invalid return value from row processor\n";
+            goto error_clearresult;
+    }
-    return 0;
-outOfMemory:
+error_and_forward:
+    /* Discard the failed message by pretending we read it */
+    conn->inCursor = conn->inStart + 5 + msgLength;
+error_clearresult:
+    pqClearAsyncResult(conn);
+    
+error_saveresult:    /*     * Replace partially constructed result with an error result. First     * discard the old
resultto try to win back some memory.     */
 
-    pqClearAsyncResult(conn);
-    printfPQExpBuffer(&conn->errorMessage,
-                      libpq_gettext("out of memory for query result\n"));
+    printfPQExpBuffer(&conn->errorMessage, "%s", libpq_gettext(errmsg));    pqSaveErrorResult(conn);
-
-    /* Discard the failed message by pretending we read it */
-    conn->inCursor = conn->inStart + 5 + msgLength;    return 0;}
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..e1d3339 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify    struct pgNotify *next;        /* list link */} PGnotify;
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+    int            len;            /* length in bytes of the value */
+    char       *value;            /* actual value, without null termination */
+} PGrowValue;
+/* Function types for notice-handling callbacks */typedef void (*PQnoticeReceiver) (void *arg, const PGresult
*res);typedefvoid (*PQnoticeProcessor) (void *arg, const char *message);
 
@@ -416,6 +427,38 @@ extern PGPing PQping(const char *conninfo);extern PGPing PQpingParams(const char *const *
keywords,            const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one pointing
+ * to particular column data in network buffer.  This function is
+ * supposed to copy data out from there and store somewhere.  NULL is
+ * signified with len<0.
+ *
+ * This function must return 1 for success and -1 for failure and the
+ * caller relreases the current PGresult for the case. Returning 0
+ * instructs libpq to exit immediately from the topmost libpq function
+ * without releasing PGresult under work.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, PGrowValue *columns,
+                              void *param);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+                                   void *rowProcessorParam);
+extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+extern int  PQskipResult(PGconn *conn, int skipAll);
+/* Force the write buffer to be written (or at least try) */extern int    PQflush(PGconn *conn);
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..9cabd20 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -398,7 +398,6 @@ struct pg_conn    /* Status for asynchronous result construction */    PGresult   *result;
 /* result being constructed */
 
-    PGresAttValue *curTuple;    /* tuple currently being read */#ifdef USE_SSL    bool        allow_ssl_try;    /*
Allowedto try SSL negotiation */
 
@@ -443,6 +442,14 @@ struct pg_conn    /* Buffer for receiving various parts of messages */    PQExpBufferData
workBuffer;/* expansible string */
 
+
+    /*
+     * Read column data from network buffer.
+     */
+    PQrowProcessor rowProcessor;/* Function pointer */
+    void *rowProcessorParam;    /* Contextual parameter for rowProcessor */
+    PGrowValue *rowBuf;            /* Buffer for passing values to rowProcessor */
+    int rowBufLen;                /* Number of columns allocated in rowBuf */};/* PGcancel stores all data necessary
tocancel a connection. A copy of this
 
@@ -560,6 +567,7 @@ extern int    pqGets(PQExpBuffer buf, PGconn *conn);extern int    pqGets_append(PQExpBuffer buf,
PGconn*conn);extern int    pqPuts(const char *s, PGconn *conn);extern int    pqGetnchar(char *s, size_t len, PGconn
*conn);
+extern int    pqSkipnchar(size_t len, PGconn *conn);extern int    pqPutnchar(const char *s, size_t len, PGconn
*conn);externint    pqGetInt(int *result, size_t bytes, PGconn *conn);extern int    pqPutInt(int value, size_t bytes,
PGconn*conn); 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..e9233bd 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,281 @@ int PQisthreadsafe(); </sect1>
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PGresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PGresult</type> is passed to user.  This behavior can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object to set the row processor function.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>func</parameter></term>
+       <listitem>
+         <para>
+           Storage handler function to set. NULL means to use the
+           default processor.
+         </para>
+       </listitem>
+     </varlistentry>
+     <varlistentry>
+       <term><parameter>param</parameter></term>
+       <listitem>
+         <para>
+           A pointer to contextual parameter passed
+           to <parameter>func</parameter>.
+         </para>
+       </listitem>
+     </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, PGrowValue *columns, void *param);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and -1 for failure and
+       the caller releases the PGresult under work for the
+       case. It can also return 0 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed
+       via <function>PQfinish</function>.  Processing can also be
+       continued without closing the connection,
+       call <function>getResult</function> on synchronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaining
+       rows by calling <function>PQskipResult</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+     <term><parameter>res</parameter></term>
+     <listitem>
+       <para>
+         A pointer to the <type>PGresult</type> object.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>columns</parameter></term>
+     <listitem>
+       <para>
+         Column values of the row to process.  Column values
+         are located in network buffer, the processor must
+         copy them out from there.
+       </para>
+       <para>
+         Column values are not null-terminated, so processor cannot
+         use C string functions on them directly.
+       </para>
+     </listitem>
+       </varlistentry>
+       <varlistentry>
+
+     <term><parameter>param</parameter></term>
+     <listitem>
+       <para>
+         Extra parameter that was given to <function>PQsetRowProcessor</function>.
+       </para>
+     </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipresult">
+    <term>
+     <function>PQskipResult</function>
+     <indexterm>
+      <primary>PQskipResult</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+        Discard all the remaining row data
+        after <function>PQexec</function>
+        or <function>PQgetResult</function> exits by the exception raised
+        in <type>RowProcessor</type> without closing connection.
+<synopsis>
+void PQskipResult(PGconn *conn, int skipAll)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object.
+         </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><parameter>skipAll</parameter></term>
+       <listitem>
+         <para>
+           Skip remaining rows in current result
+           if <parameter>skipAll</parameter> is false(0). Skip
+           remaining rows in current result and all rows in
+           succeeding results if true(non-zero).
+         </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprcessor">
+    <term>
+     <function>PQgetRowProcessor</function>
+     <indexterm>
+      <primary>PQgetRowProcessor</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+       Get row processor and its context parameter currently set to
+       the connection.
+<synopsis>
+PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param)
+</synopsis>
+      </para>
+      <para>
+    <variablelist>
+     <varlistentry>
+       <term><parameter>conn</parameter></term>
+       <listitem>
+         <para>
+           The connection object.
+         </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><parameter>param</parameter></term>
+       <listitem>
+         <para>
+              Set the current row processor parameter of the
+              connection here if not NULL.
+         </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ </sect1>
+
+ <sect1 id="libpq-build">  <title>Building <application>libpq</application> Programs</title>
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..09d6de8 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn    bool        newXactForCursor;        /* Opened a transaction for a
cursor*/} remoteConn;
 
+typedef struct storeInfo
+{
+    Tuplestorestate *tuplestore;
+    int nattrs;
+    MemoryContext oldcontext;
+    AttInMetadata *attinmeta;
+    char** valbuf;
+    int *valbuflen;
+    char **cstrs;
+    bool error_occurred;
+    bool nummismatch;
+} storeInfo;
+/* * Internal declarations */static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);static remoteConn *getConnectionByName(const
char*name);static HTAB *createConnHash(void);static void createNewConnection(const char *name, remoteConn *rconn);
 
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);static void validate_pkattnums(Relation rel,
          int2vector *pkattnums_arg, int32 pknumatts_arg,                   int **pkattnums, int *pknumatts);
 
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+/* Global */static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)    char       *curname = NULL;    int            howmany = 0;
bool       fail = true;    /* default to backward compatible */
 
+    storeInfo   storeinfo;    DBLINK_INIT;
@@ -559,15 +576,51 @@ dblink_fetch(PG_FUNCTION_ARGS)    appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
/*
 
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+    /*     * Try to execute the query.  Note that since libpq uses malloc, the     * PGresult will be long-lived even
thoughwe are still in a short-lived     * memory context.     */
 
-    res = PQexec(conn, buf.data);
+    PG_TRY();
+    {
+        res = PQexec(conn, buf.data);
+    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
+
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
+
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, FALSE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+    if (!res ||        (PQresultStatus(res) != PGRES_COMMAND_OK &&         PQresultStatus(res) != PGRES_TUPLES_OK))
{
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }
+        dblink_res_error(conname, res, "could not fetch from cursor", fail);        return (Datum) 0;    }
@@ -579,8 +632,8 @@ dblink_fetch(PG_FUNCTION_ARGS)                (errcode(ERRCODE_INVALID_CURSOR_NAME),
errmsg("cursor \"%s\" does not exist", curname)));    }
 
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
@@ -640,6 +693,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    remoteConn *rconn = NULL;    bool
      fail = true;    /* default to backward compatible */    bool        freeconn = false;
 
+    storeInfo   storeinfo;    /* check to see if caller supports us returning a tuplestore */    if (rsinfo == NULL ||
!IsA(rsinfo,ReturnSetInfo))
 
@@ -660,6 +714,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)        {            /*
text,text,bool*/            DBLINK_GET_CONN;
 
+            conname = text_to_cstring(PG_GETARG_TEXT_PP(0));            sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
        fail = PG_GETARG_BOOL(2);        }
 
@@ -675,6 +730,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)            else            {
      DBLINK_GET_CONN;
 
+                conname = text_to_cstring(PG_GETARG_TEXT_PP(0));                sql =
text_to_cstring(PG_GETARG_TEXT_PP(1));           }        }
 
@@ -705,6 +761,8 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)        else            /* shouldn't
happen*/            elog(ERROR, "wrong number of arguments");
 
+
+        conname = text_to_cstring(PG_GETARG_TEXT_PP(0));    }    if (!conn)
@@ -715,164 +773,251 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    rsinfo->setResult = NULL;
rsinfo->setDesc= NULL;
 
+
+    /*
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec/PQgetResult below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+    /* synchronous query, or async result retrieval */
-    if (!is_async)
-        res = PQexec(conn, sql);
-    else
+    PG_TRY();    {
-        res = PQgetResult(conn);
-        /* NULL means we're all done with the async results */
-        if (!res)
-            return (Datum) 0;
+        if (!is_async)
+            res = PQexec(conn, sql);
+        else
+            res = PQgetResult(conn);    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
-    /* if needed, close the connection to the database and cleanup */
-    if (freeconn)
-        PQfinish(conn);
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
-    if (!res ||
-        (PQresultStatus(res) != PGRES_COMMAND_OK &&
-         PQresultStatus(res) != PGRES_TUPLES_OK))
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, FALSE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+
+    /* NULL res from async get means we're all done with the results */
+    if (res || !is_async)    {
-        dblink_res_error(conname, res, "could not execute query", fail);
-        return (Datum) 0;
+        if (freeconn)
+            PQfinish(conn);
+
+        if (!res ||
+            (PQresultStatus(res) != PGRES_COMMAND_OK &&
+             PQresultStatus(res) != PGRES_TUPLES_OK))
+        {
+            /* finishStoreInfo saves the fields referred to below. */
+            if (storeinfo.nummismatch)
+            {
+                /* This is only for backward compatibility */
+                ereport(ERROR,
+                        (errcode(ERRCODE_DATATYPE_MISMATCH),
+                         errmsg("remote query result rowtype does not match "
+                                "the specified FROM clause rowtype")));
+            }
+
+            dblink_res_error(conname, res, "could not execute query", fail);
+            return (Datum) 0;
+        }    }
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo){    ReturnSetInfo *rsinfo = (ReturnSetInfo *)
fcinfo->resultinfo;
+    TupleDesc    tupdesc;
+    int i;
-    Assert(rsinfo->returnMode == SFRM_Materialize);
-
-    PG_TRY();
+    switch (get_call_result_type(fcinfo, NULL, &tupdesc))    {
-        TupleDesc    tupdesc;
-        bool        is_sql_cmd = false;
-        int            ntuples;
-        int            nfields;
+        case TYPEFUNC_COMPOSITE:
+            /* success */
+            break;
+        case TYPEFUNC_RECORD:
+            /* failed to determine actual type of RECORD */
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+            break;
+        default:
+            /* result type isn't composite */
+            elog(ERROR, "return type must be a row type");
+            break;
+    }
-        if (PQresultStatus(res) == PGRES_COMMAND_OK)
-        {
-            is_sql_cmd = true;
+    sinfo->oldcontext = MemoryContextSwitchTo(
+        rsinfo->econtext->ecxt_per_query_memory);
-            /*
-             * need a tuple descriptor representing one TEXT column to return
-             * the command status string as our result tuple
-             */
-            tupdesc = CreateTemplateTupleDesc(1, false);
-            TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-                               TEXTOID, -1, 0);
-            ntuples = 1;
-            nfields = 1;
-        }
-        else
-        {
-            Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+    /* make sure we have a persistent copy of the tupdesc */
+    tupdesc = CreateTupleDescCopy(tupdesc);
-            is_sql_cmd = false;
+    sinfo->error_occurred = FALSE;
+    sinfo->nummismatch = FALSE;
+    sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+    sinfo->nattrs = tupdesc->natts;
+    sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+    sinfo->valbuf = NULL;
+    sinfo->valbuflen = NULL;
-            /* get a tuple descriptor for our result type */
-            switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-            {
-                case TYPEFUNC_COMPOSITE:
-                    /* success */
-                    break;
-                case TYPEFUNC_RECORD:
-                    /* failed to determine actual type of RECORD */
-                    ereport(ERROR,
-                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                        errmsg("function returning record called in context "
-                               "that cannot accept type record")));
-                    break;
-                default:
-                    /* result type isn't composite */
-                    elog(ERROR, "return type must be a row type");
-                    break;
-            }
+    /* Preallocate memory of same size with c string array for values. */
+    sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+    if (sinfo->valbuf)
+        sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+    if (sinfo->valbuflen)
+        sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
-            /* make sure we have a persistent copy of the tupdesc */
-            tupdesc = CreateTupleDescCopy(tupdesc);
-            ntuples = PQntuples(res);
-            nfields = PQnfields(res);
-        }
+    if (sinfo->cstrs == NULL)
+    {
+        if (sinfo->valbuf)
+            free(sinfo->valbuf);
+        if (sinfo->valbuflen)
+            free(sinfo->valbuflen);
-        /*
-         * check result and tuple descriptor have the same number of columns
-         */
-        if (nfields != tupdesc->natts)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("remote query result rowtype does not match "
-                            "the specified FROM clause rowtype")));
+        ereport(ERROR,
+                (errcode(ERRCODE_OUT_OF_MEMORY),
+                 errmsg("out of memory")));
+    }
+
+    for (i = 0 ; i < sinfo->nattrs ; i++)
+    {
+        sinfo->valbuf[i] = NULL;
+        sinfo->valbuflen[i] = -1;
+    }
-        if (ntuples > 0)
+    rsinfo->setResult = sinfo->tuplestore;
+    rsinfo->setDesc = tupdesc;
+}
+
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+    int i;
+
+    if (sinfo->valbuf)
+    {
+        for (i = 0 ; i < sinfo->nattrs ; i++)        {
-            AttInMetadata *attinmeta;
-            Tuplestorestate *tupstore;
-            MemoryContext oldcontext;
-            int            row;
-            char      **values;
-
-            attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-            oldcontext = MemoryContextSwitchTo(
-                                    rsinfo->econtext->ecxt_per_query_memory);
-            tupstore = tuplestore_begin_heap(true, false, work_mem);
-            rsinfo->setResult = tupstore;
-            rsinfo->setDesc = tupdesc;
-            MemoryContextSwitchTo(oldcontext);
+            if (sinfo->valbuf[i])
+                free(sinfo->valbuf[i]);
+        }
+        free(sinfo->valbuf);
+        sinfo->valbuf = NULL;
+    }
-            values = (char **) palloc(nfields * sizeof(char *));
+    if (sinfo->valbuflen)
+    {
+        free(sinfo->valbuflen);
+        sinfo->valbuflen = NULL;
+    }
-            /* put all tuples into the tuplestore */
-            for (row = 0; row < ntuples; row++)
-            {
-                HeapTuple    tuple;
+    if (sinfo->cstrs)
+    {
+        free(sinfo->cstrs);
+        sinfo->cstrs = NULL;
+    }
-                if (!is_sql_cmd)
-                {
-                    int            i;
+    MemoryContextSwitchTo(sinfo->oldcontext);
+}
-                    for (i = 0; i < nfields; i++)
-                    {
-                        if (PQgetisnull(res, row, i))
-                            values[i] = NULL;
-                        else
-                            values[i] = PQgetvalue(res, row, i);
-                    }
-                }
-                else
-                {
-                    values[0] = PQcmdStatus(res);
-                }
+/* Prototype of this function is PQrowProcessor */
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+    storeInfo *sinfo = (storeInfo *)param;
+    HeapTuple  tuple;
+    int        fields = PQnfields(res);
+    int        i;
+    char      **cstrs = sinfo->cstrs;
-                /* build the tuple and put it into the tuplestore. */
-                tuple = BuildTupleFromCStrings(attinmeta, values);
-                tuplestore_puttuple(tupstore, tuple);
-            }
+    if (sinfo->error_occurred)
+        return -1;
-            /* clean up and return the tuplestore */
-            tuplestore_donestoring(tupstore);
-        }
+    if (sinfo->nattrs != fields)
+    {
+        sinfo->error_occurred = TRUE;
+        sinfo->nummismatch = TRUE;
+        finishStoreInfo(sinfo);
-        PQclear(res);
+        /* This error will be processed in dblink_record_internal() */
+        return -1;    }
-    PG_CATCH();
+
+    /*
+     * value input functions assumes that the input string is
+     * terminated by zero. We should make the values to be so.
+     */
+    for(i = 0 ; i < fields ; i++)    {
-        /* be sure to release the libpq result */
-        PQclear(res);
-        PG_RE_THROW();
+        int len = columns[i].len;
+        if (len < 0)
+            cstrs[i] = NULL;
+        else
+        {
+            char *tmp = sinfo->valbuf[i];
+            int tmplen = sinfo->valbuflen[i];
+
+            /*
+             * Divide calls to malloc and realloc so that things will
+             * go fine even on the systems of which realloc() does not
+             * accept NULL as old memory block.
+             *
+             * Also try to (re)allocate in bigger steps to
+             * avoid flood of allocations on weird data.
+             */
+            if (tmp == NULL)
+            {
+                tmplen = len + 1;
+                if (tmplen < 64)
+                    tmplen = 64;
+                tmp = (char *)malloc(tmplen);
+            }
+            else if (tmplen < len + 1)
+            {
+                if (len + 1 > tmplen * 2)
+                    tmplen = len + 1;
+                else
+                    tmplen = tmplen * 2;
+                tmp = (char *)realloc(tmp, tmplen);
+            }
+
+            /*
+             * sinfo->valbuf[n] will be freed in finishStoreInfo()
+             * when realloc returns NULL.
+             */
+            if (tmp == NULL)
+                return -1;  /* Inform out of memory to the caller */
+
+            sinfo->valbuf[i] = tmp;
+            sinfo->valbuflen[i] = tmplen;
+
+            cstrs[i] = sinfo->valbuf[i];
+            memcpy(cstrs[i], columns[i].value, len);
+            cstrs[i][len] = '\0';
+        }    }
-    PG_END_TRY();
+
+    /*
+     * These functions may throw exception. It will be caught in
+     * dblink_record_internal()
+     */
+    tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+    tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+    return 1;}/*

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

07 March 2012, 16:22:45

On Wed, Mar 07, 2012 at 03:14:57PM +0900, Kyotaro HORIGUCHI wrote:
> > * remove pqIsnonblocking(conn) check when row processor returned 2.
> >   I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
> >   on sync connection.
>
> done. This affects PQisBusy, but PQgetResult won't be affected as
> far as I see. I have no idea for PQconsumeInput()..

PQconsumeInput does not parse the row and there is no need to
do anything with PQgetResult().

> > * It seems the return codes from callback should be remapped,
> >   (0, 1, 2) is unusual pattern.  Better would be:
> >
> >    -1 - error
> >     0 - stop parsing / early exit ("I'm not done yet")
> >     1 - OK ("I'm done with the row")
>
> done.
>
> This might be described more precisely as follows,
>
> |    -1 - error - erase result and change result status to
> |               - FATAL_ERROR All the rest rows in current result
> |               - will skipped(consumed).
> |     0 - stop parsing / early exit ("I'm not done yet")
> |                - getAnotherTuple returns EOF without dropping PGresult.
> #                - We expect PQisBusy(), PQconsumeInput()(?) and
> #                - PQgetResult() to exit immediately and we can
> #                - call PQgetResult(), PQskipResult() or
> #                - PQisBusy() after.
> |     1 - OK ("I'm done with the row")
> |                - save result and getAnotherTuple returns 0.
>
> The lines prefixed with '#' is the desirable behavior I have
> understood from the discussion so far. And I doubt that it works
> as we expected for PQgetResult().

No, the desirable behavior is already implemented and documented -
the "stop parsing" return code affects only PQisBusy().  As that
is the function that does the actual parsing.

The main plus if such scheme is that we do not change the behaviour
of any existing APIs.

> > * Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg().
>
> done.

You optimized libpq_gettext() calls, but it's wrong - they must
wrap the actual strings so that the strings can be extracted
for translating.

Fixed in attached patch.

> > My suggestion - check in getAnotherTuple whether resultStatus is
> > already error and do nothing then.  This allows internal pqAddRow
> > to set regular "out of memory" error.  Otherwise give generic
> > "row processor error".
>
> Current implement seems already doing this in
> parseInput3(). Could you give me further explanation?

The suggestion was about getAnotherTuple() - currently it sets
always "error in row processor".  With such check, the callback
can set the error result itself.  Currently only callbacks that
live inside libpq can set errors, but if we happen to expose
error-setting function in outside API, then the getAnotherTuple()
would already be ready for it.


See attached patch, I did some generic comment/docs cleanups
but also minor code cleanups:

- PQsetRowProcessor(NULL,NULL) sets Param to PGconn, instead NULL,
  this makes PGconn available to pqAddRow.

- pqAddRow sets "out of memory" error itself on PGconn.

- getAnotherTuple(): when callback returns -1, it checks if error
  message is set, does nothing then.

- put libpq_gettext() back around strings.

- dropped the error_saveresult label, it was unnecessary branch.

- made functions survive conn==NULL.

- dblink: changed skipAll parameter for PQskipResult() to TRUE,
  as dblink uses PQexec which can send several queries.

- dblink: refreshed regtest result, as now we get actual
  connection name in error message.

- Synced PQgetRow patch with return value changes.

- Synced demos at https://github.com/markokr/libpq-rowproc-demos
  with return value changes.


I'm pretty happy with current state.  So tagging it
ReadyForCommitter.

--
marko

Attachment

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

08 March 2012, 01:35:47

Hello,

> > #                - We expect PQisBusy(), PQconsumeInput()(?) and 
> > #                - PQgetResult() to exit immediately and we can
> > #                - call PQgetResult(), PQskipResult() or
> > #                - PQisBusy() after.
> > |     1 - OK ("I'm done with the row")
> > |                - save result and getAnotherTuple returns 0.
> > 
> > The lines prefixed with '#' is the desirable behavior I have
> > understood from the discussion so far. And I doubt that it works
> > as we expected for PQgetResult().
> 
> No, the desirable behavior is already implemented and documented -
> the "stop parsing" return code affects only PQisBusy().  As that
> is the function that does the actual parsing.

I am satisfied with your answer. Thank you.

> The main plus if such scheme is that we do not change the behaviour
> of any existing APIs.

I agree with the policy.

> You optimized libpq_gettext() calls, but it's wrong - they must
> wrap the actual strings so that the strings can be extracted
> for translating.

Ouch. I remember to have had an build error about that before...

> Fixed in attached patch.

I'm sorry to annoy you.

> > > My suggestion - check in getAnotherTuple whether resultStatus is
> > > already error and do nothing then.  This allows internal pqAddRow
> > > to set regular "out of memory" error.  Otherwise give generic
> > > "row processor error".
..
> The suggestion was about getAnotherTuple() - currently it sets
> always "error in row processor".  With such check, the callback
> can set the error result itself.  Currently only callbacks that
> live inside libpq can set errors, but if we happen to expose
> error-setting function in outside API, then the getAnotherTuple()
> would already be ready for it.

I see. And I found it implemented in your patch.

> See attached patch, I did some generic comment/docs cleanups
> but also minor code cleanups:
> 
> - PQsetRowProcessor(NULL,NULL) sets Param to PGconn, instead NULL,
> - pqAddRow sets "out of memory" error itself on PGconn.
> - getAnotherTuple(): when callback returns -1, it checks if error
> - dropped the error_saveresult label, it was unnecessary branch.

Ok, I've confirmed them.

> - put libpq_gettext() back around strings.
> - made functions survive conn==NULL.
> - dblink: refreshed regtest result, as now we get actual

Thank you for fixing my bugs.

> - dblink: changed skipAll parameter for PQskipResult() to TRUE,
>   as dblink uses PQexec which can send several queries.

I agree to the change. I intended to allow to receive the results
after skipping the current result for failure. But that seems not
only not very likely, but also to be something dangerous.

> - Synced PQgetRow patch with return value changes.
> 
> - Synced demos at https://github.com/markokr/libpq-rowproc-demos
>   with return value changes.


> I'm pretty happy with current state.  So tagging it
> ReadyForCommitter.

Thank you very much for all your help.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

22 March 2012, 19:07:35

Marko Kreen <markokr@gmail.com> writes:
> [ patches against libpq and dblink ]

I think this patch needs a lot of work.

AFAICT it breaks async processing entirely, because many changes have been
made that fail to distinguish "insufficient data available as yet" from
"hard error".  As an example, this code at the beginning of
getAnotherTuple:      /* Get the field count and make sure it's what we expect */     if (pqGetInt(&tupnfields, 2,
conn))
!         return EOF;

is considering three cases: it got a 2-byte integer (and can continue on),
or there aren't yet 2 more bytes available in the buffer, in which case it
should return EOF without doing anything, or pqGetInt detected a hard
error and updated the connection error state accordingly, in which case
again there is nothing to do except return EOF.  In the patched code we
have:
     /* Get the field count and make sure it's what we expect */     if (pqGetInt(&tupnfields, 2, conn))
!     {
!         /* Whole the message must be loaded on the buffer here */
!         errmsg = libpq_gettext("protocol error\n");
!         goto error_and_forward;
!     }

which handles neither the second nor third case correctly: it thinks that
"data not here yet" is a hard error, and then makes sure it is an error by
destroying the parsing state :-(.  And if in fact pqGetInt did log an
error, that possibly-useful error message is overwritten with an entirely
useless "protocol error" text.

I don't think the error return cases for the row processor have been
thought out too well either.  The row processor is not in charge of what
happens to the PGresult, and it certainly has no business telling libpq to
just "exit immediately from the topmost libpq function".  If we do that
we'll probably lose sync with the data stream and be unable to recover use
of the connection at all.  Also, do we need to consider any error cases
for the row processor other than out-of-memory?  If so it might be a good
idea for it to have some ability to store a custom error message into the
PGconn, which it cannot do given the current function API.

In the same vein, I am fairly uncomfortable with the blithe assertion that
a row processor can safely longjmp out of libpq.  This was never foreseen
in the original library coding and there are any number of places that
that might break, now or in the future.  Do we really need to allow it?
If we do, it would be a good idea to decorate the libpq functions that are
now expected to possibly longjmp with comments saying so.  Tracing all the
potential call paths that might be aborted by a longjmp is an essential
activity anyway.

Another design deficiency is PQskipResult().  This is badly designed for
async operation because once you call it, it will absolutely not give back
control until it's read the whole query result.  (It'd be better to have
it set some kind of flag that makes future library calls discard data
until the query result end is reached.)  Something else that's not very
well-designed about it is that it might throw away more than just incoming
tuples.  As an example, suppose that the incoming data at the time you
call it consists of half a dozen rows followed by an ErrorResponse.  The
ErrorResponse will be eaten and discarded, leaving the application no clue
why its transaction has been aborted, or perhaps even the entire session
cancelled.  What we probably want here is just to transiently install a
row processor that discards all incoming data, but the application is
still expected to eventually fetch a PGresult that will tell it whether
the server side of the query completed or not.

In the dblink patch, given that you have to worry about elogs coming out
of BuildTupleFromCStrings and tuplestore_puttuple anyway, it's not clear
what is the benefit of using malloc rather than palloc to manage the
intermediate buffers in storeHandler --- what that seems to mostly
accomplish is increase the risk of permanent memory leaks.  I don't see
much value in per-column buffers either; it'd likely be cheaper to just
palloc one workspace that's big enough for all the column strings
together.  And speaking of leaks, doesn't storeHandler leak the
constructed tuple on each call, not to mention whatever might be leaked by
the called datatype input functions?

It also seems to me that the dblink patch breaks the case formerly handled
in materializeResult() of a PGRES_COMMAND_OK rather than PGRES_TUPLES_OK
result.  The COMMAND case is supposed to be converted to a single-column
text result, and I sure don't see where that's happening now.

BTW, am I right in thinking that some of the hunks in this patch are meant
to fix a pre-existing bug of failing to report the correct connection
name?  If so, it'd likely be better to split those out and submit as a
separate patch, instead of conflating them with a feature addition.

Lastly, as to the pqgetrow patch, the lack of any demonstrated test case
for these functions makes me uncomfortable as to whether they are well
designed.  Again, I'm unconvinced that the error handling is good or that
they work sanely in async mode.  I'm inclined to just drop these for this
go-round, and to stick to providing the features that we can test via the
dblink patch.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

23 March 2012, 06:08:26

Thank you for picking up.

> is considering three cases: it got a 2-byte integer (and can continue on),
> or there aren't yet 2 more bytes available in the buffer, in which case it
> should return EOF without doing anything, or pqGetInt detected a hard
> error and updated the connection error state accordingly, in which case
> again there is nothing to do except return EOF.  In the patched code we
> have:
...
> which handles neither the second nor third case correctly: it thinks that
> "data not here yet" is a hard error, and then makes sure it is an error by
> destroying the parsing state :-(.

Marko and I think that, in protocol 3, all bytes of the incoming
message should have been surely loaded when entering
getAnotherTuple(). The following part In pqParseInput3() does
this.

| if (avail < msgLength)
| {
|     /*
|      * Before returning, enlarge the input buffer if needed to hold
|      * the whole message.
|      (snipped)..
|      */
|     if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength,
|                              conn))
|     {
|         /*
|          * XXX add some better recovery code...
|         (snipped)..
|          */
|         handleSyncLoss(conn, id, msgLength);
|     }
|     return;


So, if cursor state is broken just after exiting
getAnotherTuple(), it had already been broken BEFORE entering
getAnotherTuple() according to current disign. That is the
'protocol error' means. pqGetInt there should not detect any
errors except for broken message.


> error, that possibly-useful error message is overwritten with an entirely
> useless "protocol error" text.
Plus, current pqGetInt seems to set its own error message only
for the wrong parameter 'bytes'. 

On the other hand, in protocol 2 (to be removed ?) the error
handling mechanism get touched, because full-load of the message
is not guraranteed.

> I don't think the error return cases for the row processor have been
> thought out too well either.  The row processor is not in charge of what
> happens to the PGresult,

Default row processor stuffs PGresult with tuples, another (say
that of dblink) leave it empty. Row processor manages PGresult by
the extent of their own.

>  and it certainly has no business telling libpq to just "exit
> immediately from the topmost libpq function". If we do that
> we'll probably lose sync with the data stream and be unable to
> recover use of the connection at all.

I don't think PGresult has any charge of error handling system in
current implement. The phrase 'exit immediately from the topmost
libpq function' should not be able to be seen in the patch.

The exit routes from row processor are following,
- Do longjmp (or PG_PG_TRY-CATCH mechanism) out of the row  processor.
- Row processor returns 0 when entered from PQisBusy(),  immediately exit from PQisBusy().

Curosor consistency will be kept in both case. The cursor already
be on the next to the last byte of the current message.

> Also, do we need to consider any error cases for the row
> processor other than out-of-memory?  If so it might be a good
> idea for it to have some ability to store a custom error
> message into the PGconn, which it cannot do given the current
> function API.

It seems not have so strong necessity concerning dblink or
PQgetRow comparing to expected additional complexity around. So
this patch does not include it.

> In the same vein, I am fairly uncomfortable with the blithe assertion that
> a row processor can safely longjmp out of libpq.  This was never foreseen
> in the original library coding and there are any number of places that
> that might break, now or in the future.  Do we really need to allow it?

To protect row processor from longjmp'ing out, I enclosed the
functions potentially throw exception by PG_TRY-CATCH clause in
the early verson. This was totally safe but the penalty was not
negligible because the TRY-CATCH was passed for every row.


> If we do, it would be a good idea to decorate the libpq
> functions that are now expected to possibly longjmp with
> comments saying so.  Tracing all the potential call paths that
> might be aborted by a longjmp is an essential activity anyway.

Concerning now but the future, I can show you the trail of
confirmation process.

- There is no difference between with and without the patch at the level of getAnotherTuple() from the view of
consistency.

- Assuming pqParseInput3 detects the next message has not come after getAnotherTuple returned. It exits immediately on
readingthe length of the next message. This is the same condition to longjumping.   >   if (pqGetInt(&msgLength, 4,
conn))  >       return;
 

- parseInput passes it through and immediately exits in consistent state.

- The caller of PQgetResult, PQisBusy, PQskipResult, PQnotifies, PQputCopyData, pqHandleSendFailure gain the control
finally.I am convinced that the async status at the time must be PGASYNC_BUSY and the conn cursor in consistent state.
 
 So the ancestor of row processor is encouraged to call PQfinish, PQgetResult, PQskipResult after getting longjmped in
thedocument. These functions should resolve the intermediate status described above created by longjmp by restarting
parsingthe stream afterwards
 


And about the future, altough it is a matter of cource that every
touch on the code will cause every destruction, longjmp stepping
over libpq internal code seems something complicated which is
enough to cause trouble. I will marking as 'This function is
skipped over by the longjmp invoked in the descendents.' (Better
expressions are welcome..)

> Another design deficiency is PQskipResult().  This is badly
> designed for async operation because once you call it, it will
> absolutely not give back control until it's read the whole
> query result.  (It'd be better to have it set some kind of flag
> that makes future library calls discard data until the query
> result end is reached.)

If this function is called just after getting longjmp from row
processor, the async status of the connection at the time must be
PGASYNC_BUSY. So I think this function should always returns even
if the longjmp takes place at the last row in a result. There
must be following 'C' message if not broken.

> Something else that's not very well-designed about it is that
> it might throw away more than just incoming tuples.  As an
> example, suppose that the incoming data at the time you call it
> consists of half a dozen rows followed by an ErrorResponse.
> The ErrorResponse will be eaten and discarded, leaving the
> application no clue why its transaction has been aborted, or
> perhaps even the entire session cancelled.

If the caller needs to see the ErrorResposes following, I think
calling PQgetResult seems enough.

> What we probably want here is just to transiently install a
> row processor that
> discards all incoming data, but the application is still
> expected to eventually fetch a PGresult that will tell it
> whether the server side of the query completed or not.

The all-dicarding row processor should not be visible to the
user. The users could design the row processor to behave so if
they want. This is mere a shortcut but it seems difficult to do
so without.

> In the dblink patch, ... it's not clear what is the benefit of
> using malloc rather than palloc to manage the intermediate
> buffers in storeHandler --- what that seems to mostly
> accomplish is increase the risk of permanent memory leaks.

Hmm. I thought that palloc is heavier than malloc, and the
allocated blocks looked well controlled so that there won't be
likely to leak. I will change to use palloc's counting the
discussion about block granurality below.

> I don't see much value in per-column buffers either; it'd
> likely be cheaper to just palloc one workspace that's big
> enough for all the column strings together.

I had hated to count the total length prior to copying the
contents in the early version. In the latest the total
requirement of the memory is easily obatined. So it seems good to
alloc together. I'll do so.

> And speaking of leaks, doesn't storeHandler leak the
> constructed tuple on each call, not to mention whatever might
> be leaked by the called datatype input functions?

I copied the process in the original materializeResult into
storeHandler. tuplestore_donestoring was removed because it is
obsoleted. So I think it is no problem about it. Am I wrong?

> It also seems to me that the dblink patch breaks the case
> formerly handled in materializeResult() of a PGRES_COMMAND_OK
> rather than PGRES_TUPLES_OK result.  The COMMAND case is
> supposed to be converted to a single-column text result, and I
> sure don't see where that's happening now.

I'm sorry. I will restore that route.

> BTW, am I right in thinking that some of the hunks in this
> patch are meant to fix a pre-existing bug of failing to report
> the correct connection name?  If so, it'd likely be better to
> split those out and submit as a separate patch, instead of
> conflating them with a feature addition.

Ok. I will split it.

> Lastly, as to the pqgetrow patch, the lack of any demonstrated
> test case for these functions makes me uncomfortable as to
> whether they are well designed.  Again, I'm unconvinced that
> the error handling is good or that they work sanely in async
> mode.  I'm inclined to just drop these for this go-round, and
> to stick to providing the features that we can test via the
> dblink patch.

Testing pqGetRow via dblink?

Do you mean 'drop these' as pqGetRow? So, this part might be
droppable apart from the rest.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

23 March 2012, 21:22:48

I saw Kyotaro already answered, but I give my view as well.

On Thu, Mar 22, 2012 at 06:07:16PM -0400, Tom Lane wrote:
> AFAICT it breaks async processing entirely, because many changes have been
> made that fail to distinguish "insufficient data available as yet" from
> "hard error".  As an example, this code at the beginning of
> getAnotherTuple:
>
>       /* Get the field count and make sure it's what we expect */
>       if (pqGetInt(&tupnfields, 2, conn))
> !         return EOF;
>
> is considering three cases: it got a 2-byte integer (and can continue on),
> or there aren't yet 2 more bytes available in the buffer, in which case it
> should return EOF without doing anything, or pqGetInt detected a hard
> error and updated the connection error state accordingly, in which case
> again there is nothing to do except return EOF.  In the patched code we
> have:
>
>       /* Get the field count and make sure it's what we expect */
>       if (pqGetInt(&tupnfields, 2, conn))
> !     {
> !         /* Whole the message must be loaded on the buffer here */
> !         errmsg = libpq_gettext("protocol error\n");
> !         goto error_and_forward;
> !     }
>
> which handles neither the second nor third case correctly: it thinks that
> "data not here yet" is a hard error, and then makes sure it is an error by
> destroying the parsing state :-(.  And if in fact pqGetInt did log an
> error, that possibly-useful error message is overwritten with an entirely
> useless "protocol error" text.

No, "protocol error" really is only error case here.

- pqGetInt() does not set errors.

- V3 getAnotherTuple() is called only if packet is fully in buffer.

> I don't think the error return cases for the row processor have been
> thought out too well either.  The row processor is not in charge of what
> happens to the PGresult, and it certainly has no business telling libpq to
> just "exit immediately from the topmost libpq function".  If we do that
> we'll probably lose sync with the data stream and be unable to recover use
> of the connection at all.  Also, do we need to consider any error cases
> for the row processor other than out-of-memory?

No, the rule is *not* "exit to topmost", but "exit PQisBusy()".

This is exactly so that if any code that does not expect row-processor
behaviour continues to work.

Also, from programmers POV, this also means row-processor callback causes
minimal changes to existing APIs.

> If so it might be a good
> idea for it to have some ability to store a custom error message into the
> PGconn, which it cannot do given the current function API.

There already was such function, but it was row-processor specific hack
that could leak out and create confusion.  I rejected it.  Instead there
should be generic error setting function, equivalent to current libpq
internal error setting.

But such generic error setting function would need review all libpq
error states as it allows error state appear in new situations.  Also
we need to have well-defined behaviour of client-side errors vs. incoming
server errors.

Considering that even current cut-down patch is troubling committers,
I would definitely suggest postponing such generic error setter to 9.3.

Especially as it does not change anything coding-style-wise.

> In the same vein, I am fairly uncomfortable with the blithe assertion that
> a row processor can safely longjmp out of libpq.  This was never foreseen
> in the original library coding and there are any number of places that
> that might break, now or in the future.  Do we really need to allow it?
> If we do, it would be a good idea to decorate the libpq functions that are
> now expected to possibly longjmp with comments saying so.  Tracing all the
> potential call paths that might be aborted by a longjmp is an essential
> activity anyway.

I think we *should* allow exceptions, but in limited number of APIs.

Basically, the usefulness for users vs. effort from our side
is clearly on the side of providing it.

But its up to us to define what the *limited* means (what needs
least effort from us), so that later when users want to use exceptions
in callback, they need to pick right API.

Currently it seems only PQexec() + multiple SELECTS can give trouble,
as previous PGresult is kept in stack.  Should we unsupport
PQexec or multiple SELECTS?

But such case it borken even without exceptions - or at least
very confusing.  Not sure what to do with it.

In any case, "decorating" libpq functions is wrong approach.  This gives
suggestion that caller of eg. PQexec() needs to take care of any possible
behaviour of unknown callback.  This will not work.   Instead allowed
functions should be simply listed in row-processor documentation.

Basically custom callback should be always matched by caller that
knows about it and knows how to handle it.  Not sure how to put
such suggestion into documentation tho'.

> Another design deficiency is PQskipResult().  This is badly designed for
> async operation because once you call it, it will absolutely not give back
> control until it's read the whole query result.  (It'd be better to have
> it set some kind of flag that makes future library calls discard data
> until the query result end is reached.)  Something else that's not very
> well-designed about it is that it might throw away more than just incoming
> tuples.  As an example, suppose that the incoming data at the time you
> call it consists of half a dozen rows followed by an ErrorResponse.  The
> ErrorResponse will be eaten and discarded, leaving the application no clue
> why its transaction has been aborted, or perhaps even the entire session
> cancelled.  What we probably want here is just to transiently install a
> row processor that discards all incoming data, but the application is
> still expected to eventually fetch a PGresult that will tell it whether
> the server side of the query completed or not.

I guess it's designed for rolling connection forward in exception
handler...  And it's blocking-only indeed.  Considering that better
approach is to drop the connection, instead trying to save it,
it's usefulness is questionable.

I'm OK with dropping it.

> Lastly, as to the pqgetrow patch, the lack of any demonstrated test case
> for these functions makes me uncomfortable as to whether they are well
> designed.  Again, I'm unconvinced that the error handling is good or that
> they work sanely in async mode.  I'm inclined to just drop these for this
> go-round, and to stick to providing the features that we can test via the
> dblink patch.

I simplified the github test cases and attached as patch.

Could you please give more concrete critique of the API?

Main idea behing PQgetRow is that it does not replace any existing
API function, instead acts as addition:

* Sync case: PQsendQuery() + PQgetResult - PQgetRow should be called
  before PQgetResult until it returns NULL, then proceed with PQgetResult
  to get final state.

* Async case: PQsendQuery() + PQconsumeInput() + PQisBusy() + PQgetResult().
  Here PQgetRow() should be called before PQisBusy() until it returns
  NULL, then proceed with PQisBusy() as usual.

It only returns rows, never any error state PGresults.

Main advantage of including PQgetRow() together with low-level
rowproc API is that it allows de-emphasizing more complex parts of
rowproc API (exceptions, early exit, skipresult, custom error msg).
And drop/undocument them or simply call them postgres-internal-only.

--
marko

Attachment

rowproc-demos.diff

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

26 March 2012, 05:55:13

Hello, This is new version of patch for dblink using row processor.
- Use palloc to allocate temporaly memoriy blocks.
- Memory allocation is now done in once. Preallocate a block of  initial size and palloc simplified reallocation code.
- Resurrected the route for command invoking. And small  adjustment of behavior on error.
- Modification to fix connection name missing bug is removed out  to another patch.
- Commenting on the functions skipped over by lonjmp is  withholded according to Marko's discussion.
- rebased to e8476f46fc847060250c92ec9b310559293087fc

dblink_use_rowproc_20120326.patch - dblink row processor patch.
dblink_connname_20120326.patch    - dblink connname fix patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..dd73aa5 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn    bool        newXactForCursor;        /* Opened a transaction for a
cursor*/} remoteConn;
 
+typedef struct storeInfo
+{
+    Tuplestorestate *tuplestore;
+    int nattrs;
+    MemoryContext oldcontext;
+    AttInMetadata *attinmeta;
+    char* valbuf;
+    int valbuflen;
+    char **cstrs;
+    bool error_occurred;
+    bool nummismatch;
+} storeInfo;
+/* * Internal declarations */static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);static remoteConn *getConnectionByName(const
char*name);static HTAB *createConnHash(void);static void createNewConnection(const char *name, remoteConn *rconn);
 
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);static void validate_pkattnums(Relation rel,
          int2vector *pkattnums_arg, int32 pknumatts_arg,                   int **pkattnums, int *pknumatts);
 
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+/* Global */static remoteConn *pconn = NULL;
@@ -111,6 +127,9 @@ typedef struct remoteConnHashEnt/* initial number of connection hashes */#define NUMCONN 16
+/* Initial block size for value buffer in storeHandler */
+#define INITBUFLEN 64
+/* general utility */#define xpfree(var_) \    do { \
@@ -503,6 +522,7 @@ dblink_fetch(PG_FUNCTION_ARGS)    char       *curname = NULL;    int            howmany = 0;
bool       fail = true;    /* default to backward compatible */
 
+    storeInfo   storeinfo;    DBLINK_INIT;
@@ -559,15 +579,51 @@ dblink_fetch(PG_FUNCTION_ARGS)    appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
/*
 
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+    /*     * Try to execute the query.  Note that since libpq uses malloc, the     * PGresult will be long-lived even
thoughwe are still in a short-lived     * memory context.     */
 
-    res = PQexec(conn, buf.data);
+    PG_TRY();
+    {
+        res = PQexec(conn, buf.data);
+    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
+
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
+
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, TRUE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+    if (!res ||        (PQresultStatus(res) != PGRES_COMMAND_OK &&         PQresultStatus(res) != PGRES_TUPLES_OK))
{
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }
+        dblink_res_error(conname, res, "could not fetch from cursor", fail);        return (Datum) 0;    }
@@ -579,8 +635,8 @@ dblink_fetch(PG_FUNCTION_ARGS)                (errcode(ERRCODE_INVALID_CURSOR_NAME),
errmsg("cursor \"%s\" does not exist", curname)));    }
 
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
@@ -640,6 +696,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    remoteConn *rconn = NULL;    bool
      fail = true;    /* default to backward compatible */    bool        freeconn = false;
 
+    storeInfo   storeinfo;    /* check to see if caller supports us returning a tuplestore */    if (rsinfo == NULL ||
!IsA(rsinfo,ReturnSetInfo))
 
@@ -660,6 +717,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)        {            /*
text,text,bool*/            DBLINK_GET_CONN;
 
+            conname = text_to_cstring(PG_GETARG_TEXT_PP(0));            sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
        fail = PG_GETARG_BOOL(2);        }
 
@@ -715,164 +773,229 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    rsinfo->setResult = NULL;
rsinfo->setDesc= NULL;
 
-    /* synchronous query, or async result retrieval */
-    if (!is_async)
-        res = PQexec(conn, sql);
-    else
+
+    /*
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec/PQgetResult below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+    PG_TRY();    {
-        res = PQgetResult(conn);
-        /* NULL means we're all done with the async results */
-        if (!res)
-            return (Datum) 0;
+        /* synchronous query, or async result retrieval */
+        if (!is_async)
+            res = PQexec(conn, sql);
+        else
+            res = PQgetResult(conn);    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
-    /* if needed, close the connection to the database and cleanup */
-    if (freeconn)
-        PQfinish(conn);
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
-    if (!res ||
-        (PQresultStatus(res) != PGRES_COMMAND_OK &&
-         PQresultStatus(res) != PGRES_TUPLES_OK))
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, TRUE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+
+    /* NULL res from async get means we're all done with the results */
+    if (res || !is_async)    {
-        dblink_res_error(conname, res, "could not execute query", fail);
-        return (Datum) 0;
+        if (freeconn)
+            PQfinish(conn);
+
+        /*
+         * exclude mismatch of the numbers of the colums here so as to
+         * behave as before.
+         */
+        if (!res ||
+            (PQresultStatus(res) != PGRES_COMMAND_OK &&
+             PQresultStatus(res) != PGRES_TUPLES_OK &&
+             !storeinfo.nummismatch))
+        {
+            dblink_res_error(conname, res, "could not execute query", fail);
+            return (Datum) 0;
+        }
+
+        /* Set command return status when the query was a command. */
+        if (PQresultStatus(res) == PGRES_COMMAND_OK)
+        {
+            char *values[1];
+            HeapTuple tuple;
+            AttInMetadata *attinmeta;
+            ReturnSetInfo *rcinfo = (ReturnSetInfo*)fcinfo->resultinfo;
+            
+            values[0] = PQcmdStatus(res);
+            attinmeta = TupleDescGetAttInMetadata(rcinfo->setDesc);
+            tuple = BuildTupleFromCStrings(attinmeta, values);
+            tuplestore_puttuple(rcinfo->setResult, tuple);
+        }
+        else if (get_call_result_type(fcinfo, NULL, NULL) == TYPEFUNC_RECORD)
+        {
+            /* failed to determine actual type of RECORD */
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+        }
+
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }    }
-    materializeResult(fcinfo, res);
+    if (res)
+        PQclear(res);
+    return (Datum) 0;}
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo){    ReturnSetInfo *rsinfo = (ReturnSetInfo *)
fcinfo->resultinfo;
+    TupleDesc    tupdesc;
-    Assert(rsinfo->returnMode == SFRM_Materialize);
-
-    PG_TRY();
+    sinfo->oldcontext = MemoryContextSwitchTo(
+        rsinfo->econtext->ecxt_per_query_memory);
+        
+    switch (get_call_result_type(fcinfo, NULL, &tupdesc))    {
-        TupleDesc    tupdesc;
-        bool        is_sql_cmd = false;
-        int            ntuples;
-        int            nfields;
-
-        if (PQresultStatus(res) == PGRES_COMMAND_OK)
-        {
-            is_sql_cmd = true;
-
-            /*
-             * need a tuple descriptor representing one TEXT column to return
-             * the command status string as our result tuple
-             */
+        case TYPEFUNC_COMPOSITE:
+            tupdesc = CreateTupleDescCopy(tupdesc);
+            sinfo->nattrs = tupdesc->natts;
+            break;
+        case TYPEFUNC_RECORD:            tupdesc = CreateTemplateTupleDesc(1, false);
TupleDescInitEntry(tupdesc,(AttrNumber) 1, "status",                               TEXTOID, -1, 0);
 
-            ntuples = 1;
-            nfields = 1;
-        }
-        else
-        {
-            Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
-
-            is_sql_cmd = false;
+            sinfo->nattrs = 1;
+            break;
+        default:
+            /* result type isn't composite */
+            elog(ERROR, "return type must be a row type");
+            break;
+    }
-            /* get a tuple descriptor for our result type */
-            switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-            {
-                case TYPEFUNC_COMPOSITE:
-                    /* success */
-                    break;
-                case TYPEFUNC_RECORD:
-                    /* failed to determine actual type of RECORD */
-                    ereport(ERROR,
-                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                        errmsg("function returning record called in context "
-                               "that cannot accept type record")));
-                    break;
-                default:
-                    /* result type isn't composite */
-                    elog(ERROR, "return type must be a row type");
-                    break;
-            }
+    /* make sure we have a persistent copy of the tupdesc */
-            /* make sure we have a persistent copy of the tupdesc */
-            tupdesc = CreateTupleDescCopy(tupdesc);
-            ntuples = PQntuples(res);
-            nfields = PQnfields(res);
-        }
+    sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+    sinfo->error_occurred = FALSE;
+    sinfo->nummismatch = FALSE;
+    sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+    sinfo->valbuflen = INITBUFLEN;
+    sinfo->valbuf = (char *)palloc(sinfo->valbuflen);
+    sinfo->cstrs = (char **)palloc(sinfo->nattrs * sizeof(char *));
-        /*
-         * check result and tuple descriptor have the same number of columns
-         */
-        if (nfields != tupdesc->natts)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("remote query result rowtype does not match "
-                            "the specified FROM clause rowtype")));
+    rsinfo->setResult = sinfo->tuplestore;
+    rsinfo->setDesc = tupdesc;
+}
-        if (ntuples > 0)
-        {
-            AttInMetadata *attinmeta;
-            Tuplestorestate *tupstore;
-            MemoryContext oldcontext;
-            int            row;
-            char      **values;
-
-            attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-            oldcontext = MemoryContextSwitchTo(
-                                    rsinfo->econtext->ecxt_per_query_memory);
-            tupstore = tuplestore_begin_heap(true, false, work_mem);
-            rsinfo->setResult = tupstore;
-            rsinfo->setDesc = tupdesc;
-            MemoryContextSwitchTo(oldcontext);
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+    if (sinfo->valbuf)
+    {
+        pfree(sinfo->valbuf);
+        sinfo->valbuf = NULL;
+    }
-            values = (char **) palloc(nfields * sizeof(char *));
+    if (sinfo->cstrs)
+    {
+        pfree(sinfo->cstrs);
+        sinfo->cstrs = NULL;
+    }
-            /* put all tuples into the tuplestore */
-            for (row = 0; row < ntuples; row++)
-            {
-                HeapTuple    tuple;
+    MemoryContextSwitchTo(sinfo->oldcontext);
+}
-                if (!is_sql_cmd)
-                {
-                    int            i;
+/* Prototype of this function is PQrowProcessor */
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+    storeInfo *sinfo = (storeInfo *)param;
+    HeapTuple  tuple;
+    int        newbuflen;
+    int        fields = PQnfields(res);
+    int        i;
+    char       **cstrs = sinfo->cstrs;
+    char       *pbuf;
+
+    if (sinfo->error_occurred)
+        return -1;
+
+    if (sinfo->nattrs != fields)
+    {
+        sinfo->error_occurred = TRUE;
+        sinfo->nummismatch = TRUE;
+        finishStoreInfo(sinfo);
-                    for (i = 0; i < nfields; i++)
-                    {
-                        if (PQgetisnull(res, row, i))
-                            values[i] = NULL;
-                        else
-                            values[i] = PQgetvalue(res, row, i);
-                    }
-                }
-                else
-                {
-                    values[0] = PQcmdStatus(res);
-                }
+        /* This error will be processed in dblink_record_internal() */
+        return -1;
+    }
-                /* build the tuple and put it into the tuplestore. */
-                tuple = BuildTupleFromCStrings(attinmeta, values);
-                tuplestore_puttuple(tupstore, tuple);
-            }
+    /*
+     * value input functions assumes that the input string is
+     * terminated by zero. We should make the values to be so.
+     */
-            /* clean up and return the tuplestore */
-            tuplestore_donestoring(tupstore);
-        }
+    /*
+     * The length of the buffer for each field is value length + 1 for
+     * zero-termination
+     */
+    newbuflen = fields;
+    for(i = 0 ; i < fields ; i++)
+        newbuflen += columns[i].len;
+
+    if (newbuflen > sinfo->valbuflen)
+    {
+        /*
+         * Try to (re)allocate in bigger steps to avoid flood of allocations
+         * on weird data.
+         */
+        if (newbuflen < sinfo->valbuflen * 2)
+            newbuflen = sinfo->valbuflen * 2;
-        PQclear(res);
+        sinfo->valbuf = (char *)repalloc(sinfo->valbuf, newbuflen);
+        sinfo->valbuflen = newbuflen;    }
-    PG_CATCH();
+
+    pbuf = sinfo->valbuf;
+    for(i = 0 ; i < fields ; i++)    {
-        /* be sure to release the libpq result */
-        PQclear(res);
-        PG_RE_THROW();
+        int len = columns[i].len;
+        if (len < 0)
+            cstrs[i] = NULL;
+        else
+        {
+            cstrs[i] = pbuf;
+            memcpy(pbuf, columns[i].value, len);
+            pbuf += len;
+            *pbuf++ = '\0';
+        }    }
-    PG_END_TRY();
+
+    /*
+     * These functions may throw exception. It will be caught in
+     * dblink_record_internal()
+     */
+    tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+    tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+    return 1;}/*
diff --git b/contrib/dblink/dblink.c a/contrib/dblink/dblink.c
index dd73aa5..b79f0c0 100644
--- b/contrib/dblink/dblink.c
+++ a/contrib/dblink/dblink.c
@@ -733,6 +733,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)            else            {
      DBLINK_GET_CONN;
 
+                conname = text_to_cstring(PG_GETARG_TEXT_PP(0));                sql =
text_to_cstring(PG_GETARG_TEXT_PP(1));           }        }
 
@@ -763,6 +764,8 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)        else            /* shouldn't
happen*/            elog(ERROR, "wrong number of arguments");
 
+
+        conname = text_to_cstring(PG_GETARG_TEXT_PP(0));    }    if (!conn)

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

26 March 2012, 23:21:17

I'm sorry to have coded a silly bug.

The previous patch has a bug in realloc size calculation.
And separation of the 'connname patch' was incomplete in regtest.
It is fixed in this patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..4de28ef 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn    bool        newXactForCursor;        /* Opened a transaction for a
cursor*/} remoteConn;
 
+typedef struct storeInfo
+{
+    Tuplestorestate *tuplestore;
+    int nattrs;
+    MemoryContext oldcontext;
+    AttInMetadata *attinmeta;
+    char* valbuf;
+    int valbuflen;
+    char **cstrs;
+    bool error_occurred;
+    bool nummismatch;
+} storeInfo;
+/* * Internal declarations */static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);static remoteConn *getConnectionByName(const
char*name);static HTAB *createConnHash(void);static void createNewConnection(const char *name, remoteConn *rconn);
 
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);static void validate_pkattnums(Relation rel,
          int2vector *pkattnums_arg, int32 pknumatts_arg,                   int **pkattnums, int *pknumatts);
 
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+/* Global */static remoteConn *pconn = NULL;
@@ -111,6 +127,9 @@ typedef struct remoteConnHashEnt/* initial number of connection hashes */#define NUMCONN 16
+/* Initial block size for value buffer in storeHandler */
+#define INITBUFLEN 64
+/* general utility */#define xpfree(var_) \    do { \
@@ -503,6 +522,7 @@ dblink_fetch(PG_FUNCTION_ARGS)    char       *curname = NULL;    int            howmany = 0;
bool       fail = true;    /* default to backward compatible */
 
+    storeInfo   storeinfo;    DBLINK_INIT;
@@ -559,15 +579,51 @@ dblink_fetch(PG_FUNCTION_ARGS)    appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
/*
 
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+    /*     * Try to execute the query.  Note that since libpq uses malloc, the     * PGresult will be long-lived even
thoughwe are still in a short-lived     * memory context.     */
 
-    res = PQexec(conn, buf.data);
+    PG_TRY();
+    {
+        res = PQexec(conn, buf.data);
+    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
+
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
+
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, TRUE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+    if (!res ||        (PQresultStatus(res) != PGRES_COMMAND_OK &&         PQresultStatus(res) != PGRES_TUPLES_OK))
{
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }
+        dblink_res_error(conname, res, "could not fetch from cursor", fail);        return (Datum) 0;    }
@@ -579,8 +635,8 @@ dblink_fetch(PG_FUNCTION_ARGS)                (errcode(ERRCODE_INVALID_CURSOR_NAME),
errmsg("cursor \"%s\" does not exist", curname)));    }
 
+    PQclear(res);
-    materializeResult(fcinfo, res);    return (Datum) 0;}
@@ -640,6 +696,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    remoteConn *rconn = NULL;    bool
      fail = true;    /* default to backward compatible */    bool        freeconn = false;
 
+    storeInfo   storeinfo;    /* check to see if caller supports us returning a tuplestore */    if (rsinfo == NULL ||
!IsA(rsinfo,ReturnSetInfo))
 
@@ -660,6 +717,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)        {            /*
text,text,bool*/            DBLINK_GET_CONN;
 
+            conname = text_to_cstring(PG_GETARG_TEXT_PP(0));            sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
        fail = PG_GETARG_BOOL(2);        }
 
@@ -715,164 +773,234 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)    rsinfo->setResult = NULL;
rsinfo->setDesc= NULL;
 
-    /* synchronous query, or async result retrieval */
-    if (!is_async)
-        res = PQexec(conn, sql);
-    else
+
+    /*
+     * Result is stored into storeinfo.tuplestore instead of
+     * res->result retuned by PQexec/PQgetResult below
+     */
+    initStoreInfo(&storeinfo, fcinfo);
+    PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+    PG_TRY();    {
-        res = PQgetResult(conn);
-        /* NULL means we're all done with the async results */
-        if (!res)
-            return (Datum) 0;
+        /* synchronous query, or async result retrieval */
+        if (!is_async)
+            res = PQexec(conn, sql);
+        else
+            res = PQgetResult(conn);    }
+    PG_CATCH();
+    {
+        ErrorData *edata;
-    /* if needed, close the connection to the database and cleanup */
-    if (freeconn)
-        PQfinish(conn);
+        finishStoreInfo(&storeinfo);
+        edata = CopyErrorData();
+        FlushErrorState();
-    if (!res ||
-        (PQresultStatus(res) != PGRES_COMMAND_OK &&
-         PQresultStatus(res) != PGRES_TUPLES_OK))
+        /* Skip remaining results when storeHandler raises exception. */
+        PQskipResult(conn, TRUE);
+        ReThrowError(edata);
+    }
+    PG_END_TRY();
+
+    finishStoreInfo(&storeinfo);
+
+    /* NULL res from async get means we're all done with the results */
+    if (res || !is_async)    {
-        dblink_res_error(conname, res, "could not execute query", fail);
-        return (Datum) 0;
+        if (freeconn)
+            PQfinish(conn);
+
+        /*
+         * exclude mismatch of the numbers of the colums here so as to
+         * behave as before.
+         */
+        if (!res ||
+            (PQresultStatus(res) != PGRES_COMMAND_OK &&
+             PQresultStatus(res) != PGRES_TUPLES_OK &&
+             !storeinfo.nummismatch))
+        {
+            dblink_res_error(conname, res, "could not execute query", fail);
+            return (Datum) 0;
+        }
+
+        /* Set command return status when the query was a command. */
+        if (PQresultStatus(res) == PGRES_COMMAND_OK)
+        {
+            char *values[1];
+            HeapTuple tuple;
+            AttInMetadata *attinmeta;
+            ReturnSetInfo *rcinfo = (ReturnSetInfo*)fcinfo->resultinfo;
+            
+            values[0] = PQcmdStatus(res);
+            attinmeta = TupleDescGetAttInMetadata(rcinfo->setDesc);
+            tuple = BuildTupleFromCStrings(attinmeta, values);
+            tuplestore_puttuple(rcinfo->setResult, tuple);
+        }
+        else if (get_call_result_type(fcinfo, NULL, NULL) == TYPEFUNC_RECORD)
+        {
+            /* failed to determine actual type of RECORD */
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+        }
+
+        /* finishStoreInfo saves the fields referred to below. */
+        if (storeinfo.nummismatch)
+        {
+            /* This is only for backward compatibility */
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("remote query result rowtype does not match "
+                            "the specified FROM clause rowtype")));
+        }    }
-    materializeResult(fcinfo, res);
+    if (res)
+        PQclear(res);
+    return (Datum) 0;}
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo){    ReturnSetInfo *rsinfo = (ReturnSetInfo *)
fcinfo->resultinfo;
+    TupleDesc    tupdesc;
-    Assert(rsinfo->returnMode == SFRM_Materialize);
-
-    PG_TRY();
+    sinfo->oldcontext = MemoryContextSwitchTo(
+        rsinfo->econtext->ecxt_per_query_memory);
+        
+    switch (get_call_result_type(fcinfo, NULL, &tupdesc))    {
-        TupleDesc    tupdesc;
-        bool        is_sql_cmd = false;
-        int            ntuples;
-        int            nfields;
-
-        if (PQresultStatus(res) == PGRES_COMMAND_OK)
-        {
-            is_sql_cmd = true;
-
-            /*
-             * need a tuple descriptor representing one TEXT column to return
-             * the command status string as our result tuple
-             */
+        case TYPEFUNC_COMPOSITE:
+            tupdesc = CreateTupleDescCopy(tupdesc);
+            sinfo->nattrs = tupdesc->natts;
+            break;
+        case TYPEFUNC_RECORD:            tupdesc = CreateTemplateTupleDesc(1, false);
TupleDescInitEntry(tupdesc,(AttrNumber) 1, "status",                               TEXTOID, -1, 0);
 
-            ntuples = 1;
-            nfields = 1;
-        }
-        else
-        {
-            Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+            sinfo->nattrs = 1;
+            break;
+        default:
+            /* result type isn't composite */
+            elog(ERROR, "return type must be a row type");
+            break;
+    }
-            is_sql_cmd = false;
+    /* make sure we have a persistent copy of the tupdesc */
-            /* get a tuple descriptor for our result type */
-            switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-            {
-                case TYPEFUNC_COMPOSITE:
-                    /* success */
-                    break;
-                case TYPEFUNC_RECORD:
-                    /* failed to determine actual type of RECORD */
-                    ereport(ERROR,
-                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                        errmsg("function returning record called in context "
-                               "that cannot accept type record")));
-                    break;
-                default:
-                    /* result type isn't composite */
-                    elog(ERROR, "return type must be a row type");
-                    break;
-            }
+    sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+    sinfo->error_occurred = FALSE;
+    sinfo->nummismatch = FALSE;
+    sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+    sinfo->valbuflen = INITBUFLEN;
+    sinfo->valbuf = (char *)palloc(sinfo->valbuflen);
+    sinfo->cstrs = (char **)palloc(sinfo->nattrs * sizeof(char *));
-            /* make sure we have a persistent copy of the tupdesc */
-            tupdesc = CreateTupleDescCopy(tupdesc);
-            ntuples = PQntuples(res);
-            nfields = PQnfields(res);
-        }
+    rsinfo->setResult = sinfo->tuplestore;
+    rsinfo->setDesc = tupdesc;
+}
-        /*
-         * check result and tuple descriptor have the same number of columns
-         */
-        if (nfields != tupdesc->natts)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("remote query result rowtype does not match "
-                            "the specified FROM clause rowtype")));
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+    if (sinfo->valbuf)
+    {
+        pfree(sinfo->valbuf);
+        sinfo->valbuf = NULL;
+    }
-        if (ntuples > 0)
-        {
-            AttInMetadata *attinmeta;
-            Tuplestorestate *tupstore;
-            MemoryContext oldcontext;
-            int            row;
-            char      **values;
-
-            attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-            oldcontext = MemoryContextSwitchTo(
-                                    rsinfo->econtext->ecxt_per_query_memory);
-            tupstore = tuplestore_begin_heap(true, false, work_mem);
-            rsinfo->setResult = tupstore;
-            rsinfo->setDesc = tupdesc;
-            MemoryContextSwitchTo(oldcontext);
+    if (sinfo->cstrs)
+    {
+        pfree(sinfo->cstrs);
+        sinfo->cstrs = NULL;
+    }
-            values = (char **) palloc(nfields * sizeof(char *));
+    MemoryContextSwitchTo(sinfo->oldcontext);
+}
-            /* put all tuples into the tuplestore */
-            for (row = 0; row < ntuples; row++)
-            {
-                HeapTuple    tuple;
+/* Prototype of this function is PQrowProcessor */
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+    storeInfo *sinfo = (storeInfo *)param;
+    HeapTuple  tuple;
+    int        newbuflen;
+    int        fields = PQnfields(res);
+    int        i;
+    char       **cstrs = sinfo->cstrs;
+    char       *pbuf;
+
+    if (sinfo->error_occurred)
+        return -1;
+
+    if (sinfo->nattrs != fields)
+    {
+        sinfo->error_occurred = TRUE;
+        sinfo->nummismatch = TRUE;
+        finishStoreInfo(sinfo);
-                if (!is_sql_cmd)
-                {
-                    int            i;
+        /* This error will be processed in dblink_record_internal() */
+        return -1;
+    }
-                    for (i = 0; i < nfields; i++)
-                    {
-                        if (PQgetisnull(res, row, i))
-                            values[i] = NULL;
-                        else
-                            values[i] = PQgetvalue(res, row, i);
-                    }
-                }
-                else
-                {
-                    values[0] = PQcmdStatus(res);
-                }
+    /*
+     * value input functions assumes that the input string is
+     * terminated by zero. We should make the values to be so.
+     */
-                /* build the tuple and put it into the tuplestore. */
-                tuple = BuildTupleFromCStrings(attinmeta, values);
-                tuplestore_puttuple(tupstore, tuple);
-            }
+    /*
+     * The length of the buffer for each field is value length + 1 for
+     * zero-termination
+     */
+    newbuflen = fields;
+    for(i = 0 ; i < fields ; i++)
+        newbuflen += columns[i].len;
+
+    if (newbuflen > sinfo->valbuflen)
+    {
+        int tmplen = sinfo->valbuflen * 2;
+        /*
+         * Try to (re)allocate in bigger steps to avoid flood of allocations
+         * on weird data.
+         */
+        while (newbuflen > tmplen && tmplen >= 0)
+            tmplen *= 2;
-            /* clean up and return the tuplestore */
-            tuplestore_donestoring(tupstore);
-        }
+        /* Check if the integer was wrap-rounded. */
+        if (tmplen < 0)
+            elog(ERROR, "Buffer size for one row exceeds integer limit");
-        PQclear(res);
+        sinfo->valbuf = (char *)repalloc(sinfo->valbuf, tmplen);
+        sinfo->valbuflen = tmplen;    }
-    PG_CATCH();
+
+    pbuf = sinfo->valbuf;
+    for(i = 0 ; i < fields ; i++)    {
-        /* be sure to release the libpq result */
-        PQclear(res);
-        PG_RE_THROW();
+        int len = columns[i].len;
+        if (len < 0)
+            cstrs[i] = NULL;
+        else
+        {
+            cstrs[i] = pbuf;
+            memcpy(pbuf, columns[i].value, len);
+            pbuf += len;
+            *pbuf++ = '\0';
+        }    }
-    PG_END_TRY();
+
+    /*
+     * These functions may throw exception. It will be caught in
+     * dblink_record_internal()
+     */
+    tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+    tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+    return 1;}/*
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 4de28ef..05d7e98 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -733,6 +733,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)            else            {
      DBLINK_GET_CONN;
 
+                conname = text_to_cstring(PG_GETARG_TEXT_PP(0));                sql =
text_to_cstring(PG_GETARG_TEXT_PP(1));           }        }
 
@@ -763,6 +764,8 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)        else            /* shouldn't
happen*/            elog(ERROR, "wrong number of arguments");
 
+
+        conname = text_to_cstring(PG_GETARG_TEXT_PP(0));    }    if (!conn)
diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out
index 511dd5e..2dcba15 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -371,7 +371,7 @@ SELECT *FROM dblink('myconn','SELECT * FROM foobar',false) AS t(a int, b text, c text[])WHERE t.a >
7;NOTICE: relation "foobar" does not exist
 
-CONTEXT:  Error occurred on dblink connection named "unnamed": could not execute query.
+CONTEXT:  Error occurred on dblink connection named "myconn": could not execute query. a | b | c ---+---+---(0 rows)

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

27 March 2012, 17:41:44

On Sat, Mar 24, 2012 at 02:22:24AM +0200, Marko Kreen wrote:
> Main advantage of including PQgetRow() together with low-level
> rowproc API is that it allows de-emphasizing more complex parts of
> rowproc API (exceptions, early exit, skipresult, custom error msg).
> And drop/undocument them or simply call them postgres-internal-only.

I thought more about exceptions and PQgetRow and found
interesting pattern:

- Exceptions are problematic if always allowed.  Although PQexec() is easy to fix in current code, trying to keep to
promiseof "exceptions are allowed from everywhere" adds non-trivial maintainability overhead to future libpq changes,
soinstead we should simply fix documentation. Especially as I cannot see any usage scenario that would benefit from
suchpromise.

- Multiple SELECTs from PQexec() are problematic even without exceptions: additional documentation is needed how to
detectthat rows are coming from new statement.

Now the interesting one:

- PQregisterProcessor() API allows changing the callback permanently. Thus breaking any user code which simply calls
PQexec()and expects regular PGresult back.  Again, nothing to fix code-wise, need to document that callback should be
setonly for current query, later changed back.

My conclusion is that row-processor API is low-level expert API and
quite easy to misuse.  It would be preferable to have something more
robust as end-user API, the PQgetRow() is my suggestion for that.
Thus I see 3 choices:

1) Push row-processor as main API anyway and describe all dangerous  scenarios in documentation.
2) Have both PQgetRow() and row-processor available in <libpq-fe.h>,  PQgetRow() as preferred API and row-processor for
expertusage,  with proper documentation what works and what does not.

3) Have PQgetRow() in <libpq-fe.h>, move row-processor to <libpq-int.h>.

I guess this needs committer decision which way to go?

Second conclusion is that current dblink row-processor usage is broken
when user uses multiple SELECTs in SQL as dblink uses plain PQexec().
Simplest fix would be to use PQexecParams() instead, but if keeping old
behaviour is important, then dblink needs to emulate PQexec() resultset
behaviour with row-processor or PQgetRow().

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

29 March 2012, 18:57:09

Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp> writes:
> I'm sorry to have coded a silly bug.
> The previous patch has a bug in realloc size calculation.
> And separation of the 'connname patch' was incomplete in regtest.
> It is fixed in this patch.

I've applied a modified form of the conname update patch.  It seemed to
me that the fault is really in the DBLINK_GET_CONN and
DBLINK_GET_NAMED_CONN macros, which ought to be responsible for setting
the surrounding function's conname variable along with conn, rconn, etc.
There was actually a second error of the same type visible in the dblink
regression test, which is also fixed by this more general method.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

29 March 2012, 19:56:52

Marko Kreen <markokr@gmail.com> writes:
> My conclusion is that row-processor API is low-level expert API and
> quite easy to misuse.  It would be preferable to have something more
> robust as end-user API, the PQgetRow() is my suggestion for that.
> Thus I see 3 choices:

> 1) Push row-processor as main API anyway and describe all dangerous
>    scenarios in documentation.
> 2) Have both PQgetRow() and row-processor available in <libpq-fe.h>,
>    PQgetRow() as preferred API and row-processor for expert usage,
>    with proper documentation what works and what does not.
> 3) Have PQgetRow() in <libpq-fe.h>, move row-processor to <libpq-int.h>.

I still am failing to see the use-case for PQgetRow.  ISTM the entire
point of a special row processor is to reduce the per-row processing
overhead, but PQgetRow greatly increases that overhead.  And it doesn't
reduce complexity much either IMO: you still have all the primary risk
factors arising from processing rows in advance of being sure that the
whole query completed successfully.  Plus it conflates "no more data"
with "there was an error receiving the data" or "there was an error on
the server side".  PQrecvRow alleviates the per-row-overhead aspect of
that but doesn't really do a thing from the complexity standpoint;
it doesn't look to me to be noticeably easier to use than a row
processor callback.

I think PQgetRow and PQrecvRow just add more API calls without making
any fundamental improvements, and so we could do without them.  "There's
more than one way to do it" is not necessarily a virtue.

> Second conclusion is that current dblink row-processor usage is broken
> when user uses multiple SELECTs in SQL as dblink uses plain PQexec().

Yeah.  Perhaps we should tweak the row-processor callback API so that
it gets an explicit notification that "this is a new resultset".
Duplicating PQexec's behavior would then involve having the dblink row
processor throw away any existing tuplestore and start over when it
gets such a call.

There's multiple ways to express that but the most convenient thing
from libpq's viewpoint, I think, is to have a callback that occurs
immediately after collecting a RowDescription message, before any
rows have arrived.  So maybe we could express that as a callback
with valid "res" but "columns" set to NULL?

A different approach would be to add a row counter to the arguments
provided to the row processor; then you'd know a new resultset had
started if you saw rowcounter == 0.  This might have another advantage
of not requiring the row processor to count the rows for itself, which
I think many row processors would otherwise have to do.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

30 March 2012, 01:31:02

> I've applied a modified form of the conname update patch.  It seemed to
> me that the fault is really in the DBLINK_GET_CONN and
> DBLINK_GET_NAMED_CONN macros, which ought to be responsible for setting
> the surrounding function's conname variable along with conn, rconn, etc.
> There was actually a second error of the same type visible in the dblink
> regression test, which is also fixed by this more general method.

Come to think of it, the patch is mere a verifying touch for the
bug mingled with the other part of the dblink patch to all
appearances. I totally agree with you. It should be dropped for
this time and done another time.

I'am sorry for bothering you by such a damn thing.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

30 March 2012, 12:53:13

On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > My conclusion is that row-processor API is low-level expert API and
> > quite easy to misuse.  It would be preferable to have something more
> > robust as end-user API, the PQgetRow() is my suggestion for that.
> > Thus I see 3 choices:
> 
> > 1) Push row-processor as main API anyway and describe all dangerous
> >    scenarios in documentation.
> > 2) Have both PQgetRow() and row-processor available in <libpq-fe.h>,
> >    PQgetRow() as preferred API and row-processor for expert usage,
> >    with proper documentation what works and what does not.
> > 3) Have PQgetRow() in <libpq-fe.h>, move row-processor to <libpq-int.h>.
> 
> I still am failing to see the use-case for PQgetRow.  ISTM the entire
> point of a special row processor is to reduce the per-row processing
> overhead, but PQgetRow greatly increases that overhead.

No, decreasing CPU overhead is minor win.  I guess in realistic
application, like dblink, you can't even measure the difference.

The *major* win comes from avoiding buffering of all rows in PGresult.
Ofcourse, this is noticeable only with bigger resultsets.
I guess such buffering pessimizes memory usage: code always
works on cold cache.  And simply keeping RSS low is good for
long-term health of a process.

Second major win is avoiding the need to use cursor with small chunks
to access resultset of unknown size.  Thus stalling application
until next block arrives from network.

The "PGresult *PQgetRow()" is for applications that do not convert
rows immediately to some internal format, but keep using PGresult.
So they can be converted to row-by-row processing with minimal
changes to actual code.

Note that the PGrowValue is temporary struct that application *must*
move data away from.  If app internally uses PGresult, then it's
pretty annoying to invent a new internal format for long-term
storage.

But maybe I'm overestimating the number of such applications.

> And it doesn't
> reduce complexity much either IMO: you still have all the primary risk
> factors arising from processing rows in advance of being sure that the
> whole query completed successfully.

It avoids the complexity of:

* How to pass error from callback to upper code

* Needing to know how exceptions behave

* How to use early exit to pass rows to upper code one-by-one, (by storing the PGresult and PGrowValue in temp place
andlater checking their values)

* How to detect that new resultset has started. (keeping track of previous PGresult or noting some quirky API behaviour
wemay invent for such case)

* Needing to make sure the callback does not leak to call-sites that expect regular libpq behaviour. ("Always call
PQregisterRowProcessor(db,NULL, NULL) after query finishes" ) ["But now I'm in exception handler, how do I find the
connection?"]

I've now reviewed the callback code and even done some coding with it
and IMHO it's too low-level to be end-user-API.

Yes, the "query-may-still-fail" complexity remains, but thats not unlike
the usual "multi-statement-transaction-is-not-guaranteed-to-succeed"
complexity.

Another compexity that remains is "how-to-skip-current-resultset",
but that is a problem only on sync connections and the answer is
simple - "call PQgetResult()".  Or "call PQgetRow/PQrecvRow" if
user wants to avoid buffering.

> Plus it conflates "no more data"
> with "there was an error receiving the data" or "there was an error on
> the server side".

Well, current PQgetRow() is written with style: "return only single-row
PGresult, to see errors user must call PQgetResult()".  Basically
so that user it forced to fall back familiar libpq usage pattern.

It can be changed, so that PQgetRow() returns also errors.

Or we can drop it and just keep PQrecvRow().

> PQrecvRow alleviates the per-row-overhead aspect of
> that but doesn't really do a thing from the complexity standpoint;
> it doesn't look to me to be noticeably easier to use than a row
> processor callback.

> I think PQgetRow and PQrecvRow just add more API calls without making
> any fundamental improvements, and so we could do without them.  "There's
> more than one way to do it" is not necessarily a virtue.

Please re-read the above list of problematic situations that this API
fixes.  Then, if you still think that PQrecvRow() is pointless, sure,
let's drop it.

We can also postpone it to 9.3, to poll users whether they want
easier API, or is maximum performance important.  (PQrecvRow()
*does* have few cycles of overhead compared to callbacks.)

Only option that we have on the table for 9.2 but not later
is moving the callback API to <libpq-int.h>.

> > Second conclusion is that current dblink row-processor usage is broken
> > when user uses multiple SELECTs in SQL as dblink uses plain PQexec().
> 
> Yeah.  Perhaps we should tweak the row-processor callback API so that
> it gets an explicit notification that "this is a new resultset".
> Duplicating PQexec's behavior would then involve having the dblink row
> processor throw away any existing tuplestore and start over when it
> gets such a call.
> 
> There's multiple ways to express that but the most convenient thing
> from libpq's viewpoint, I think, is to have a callback that occurs
> immediately after collecting a RowDescription message, before any
> rows have arrived.  So maybe we could express that as a callback
> with valid "res" but "columns" set to NULL?
> 
> A different approach would be to add a row counter to the arguments
> provided to the row processor; then you'd know a new resultset had
> started if you saw rowcounter == 0.  This might have another advantage
> of not requiring the row processor to count the rows for itself, which
> I think many row processors would otherwise have to do.

Try to imagine how final documentation will look like.

Then imagine documentation for PGrecvRow() / PQgetRow().

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

30 March 2012, 12:59:30

Marko Kreen <markokr@gmail.com> writes:
> On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:
>> Marko Kreen <markokr@gmail.com> writes:
>>> Second conclusion is that current dblink row-processor usage is broken
>>> when user uses multiple SELECTs in SQL as dblink uses plain PQexec().

>> Yeah.  Perhaps we should tweak the row-processor callback API so that
>> it gets an explicit notification that "this is a new resultset".
>> Duplicating PQexec's behavior would then involve having the dblink row
>> processor throw away any existing tuplestore and start over when it
>> gets such a call.
>> 
>> There's multiple ways to express that but the most convenient thing
>> from libpq's viewpoint, I think, is to have a callback that occurs
>> immediately after collecting a RowDescription message, before any
>> rows have arrived.  So maybe we could express that as a callback
>> with valid "res" but "columns" set to NULL?
>> 
>> A different approach would be to add a row counter to the arguments
>> provided to the row processor; then you'd know a new resultset had
>> started if you saw rowcounter == 0.  This might have another advantage
>> of not requiring the row processor to count the rows for itself, which
>> I think many row processors would otherwise have to do.

> Try to imagine how final documentation will look like.

> Then imagine documentation for PGrecvRow() / PQgetRow().

What's your point, exactly?  PGrecvRow() / PQgetRow() aren't going to
make that any better as currently defined, because there's noplace to
indicate "this is a new resultset" in those APIs either.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

30 March 2012, 13:05:20

On Fri, Mar 30, 2012 at 11:59:12AM -0400, Tom Lane wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:
> >> Marko Kreen <markokr@gmail.com> writes:
> >>> Second conclusion is that current dblink row-processor usage is broken
> >>> when user uses multiple SELECTs in SQL as dblink uses plain PQexec().
> 
> >> Yeah.  Perhaps we should tweak the row-processor callback API so that
> >> it gets an explicit notification that "this is a new resultset".
> >> Duplicating PQexec's behavior would then involve having the dblink row
> >> processor throw away any existing tuplestore and start over when it
> >> gets such a call.
> >> 
> >> There's multiple ways to express that but the most convenient thing
> >> from libpq's viewpoint, I think, is to have a callback that occurs
> >> immediately after collecting a RowDescription message, before any
> >> rows have arrived.  So maybe we could express that as a callback
> >> with valid "res" but "columns" set to NULL?
> >> 
> >> A different approach would be to add a row counter to the arguments
> >> provided to the row processor; then you'd know a new resultset had
> >> started if you saw rowcounter == 0.  This might have another advantage
> >> of not requiring the row processor to count the rows for itself, which
> >> I think many row processors would otherwise have to do.
> 
> > Try to imagine how final documentation will look like.
> 
> > Then imagine documentation for PGrecvRow() / PQgetRow().
> 
> What's your point, exactly?  PGrecvRow() / PQgetRow() aren't going to
> make that any better as currently defined, because there's noplace to
> indicate "this is a new resultset" in those APIs either.

Have you looked at the examples?  PQgetResult() is pretty good hint
that one resultset finished...

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

30 March 2012, 13:30:41

On Fri, Mar 30, 2012 at 7:04 PM, Marko Kreen <markokr@gmail.com> wrote:
> Have you looked at the examples?  PQgetResult() is pretty good hint
> that one resultset finished...

Ok, the demos are around this long thread and hard to find,
so here is a summary of links:

Original design mail:
 http://archives.postgresql.org/message-id/20120224154616.GA16985@gmail.com

First patch with quick demos:
 http://archives.postgresql.org/message-id/20120226221922.GA6981@gmail.com

Demos as diff:
 http://archives.postgresql.org/message-id/20120324002224.GA19635@gmail.com

Demos/experiments/tests (bit messier than the demos-as-diffs):
 https://github.com/markokr/libpq-rowproc-demos

Note - the point is that user *must* call PQgetResult() when resultset ends.
Thus also the "PQgetRow() does not return errors" decision.

I'll put this mail into commitfest page too, seems I've forgotten to
put some mails there.

--
marko

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

30 March 2012, 18:19:00

Marko Kreen <markokr@gmail.com> writes:
> On Wed, Mar 07, 2012 at 03:14:57PM +0900, Kyotaro HORIGUCHI wrote:
>>> My suggestion - check in getAnotherTuple whether resultStatus is
>>> already error and do nothing then.  This allows internal pqAddRow
>>> to set regular "out of memory" error.  Otherwise give generic
>>> "row processor error".

>> Current implement seems already doing this in
>> parseInput3(). Could you give me further explanation?

> The suggestion was about getAnotherTuple() - currently it sets
> always "error in row processor".  With such check, the callback
> can set the error result itself.  Currently only callbacks that
> live inside libpq can set errors, but if we happen to expose
> error-setting function in outside API, then the getAnotherTuple()
> would already be ready for it.

I'm pretty dissatisfied with the error reporting situation for row
processors.  You can't just decide not to solve it, which seems to be
the current state of affairs.  What I'm inclined to do is to add a
"char **" parameter to the row processor, and say that when the
processor returns -1 it can store an error message string there.
If it does so, that's what we report.  If it doesn't (which we'd detect
by presetting the value to NULL), then use a generic "error in row
processor" message.  This is cheap and doesn't prevent the row processor
from using some application-specific error reporting method if it wants;
but it does allow the processor to make use of libpq's error mechanism
when that's preferable.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

30 March 2012, 19:00:05

On Fri, Mar 30, 2012 at 05:18:42PM -0400, Tom Lane wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > On Wed, Mar 07, 2012 at 03:14:57PM +0900, Kyotaro HORIGUCHI wrote:
> >>> My suggestion - check in getAnotherTuple whether resultStatus is
> >>> already error and do nothing then.  This allows internal pqAddRow
> >>> to set regular "out of memory" error.  Otherwise give generic
> >>> "row processor error".
> 
> >> Current implement seems already doing this in
> >> parseInput3(). Could you give me further explanation?
> 
> > The suggestion was about getAnotherTuple() - currently it sets
> > always "error in row processor".  With such check, the callback
> > can set the error result itself.  Currently only callbacks that
> > live inside libpq can set errors, but if we happen to expose
> > error-setting function in outside API, then the getAnotherTuple()
> > would already be ready for it.
> 
> I'm pretty dissatisfied with the error reporting situation for row
> processors.  You can't just decide not to solve it, which seems to be
> the current state of affairs.  What I'm inclined to do is to add a
> "char **" parameter to the row processor, and say that when the
> processor returns -1 it can store an error message string there.
> If it does so, that's what we report.  If it doesn't (which we'd detect
> by presetting the value to NULL), then use a generic "error in row
> processor" message.  This is cheap and doesn't prevent the row processor
> from using some application-specific error reporting method if it wants;
> but it does allow the processor to make use of libpq's error mechanism
> when that's preferable.

Yeah.

But such API seems to require specifying allocator, which seems ugly.
I think it would be better to just use Kyotaro's original idea
of PQsetRowProcessorError() which nicer to use.

Few thoughts on the issue:

----------

As libpq already provides quite good coverage of PGresult
manipulation APIs, then how about:
 void PQsetResultError(PGresult *res, const char *msg);

that does:
 res->errMsg = pqResultStrdup(msg); res->resultStatus = PGRES_FATAL_ERROR;

that would also cause minimal fuss in getAnotherTuple().

------------

I would actually like even more:
 void PQsetConnectionError(PGconn *conn, const char *msg);

that does full-blown libpq error logic.  Thus it would be 
useful everywherewhere in libpq.  But it seems bit too disruptive,
so I would like a ACK from a somebody who knows libpq better.
(well, from you...)

-----------

Another thought - if we have API to set error from *outside*
of row-processor callback, that would immediately solve the
"how to skip incoming resultset without buffering it" problem.

And it would be usable for PQgetRow()/PQrecvRow() too.

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

30 March 2012, 19:14:10

Marko Kreen <markokr@gmail.com> writes:
> On Fri, Mar 30, 2012 at 05:18:42PM -0400, Tom Lane wrote:
>> I'm pretty dissatisfied with the error reporting situation for row
>> processors.  You can't just decide not to solve it, which seems to be
>> the current state of affairs.  What I'm inclined to do is to add a
>> "char **" parameter to the row processor, and say that when the
>> processor returns -1 it can store an error message string there.

> But such API seems to require specifying allocator, which seems ugly.

Not if the message is a constant string, which seems like the typical
situation (think "out of memory").  If the row processor does need a
buffer for a constructed string, it could make use of some space in its
"void *param" area, for instance.

> I think it would be better to just use Kyotaro's original idea
> of PQsetRowProcessorError() which nicer to use.

I don't particularly care for that idea because it opens up all sorts of
potential issues when such a function is called at the wrong time.
Moreover, you have to remember that the typical situation here is that
we're going to be out of memory or otherwise in trouble, which means
you've got to be really circumspect about what you assume will work.
Row processors that think they can do a lot of fancy message
construction should be discouraged, and an API that requires
construction of a new PGresult in order to return an error is right out.
(This is why getAnotherTuple is careful to clear the failed result
before it tries to build a new one.  But that trick isn't going to be
available to an external row processor.)
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

30 March 2012, 19:27:03

On Sat, Mar 31, 2012 at 1:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> On Fri, Mar 30, 2012 at 05:18:42PM -0400, Tom Lane wrote:
>>> I'm pretty dissatisfied with the error reporting situation for row
>>> processors.  You can't just decide not to solve it, which seems to be
>>> the current state of affairs.  What I'm inclined to do is to add a
>>> "char **" parameter to the row processor, and say that when the
>>> processor returns -1 it can store an error message string there.
>
>> But such API seems to require specifying allocator, which seems ugly.
>
> Not if the message is a constant string, which seems like the typical
> situation (think "out of memory").  If the row processor does need a
> buffer for a constructed string, it could make use of some space in its
> "void *param" area, for instance.

If it's specified as string that libpq does not own, then I'm fine with it.

>> I think it would be better to just use Kyotaro's original idea
>> of PQsetRowProcessorError() which nicer to use.
>
> I don't particularly care for that idea because it opens up all sorts of
> potential issues when such a function is called at the wrong time.
> Moreover, you have to remember that the typical situation here is that
> we're going to be out of memory or otherwise in trouble, which means
> you've got to be really circumspect about what you assume will work.
> Row processors that think they can do a lot of fancy message
> construction should be discouraged, and an API that requires
> construction of a new PGresult in order to return an error is right out.
> (This is why getAnotherTuple is careful to clear the failed result
> before it tries to build a new one.  But that trick isn't going to be
> available to an external row processor.)

Kyotaro's original idea was to assume out-of-memory if error
string was not set, thus the callback needed to set the string
only when it really had something to say.

--
marko

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

30 March 2012, 19:35:48

Marko Kreen <markokr@gmail.com> writes:
> On Sat, Mar 31, 2012 at 1:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Not if the message is a constant string, which seems like the typical
>> situation (think "out of memory"). �If the row processor does need a
>> buffer for a constructed string, it could make use of some space in its
>> "void *param" area, for instance.

> If it's specified as string that libpq does not own, then I'm fine with it.

Check.  Let's make it "const char **" in fact, just to be clear on that.

>> (This is why getAnotherTuple is careful to clear the failed result
>> before it tries to build a new one. �But that trick isn't going to be
>> available to an external row processor.)

> Kyotaro's original idea was to assume out-of-memory if error
> string was not set, thus the callback needed to set the string
> only when it really had something to say.

Hmm.  We could still do that in conjunction with the idea of returning
the string from the row processor, but I'm not sure if it's useful or
just overly cute.

[ thinks... ]  A small advantage of assuming NULL means that is that
we could postpone the libpq_gettext("out of memory") call until after
clearing the overflowed PGresult, which would greatly improve the odds
of getting a nicely translated result and not just ASCII.  Might be
worth it just for that.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

31 March 2012, 12:50:46

Marko Kreen <markokr@gmail.com> writes:
> On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:
>> Yeah.  Perhaps we should tweak the row-processor callback API so that
>> it gets an explicit notification that "this is a new resultset".
>> Duplicating PQexec's behavior would then involve having the dblink row
>> processor throw away any existing tuplestore and start over when it
>> gets such a call.
>> 
>> There's multiple ways to express that but the most convenient thing
>> from libpq's viewpoint, I think, is to have a callback that occurs
>> immediately after collecting a RowDescription message, before any
>> rows have arrived.  So maybe we could express that as a callback
>> with valid "res" but "columns" set to NULL?
>> 
>> A different approach would be to add a row counter to the arguments
>> provided to the row processor; then you'd know a new resultset had
>> started if you saw rowcounter == 0.  This might have another advantage
>> of not requiring the row processor to count the rows for itself, which
>> I think many row processors would otherwise have to do.

I had been leaning towards the second approach with a row counter,
because it seemed cleaner, but I thought of another consideration that
makes the first way seem better.  Suppose that your row processor has to
do some setup work at the start of a result set, and that work needs to
see the resultset properties (eg number of columns) so it can't be done
before starting PQgetResult.  In the patch as submitted, the
only way to manage that is to keep enough state to recognize that the
current row processor call is the first one, which we realized is
inadequate for multiple-result-set cases.  With a row counter argument
you can do the setup whenever rowcount == 0, which fixes that.  But
neither of these methods deals correctly with an empty result set!
To make that work, you need to add extra logic after the PQgetResult
call to do the setup work the row processor should have done but never
got a chance to.  So that's ugly, and it makes for an easy-to-miss bug.
A call that occurs when we receive RowDescription, independently of
whether the result set contains any rows, makes this a lot cleaner.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

01 April 2012, 18:51:40

I've been thinking some more about the early-termination cases (where
the row processor returns zero or longjmps), and I believe I have got
proposals that smooth off most of the rough edges there.

First off, returning zero is really pretty unsafe as it stands, because
it only works more-or-less-sanely if the connection is being used in
async style.  If the row processor returns zero within a regular
PQgetResult call, that will cause PQgetResult to block waiting for more
input.  Which might not be forthcoming, if we're in the last bufferload
of a query response from the server.  Even in the async case, I think
it's a bad design to have PQisBusy return true when the row processor
requested stoppage.  In that situation, there is work available for the
outer application code to do, whereas normally PQisBusy == true means
we're still waiting for the server.

I think we can fix this by introducing a new PQresultStatus, called say
PGRES_SUSPENDED, and having PQgetResult return an empty PGresult with
status PGRES_SUSPENDED after the row processor has returned zero.
Internally, there'd also be a new asyncStatus PGASYNC_SUSPENDED,
which we'd set before exiting from the getAnotherTuple call.  This would
cause PQisBusy and PQgetResult to do the right things.  In PQgetResult,
we'd switch back to PGASYNC_BUSY state after producing a PGRES_SUSPENDED
result, so that subsequent calls would resume parsing input.

With this design, a suspending row processor can be used safely in
either async or non-async mode.  It does cost an extra PGresult creation
and deletion per cycle, but that's not much more than a malloc and free.

Also, we can document that a longjmp out of the row processor leaves the
library in the same state as if the row processor had returned zero and
a PGRES_SUSPENDED result had been returned to the application; which
will be a true statement in all cases, sync or async.

I also mentioned earlier that I wasn't too thrilled with the design of
PQskipResult; in particular that it would encourage application writers
to miss server-sent error results, which would inevitably be a bad idea.
I think what we ought to do is define (and implement) it as being
exactly equivalent to PQgetResult, except that it temporarily installs
a dummy row processor so that data rows are discarded rather than
accumulated.  Then, the documented way to clean up after deciding to
abandon a suspended query will be to do PQskipResult until it returns
null, paying normal attention to any result statuses other than
PGRES_TUPLES_OK.  This is still not terribly helpful for async-mode
applications, but what they'd probably end up doing is installing their
own dummy row processors and then flushing results as part of their
normal outer loop.  The only thing we could do for them is to expose
a dummy row processor, which seems barely worthwhile given that it's
a one-line function.

I remain of the opinion that PQgetRow/PQrecvRow aren't adding much
usability-wise.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

01 April 2012, 20:04:17

On Sun, Apr 01, 2012 at 05:51:19PM -0400, Tom Lane wrote:
> I've been thinking some more about the early-termination cases (where
> the row processor returns zero or longjmps), and I believe I have got
> proposals that smooth off most of the rough edges there.
> 
> First off, returning zero is really pretty unsafe as it stands, because
> it only works more-or-less-sanely if the connection is being used in
> async style.  If the row processor returns zero within a regular
> PQgetResult call, that will cause PQgetResult to block waiting for more
> input.  Which might not be forthcoming, if we're in the last bufferload
> of a query response from the server.  Even in the async case, I think
> it's a bad design to have PQisBusy return true when the row processor
> requested stoppage.  In that situation, there is work available for the
> outer application code to do, whereas normally PQisBusy == true means
> we're still waiting for the server.
> 
> I think we can fix this by introducing a new PQresultStatus, called say
> PGRES_SUSPENDED, and having PQgetResult return an empty PGresult with
> status PGRES_SUSPENDED after the row processor has returned zero.
> Internally, there'd also be a new asyncStatus PGASYNC_SUSPENDED,
> which we'd set before exiting from the getAnotherTuple call.  This would
> cause PQisBusy and PQgetResult to do the right things.  In PQgetResult,
> we'd switch back to PGASYNC_BUSY state after producing a PGRES_SUSPENDED
> result, so that subsequent calls would resume parsing input.
> 
> With this design, a suspending row processor can be used safely in
> either async or non-async mode.  It does cost an extra PGresult creation
> and deletion per cycle, but that's not much more than a malloc and free.

I added extra magic to PQisBusy(), you are adding extra magic to
PQgetResult().  Not much difference.

Seems we both lost sight of actual usage scenario for the early-exit
logic - that both callback and upper-level code *must* cooperate
for it to be useful.  Instead, we designed API for non-cooperating case,
which is wrong.

So the proper approach would be to have new API call, designed to
handle it, and allow early-exit only from there.

That would also avoid any breakage of old APIs.  Also it would avoid
any accidental data loss, if the user code does not have exactly
right sequence of calls.

How about PQisBusy2(), which returns '2' when early-exit is requested?
Please suggest something better...


-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

01 April 2012, 20:23:28

Marko Kreen <markokr@gmail.com> writes:
> Seems we both lost sight of actual usage scenario for the early-exit
> logic - that both callback and upper-level code *must* cooperate
> for it to be useful.  Instead, we designed API for non-cooperating case,
> which is wrong.

Exactly.  So you need an extra result state, or something isomorphic.

> So the proper approach would be to have new API call, designed to
> handle it, and allow early-exit only from there.

> That would also avoid any breakage of old APIs.  Also it would avoid
> any accidental data loss, if the user code does not have exactly
> right sequence of calls.

> How about PQisBusy2(), which returns '2' when early-exit is requested?
> Please suggest something better...

My proposal is way better than that.  You apparently aren't absorbing my
point, which is that making this behavior unusable with every existing
API (whether intentionally or by oversight) isn't an improvement.
The row processor needs to be able to do this *without* assuming a
particular usage style, and most particularly it should not force people
to use async mode.

An alternative that I'd prefer to that one is to get rid of the
suspension return mode altogether.  However, that leaves us needing
to document what it means to longjmp out of a row processor without
having any comparable API concept, so I don't really find it better.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

03 April 2012, 12:34:08

I've whacked the libpq part of this patch around to the point where I'm
reasonably satisfied with it (attached), and am now looking at the
dblink part.  I soon realized that there's a rather nasty issue with the
dblink patch, which is that it fundamentally doesn't work for async
operations.  In an async setup what you would do is dblink_send_query(),
then periodically poll with dblink_is_busy(), then when it says the
query is done, collect the results with dblink_get_result().  The
trouble with this is that PQisBusy will invoke the standard row
processor, so by the time dblink_get_result runs it's way too late to
switch row processors.

I thought about fixing that by installing dblink's custom row processor
permanently, but that doesn't really work because we don't know the
expected tuple column datatypes until we see the call environment for
dblink_get_result().

A hack on top of that hack would be to collect the data into a
tuplestore that contains all text columns, and then convert to the
correct rowtype during dblink_get_result, but that seems rather ugly
and not terribly high-performance.

What I'm currently thinking we should do is just use the old method
for async queries, and only optimize the synchronous case.

I thought for awhile that this might represent a generic deficiency
in the whole concept of a row processor, but probably it's mostly
down to dblink's rather bizarre API.  It would be unusual I think for
people to want a row processor that couldn't know what to do until
after the entire query result is received.

            regards, tom lane


diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 96064bbb0de8cdb0ad3bd52f35c7d845038acf1f..af4db64a06da0a6fc53b77d5e2d7cb366b384e8b 100644
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*************** ExecStatusType PQresultStatus(const PGre
*** 2304,2309 ****
--- 2304,2319 ----
             </para>
            </listitem>
           </varlistentry>
+
+          <varlistentry id="libpq-pgres-suspended">
+           <term><literal>PGRES_SUSPENDED</literal></term>
+           <listitem>
+            <para>
+             A custom row processor requested suspension of processing of
+             a query result (see <xref linkend="libpq-row-processor">).
+            </para>
+           </listitem>
+          </varlistentry>
          </variablelist>

          If the result status is <literal>PGRES_TUPLES_OK</literal>, then
*************** defaultNoticeProcessor(void *arg, const
*** 5581,5586 ****
--- 5591,5879 ----

   </sect1>

+  <sect1 id="libpq-row-processor">
+   <title>Custom Row Processing</title>
+
+   <indexterm zone="libpq-row-processor">
+    <primary>PQrowProcessor</primary>
+   </indexterm>
+
+   <indexterm zone="libpq-row-processor">
+    <primary>row processor</primary>
+    <secondary>in libpq</secondary>
+   </indexterm>
+
+   <para>
+    Ordinarily, when receiving a query result from the server,
+    <application>libpq</> adds each row value to the current
+    <type>PGresult</type> until the entire result set is received; then
+    the <type>PGresult</type> is returned to the application as a unit.
+    This approach is simple to work with, but becomes inefficient for large
+    result sets.  To improve performance, an application can register a
+    custom <firstterm>row processor</> function that processes each row
+    as the data is received from the network.  The custom row processor could
+    process the data fully, or store it into some application-specific data
+    structure for later processing.
+   </para>
+
+   <caution>
+    <para>
+     The row processor function sees the rows before it is known whether the
+     query will succeed overall, since the server might return some rows before
+     encountering an error.  For proper transactional behavior, it must be
+     possible to discard or undo whatever the row processor has done, if the
+     query ultimately fails.
+    </para>
+   </caution>
+
+   <para>
+    When using a custom row processor, row data is not accumulated into the
+    <type>PGresult</type>, so the <type>PGresult</type> ultimately delivered to
+    the application will contain no rows (<function>PQntuples</> =
+    <literal>0</>).  However, it still has <function>PQresultStatus</> =
+    <literal>PGRES_TUPLES_OK</>, and it contains correct information about the
+    set of columns in the query result.  On the other hand, if the query fails
+    partway through, the returned <type>PGresult</type> has
+    <function>PQresultStatus</> = <literal>PGRES_FATAL_ERROR</>.  The
+    application must be prepared to undo any actions of the row processor
+    whenever it gets a <literal>PGRES_FATAL_ERROR</> result.
+   </para>
+
+   <para>
+    A custom row processor is registered for a particular connection by
+    calling <function>PQsetRowProcessor</function>, described below.
+    This row processor will be used for all subsequent query results on that
+    connection until changed again.  A row processor function must have a
+    signature matching
+
+ <synopsis>
+ typedef int (*PQrowProcessor) (PGresult *res, const PGdataValue *columns,
+                                const char **errmsg, void *param);
+ </synopsis>
+    where <type>PGdataValue</> is described by
+ <synopsis>
+ typedef struct pgDataValue
+ {
+     int         len;            /* data length in bytes, or <0 if NULL */
+     const char *value;          /* data value, without zero-termination */
+ } PGdataValue;
+ </synopsis>
+   </para>
+
+   <para>
+    The <parameter>res</> parameter is the <literal>PGRES_TUPLES_OK</>
+    <type>PGresult</type> that will eventually be delivered to the calling
+    application (if no error intervenes).  It contains information about
+    the set of columns in the query result, but no row data.  In particular the
+    row processor must fetch <literal>PQnfields(res)</> to know the number of
+    data columns.
+   </para>
+
+   <para>
+    Immediately after <application>libpq</> has determined the result set's
+    column information, it will make a call to the row processor with
+    <parameter>columns</parameter> set to NULL, but the other parameters as
+    usual.  The row processor can use this call to initialize for a new result
+    set; if it has nothing to do, it can just return <literal>1</>.  In
+    subsequent calls, one per received row, <parameter>columns</parameter>
+    is non-NULL and points to an array of length/pointer structs, one per
+    data column.
+   </para>
+
+   <para>
+    <parameter>errmsg</parameter> is an output parameter used only for error
+    reporting.  If the row processor needs to report an error, it can set
+    <literal>*</><parameter>errmsg</parameter> to point to a suitable message
+    string (and then return <literal>-1</>).  As a special case, returning
+    <literal>-1</> without changing <literal>*</><parameter>errmsg</parameter>
+    from its initial value of NULL is taken to mean <quote>out of memory</>.
+   </para>
+
+   <para>
+    The last parameter, <parameter>param</parameter>, is just a void pointer
+    passed through from <function>PQsetRowProcessor</function>.  This can be
+    used for communication between the row processor function and the
+    surrounding application.
+   </para>
+
+   <para>
+    The row processor function must return one of three values.
+    <literal>1</> is the normal, successful result value; <application>libpq</>
+    will continue with receiving row values from the server and passing them to
+    the row processor.
+    <literal>-1</> indicates that the row processor has encountered an error.
+    <application>libpq</> will discard all remaining rows in the result set
+    and then return a <literal>PGRES_FATAL_ERROR</> <type>PGresult</type> to
+    the application (containing the specified error message, or <quote>out of
+    memory for query result</> if <literal>*</><parameter>errmsg</parameter>
+    was left as NULL).
+    <literal>0</> means that <application>libpq</> should suspend receiving the
+    query result and return a <literal>PGRES_SUSPENDED</> <type>PGresult</type>
+    to the application.  This option allows the surrounding application to deal
+    with the row value and then resume reading the query result.
+   </para>
+
+   <para>
+    In the <type>PGdataValue</> array passed to a row processor, data values
+    cannot be assumed to be zero-terminated, whether the data format is text
+    or binary.  A SQL NULL value is indicated by a negative length field.
+   </para>
+
+   <para>
+    The row processor <emphasis>must</> process the row data values
+    immediately, or else copy them into application-controlled storage.
+    The value pointers passed to the row processor point into
+    <application>libpq</>'s internal data input buffer, which will be
+    overwritten by the next packet fetch.  However, if the row processor
+    returns <literal>0</> to request suspension, the passed values will
+    still be valid when control returns to the calling application.
+   </para>
+
+   <para>
+    After a row processor requests suspension, <application>libpq</> returns a
+    <literal>PGRES_SUSPENDED</> <type>PGresult</type> to the application.
+    This <type>PGresult</type> contains no other information and can be dropped
+    immediately with <function>PQclear</>.  When the application is ready to
+    resume processing of the query result, it should call
+    <function>PQgetResult</>; this may result in another
+    <literal>PGRES_SUSPENDED</> result, etc, until the result set is completely
+    processed.  As with any usage of <function>PQgetResult</>, the application
+    should continue calling <function>PQgetResult</> until it gets a NULL
+    result before issuing any new query.
+   </para>
+
+   <para>
+    Another option for exiting a row processor is to throw an exception using
+    C's <function>longjmp()</> or C++'s <literal>throw</>.  If this is done,
+    the state of the <application>libpq</> connection is the same as if the row
+    processor had requested suspension, and the application can resume
+    processing with <function>PQgetResult</>.
+   </para>
+
+   <para>
+    In some cases, a suspension or exception may mean that the remainder of the
+    query result is not interesting.  In such cases the application can discard
+    the remaining rows with <function>PQskipResult</>, described below.
+    Another possible recovery option is to close the connection altogether with
+    <function>PQfinish</>.
+   </para>
+
+   <para>
+    <variablelist>
+     <varlistentry id="libpq-pqsetrowprocessor">
+      <term>
+       <function>PQsetRowProcessor</function>
+       <indexterm>
+        <primary>PQsetRowProcessor</primary>
+       </indexterm>
+      </term>
+
+      <listitem>
+       <para>
+        Sets a callback function to process each row.
+
+ <synopsis>
+ void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+ </synopsis>
+       </para>
+
+       <para>
+        The specified row processor function <parameter>func</> is installed as
+        the active row processor for the given connection <parameter>conn</>.
+        Also, <parameter>param</> is installed as the passthrough pointer to
+        pass to it.  Alternatively, if <parameter>func</> is NULL, the standard
+        row processor is reinstalled on the given connection (and
+        <parameter>param</> is ignored).
+       </para>
+
+       <para>
+        Although the row processor can be changed at any time in the life of a
+        connection, it's generally unwise to do so while a query is active.
+        In particular, when using asynchronous mode, be aware that both
+        <function>PQisBusy</> and <function>PQgetResult</> can call the current
+        row processor.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="libpq-pqgetrowprocessor">
+      <term>
+       <function>PQgetRowProcessor</function>
+       <indexterm>
+        <primary>PQgetRowProcessor</primary>
+       </indexterm>
+      </term>
+
+      <listitem>
+       <para>
+        Fetches the current row processor for the specified connection.
+
+ <synopsis>
+ PQrowProcessor PQgetRowProcessor(const PGconn *conn, void **param);
+ </synopsis>
+       </para>
+
+       <para>
+        In addition to returning the row processor function pointer, the
+        current passthrough pointer will be returned at
+        <literal>*</><parameter>param</>, if <parameter>param</> is not NULL.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="libpq-pqskipresult">
+      <term>
+       <function>PQskipResult</function>
+       <indexterm>
+        <primary>PQskipResult</primary>
+       </indexterm>
+      </term>
+
+      <listitem>
+       <para>
+        Discard all the remaining rows in the incoming result set.
+
+ <synopsis>
+ PGresult *PQskipResult(PGconn *conn);
+ </synopsis>
+       </para>
+
+       <para>
+        This is a simple convenience function to discard incoming data after a
+        row processor has failed or it's determined that the rest of the result
+        set is not interesting.  <function>PQskipResult</> is exactly
+        equivalent to <function>PQgetResult</> except that it transiently
+        installs a dummy row processor function that just discards data.
+        The returned <type>PGresult</> can be discarded without further ado
+        if it has status <literal>PGRES_TUPLES_OK</>; but other status values
+        should be handled normally.  (In particular,
+        <literal>PGRES_FATAL_ERROR</> indicates a server-reported error that
+        will still need to be dealt with.)
+        As when using <function>PQgetResult</>, one should usually repeat the
+        call until NULL is returned to ensure the connection has reached an
+        idle state.  Another possible usage is to call
+        <function>PQskipResult</> just once, and then resume using
+        <function>PQgetResult</> to process subsequent result sets normally.
+       </para>
+
+       <para>
+        Because <function>PQskipResult</> will wait for server input, it is not
+        very useful in asynchronous applications.  In particular you should not
+        code a loop of <function>PQisBusy</> and <function>PQskipResult</>,
+        because that will result in the normal row processor being called
+        within <function>PQisBusy</>.  To get the proper behavior in an
+        asynchronous application, you'll need to install a dummy row processor
+        (or set a flag to make your normal row processor do nothing) and leave
+        it there until you have discarded all incoming data via your normal
+        <function>PQisBusy</> and <function>PQgetResult</> loop.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+
+  </sect1>
+
   <sect1 id="libpq-events">
    <title>Event System</title>

diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 715e23167de82709da99b8c8afa58d81935eb25c..33dc97e95f249576165adb28d7c59b556a2ea08b 100644
*** a/src/bin/psql/common.c
--- b/src/bin/psql/common.c
*************** AcceptResult(const PGresult *result)
*** 450,456 ****
              case PGRES_EMPTY_QUERY:
              case PGRES_COPY_IN:
              case PGRES_COPY_OUT:
-             case PGRES_COPY_BOTH:
                  /* Fine, do nothing */
                  OK = true;
                  break;
--- 450,455 ----
*************** ProcessResult(PGresult **results)
*** 673,701 ****
          result_status = PQresultStatus(*results);
          switch (result_status)
          {
-             case PGRES_COPY_BOTH:
-                 /*
-                  * No now-existing SQL command can yield PGRES_COPY_BOTH, but
-                  * defend against the future.  PQexec() can't short-circuit
-                  * it's way out of a PGRES_COPY_BOTH, so the connection will
-                  * be useless at this point.  XXX is there a method for
-                  * clearing this status that's likely to work with every
-                  * future command that can initiate it?
-                  */
-                 psql_error("unexpected PQresultStatus (%d)", result_status);
-                 return false;
-
-             case PGRES_COPY_OUT:
-             case PGRES_COPY_IN:
-                 is_copy = true;
-                 break;
-
              case PGRES_EMPTY_QUERY:
              case PGRES_COMMAND_OK:
              case PGRES_TUPLES_OK:
                  is_copy = false;
                  break;

              default:
                  /* AcceptResult() should have caught anything else. */
                  is_copy = false;
--- 672,688 ----
          result_status = PQresultStatus(*results);
          switch (result_status)
          {
              case PGRES_EMPTY_QUERY:
              case PGRES_COMMAND_OK:
              case PGRES_TUPLES_OK:
                  is_copy = false;
                  break;

+             case PGRES_COPY_OUT:
+             case PGRES_COPY_IN:
+                 is_copy = true;
+                 break;
+
              default:
                  /* AcceptResult() should have caught anything else. */
                  is_copy = false;
*************** PrintQueryResults(PGresult *results)
*** 817,823 ****

          case PGRES_COPY_OUT:
          case PGRES_COPY_IN:
-         case PGRES_COPY_BOTH:
              /* nothing to do here */
              success = true;
              break;
--- 804,809 ----
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df699edf641ef4defecbdd375d7ff4b3a515..1251455f1f6d92c74e206b5a9b8dcdeb36ac9b98 100644
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
*************** PQconnectStartParams      157
*** 160,162 ****
--- 160,165 ----
  PQping                    158
  PQpingParams              159
  PQlibVersion              160
+ PQsetRowProcessor         161
+ PQgetRowProcessor         162
+ PQskipResult              163
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 6a20a1485d17b6fdea79cb922ff26c78335388a9..03fd6e45bb9a9b996f39b6c2625643ac8a288ebe 100644
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*************** keep_going:                        /* We will come back to
*** 2425,2431 ****
                      conn->status = CONNECTION_AUTH_OK;

                      /*
!                      * Set asyncStatus so that PQsetResult will think that
                       * what comes back next is the result of a query.  See
                       * below.
                       */
--- 2425,2431 ----
                      conn->status = CONNECTION_AUTH_OK;

                      /*
!                      * Set asyncStatus so that PQgetResult will think that
                       * what comes back next is the result of a query.  See
                       * below.
                       */
*************** makeEmptyPGconn(void)
*** 2686,2693 ****
--- 2686,2696 ----
      /* Zero all pointers and booleans */
      MemSet(conn, 0, sizeof(PGconn));

+     /* install default row processor and notice hooks */
+     PQsetRowProcessor(conn, NULL, NULL);
      conn->noticeHooks.noticeRec = defaultNoticeReceiver;
      conn->noticeHooks.noticeProc = defaultNoticeProcessor;
+
      conn->status = CONNECTION_BAD;
      conn->asyncStatus = PGASYNC_IDLE;
      conn->xactStatus = PQTRANS_IDLE;
*************** makeEmptyPGconn(void)
*** 2721,2731 ****
--- 2724,2737 ----
      conn->inBuffer = (char *) malloc(conn->inBufSize);
      conn->outBufSize = 16 * 1024;
      conn->outBuffer = (char *) malloc(conn->outBufSize);
+     conn->rowBufLen = 32;
+     conn->rowBuf = (PGdataValue *) malloc(conn->rowBufLen * sizeof(PGdataValue));
      initPQExpBuffer(&conn->errorMessage);
      initPQExpBuffer(&conn->workBuffer);

      if (conn->inBuffer == NULL ||
          conn->outBuffer == NULL ||
+         conn->rowBuf == NULL ||
          PQExpBufferBroken(&conn->errorMessage) ||
          PQExpBufferBroken(&conn->workBuffer))
      {
*************** freePGconn(PGconn *conn)
*** 2829,2834 ****
--- 2835,2842 ----
          free(conn->inBuffer);
      if (conn->outBuffer)
          free(conn->outBuffer);
+     if (conn->rowBuf)
+         free(conn->rowBuf);
      termPQExpBuffer(&conn->errorMessage);
      termPQExpBuffer(&conn->workBuffer);

*************** closePGconn(PGconn *conn)
*** 2888,2894 ****
      conn->status = CONNECTION_BAD;        /* Well, not really _bad_ - just
                                           * absent */
      conn->asyncStatus = PGASYNC_IDLE;
!     pqClearAsyncResult(conn);    /* deallocate result and curTuple */
      pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
      conn->addrlist = NULL;
      conn->addr_cur = NULL;
--- 2896,2902 ----
      conn->status = CONNECTION_BAD;        /* Well, not really _bad_ - just
                                           * absent */
      conn->asyncStatus = PGASYNC_IDLE;
!     pqClearAsyncResult(conn);    /* deallocate result */
      pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
      conn->addrlist = NULL;
      conn->addr_cur = NULL;
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566a5ddda2e9f30a4148bfcac78ed898fd84..b79cd7e45d6ff63c6beb85232ffbdcc32e7397df 100644
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
*************** char       *const pgresStatus[] = {
*** 38,44 ****
      "PGRES_BAD_RESPONSE",
      "PGRES_NONFATAL_ERROR",
      "PGRES_FATAL_ERROR",
!     "PGRES_COPY_BOTH"
  };

  /*
--- 38,45 ----
      "PGRES_BAD_RESPONSE",
      "PGRES_NONFATAL_ERROR",
      "PGRES_FATAL_ERROR",
!     "PGRES_COPY_BOTH",
!     "PGRES_SUSPENDED"
  };

  /*
*************** static bool static_std_strings = false;
*** 50,55 ****
--- 51,59 ----


  static PGEvent *dupEvents(PGEvent *events, int count);
+ static bool pqAddTuple(PGresult *res, PGresAttValue *tup);
+ static int pqStdRowProcessor(PGresult *res, const PGdataValue *columns,
+                   const char **errmsg, void *param);
  static bool PQsendQueryStart(PGconn *conn);
  static int PQsendQueryGuts(PGconn *conn,
                  const char *command,
*************** static int PQsendQueryGuts(PGconn *conn,
*** 61,66 ****
--- 65,72 ----
                  const int *paramFormats,
                  int resultFormat);
  static void parseInput(PGconn *conn);
+ static int dummyRowProcessor(PGresult *res, const PGdataValue *columns,
+                   const char **errmsg, void *param);
  static bool PQexecStart(PGconn *conn);
  static PGresult *PQexecFinish(PGconn *conn);
  static int PQsendDescribe(PGconn *conn, char desc_type,
*************** PQmakeEmptyPGresult(PGconn *conn, ExecSt
*** 176,181 ****
--- 182,188 ----
              case PGRES_COPY_OUT:
              case PGRES_COPY_IN:
              case PGRES_COPY_BOTH:
+             case PGRES_SUSPENDED:
                  /* non-error cases */
                  break;
              default:
*************** PQclear(PGresult *res)
*** 694,707 ****
  /*
   * Handy subroutine to deallocate any partially constructed async result.
   */
-
  void
  pqClearAsyncResult(PGconn *conn)
  {
      if (conn->result)
          PQclear(conn->result);
      conn->result = NULL;
-     conn->curTuple = NULL;
  }

  /*
--- 701,712 ----
*************** pqPrepareAsyncResult(PGconn *conn)
*** 756,762 ****
       */
      res = conn->result;
      conn->result = NULL;        /* handing over ownership to caller */
-     conn->curTuple = NULL;        /* just in case */
      if (!res)
          res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
      else
--- 761,766 ----
*************** pqInternalNotice(const PGNoticeHooks *ho
*** 832,838 ****
   *      add a row pointer to the PGresult structure, growing it if necessary
   *      Returns TRUE if OK, FALSE if not enough memory to add the row
   */
! int
  pqAddTuple(PGresult *res, PGresAttValue *tup)
  {
      if (res->ntups >= res->tupArrSize)
--- 836,842 ----
   *      add a row pointer to the PGresult structure, growing it if necessary
   *      Returns TRUE if OK, FALSE if not enough memory to add the row
   */
! static bool
  pqAddTuple(PGresult *res, PGresAttValue *tup)
  {
      if (res->ntups >= res->tupArrSize)
*************** pqSaveParameterStatus(PGconn *conn, cons
*** 979,984 ****
--- 983,1106 ----


  /*
+  * PQsetRowProcessor
+  *      Set function that copies row data out from the network buffer,
+  *      along with a passthrough parameter for it.
+  */
+ void
+ PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+ {
+     if (!conn)
+         return;
+
+     if (func)
+     {
+         /* set custom row processor */
+         conn->rowProcessor = func;
+         conn->rowProcessorParam = param;
+     }
+     else
+     {
+         /* set default row processor */
+         conn->rowProcessor = pqStdRowProcessor;
+         conn->rowProcessorParam = conn;
+     }
+ }
+
+ /*
+  * PQgetRowProcessor
+  *      Get current row processor of PGconn.
+  *      If param is not NULL, also store the passthrough parameter at *param.
+  */
+ PQrowProcessor
+ PQgetRowProcessor(const PGconn *conn, void **param)
+ {
+     if (!conn)
+     {
+         if (param)
+             *param = NULL;
+         return NULL;
+     }
+
+     if (param)
+         *param = conn->rowProcessorParam;
+     return conn->rowProcessor;
+ }
+
+ /*
+  * pqStdRowProcessor
+  *      Add the received row to the PGresult structure
+  *      Returns 1 if OK, -1 if error occurred.
+  *
+  * Note: "param" should point to the PGconn, but we don't actually need that
+  * as of the current coding.
+  */
+ static int
+ pqStdRowProcessor(PGresult *res, const PGdataValue *columns,
+                   const char **errmsg, void *param)
+ {
+     int            nfields = res->numAttributes;
+     PGresAttValue *tup;
+     int            i;
+
+     if (columns == NULL)
+     {
+         /* New result set ... we have nothing to do in this function. */
+         return 1;
+     }
+
+     /*
+      * Basically we just allocate space in the PGresult for each field and
+      * copy the data over.
+      *
+      * Note: on malloc failure, we return -1 leaving *errmsg still NULL, which
+      * caller will take to mean "out of memory".  This is preferable to trying
+      * to set up such a message here, because evidently there's not enough
+      * memory for gettext() to do anything.
+      */
+     tup = (PGresAttValue *)
+         pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+     if (tup == NULL)
+         return -1;
+
+     for (i = 0; i < nfields; i++)
+     {
+         int        clen = columns[i].len;
+
+         if (clen < 0)
+         {
+             /* null field */
+             tup[i].len = NULL_LEN;
+             tup[i].value = res->null_field;
+         }
+         else
+         {
+             bool        isbinary = (res->attDescs[i].format != 0);
+             char       *val;
+
+             val = (char *) pqResultAlloc(res, clen + 1, isbinary);
+             if (val == NULL)
+                 return -1;
+
+             /* copy and zero-terminate the data (even if it's binary) */
+             memcpy(val, columns[i].value, clen);
+             val[clen] = '\0';
+
+             tup[i].len = clen;
+             tup[i].value = val;
+         }
+     }
+
+     /* And add the tuple to the PGresult's tuple array */
+     if (!pqAddTuple(res, tup))
+         return -1;
+
+     /* Success */
+     return 1;
+ }
+
+
+ /*
   * PQsendQuery
   *     Submit a query, but don't wait for it to finish
   *
*************** PQsendQueryStart(PGconn *conn)
*** 1223,1229 ****

      /* initialize async result-accumulation state */
      conn->result = NULL;
-     conn->curTuple = NULL;

      /* ready to send command message */
      return true;
--- 1345,1350 ----
*************** PQconsumeInput(PGconn *conn)
*** 1468,1473 ****
--- 1589,1597 ----
   * parseInput: if appropriate, parse input data from backend
   * until input is exhausted or a stopping state is reached.
   * Note that this function will NOT attempt to read more data from the backend.
+  *
+  * Note: callers of parseInput must be prepared for a longjmp exit when we are
+  * in PGASYNC_BUSY state, since an external row processor might do that.
   */
  static void
  parseInput(PGconn *conn)
*************** PQgetResult(PGconn *conn)
*** 1562,1567 ****
--- 1686,1697 ----
              /* Set the state back to BUSY, allowing parsing to proceed. */
              conn->asyncStatus = PGASYNC_BUSY;
              break;
+         case PGASYNC_SUSPENDED:
+             /* Note we do not touch the async result here */
+             res = PQmakeEmptyPGresult(conn, PGRES_SUSPENDED);
+             /* Set the state back to BUSY, allowing parsing to proceed. */
+             conn->asyncStatus = PGASYNC_BUSY;
+             break;
          case PGASYNC_COPY_IN:
              if (conn->result && conn->result->resultStatus == PGRES_COPY_IN)
                  res = pqPrepareAsyncResult(conn);
*************** PQgetResult(PGconn *conn)
*** 1615,1620 ****
--- 1745,1792 ----
      return res;
  }

+ /*
+  * PQskipResult
+  *      Get the next PGresult produced by a query, but discard any data rows.
+  *
+  * This is mainly useful for cleaning up after deciding to abandon a suspended
+  * query.  Note that it's of little value in an async-style application, since
+  * any preceding calls to PQisBusy would have already called the regular row
+  * processor.
+  */
+ PGresult *
+ PQskipResult(PGconn *conn)
+ {
+     PGresult   *res;
+     PQrowProcessor savedRowProcessor;
+
+     if (!conn)
+         return NULL;
+
+     /* temporarily install dummy row processor */
+     savedRowProcessor = conn->rowProcessor;
+     conn->rowProcessor = dummyRowProcessor;
+     /* no need to save/change rowProcessorParam */
+
+     /* fetch the next result */
+     res = PQgetResult(conn);
+
+     /* restore previous row processor */
+     conn->rowProcessor = savedRowProcessor;
+
+     return res;
+ }
+
+ /*
+  * Do-nothing row processor for PQskipResult
+  */
+ static int
+ dummyRowProcessor(PGresult *res, const PGdataValue *columns,
+                   const char **errmsg, void *param)
+ {
+     return 1;
+ }
+

  /*
   * PQexec
*************** PQexecStart(PGconn *conn)
*** 1721,1727 ****
       * Silently discard any prior query result that application didn't eat.
       * This is probably poor design, but it's here for backward compatibility.
       */
!     while ((result = PQgetResult(conn)) != NULL)
      {
          ExecStatusType resultStatus = result->resultStatus;

--- 1893,1899 ----
       * Silently discard any prior query result that application didn't eat.
       * This is probably poor design, but it's here for backward compatibility.
       */
!     while ((result = PQskipResult(conn)) != NULL)
      {
          ExecStatusType resultStatus = result->resultStatus;

diff --git a/src/interfaces/libpq/fe-lobj.c b/src/interfaces/libpq/fe-lobj.c
index 29752270a1d8072fabc73eea0e64383d32b12402..13fd98c2f913d3818758e75bac96822306981b52 100644
*** a/src/interfaces/libpq/fe-lobj.c
--- b/src/interfaces/libpq/fe-lobj.c
***************
*** 40,48 ****
  #define LO_BUFSIZE          8192

  static int    lo_initialize(PGconn *conn);
!
! static Oid
!             lo_import_internal(PGconn *conn, const char *filename, const Oid oid);

  /*
   * lo_open
--- 40,46 ----
  #define LO_BUFSIZE          8192

  static int    lo_initialize(PGconn *conn);
! static Oid    lo_import_internal(PGconn *conn, const char *filename, Oid oid);

  /*
   * lo_open
*************** lo_open(PGconn *conn, Oid lobjId, int mo
*** 59,65 ****
      PQArgBlock    argv[2];
      PGresult   *res;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
--- 57,63 ----
      PQArgBlock    argv[2];
      PGresult   *res;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
*************** lo_close(PGconn *conn, int fd)
*** 101,107 ****
      int            retval;
      int            result_len;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
--- 99,105 ----
      int            retval;
      int            result_len;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
*************** lo_truncate(PGconn *conn, int fd, size_t
*** 139,145 ****
      int            retval;
      int            result_len;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
--- 137,143 ----
      int            retval;
      int            result_len;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
*************** lo_read(PGconn *conn, int fd, char *buf,
*** 192,198 ****
      PGresult   *res;
      int            result_len;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
--- 190,196 ----
      PGresult   *res;
      int            result_len;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
*************** lo_write(PGconn *conn, int fd, const cha
*** 234,240 ****
      int            result_len;
      int            retval;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
--- 232,238 ----
      int            result_len;
      int            retval;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
*************** lo_lseek(PGconn *conn, int fd, int offse
*** 280,286 ****
      int            retval;
      int            result_len;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
--- 278,284 ----
      int            retval;
      int            result_len;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
*************** lo_creat(PGconn *conn, int mode)
*** 328,334 ****
      int            retval;
      int            result_len;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return InvalidOid;
--- 326,332 ----
      int            retval;
      int            result_len;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return InvalidOid;
*************** lo_create(PGconn *conn, Oid lobjId)
*** 367,373 ****
      int            retval;
      int            result_len;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return InvalidOid;
--- 365,371 ----
      int            retval;
      int            result_len;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return InvalidOid;
*************** lo_tell(PGconn *conn, int fd)
*** 413,419 ****
      PGresult   *res;
      int            result_len;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
--- 411,417 ----
      PGresult   *res;
      int            result_len;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
*************** lo_unlink(PGconn *conn, Oid lobjId)
*** 451,457 ****
      int            result_len;
      int            retval;

!     if (conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
--- 449,455 ----
      int            result_len;
      int            retval;

!     if (conn == NULL || conn->lobjfuncs == NULL)
      {
          if (lo_initialize(conn) < 0)
              return -1;
*************** lo_import_with_oid(PGconn *conn, const c
*** 505,511 ****
  }

  static Oid
! lo_import_internal(PGconn *conn, const char *filename, const Oid oid)
  {
      int            fd;
      int            nbytes,
--- 503,509 ----
  }

  static Oid
! lo_import_internal(PGconn *conn, const char *filename, Oid oid)
  {
      int            fd;
      int            nbytes,
*************** lo_initialize(PGconn *conn)
*** 684,691 ****
--- 682,694 ----
      int            n;
      const char *query;
      const char *fname;
+     PQrowProcessor savedRowProcessor;
+     void       *savedRowProcessorParam;
      Oid            foid;

+     if (!conn)
+         return -1;
+
      /*
       * Allocate the structure to hold the functions OID's
       */
*************** lo_initialize(PGconn *conn)
*** 729,735 ****
--- 732,747 ----
              "or proname = 'loread' "
              "or proname = 'lowrite'";

+     /* Ensure the standard row processor is used to collect the result */
+     savedRowProcessor = conn->rowProcessor;
+     savedRowProcessorParam = conn->rowProcessorParam;
+     PQsetRowProcessor(conn, NULL, NULL);
+
      res = PQexec(conn, query);
+
+     conn->rowProcessor = savedRowProcessor;
+     conn->rowProcessorParam = savedRowProcessorParam;
+
      if (res == NULL)
      {
          free(lobjfuncs);
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3f712eab7f2974619f930494c941783874..b5e5519c416aa767921a5549f7947f80e4cf8250 100644
*** a/src/interfaces/libpq/fe-misc.c
--- b/src/interfaces/libpq/fe-misc.c
*************** pqGetnchar(char *s, size_t len, PGconn *
*** 219,224 ****
--- 219,250 ----
  }

  /*
+  * pqSkipnchar:
+  *    skip over len bytes in input buffer.
+  *
+  * Note: this is primarily useful for its debug output, which should
+  * be exactly the same as for pqGetnchar.  We assume the data in question
+  * will actually be used, but just isn't getting copied anywhere as yet.
+  */
+ int
+ pqSkipnchar(size_t len, PGconn *conn)
+ {
+     if (len > (size_t) (conn->inEnd - conn->inCursor))
+         return EOF;
+
+     if (conn->Pfdebug)
+     {
+         fprintf(conn->Pfdebug, "From backend (%lu)> ", (unsigned long) len);
+         fputnbytes(conn->Pfdebug, conn->inBuffer + conn->inCursor, len);
+         fprintf(conn->Pfdebug, "\n");
+     }
+
+     conn->inCursor += len;
+
+     return 0;
+ }
+
+ /*
   * pqPutnchar:
   *    write exactly len bytes to the current message
   */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c38993b8b69f23e7c33f1c41158b1ff9a7d0d8..ec1b2d71bd92bbc1314a410062ec6470d07460cf 100644
*** a/src/interfaces/libpq/fe-protocol2.c
--- b/src/interfaces/libpq/fe-protocol2.c
*************** static int    getNotify(PGconn *conn);
*** 49,59 ****
--- 49,67 ----
  PostgresPollingStatusType
  pqSetenvPoll(PGconn *conn)
  {
+     PostgresPollingStatusType result;
      PGresult   *res;
+     PQrowProcessor savedRowProcessor;
+     void       *savedRowProcessorParam;

      if (conn == NULL || conn->status == CONNECTION_BAD)
          return PGRES_POLLING_FAILED;

+     /* Ensure the standard row processor is used to collect any results */
+     savedRowProcessor = conn->rowProcessor;
+     savedRowProcessorParam = conn->rowProcessorParam;
+     PQsetRowProcessor(conn, NULL, NULL);
+
      /* Check whether there are any data for us */
      switch (conn->setenv_state)
      {
*************** pqSetenvPoll(PGconn *conn)
*** 69,75 ****
                  if (n < 0)
                      goto error_return;
                  if (n == 0)
!                     return PGRES_POLLING_READING;

                  break;
              }
--- 77,86 ----
                  if (n < 0)
                      goto error_return;
                  if (n == 0)
!                 {
!                     result = PGRES_POLLING_READING;
!                     goto normal_return;
!                 }

                  break;
              }
*************** pqSetenvPoll(PGconn *conn)
*** 83,89 ****

              /* Should we raise an error if called when not active? */
          case SETENV_STATE_IDLE:
!             return PGRES_POLLING_OK;

          default:
              printfPQExpBuffer(&conn->errorMessage,
--- 94,101 ----

              /* Should we raise an error if called when not active? */
          case SETENV_STATE_IDLE:
!             result = PGRES_POLLING_OK;
!             goto normal_return;

          default:
              printfPQExpBuffer(&conn->errorMessage,
*************** pqSetenvPoll(PGconn *conn)
*** 180,186 ****
              case SETENV_STATE_CLIENT_ENCODING_WAIT:
                  {
                      if (PQisBusy(conn))
!                         return PGRES_POLLING_READING;

                      res = PQgetResult(conn);

--- 192,201 ----
              case SETENV_STATE_CLIENT_ENCODING_WAIT:
                  {
                      if (PQisBusy(conn))
!                     {
!                         result = PGRES_POLLING_READING;
!                         goto normal_return;
!                     }

                      res = PQgetResult(conn);

*************** pqSetenvPoll(PGconn *conn)
*** 205,211 ****
              case SETENV_STATE_OPTION_WAIT:
                  {
                      if (PQisBusy(conn))
!                         return PGRES_POLLING_READING;

                      res = PQgetResult(conn);

--- 220,229 ----
              case SETENV_STATE_OPTION_WAIT:
                  {
                      if (PQisBusy(conn))
!                     {
!                         result = PGRES_POLLING_READING;
!                         goto normal_return;
!                     }

                      res = PQgetResult(conn);

*************** pqSetenvPoll(PGconn *conn)
*** 244,256 ****
                          goto error_return;

                      conn->setenv_state = SETENV_STATE_QUERY1_WAIT;
!                     return PGRES_POLLING_READING;
                  }

              case SETENV_STATE_QUERY1_WAIT:
                  {
                      if (PQisBusy(conn))
!                         return PGRES_POLLING_READING;

                      res = PQgetResult(conn);

--- 262,278 ----
                          goto error_return;

                      conn->setenv_state = SETENV_STATE_QUERY1_WAIT;
!                     result = PGRES_POLLING_READING;
!                     goto normal_return;
                  }

              case SETENV_STATE_QUERY1_WAIT:
                  {
                      if (PQisBusy(conn))
!                     {
!                         result = PGRES_POLLING_READING;
!                         goto normal_return;
!                     }

                      res = PQgetResult(conn);

*************** pqSetenvPoll(PGconn *conn)
*** 327,339 ****
                          goto error_return;

                      conn->setenv_state = SETENV_STATE_QUERY2_WAIT;
!                     return PGRES_POLLING_READING;
                  }

              case SETENV_STATE_QUERY2_WAIT:
                  {
                      if (PQisBusy(conn))
!                         return PGRES_POLLING_READING;

                      res = PQgetResult(conn);

--- 349,365 ----
                          goto error_return;

                      conn->setenv_state = SETENV_STATE_QUERY2_WAIT;
!                     result = PGRES_POLLING_READING;
!                     goto normal_return;
                  }

              case SETENV_STATE_QUERY2_WAIT:
                  {
                      if (PQisBusy(conn))
!                     {
!                         result = PGRES_POLLING_READING;
!                         goto normal_return;
!                     }

                      res = PQgetResult(conn);

*************** pqSetenvPoll(PGconn *conn)
*** 380,386 ****
                      {
                          /* Query finished, so we're done */
                          conn->setenv_state = SETENV_STATE_IDLE;
!                         return PGRES_POLLING_OK;
                      }
                      break;
                  }
--- 406,413 ----
                      {
                          /* Query finished, so we're done */
                          conn->setenv_state = SETENV_STATE_IDLE;
!                         result = PGRES_POLLING_OK;
!                         goto normal_return;
                      }
                      break;
                  }
*************** pqSetenvPoll(PGconn *conn)
*** 398,404 ****

  error_return:
      conn->setenv_state = SETENV_STATE_IDLE;
!     return PGRES_POLLING_FAILED;
  }


--- 425,436 ----

  error_return:
      conn->setenv_state = SETENV_STATE_IDLE;
!     result = PGRES_POLLING_FAILED;
!
! normal_return:
!     conn->rowProcessor = savedRowProcessor;
!     conn->rowProcessorParam = savedRowProcessorParam;
!     return result;
  }


*************** error_return:
*** 406,411 ****
--- 438,446 ----
   * parseInput: if appropriate, parse input data from backend
   * until input is exhausted or a stopping state is reached.
   * Note that this function will NOT attempt to read more data from the backend.
+  *
+  * Note: callers of parseInput must be prepared for a longjmp exit when we are
+  * in PGASYNC_BUSY state, since an external row processor might do that.
   */
  void
  pqParseInput2(PGconn *conn)
*************** pqParseInput2(PGconn *conn)
*** 549,554 ****
--- 584,591 ----
                          /* First 'T' in a query sequence */
                          if (getRowDescriptions(conn))
                              return;
+                         /* getRowDescriptions() moves inStart itself */
+                         continue;
                      }
                      else
                      {
*************** pqParseInput2(PGconn *conn)
*** 569,574 ****
--- 606,613 ----
                          /* Read another tuple of a normal query response */
                          if (getAnotherTuple(conn, FALSE))
                              return;
+                         /* getAnotherTuple() moves inStart itself */
+                         continue;
                      }
                      else
                      {
*************** pqParseInput2(PGconn *conn)
*** 585,590 ****
--- 624,631 ----
                          /* Read another tuple of a normal query response */
                          if (getAnotherTuple(conn, TRUE))
                              return;
+                         /* getAnotherTuple() moves inStart itself */
+                         continue;
                      }
                      else
                      {
*************** pqParseInput2(PGconn *conn)
*** 627,653 ****
  /*
   * parseInput subroutine to read a 'T' (row descriptions) message.
   * We build a PGresult structure containing the attribute data.
!  * Returns: 0 if completed message, EOF if not enough data yet.
   *
!  * Note that if we run out of data, we have to release the partially
!  * constructed PGresult, and rebuild it again next time.  Fortunately,
!  * that shouldn't happen often, since 'T' messages usually fit in a packet.
   */
  static int
  getRowDescriptions(PGconn *conn)
  {
!     PGresult   *result = NULL;
      int            nfields;
      int            i;

      result = PQmakeEmptyPGresult(conn, PGRES_TUPLES_OK);
      if (!result)
!         goto failure;

      /* parseInput already read the 'T' label. */
      /* the next two bytes are the number of fields    */
      if (pqGetInt(&(result->numAttributes), 2, conn))
!         goto failure;
      nfields = result->numAttributes;

      /* allocate space for the attribute descriptors */
--- 668,699 ----
  /*
   * parseInput subroutine to read a 'T' (row descriptions) message.
   * We build a PGresult structure containing the attribute data.
!  * Returns: 0 if completed message, EOF if error, suspend, or not enough
!  * data received yet.
   *
!  * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  Otherwise, conn->inStart
!  * must get advanced past the processed data.
   */
  static int
  getRowDescriptions(PGconn *conn)
  {
!     PGresult   *result;
      int            nfields;
+     const char *errmsg;
      int            i;

      result = PQmakeEmptyPGresult(conn, PGRES_TUPLES_OK);
      if (!result)
!     {
!         errmsg = NULL;            /* means "out of memory", see below */
!         goto advance_and_error;
!     }

      /* parseInput already read the 'T' label. */
      /* the next two bytes are the number of fields    */
      if (pqGetInt(&(result->numAttributes), 2, conn))
!         goto EOFexit;
      nfields = result->numAttributes;

      /* allocate space for the attribute descriptors */
*************** getRowDescriptions(PGconn *conn)
*** 656,662 ****
          result->attDescs = (PGresAttDesc *)
              pqResultAlloc(result, nfields * sizeof(PGresAttDesc), TRUE);
          if (!result->attDescs)
!             goto failure;
          MemSet(result->attDescs, 0, nfields * sizeof(PGresAttDesc));
      }

--- 702,711 ----
          result->attDescs = (PGresAttDesc *)
              pqResultAlloc(result, nfields * sizeof(PGresAttDesc), TRUE);
          if (!result->attDescs)
!         {
!             errmsg = NULL;        /* means "out of memory", see below */
!             goto advance_and_error;
!         }
          MemSet(result->attDescs, 0, nfields * sizeof(PGresAttDesc));
      }

*************** getRowDescriptions(PGconn *conn)
*** 671,677 ****
              pqGetInt(&typid, 4, conn) ||
              pqGetInt(&typlen, 2, conn) ||
              pqGetInt(&atttypmod, 4, conn))
!             goto failure;

          /*
           * Since pqGetInt treats 2-byte integers as unsigned, we need to
--- 720,726 ----
              pqGetInt(&typid, 4, conn) ||
              pqGetInt(&typlen, 2, conn) ||
              pqGetInt(&atttypmod, 4, conn))
!             goto EOFexit;

          /*
           * Since pqGetInt treats 2-byte integers as unsigned, we need to
*************** getRowDescriptions(PGconn *conn)
*** 682,688 ****
          result->attDescs[i].name = pqResultStrdup(result,
                                                    conn->workBuffer.data);
          if (!result->attDescs[i].name)
!             goto failure;
          result->attDescs[i].tableid = 0;
          result->attDescs[i].columnid = 0;
          result->attDescs[i].format = 0;
--- 731,740 ----
          result->attDescs[i].name = pqResultStrdup(result,
                                                    conn->workBuffer.data);
          if (!result->attDescs[i].name)
!         {
!             errmsg = NULL;        /* means "out of memory", see below */
!             goto advance_and_error;
!         }
          result->attDescs[i].tableid = 0;
          result->attDescs[i].columnid = 0;
          result->attDescs[i].format = 0;
*************** getRowDescriptions(PGconn *conn)
*** 693,722 ****

      /* Success! */
      conn->result = result;
-     return 0;

! failure:
!     if (result)
          PQclear(result);
      return EOF;
  }

  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
!  * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
      PGresult   *result = conn->result;
      int            nfields = result->numAttributes;
!     PGresAttValue *tup;
!
      /* the backend sends us a bitmap of which attributes are null */
      char        std_bitmap[64]; /* used unless it doesn't fit */
      char       *bitmap = std_bitmap;
--- 745,839 ----

      /* Success! */
      conn->result = result;

!     /*
!      * Advance inStart to show that the "T" message has been processed.  We
!      * must do this before calling the row processor, in case it longjmps.
!      */
!     conn->inStart = conn->inCursor;
!
!     /* Give the row processor a chance to initialize for new result set */
!     errmsg = NULL;
!     switch ((*conn->rowProcessor) (result, NULL, &errmsg,
!                                    conn->rowProcessorParam))
!     {
!         case 1:
!             /* everything is good */
!             return 0;
!
!         case 0:
!             /* row processor requested suspension */
!             conn->asyncStatus = PGASYNC_SUSPENDED;
!             return EOF;
!
!         case -1:
!             /* error, report the errmsg below */
!             break;
!
!         default:
!             /* unrecognized return code */
!             errmsg = libpq_gettext("unrecognized return value from row processor");
!             break;
!     }
!     goto set_error_result;
!
! advance_and_error:
!     /*
!      * Discard the failed message.  Unfortunately we don't know for sure
!      * where the end is, so just throw away everything in the input buffer.
!      * This is not very desirable but it's the best we can do in protocol v2.
!      */
!     conn->inStart = conn->inEnd;
!
! set_error_result:
!
!     /*
!      * Replace partially constructed result with an error result. First
!      * discard the old result to try to win back some memory.
!      */
!     pqClearAsyncResult(conn);
!
!     /*
!      * If row processor didn't provide an error message, assume "out of
!      * memory" was meant.  The advantage of having this special case is that
!      * freeing the old result first greatly improves the odds that gettext()
!      * will succeed in providing a translation.
!      */
!     if (!errmsg)
!         errmsg = libpq_gettext("out of memory for query result");
!
!     printfPQExpBuffer(&conn->errorMessage, "%s\n", errmsg);
!
!     /*
!      * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
!      * do to recover...
!      */
!     conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
!     conn->asyncStatus = PGASYNC_READY;
!
! EOFexit:
!     if (result && result != conn->result)
          PQclear(result);
      return EOF;
  }

  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * We fill rowbuf with column pointers and then call the row processor.
!  * Returns: 0 if completed message, EOF if error, suspend, or not enough
!  * data received yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  Otherwise, conn->inStart
!  * must get advanced past the processed data.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
      PGresult   *result = conn->result;
      int            nfields = result->numAttributes;
!     const char *errmsg;
!     PGdataValue *rowbuf;
      /* the backend sends us a bitmap of which attributes are null */
      char        std_bitmap[64]; /* used unless it doesn't fit */
      char       *bitmap = std_bitmap;
*************** getAnotherTuple(PGconn *conn, bool binar
*** 727,754 ****
      int            bitcnt;            /* number of bits examined in current byte */
      int            vlen;            /* length of the current field value */

!     result->binary = binary;
!
!     /* Allocate tuple space if first time for this data message */
!     if (conn->curTuple == NULL)
      {
!         conn->curTuple = (PGresAttValue *)
!             pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
!         if (conn->curTuple == NULL)
!             goto outOfMemory;
!         MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
!
!         /*
!          * If it's binary, fix the column format indicators.  We assume the
!          * backend will consistently send either B or D, not a mix.
!          */
!         if (binary)
          {
!             for (i = 0; i < nfields; i++)
!                 result->attDescs[i].format = 1;
          }
      }
-     tup = conn->curTuple;

      /* Get the null-value bitmap */
      nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
--- 844,876 ----
      int            bitcnt;            /* number of bits examined in current byte */
      int            vlen;            /* length of the current field value */

!     /* Resize row buffer if needed */
!     rowbuf = conn->rowBuf;
!     if (nfields > conn->rowBufLen)
      {
!         rowbuf = (PGdataValue *) realloc(rowbuf,
!                                          nfields * sizeof(PGdataValue));
!         if (!rowbuf)
          {
!             errmsg = NULL;        /* means "out of memory", see below */
!             goto advance_and_error;
          }
+         conn->rowBuf = rowbuf;
+         conn->rowBufLen = nfields;
+     }
+
+     /* Save format specifier */
+     result->binary = binary;
+
+     /*
+      * If it's binary, fix the column format indicators.  We assume the
+      * backend will consistently send either B or D, not a mix.
+      */
+     if (binary)
+     {
+         for (i = 0; i < nfields; i++)
+             result->attDescs[i].format = 1;
      }

      /* Get the null-value bitmap */
      nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
*************** getAnotherTuple(PGconn *conn, bool binar
*** 757,763 ****
      {
          bitmap = (char *) malloc(nbytes);
          if (!bitmap)
!             goto outOfMemory;
      }

      if (pqGetnchar(bitmap, nbytes, conn))
--- 879,888 ----
      {
          bitmap = (char *) malloc(nbytes);
          if (!bitmap)
!         {
!             errmsg = NULL;        /* means "out of memory", see below */
!             goto advance_and_error;
!         }
      }

      if (pqGetnchar(bitmap, nbytes, conn))
*************** getAnotherTuple(PGconn *conn, bool binar
*** 770,804 ****

      for (i = 0; i < nfields; i++)
      {
          if (!(bmap & 0200))
!         {
!             /* if the field value is absent, make it a null string */
!             tup[i].value = result->null_field;
!             tup[i].len = NULL_LEN;
!         }
          else
          {
-             /* get the value length (the first four bytes are for length) */
-             if (pqGetInt(&vlen, 4, conn))
-                 goto EOFexit;
              if (!binary)
                  vlen = vlen - 4;
              if (vlen < 0)
                  vlen = 0;
-             if (tup[i].value == NULL)
-             {
-                 tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-                 if (tup[i].value == NULL)
-                     goto outOfMemory;
-             }
-             tup[i].len = vlen;
-             /* read in the value */
-             if (vlen > 0)
-                 if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-                     goto EOFexit;
-             /* we have to terminate this ourselves */
-             tup[i].value[vlen] = '\0';
          }
          /* advance the bitmap stuff */
          bitcnt++;
          if (bitcnt == BITS_PER_BYTE)
--- 895,928 ----

      for (i = 0; i < nfields; i++)
      {
+         /* get the value length */
          if (!(bmap & 0200))
!             vlen = NULL_LEN;
!         else if (pqGetInt(&vlen, 4, conn))
!             goto EOFexit;
          else
          {
              if (!binary)
                  vlen = vlen - 4;
              if (vlen < 0)
                  vlen = 0;
          }
+         rowbuf[i].len = vlen;
+
+         /*
+          * rowbuf[i].value always points to the next address in the data
+          * buffer even if the value is NULL.  This allows row processors to
+          * estimate data sizes more easily.
+          */
+         rowbuf[i].value = conn->inBuffer + conn->inCursor;
+
+         /* Skip over the data value */
+         if (vlen > 0)
+         {
+             if (pqSkipnchar(vlen, conn))
+                 goto EOFexit;
+         }
+
          /* advance the bitmap stuff */
          bitcnt++;
          if (bitcnt == BITS_PER_BYTE)
*************** getAnotherTuple(PGconn *conn, bool binar
*** 811,836 ****
              bmap <<= 1;
      }

!     /* Success!  Store the completed tuple in the result */
!     if (!pqAddTuple(result, tup))
!         goto outOfMemory;
!     /* and reset for a new message */
!     conn->curTuple = NULL;
!
      if (bitmap != std_bitmap)
          free(bitmap);
!     return 0;

! outOfMemory:
!     /* Replace partially constructed result with an error result */

      /*
!      * we do NOT use pqSaveErrorResult() here, because of the likelihood that
!      * there's not enough memory to concatenate messages...
       */
      pqClearAsyncResult(conn);
!     printfPQExpBuffer(&conn->errorMessage,
!                       libpq_gettext("out of memory for query result\n"));

      /*
       * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
--- 935,1002 ----
              bmap <<= 1;
      }

!     /* Release bitmap now if we allocated it */
      if (bitmap != std_bitmap)
          free(bitmap);
!     bitmap = NULL;

!     /*
!      * Advance inStart to show that the "D" message has been processed.  We
!      * must do this before calling the row processor, in case it longjmps.
!      */
!     conn->inStart = conn->inCursor;

+     /* Pass the completed row values to rowProcessor */
+     errmsg = NULL;
+     switch ((*conn->rowProcessor) (result, rowbuf, &errmsg,
+                                    conn->rowProcessorParam))
+     {
+         case 1:
+             /* everything is good */
+             return 0;
+
+         case 0:
+             /* row processor requested suspension */
+             conn->asyncStatus = PGASYNC_SUSPENDED;
+             return EOF;
+
+         case -1:
+             /* error, report the errmsg below */
+             break;
+
+         default:
+             /* unrecognized return code */
+             errmsg = libpq_gettext("unrecognized return value from row processor");
+             break;
+     }
+     goto set_error_result;
+
+ advance_and_error:
      /*
!      * Discard the failed message.  Unfortunately we don't know for sure
!      * where the end is, so just throw away everything in the input buffer.
!      * This is not very desirable but it's the best we can do in protocol v2.
!      */
!     conn->inStart = conn->inEnd;
!
! set_error_result:
!
!     /*
!      * Replace partially constructed result with an error result. First
!      * discard the old result to try to win back some memory.
       */
      pqClearAsyncResult(conn);
!
!     /*
!      * If row processor didn't provide an error message, assume "out of
!      * memory" was meant.  The advantage of having this special case is that
!      * freeing the old result first greatly improves the odds that gettext()
!      * will succeed in providing a translation.
!      */
!     if (!errmsg)
!         errmsg = libpq_gettext("out of memory for query result");
!
!     printfPQExpBuffer(&conn->errorMessage, "%s\n", errmsg);

      /*
       * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
*************** outOfMemory:
*** 838,845 ****
       */
      conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
      conn->asyncStatus = PGASYNC_READY;
-     /* Discard the failed message --- good idea? */
-     conn->inStart = conn->inEnd;

  EOFexit:
      if (bitmap != NULL && bitmap != std_bitmap)
--- 1004,1009 ----
*************** pqGetline2(PGconn *conn, char *s, int ma
*** 1122,1128 ****
  {
      int            result = 1;        /* return value if buffer overflows */

!     if (conn->sock < 0)
      {
          *s = '\0';
          return EOF;
--- 1286,1293 ----
  {
      int            result = 1;        /* return value if buffer overflows */

!     if (conn->sock < 0 ||
!         conn->asyncStatus != PGASYNC_COPY_OUT)
      {
          *s = '\0';
          return EOF;
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbcb00f15e9a675543b874603058ebf012f1..ccf8a814280653e310d533b4cf3a929d17686da2 100644
*** a/src/interfaces/libpq/fe-protocol3.c
--- b/src/interfaces/libpq/fe-protocol3.c
***************
*** 44,50 ****


  static void handleSyncLoss(PGconn *conn, char id, int msgLength);
! static int    getRowDescriptions(PGconn *conn);
  static int    getParamDescriptions(PGconn *conn);
  static int    getAnotherTuple(PGconn *conn, int msgLength);
  static int    getParameterStatus(PGconn *conn);
--- 44,50 ----


  static void handleSyncLoss(PGconn *conn, char id, int msgLength);
! static int    getRowDescriptions(PGconn *conn, int msgLength);
  static int    getParamDescriptions(PGconn *conn);
  static int    getAnotherTuple(PGconn *conn, int msgLength);
  static int    getParameterStatus(PGconn *conn);
*************** static int build_startup_packet(const PG
*** 61,66 ****
--- 61,69 ----
   * parseInput: if appropriate, parse input data from backend
   * until input is exhausted or a stopping state is reached.
   * Note that this function will NOT attempt to read more data from the backend.
+  *
+  * Note: callers of parseInput must be prepared for a longjmp exit when we are
+  * in PGASYNC_BUSY state, since an external row processor might do that.
   */
  void
  pqParseInput3(PGconn *conn)
*************** pqParseInput3(PGconn *conn)
*** 269,283 ****
                          conn->queryclass == PGQUERY_DESCRIBE)
                      {
                          /* First 'T' in a query sequence */
!                         if (getRowDescriptions(conn))
                              return;
!
!                         /*
!                          * If we're doing a Describe, we're ready to pass the
!                          * result back to the client.
!                          */
!                         if (conn->queryclass == PGQUERY_DESCRIBE)
!                             conn->asyncStatus = PGASYNC_READY;
                      }
                      else
                      {
--- 272,281 ----
                          conn->queryclass == PGQUERY_DESCRIBE)
                      {
                          /* First 'T' in a query sequence */
!                         if (getRowDescriptions(conn, msgLength))
                              return;
!                         /* getRowDescriptions() moves inStart itself */
!                         continue;
                      }
                      else
                      {
*************** pqParseInput3(PGconn *conn)
*** 327,332 ****
--- 325,332 ----
                          /* Read another tuple of a normal query response */
                          if (getAnotherTuple(conn, msgLength))
                              return;
+                         /* getAnotherTuple() moves inStart itself */
+                         continue;
                      }
                      else if (conn->result != NULL &&
                               conn->result->resultStatus == PGRES_FATAL_ERROR)
*************** handleSyncLoss(PGconn *conn, char id, in
*** 443,459 ****
   * parseInput subroutine to read a 'T' (row descriptions) message.
   * We'll build a new PGresult structure (unless called for a Describe
   * command for a prepared statement) containing the attribute data.
!  * Returns: 0 if completed message, EOF if not enough data yet.
   *
!  * Note that if we run out of data, we have to release the partially
!  * constructed PGresult, and rebuild it again next time.  Fortunately,
!  * that shouldn't happen often, since 'T' messages usually fit in a packet.
   */
  static int
! getRowDescriptions(PGconn *conn)
  {
      PGresult   *result;
      int            nfields;
      int            i;

      /*
--- 443,461 ----
   * parseInput subroutine to read a 'T' (row descriptions) message.
   * We'll build a new PGresult structure (unless called for a Describe
   * command for a prepared statement) containing the attribute data.
!  * Returns: 0 if processed message successfully, EOF if suspended parsing.
!  * In either case, conn->inStart has been advanced past the message.
   *
!  * Note: the row processor could also choose to longjmp out of libpq,
!  * in which case the library's state must be the same as if we'd gotten
!  * a "suspend parsing" return and exited normally.
   */
  static int
! getRowDescriptions(PGconn *conn, int msgLength)
  {
      PGresult   *result;
      int            nfields;
+     const char *errmsg;
      int            i;

      /*
*************** getRowDescriptions(PGconn *conn)
*** 471,482 ****
      else
          result = PQmakeEmptyPGresult(conn, PGRES_TUPLES_OK);
      if (!result)
!         goto failure;

      /* parseInput already read the 'T' label and message length. */
      /* the next two bytes are the number of fields */
      if (pqGetInt(&(result->numAttributes), 2, conn))
!         goto failure;
      nfields = result->numAttributes;

      /* allocate space for the attribute descriptors */
--- 473,491 ----
      else
          result = PQmakeEmptyPGresult(conn, PGRES_TUPLES_OK);
      if (!result)
!     {
!         errmsg = NULL;            /* means "out of memory", see below */
!         goto advance_and_error;
!     }

      /* parseInput already read the 'T' label and message length. */
      /* the next two bytes are the number of fields */
      if (pqGetInt(&(result->numAttributes), 2, conn))
!     {
!         /* We should not run out of data here, so complain */
!         errmsg = libpq_gettext("insufficient data in \"T\" message");
!         goto advance_and_error;
!     }
      nfields = result->numAttributes;

      /* allocate space for the attribute descriptors */
*************** getRowDescriptions(PGconn *conn)
*** 485,491 ****
          result->attDescs = (PGresAttDesc *)
              pqResultAlloc(result, nfields * sizeof(PGresAttDesc), TRUE);
          if (!result->attDescs)
!             goto failure;
          MemSet(result->attDescs, 0, nfields * sizeof(PGresAttDesc));
      }

--- 494,503 ----
          result->attDescs = (PGresAttDesc *)
              pqResultAlloc(result, nfields * sizeof(PGresAttDesc), TRUE);
          if (!result->attDescs)
!         {
!             errmsg = NULL;        /* means "out of memory", see below */
!             goto advance_and_error;
!         }
          MemSet(result->attDescs, 0, nfields * sizeof(PGresAttDesc));
      }

*************** getRowDescriptions(PGconn *conn)
*** 510,516 ****
              pqGetInt(&atttypmod, 4, conn) ||
              pqGetInt(&format, 2, conn))
          {
!             goto failure;
          }

          /*
--- 522,530 ----
              pqGetInt(&atttypmod, 4, conn) ||
              pqGetInt(&format, 2, conn))
          {
!             /* We should not run out of data here, so complain */
!             errmsg = libpq_gettext("insufficient data in \"T\" message");
!             goto advance_and_error;
          }

          /*
*************** getRowDescriptions(PGconn *conn)
*** 524,530 ****
          result->attDescs[i].name = pqResultStrdup(result,
                                                    conn->workBuffer.data);
          if (!result->attDescs[i].name)
!             goto failure;
          result->attDescs[i].tableid = tableid;
          result->attDescs[i].columnid = columnid;
          result->attDescs[i].format = format;
--- 538,547 ----
          result->attDescs[i].name = pqResultStrdup(result,
                                                    conn->workBuffer.data);
          if (!result->attDescs[i].name)
!         {
!             errmsg = NULL;        /* means "out of memory", see below */
!             goto advance_and_error;
!         }
          result->attDescs[i].tableid = tableid;
          result->attDescs[i].columnid = columnid;
          result->attDescs[i].format = format;
*************** getRowDescriptions(PGconn *conn)
*** 536,559 ****
              result->binary = 0;
      }

      /* Success! */
      conn->result = result;
-     return 0;

! failure:

      /*
!      * Discard incomplete result, unless it's from getParamDescriptions.
!      *
!      * Note that if we hit a bufferload boundary while handling the
!      * describe-statement case, we'll forget any PGresult space we just
!      * allocated, and then reallocate it on next try.  This will bloat the
!      * PGresult a little bit but the space will be freed at PQclear, so it
!      * doesn't seem worth trying to be smarter.
       */
!     if (result != conn->result)
          PQclear(result);
!     return EOF;
  }

  /*
--- 553,641 ----
              result->binary = 0;
      }

+     /* Sanity check that we absorbed all the data */
+     if (conn->inCursor != conn->inStart + 5 + msgLength)
+     {
+         errmsg = libpq_gettext("extraneous data in \"T\" message");
+         goto advance_and_error;
+     }
+
      /* Success! */
      conn->result = result;

!     /*
!      * Advance inStart to show that the "T" message has been processed.  We
!      * must do this before calling the row processor, in case it longjmps.
!      */
!     conn->inStart = conn->inCursor;

      /*
!      * If we're doing a Describe, we're done, and ready to pass the result
!      * back to the client.
       */
!     if (conn->queryclass == PGQUERY_DESCRIBE)
!     {
!         conn->asyncStatus = PGASYNC_READY;
!         return 0;
!     }
!
!     /* Give the row processor a chance to initialize for new result set */
!     errmsg = NULL;
!     switch ((*conn->rowProcessor) (result, NULL, &errmsg,
!                                    conn->rowProcessorParam))
!     {
!         case 1:
!             /* everything is good */
!             return 0;
!
!         case 0:
!             /* row processor requested suspension */
!             conn->asyncStatus = PGASYNC_SUSPENDED;
!             return EOF;
!
!         case -1:
!             /* error, report the errmsg below */
!             break;
!
!         default:
!             /* unrecognized return code */
!             errmsg = libpq_gettext("unrecognized return value from row processor");
!             break;
!     }
!     goto set_error_result;
!
! advance_and_error:
!     /* Discard unsaved result, if any */
!     if (result && result != conn->result)
          PQclear(result);
!
!     /* Discard the failed message by pretending we read it */
!     conn->inStart += 5 + msgLength;
!
! set_error_result:
!
!     /*
!      * Replace partially constructed result with an error result. First
!      * discard the old result to try to win back some memory.
!      */
!     pqClearAsyncResult(conn);
!
!     /*
!      * If row processor didn't provide an error message, assume "out of
!      * memory" was meant.
!      */
!     if (!errmsg)
!         errmsg = libpq_gettext("out of memory for query result");
!
!     printfPQExpBuffer(&conn->errorMessage, "%s\n", errmsg);
!     pqSaveErrorResult(conn);
!
!     /*
!      * Return zero to allow input parsing to continue.  Subsequent "D"
!      * messages will be ignored until we get to end of data, since an error
!      * result is already set up.
!      */
!     return 0;
  }

  /*
*************** failure:
*** 613,659 ****

  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
!  * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
!  * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
      PGresult   *result = conn->result;
      int            nfields = result->numAttributes;
!     PGresAttValue *tup;
      int            tupnfields;        /* # fields from tuple */
      int            vlen;            /* length of the current field value */
      int            i;

-     /* Allocate tuple space if first time for this data message */
-     if (conn->curTuple == NULL)
-     {
-         conn->curTuple = (PGresAttValue *)
-             pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-         if (conn->curTuple == NULL)
-             goto outOfMemory;
-         MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-     }
-     tup = conn->curTuple;
-
      /* Get the field count and make sure it's what we expect */
      if (pqGetInt(&tupnfields, 2, conn))
!         return EOF;

      if (tupnfields != nfields)
      {
!         /* Replace partially constructed result with an error result */
!         printfPQExpBuffer(&conn->errorMessage,
!                  libpq_gettext("unexpected field count in \"D\" message\n"));
!         pqSaveErrorResult(conn);
!         /* Discard the failed message by pretending we read it */
!         conn->inCursor = conn->inStart + 5 + msgLength;
!         return 0;
      }

      /* Scan the fields */
--- 695,746 ----

  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * We fill rowbuf with column pointers and then call the row processor.
!  * Returns: 0 if processed message successfully, EOF if suspended parsing.
!  * In either case, conn->inStart has been advanced past the message.
   *
!  * Note: the row processor could also choose to longjmp out of libpq,
!  * in which case the library's state must be the same as if we'd gotten
!  * a "suspend parsing" return and exited normally.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
      PGresult   *result = conn->result;
      int            nfields = result->numAttributes;
!     const char *errmsg;
!     PGdataValue *rowbuf;
      int            tupnfields;        /* # fields from tuple */
      int            vlen;            /* length of the current field value */
      int            i;

      /* Get the field count and make sure it's what we expect */
      if (pqGetInt(&tupnfields, 2, conn))
!     {
!         /* We should not run out of data here, so complain */
!         errmsg = libpq_gettext("insufficient data in \"D\" message");
!         goto advance_and_error;
!     }

      if (tupnfields != nfields)
      {
!         errmsg = libpq_gettext("unexpected field count in \"D\" message");
!         goto advance_and_error;
!     }
!
!     /* Resize row buffer if needed */
!     rowbuf = conn->rowBuf;
!     if (nfields > conn->rowBufLen)
!     {
!         rowbuf = (PGdataValue *) realloc(rowbuf,
!                                          nfields * sizeof(PGdataValue));
!         if (!rowbuf)
!         {
!             errmsg = NULL;        /* means "out of memory", see below */
!             goto advance_and_error;
!         }
!         conn->rowBuf = rowbuf;
!         conn->rowBufLen = nfields;
      }

      /* Scan the fields */
*************** getAnotherTuple(PGconn *conn, int msgLen
*** 661,714 ****
      {
          /* get the value length */
          if (pqGetInt(&vlen, 4, conn))
-             return EOF;
-         if (vlen == -1)
          {
!             /* null field */
!             tup[i].value = result->null_field;
!             tup[i].len = NULL_LEN;
!             continue;
          }
!         if (vlen < 0)
!             vlen = 0;
!         if (tup[i].value == NULL)
!         {
!             bool        isbinary = (result->attDescs[i].format != 0);

!             tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
!             if (tup[i].value == NULL)
!                 goto outOfMemory;
!         }
!         tup[i].len = vlen;
!         /* read in the value */
          if (vlen > 0)
!             if (pqGetnchar((char *) (tup[i].value), vlen, conn))
!                 return EOF;
!         /* we have to terminate this ourselves */
!         tup[i].value[vlen] = '\0';
      }

!     /* Success!  Store the completed tuple in the result */
!     if (!pqAddTuple(result, tup))
!         goto outOfMemory;
!     /* and reset for a new message */
!     conn->curTuple = NULL;

!     return 0;

! outOfMemory:

      /*
       * Replace partially constructed result with an error result. First
       * discard the old result to try to win back some memory.
       */
      pqClearAsyncResult(conn);
!     printfPQExpBuffer(&conn->errorMessage,
!                       libpq_gettext("out of memory for query result\n"));
      pqSaveErrorResult(conn);

!     /* Discard the failed message by pretending we read it */
!     conn->inCursor = conn->inStart + 5 + msgLength;
      return 0;
  }

--- 748,846 ----
      {
          /* get the value length */
          if (pqGetInt(&vlen, 4, conn))
          {
!             /* We should not run out of data here, so complain */
!             errmsg = libpq_gettext("insufficient data in \"D\" message");
!             goto advance_and_error;
          }
!         rowbuf[i].len = vlen;

!         /*
!          * rowbuf[i].value always points to the next address in the data
!          * buffer even if the value is NULL.  This allows row processors to
!          * estimate data sizes more easily.
!          */
!         rowbuf[i].value = conn->inBuffer + conn->inCursor;
!
!         /* Skip over the data value */
          if (vlen > 0)
!         {
!             if (pqSkipnchar(vlen, conn))
!             {
!                 /* We should not run out of data here, so complain */
!                 errmsg = libpq_gettext("insufficient data in \"D\" message");
!                 goto advance_and_error;
!             }
!         }
      }

!     /* Sanity check that we absorbed all the data */
!     if (conn->inCursor != conn->inStart + 5 + msgLength)
!     {
!         errmsg = libpq_gettext("extraneous data in \"D\" message");
!         goto advance_and_error;
!     }

!     /*
!      * Advance inStart to show that the "D" message has been processed.  We
!      * must do this before calling the row processor, in case it longjmps.
!      */
!     conn->inStart = conn->inCursor;

!     /* Pass the completed row values to rowProcessor */
!     errmsg = NULL;
!     switch ((*conn->rowProcessor) (result, rowbuf, &errmsg,
!                                    conn->rowProcessorParam))
!     {
!         case 1:
!             /* everything is good */
!             return 0;
!
!         case 0:
!             /* row processor requested suspension */
!             conn->asyncStatus = PGASYNC_SUSPENDED;
!             return EOF;
!
!         case -1:
!             /* error, report the errmsg below */
!             break;
!
!         default:
!             /* unrecognized return code */
!             errmsg = libpq_gettext("unrecognized return value from row processor");
!             break;
!     }
!     goto set_error_result;
!
! advance_and_error:
!     /* Discard the failed message by pretending we read it */
!     conn->inStart += 5 + msgLength;
!
! set_error_result:

      /*
       * Replace partially constructed result with an error result. First
       * discard the old result to try to win back some memory.
       */
      pqClearAsyncResult(conn);
!
!     /*
!      * If row processor didn't provide an error message, assume "out of
!      * memory" was meant.  The advantage of having this special case is that
!      * freeing the old result first greatly improves the odds that gettext()
!      * will succeed in providing a translation.
!      */
!     if (!errmsg)
!         errmsg = libpq_gettext("out of memory for query result");
!
!     printfPQExpBuffer(&conn->errorMessage, "%s\n", errmsg);
      pqSaveErrorResult(conn);

!     /*
!      * Return zero to allow input parsing to continue.  Subsequent "D"
!      * messages will be ignored until we get to end of data, since an error
!      * result is already set up.
!      */
      return 0;
  }

diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9e0d8b3c510c75f16ec79ace35ad7f3cd0..2800be6e97e9383176ba1f837abd32e04ca5a95f 100644
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
*************** extern        "C"
*** 38,50 ****

  /* Application-visible enum types */

  typedef enum
  {
-     /*
-      * Although it is okay to add to this list, values which become unused
-      * should never be removed, nor should constants be redefined - that would
-      * break compatibility with existing code.
-      */
      CONNECTION_OK,
      CONNECTION_BAD,
      /* Non-blocking mode only below here */
--- 38,51 ----

  /* Application-visible enum types */

+ /*
+  * Although it is okay to add to these lists, values which become unused
+  * should never be removed, nor should constants be redefined - that would
+  * break compatibility with existing code.
+  */
+
  typedef enum
  {
      CONNECTION_OK,
      CONNECTION_BAD,
      /* Non-blocking mode only below here */
*************** typedef enum
*** 89,95 ****
                                   * backend */
      PGRES_NONFATAL_ERROR,        /* notice or warning message */
      PGRES_FATAL_ERROR,            /* query failed */
!     PGRES_COPY_BOTH                /* Copy In/Out data transfer in progress */
  } ExecStatusType;

  typedef enum
--- 90,97 ----
                                   * backend */
      PGRES_NONFATAL_ERROR,        /* notice or warning message */
      PGRES_FATAL_ERROR,            /* query failed */
!     PGRES_COPY_BOTH,            /* Copy In/Out data transfer in progress */
!     PGRES_SUSPENDED                /* row processor requested suspension */
  } ExecStatusType;

  typedef enum
*************** typedef struct pg_conn PGconn;
*** 128,133 ****
--- 130,146 ----
   */
  typedef struct pg_result PGresult;

+ /* PGdataValue represents a data field value being passed to a row processor.
+  * It could be either text or binary data; text data is not zero-terminated.
+  * A SQL NULL is represented by len < 0; then value is still valid but there
+  * are no data bytes there.
+  */
+ typedef struct pgDataValue
+ {
+     int            len;            /* data length in bytes, or <0 if NULL */
+     const char *value;            /* data value, without zero-termination */
+ } PGdataValue;
+
  /* PGcancel encapsulates the information needed to cancel a running
   * query on an existing connection.
   * The contents of this struct are not supposed to be known to applications.
*************** typedef struct pgNotify
*** 149,154 ****
--- 162,171 ----
      struct pgNotify *next;        /* list link */
  } PGnotify;

+ /* Function type for row-processor callback */
+ typedef int (*PQrowProcessor) (PGresult *res, const PGdataValue *columns,
+                                const char **errmsg, void *param);
+
  /* Function types for notice-handling callbacks */
  typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
  typedef void (*PQnoticeProcessor) (void *arg, const char *message);
*************** extern int PQsendQueryPrepared(PGconn *c
*** 388,398 ****
--- 405,420 ----
                      const int *paramFormats,
                      int resultFormat);
  extern PGresult *PQgetResult(PGconn *conn);
+ extern PGresult *PQskipResult(PGconn *conn);

  /* Routines for managing an asynchronous query */
  extern int    PQisBusy(PGconn *conn);
  extern int    PQconsumeInput(PGconn *conn);

+ /* Override default per-row processing */
+ extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+ extern PQrowProcessor PQgetRowProcessor(const PGconn *conn, void **param);
+
  /* LISTEN/NOTIFY support */
  extern PGnotify *PQnotifies(PGconn *conn);

diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 2103af88329894d5cecee7f4de0214ab8613bbe4..b44ca29f65e2eedbe3e994608250a1489ca77ddf 100644
*** a/src/interfaces/libpq/libpq-int.h
--- b/src/interfaces/libpq/libpq-int.h
*************** typedef enum
*** 217,222 ****
--- 217,223 ----
      PGASYNC_IDLE,                /* nothing's happening, dude */
      PGASYNC_BUSY,                /* query in progress */
      PGASYNC_READY,                /* result ready for PQgetResult */
+     PGASYNC_SUSPENDED,            /* row processor requested suspension */
      PGASYNC_COPY_IN,            /* Copy In data transfer in progress */
      PGASYNC_COPY_OUT,            /* Copy Out data transfer in progress */
      PGASYNC_COPY_BOTH            /* Copy In/Out data transfer in progress */
*************** struct pg_conn
*** 324,329 ****
--- 325,334 ----
      /* Optional file to write trace info to */
      FILE       *Pfdebug;

+     /* Callback procedure for per-row processing */
+     PQrowProcessor rowProcessor;    /* function pointer */
+     void       *rowProcessorParam;    /* passthrough argument */
+
      /* Callback procedures for notice message processing */
      PGNoticeHooks noticeHooks;

*************** struct pg_conn
*** 396,404 ****
                                   * msg has no length word */
      int            outMsgEnd;        /* offset to msg end (so far) */

      /* Status for asynchronous result construction */
      PGresult   *result;            /* result being constructed */
!     PGresAttValue *curTuple;    /* tuple currently being read */

  #ifdef USE_SSL
      bool        allow_ssl_try;    /* Allowed to try SSL negotiation */
--- 401,414 ----
                                   * msg has no length word */
      int            outMsgEnd;        /* offset to msg end (so far) */

+     /* Row processor interface workspace */
+     PGdataValue *rowBuf;        /* array for passing values to rowProcessor */
+     int            rowBufLen;        /* number of entries allocated in rowBuf */
+
      /* Status for asynchronous result construction */
      PGresult   *result;            /* result being constructed */
!
!     /* Assorted state for SSL, GSS, etc */

  #ifdef USE_SSL
      bool        allow_ssl_try;    /* Allowed to try SSL negotiation */
*************** struct pg_conn
*** 435,441 ****
                                   * connection */
  #endif

-
      /* Buffer for current error message */
      PQExpBufferData errorMessage;        /* expansible string */

--- 445,450 ----
*************** extern void
*** 505,511 ****
  pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
  /* This lets gcc check the format string for consistency. */
  __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
- extern int    pqAddTuple(PGresult *res, PGresAttValue *tup);
  extern void pqSaveMessageField(PGresult *res, char code,
                     const char *value);
  extern void pqSaveParameterStatus(PGconn *conn, const char *name,
--- 514,519 ----
*************** extern int    pqGets(PQExpBuffer buf, PGcon
*** 558,563 ****
--- 566,572 ----
  extern int    pqGets_append(PQExpBuffer buf, PGconn *conn);
  extern int    pqPuts(const char *s, PGconn *conn);
  extern int    pqGetnchar(char *s, size_t len, PGconn *conn);
+ extern int    pqSkipnchar(size_t len, PGconn *conn);
  extern int    pqPutnchar(const char *s, size_t len, PGconn *conn);
  extern int    pqGetInt(int *result, size_t bytes, PGconn *conn);
  extern int    pqPutInt(int value, size_t bytes, PGconn *conn);

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

03 April 2012, 15:48:04

On Sun, Apr 01, 2012 at 07:23:06PM -0400, Tom Lane wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > So the proper approach would be to have new API call, designed to
> > handle it, and allow early-exit only from there.
> 
> > That would also avoid any breakage of old APIs.  Also it would avoid
> > any accidental data loss, if the user code does not have exactly
> > right sequence of calls.
> 
> > How about PQisBusy2(), which returns '2' when early-exit is requested?
> > Please suggest something better...
> 
> My proposal is way better than that.  You apparently aren't absorbing my
> point, which is that making this behavior unusable with every existing
> API (whether intentionally or by oversight) isn't an improvement.
> The row processor needs to be able to do this *without* assuming a
> particular usage style,

I don't get what kind of usage scenario you think of when you 
"early-exit without assuming anything about upper-level usage."

Could you show example code where it is useful?

The fact remains that upper-level code must cooperate with callback.
Why is it useful to hijack PQgetResult() to do so?  Especially as the
PGresult it returns is not useful in any way and the callback still needs
to use side channel to pass actual values to upper level.

IMHO it's much better to remove the concept of early-exit from public
API completely and instead give "get" style API that does the early-exit
internally.  See below for example.

> and most particularly it should not force people
> to use async mode.

Seems our concept of "async mode" is different.  I associate
PQisnonblocking() with it.  And old code, eg. PQrecvRow()
works fine in both modes.

> An alternative that I'd prefer to that one is to get rid of the
> suspension return mode altogether.

That might be good idea.  I would prefer to postpone controversial
features instead having hurried design for them.

Also - is there really need to make callback API ready for *all*
potential usage scenarios?  IMHO it would be better to make sure
it's good enough that higher-level and easier to use APIs can
be built on top of it.  And thats it.

> However, that leaves us needing
> to document what it means to longjmp out of a row processor without
> having any comparable API concept, so I don't really find it better.

Why do you think any new API concept is needed?  Why is following rule
not enough (for sync connection):
 "After exception you need to call PQgetResult() / PQfinish()".

Only thing that needs fixing is storing lastResult under PGconn
to support exceptions for PQexec().  (If we need to support
exceptions everywhere.)

---------------

Again, note that if we would provide PQrecvRow()-style API, then we *don't*
need early-exit in callback API.  Nor exceptions...

If current PQrecvRow() / PQgetRow() are too bloated for you, then how about
thin wrapper around PQisBusy():

/* 0 - need more data, 1 - have result, 2 - have row */
int PQhasRowOrResult(PGconn *conn, PGresult **hdr, PGrowValue **cols)
{   int gotrow = 0;   PQrowProcessor oldproc;   void *oldarg;   int busy;
   /* call PQisBusy with our callback */   oldproc = PQgetRowProcessor(conn, &oldarg);   PQsetRowProcessor(conn,
hasrow_cb,&flag);   busy = PQisBusy(conn);   PQsetRowProcessor(conn, oldproc, oldarg);

   if (gotrow)   {       *hdr = conn->result;       *cols = conn->rowBuf;       return 2;   }   return busy ? 0 : 1;
}

static int hasrow_cb(PGresult *res, PGrowValue *columns, void *param)
{   int *gotrow = param;   *gotrow = 1;   return 0;
}

Also, instead hasrow_cb(), we could have integer flag under PGconn that
getAnotherTuple() checks, thus getting rid if it even in internal
callback API.

Yes, it requires async-style usage pattern, but works for both
sync and async connections.  And it can be used to build even
simpler API on top of it.

Summary: it would be better to keep "early-exit" internal detail,
as the actual feature needed is processing rows outside of callback.
So why not provide API for it?

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

03 April 2012, 18:32:48

Marko Kreen <markokr@gmail.com> writes:
> The fact remains that upper-level code must cooperate with callback.
> Why is it useful to hijack PQgetResult() to do so?

Because that's the defined communications channel.  We're not
"hijacking" it.  If we're going to start using pejorative words here,
I will assert that your proposal hijacks PQisBusy to make it do
something substantially different than the traditional understanding
of it (more about that below).

> Seems our concept of "async mode" is different.  I associate
> PQisnonblocking() with it.

Well, there are really four levels to the API design:
* Plain old PQexec.* Break down PQexec into PQsendQuery and PQgetResult.* Avoid waiting in PQgetResult by testing
PQisBusy.*Avoid waiting in PQsendQuery (ie, avoid the risk of blocking  on socket writes) by using PQisnonblocking.

Any given app might choose to run at any one of those four levels,
although the first one probably isn't interesting for an app that would
care about using a suspending row processor.  But I see no reason that
we should design the suspension feature such that it doesn't work at the
second level.  PQisBusy is, and should be, an optional state-testing
function, not something that changes the set of things you can do with
the connection.

> IMHO it's much better to remove the concept of early-exit from public
> API completely and instead give "get" style API that does the early-exit
> internally.

If the row processor has an early-exit option, it hardly seems
reasonable to me to claim that that's not part of the public API.

In particular, I flat out will not accept a design in which that option
doesn't work unless the current call came via PQisBusy, much less some
entirely new call like PQhasRowOrResult.  It's unusably fragile (ie,
timing sensitive) if that has to be true.

There's another way in which I think your proposal breaks the existing
API, which is that without an internal PQASYNC_SUSPENDED state that is
cleared by PQgetResult, it's unsafe to probe PQisBusy multiple times
before doing something useful.  That shouldn't be the case.  PQisBusy
is supposed to test whether data is ready for the application to do
something with --- it should not throw away data until a separate call
has been made to consume the data.  Changing its behavior like that
would risk creating subtle bugs in existing event-loop logic.  A
possibly useful analogy is that select() and poll() don't clear a
socket's read-ready state, you have to do a separate read() to do that.
There are good reasons for that separation of actions.

> Again, note that if we would provide PQrecvRow()-style API, then we *don't*
> need early-exit in callback API.  Nor exceptions...

AFAICS, we *do* need exceptions for dblink's usage.  So most of what's
at stake here is not avoidable, unless you're proposing we put this
whole set of patches off till 9.3 so we can think it over some more.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

03 April 2012, 19:21:03

On Tue, Apr 03, 2012 at 05:32:25PM -0400, Tom Lane wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > The fact remains that upper-level code must cooperate with callback.
> > Why is it useful to hijack PQgetResult() to do so?
> 
> Because that's the defined communications channel.  We're not
> "hijacking" it.  If we're going to start using pejorative words here,
> I will assert that your proposal hijacks PQisBusy to make it do
> something substantially different than the traditional understanding
> of it (more about that below).

And I would agree with it - and I already did:
   Seems we both lost sight of actual usage scenario for the early-exit   logic - that both callback and upper-level
code*must* cooperate   for it to be useful.  Instead, we designed API for non-cooperating case,   which is wrong.

> > Seems our concept of "async mode" is different.  I associate
> > PQisnonblocking() with it.
> 
> Well, there are really four levels to the API design:
> 
>     * Plain old PQexec.
>     * Break down PQexec into PQsendQuery and PQgetResult.
>     * Avoid waiting in PQgetResult by testing PQisBusy.
>     * Avoid waiting in PQsendQuery (ie, avoid the risk of blocking
>       on socket writes) by using PQisnonblocking.
> 
> Any given app might choose to run at any one of those four levels,
> although the first one probably isn't interesting for an app that would
> care about using a suspending row processor.  But I see no reason that
> we should design the suspension feature such that it doesn't work at the
> second level.  PQisBusy is, and should be, an optional state-testing
> function, not something that changes the set of things you can do with
> the connection.

Thats actually nice overview.  I think our basic disagreement comes
from how we map the early-exit into those modes.

I want to think of the early-exit row-processing as 5th and 6th modes:

* Row-by-row processing on sync connection (PQsendQuery() + ???)
* Row-by-row processing on async connection (PQsendQuery() + ???)

But instead you want work with almost no changes on existing modes.

And I don't like it because as I've said it previously, the upper
level must know about callback and handle it properly, so it does
not make sense keep existing loop structure in stone.

Instead we should design the new mode that is logical for user
and also logical inside libpq.

> > IMHO it's much better to remove the concept of early-exit from public
> > API completely and instead give "get" style API that does the early-exit
> > internally.
> 
> If the row processor has an early-exit option, it hardly seems
> reasonable to me to claim that that's not part of the public API.

Please keep in mind the final goal - nobody is interested in
"early-exit" on it's own, the interesting goal is processing rows
outside of libpq.

And the early-exit return code is clumsy hack to make it possible
with callbacks.

But why should we provide complex way of achieving something,
when we can provide easy and direct way?

> In particular, I flat out will not accept a design in which that option
> doesn't work unless the current call came via PQisBusy, much less some
> entirely new call like PQhasRowOrResult.  It's unusably fragile (ie,
> timing sensitive) if that has to be true.

Agreed for PQisBusy, but why is PQhasRowOrResult() fragile?

It's easy to make PQhasRowOrResult more robust - let it set flag under
PGconn and let getAnotherTuple() do early exit on it's own, thus keeping
callback completely out of loop.  Thus avoiding any chance user callback
can accidentally trigger the behaviour.

> There's another way in which I think your proposal breaks the existing
> API, which is that without an internal PQASYNC_SUSPENDED state that is
> cleared by PQgetResult, it's unsafe to probe PQisBusy multiple times
> before doing something useful.  That shouldn't be the case.  PQisBusy
> is supposed to test whether data is ready for the application to do
> something with --- it should not throw away data until a separate call
> has been made to consume the data.  Changing its behavior like that
> would risk creating subtle bugs in existing event-loop logic.  A
> possibly useful analogy is that select() and poll() don't clear a
> socket's read-ready state, you have to do a separate read() to do that.
> There are good reasons for that separation of actions.

It does look like argument for new "modes" for row-by-row processing,
preferably with new API calls.

Because adding "magic" to existing APIs seems to only cause confusion.

> > Again, note that if we would provide PQrecvRow()-style API, then we *don't*
> > need early-exit in callback API.  Nor exceptions...
> 
> AFAICS, we *do* need exceptions for dblink's usage.  So most of what's
> at stake here is not avoidable, unless you're proposing we put this
> whole set of patches off till 9.3 so we can think it over some more.

My point was that if we have an API for processing rows outside libpq,
the need for exceptions for callbacks is mostly gone.  Converting dblink
to PQhasRowOrResult() should not be hard.

I don't mind keeping exceptions, but only if we don't need to add extra
complexity just for them.

-- 
marko

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

04 April 2012, 11:46:47

Marko Kreen <markokr@gmail.com> writes:
> On Tue, Apr 03, 2012 at 05:32:25PM -0400, Tom Lane wrote:
>> Well, there are really four levels to the API design:
>> * Plain old PQexec.
>> * Break down PQexec into PQsendQuery and PQgetResult.
>> * Avoid waiting in PQgetResult by testing PQisBusy.
>> * Avoid waiting in PQsendQuery (ie, avoid the risk of blocking
>>   on socket writes) by using PQisnonblocking.

> Thats actually nice overview.  I think our basic disagreement comes
> from how we map the early-exit into those modes.
> I want to think of the early-exit row-processing as 5th and 6th modes:

> * Row-by-row processing on sync connection (PQsendQuery() + ???)
> * Row-by-row processing on async connection (PQsendQuery() + ???)

> But instead you want work with almost no changes on existing modes.

Well, the trouble with the proposed PQgetRow/PQrecvRow is that they only
work safely at the second API level.  They're completely unsafe to use
with PQisBusy, and I think that is a show-stopper.  In your own terms,
the "6th mode" doesn't work.

More generally, it's not very safe to change the row processor while a
query is in progress.  PQskipResult can get away with doing so, but only
because the entire point of that function is to lose data, and we don't
much care whether some rows already got handled differently.  For every
other use-case, you have to set up the row processor in advance and
leave it in place, which is a guideline that PQgetRow/PQrecvRow violate.

So I think the only way to use row-by-row processing is to permanently
install a row processor that normally returns zero.  It's possible that
we could provide a predefined row processor that acts that way and
invite people to install it.  However, I think it's premature to suppose
that we know all the details of how somebody might want to use this.
In particular the notion of cloning the PGresult for each row seems
expensive and not obviously more useful than direct access to the
network buffer.  So I'd rather leave it as-is and see if any common
usage patterns arise, then add support for those patterns.

>> In particular, I flat out will not accept a design in which that option
>> doesn't work unless the current call came via PQisBusy, much less some
>> entirely new call like PQhasRowOrResult.  It's unusably fragile (ie,
>> timing sensitive) if that has to be true.

> Agreed for PQisBusy, but why is PQhasRowOrResult() fragile?

Because it breaks if you use PQisBusy *anywhere* in the application.
That's not just a bug hazard but a loss of functionality.  I think it's
important to have a pure "is data available" state test function that
doesn't cause data to be consumed from the connection, and there's no
way to have that if there are API functions that change the row
processor setting mid-query.  (Another way to say this is that PQisBusy
ought to be idempotent from the standpoint of the application --- we
know that it does perform work inside libpq, but it doesn't change the
state of the connection so far as the app can tell, and so it doesn't
matter if you call it zero, one, or N times between other calls.)
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

04 April 2012, 14:29:03

Hello, This is the new version of dblink patch.

- Calling dblink_is_busy prevents row processor from being used.

- some PGresult leak fixed.

- Rebased to current head.

> A hack on top of that hack would be to collect the data into a
> tuplestore that contains all text columns, and then convert to the
> correct rowtype during dblink_get_result, but that seems rather ugly
> and not terribly high-performance.
>
> What I'm currently thinking we should do is just use the old method
> for async queries, and only optimize the synchronous case.

Ok, I agree with you except for performance issue. I give up to use
row processor for async query with dblink_is_busy called.

> I thought for awhile that this might represent a generic deficiency
> in the whole concept of a row processor, but probably it's mostly
> down to dblink's rather bizarre API.  It would be unusual I think for
> people to want a row processor that couldn't know what to do until
> after the entire query result is received.

I hope so. Thank you.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

dblink_rowproc_20120405.patch

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

04 April 2012, 16:17:59

Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp> writes:
>> What I'm currently thinking we should do is just use the old method
>> for async queries, and only optimize the synchronous case.

> Ok, I agree with you except for performance issue. I give up to use
> row processor for async query with dblink_is_busy called.

Yeah, that seems like a reasonable idea.


Given the lack of consensus around the suspension API, maybe the best
way to get the underlying libpq patch to a committable state is to take
it out --- that is, remove the "return zero" option for row processors.
Since we don't have a test case for it in dblink, it's hard to escape
the feeling that we may be expending a lot of effort for something that
nobody really wants, and/or misdesigning it for lack of a concrete use
case.  Is anybody going to be really unhappy if that part of the patch
gets left behind?
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

04 April 2012, 18:18:58

On Wed, Apr 4, 2012 at 10:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Given the lack of consensus around the suspension API, maybe the best
> way to get the underlying libpq patch to a committable state is to take
> it out --- that is, remove the "return zero" option for row processors.
> Since we don't have a test case for it in dblink, it's hard to escape
> the feeling that we may be expending a lot of effort for something that
> nobody really wants, and/or misdesigning it for lack of a concrete use
> case.  Is anybody going to be really unhappy if that part of the patch
> gets left behind?

Agreed.

--
marko

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

04 April 2012, 19:04:37

I'm afraid not re-initializing materialize_needed for the next query
in the latest dblink patch.
I will confirm that and send the another one if needed in a few hours.

# I need to catch the train I usually get on..

> Hello, This is the new version of dblink patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

04 April 2012, 19:41:20

Marko Kreen <markokr@gmail.com> writes:
> On Wed, Apr 4, 2012 at 10:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Given the lack of consensus around the suspension API, maybe the best
>> way to get the underlying libpq patch to a committable state is to take
>> it out --- that is, remove the "return zero" option for row processors.

> Agreed.

Done that way.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

04 April 2012, 19:42:45

Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp> writes:
> I'm afraid not re-initializing materialize_needed for the next query
> in the latest dblink patch.
> I will confirm that and send the another one if needed in a few hours.

I've committed a revised version of the previous patch.  I'm not sure
that the case of dblink_is_busy not being used is interesting enough
to justify contorting the logic, and I'm worried about introducing
corner case bugs for that.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Kyotaro HORIGUCHI

Date:

04 April 2012, 21:29:36

> > I'm afraid not re-initializing materialize_needed for the next query
> > in the latest dblink patch.

I've found no need to worry about the re-initializing issue.

> I've committed a revised version of the previous patch.

Thank you for that. 

> I'm not sure that the case of dblink_is_busy not being used is
> interesting enough to justify contorting the logic, and I'm
> worried about introducing corner case bugs for that.

I'm afraid of indefinite state by mixing sync and async queries
or API call sequence for async query out of my expectations
(which is rather narrow).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

05 April 2012, 13:30:27

On Wed, Apr 04, 2012 at 06:41:00PM -0400, Tom Lane wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > On Wed, Apr 4, 2012 at 10:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Given the lack of consensus around the suspension API, maybe the best
> >> way to get the underlying libpq patch to a committable state is to take
> >> it out --- that is, remove the "return zero" option for row processors.
>
> > Agreed.
>
> Done that way.

Minor cleanups:

* Change callback return value to be 'bool': 0 is error.
  Currently the accepted return codes are 1 and -1,
  which is weird.

  If we happen to have the 'early-exit' logic in the future,
  it should not work via callback return code.  So keeping the 0
  in reserve is unnecessary.

* Support exceptions in multi-statement PQexec() by storing
  finished result under PGconn temporarily.  Without doing it,
  the result can leak if callback longjmps while processing
  next result.

* Add <caution> to docs for permanently keeping custom callback.

  This API fragility is also reason why early-exit (if it appears)
  should not work via callback - instead it should give safer API.

--
marko

Attachment

rowproc-cleanups.diff

Re: Speed dblink using alternate libpq tuple storage

From

Tom Lane

Date:

05 April 2012, 13:50:01

Marko Kreen <markokr@gmail.com> writes:
> Minor cleanups:

> * Change callback return value to be 'bool': 0 is error.
>   Currently the accepted return codes are 1 and -1,
>   which is weird.

No, I did it that way intentionally, with the thought that we might add
back return code zero (or other return codes) in the future.  If we go
with bool then sensible expansion is impossible, or at least ugly.
(I think it was you that objected to 0/1/2 in the first place, no?)

>   If we happen to have the 'early-exit' logic in the future,
>   it should not work via callback return code.

This assumption seems unproven to me, and even if it were,
it doesn't mean we will never have any other exit codes.

> * Support exceptions in multi-statement PQexec() by storing
>   finished result under PGconn temporarily.  Without doing it,
>   the result can leak if callback longjmps while processing
>   next result.

I'm unconvinced this is a good thing either.
        regards, tom lane

Re: Speed dblink using alternate libpq tuple storage

From

Marko Kreen

Date:

05 April 2012, 14:05:37

On Thu, Apr 05, 2012 at 12:49:37PM -0400, Tom Lane wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > Minor cleanups:
> 
> > * Change callback return value to be 'bool': 0 is error.
> >   Currently the accepted return codes are 1 and -1,
> >   which is weird.
> 
> No, I did it that way intentionally, with the thought that we might add
> back return code zero (or other return codes) in the future.  If we go
> with bool then sensible expansion is impossible, or at least ugly.
> (I think it was you that objected to 0/1/2 in the first place, no?)

Well, I was the one who added the 0/1/2 in the first place,
then I noticed that -1/0/1 works better as a triple.

But the early-exit from callback creates unnecessary fragility
so now I'm convinced we don't want to do it that way.

> >   If we happen to have the 'early-exit' logic in the future,
> >   it should not work via callback return code.
> 
> This assumption seems unproven to me, and even if it were,
> it doesn't mean we will never have any other exit codes.

I cannot even imagine why we would want new codes, so it seems
like unnecessary mess in API.

But it's a minor issue, so if you intend to keep it, I won't
push it further.

> > * Support exceptions in multi-statement PQexec() by storing
> >   finished result under PGconn temporarily.  Without doing it,
> >   the result can leak if callback longjmps while processing
> >   next result.
> 
> I'm unconvinced this is a good thing either.

This is less minor issue.  Do we support longjmp() or not?

Why are you convinced that it's not needed?

-- 
marko