Re: Why is pq_begintypsend so slow? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Why is pq_begintypsend so slow?
Date
Msg-id 6648.1589819885@sss.pgh.pa.us
Whole thread Raw
In response to Re: Why is pq_begintypsend so slow?  (Andres Freund <andres@anarazel.de>)
Responses Re: Why is pq_begintypsend so slow?
Re: Why is pq_begintypsend so slow?
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
>> FWIW, I've also observed, in another thread (the node func generation
>> thing [1]), that inlining enlargeStringInfo() helps a lot, especially
>> when inlining some of its callers. Moving e.g. appendStringInfo() inline
>> allows the compiler to sometimes optimize away the strlen. But if
>> e.g. an inlined appendBinaryStringInfo() still calls enlargeStringInfo()
>> unconditionally, successive appends cannot optimize away memory accesses
>> for ->len/->data.

> With a set of patches doing so, int4send itself is not a significant
> factor for my test benchmark [1] anymore.

This thread seems to have died out, possibly because the last set of
patches that Andres posted was sufficiently complicated and invasive
that nobody wanted to review it.  I thought about this again after
seeing that [1] is mostly about pq_begintypsend overhead, and had
an epiphany: there isn't really a strong reason for pq_begintypsend
to be inserting bits into the buffer at all.  The bytes will be
filled by pq_endtypsend, and nothing in between should be touching
them.  So I propose 0001 attached.  It's poking into the stringinfo
abstraction a bit more than I would want to do if there weren't a
compelling performance reason to do so, but there evidently is.

With 0001, pq_begintypsend drops from being the top single routine
in a profile of a test case like [1] to being well down the list.
The next biggest cost compared to text-format output is that
printtup() itself is noticeably more expensive.  A lot of the extra
cost there seems to be from pq_sendint32(), which is getting inlined
into printtup(), and there probably isn't much we can do to make that
cheaper. But eliminating a common subexpression as in 0002 below does
help noticeably, at least with the rather old gcc I'm using.

For me, the combination of these two eliminates most but not quite
all of the cost penalty of binary over text output as seen in [1].

            regards, tom lane

[1] https://www.postgresql.org/message-id/CAMovtNoHFod2jMAKQjjxv209PCTJx5Kc66anwWvX0mEiaXwgmA%40mail.gmail.com

diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c
index a6f990c..03b7404 100644
--- a/src/backend/libpq/pqformat.c
+++ b/src/backend/libpq/pqformat.c
@@ -328,11 +328,16 @@ void
 pq_begintypsend(StringInfo buf)
 {
     initStringInfo(buf);
-    /* Reserve four bytes for the bytea length word */
-    appendStringInfoCharMacro(buf, '\0');
-    appendStringInfoCharMacro(buf, '\0');
-    appendStringInfoCharMacro(buf, '\0');
-    appendStringInfoCharMacro(buf, '\0');
+
+    /*
+     * Reserve four bytes for the bytea length word.  We don't need to fill
+     * them with anything (pq_endtypsend will do that), and this function is
+     * enough of a hot spot that it's worth cheating to save some cycles. Note
+     * in particular that we don't bother to guarantee that the buffer is
+     * null-terminated.
+     */
+    Assert(buf->maxlen > 4);
+    buf->len = 4;
 }
 
 /* --------------------------------
diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index dd1bac0..a9315c6 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -438,11 +438,12 @@ printtup(TupleTableSlot *slot, DestReceiver *self)
         {
             /* Binary output */
             bytea       *outputbytes;
+            int            outputlen;

             outputbytes = SendFunctionCall(&thisState->finfo, attr);
-            pq_sendint32(buf, VARSIZE(outputbytes) - VARHDRSZ);
-            pq_sendbytes(buf, VARDATA(outputbytes),
-                         VARSIZE(outputbytes) - VARHDRSZ);
+            outputlen = VARSIZE(outputbytes) - VARHDRSZ;
+            pq_sendint32(buf, outputlen);
+            pq_sendbytes(buf, VARDATA(outputbytes), outputlen);
         }
     }


pgsql-hackers by date:

Previous
From: Luke Porter
Date:
Subject: PostgresSQL project
Next
From: Alvaro Herrera
Date:
Subject: Re: Missing grammar production for WITH TIES