Re: Why is pq_begintypsend so slow? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Why is pq_begintypsend so slow? |
Date | |
Msg-id | 6648.1589819885@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Why is pq_begintypsend so slow? (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Why is pq_begintypsend so slow?
Re: Why is pq_begintypsend so slow? |
List | pgsql-hackers |
Andres Freund <andres@anarazel.de> writes: >> FWIW, I've also observed, in another thread (the node func generation >> thing [1]), that inlining enlargeStringInfo() helps a lot, especially >> when inlining some of its callers. Moving e.g. appendStringInfo() inline >> allows the compiler to sometimes optimize away the strlen. But if >> e.g. an inlined appendBinaryStringInfo() still calls enlargeStringInfo() >> unconditionally, successive appends cannot optimize away memory accesses >> for ->len/->data. > With a set of patches doing so, int4send itself is not a significant > factor for my test benchmark [1] anymore. This thread seems to have died out, possibly because the last set of patches that Andres posted was sufficiently complicated and invasive that nobody wanted to review it. I thought about this again after seeing that [1] is mostly about pq_begintypsend overhead, and had an epiphany: there isn't really a strong reason for pq_begintypsend to be inserting bits into the buffer at all. The bytes will be filled by pq_endtypsend, and nothing in between should be touching them. So I propose 0001 attached. It's poking into the stringinfo abstraction a bit more than I would want to do if there weren't a compelling performance reason to do so, but there evidently is. With 0001, pq_begintypsend drops from being the top single routine in a profile of a test case like [1] to being well down the list. The next biggest cost compared to text-format output is that printtup() itself is noticeably more expensive. A lot of the extra cost there seems to be from pq_sendint32(), which is getting inlined into printtup(), and there probably isn't much we can do to make that cheaper. But eliminating a common subexpression as in 0002 below does help noticeably, at least with the rather old gcc I'm using. For me, the combination of these two eliminates most but not quite all of the cost penalty of binary over text output as seen in [1]. regards, tom lane [1] https://www.postgresql.org/message-id/CAMovtNoHFod2jMAKQjjxv209PCTJx5Kc66anwWvX0mEiaXwgmA%40mail.gmail.com diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c index a6f990c..03b7404 100644 --- a/src/backend/libpq/pqformat.c +++ b/src/backend/libpq/pqformat.c @@ -328,11 +328,16 @@ void pq_begintypsend(StringInfo buf) { initStringInfo(buf); - /* Reserve four bytes for the bytea length word */ - appendStringInfoCharMacro(buf, '\0'); - appendStringInfoCharMacro(buf, '\0'); - appendStringInfoCharMacro(buf, '\0'); - appendStringInfoCharMacro(buf, '\0'); + + /* + * Reserve four bytes for the bytea length word. We don't need to fill + * them with anything (pq_endtypsend will do that), and this function is + * enough of a hot spot that it's worth cheating to save some cycles. Note + * in particular that we don't bother to guarantee that the buffer is + * null-terminated. + */ + Assert(buf->maxlen > 4); + buf->len = 4; } /* -------------------------------- diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c index dd1bac0..a9315c6 100644 --- a/src/backend/access/common/printtup.c +++ b/src/backend/access/common/printtup.c @@ -438,11 +438,12 @@ printtup(TupleTableSlot *slot, DestReceiver *self) { /* Binary output */ bytea *outputbytes; + int outputlen; outputbytes = SendFunctionCall(&thisState->finfo, attr); - pq_sendint32(buf, VARSIZE(outputbytes) - VARHDRSZ); - pq_sendbytes(buf, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); + outputlen = VARSIZE(outputbytes) - VARHDRSZ; + pq_sendint32(buf, outputlen); + pq_sendbytes(buf, VARDATA(outputbytes), outputlen); } }
pgsql-hackers by date: