Thread: [COMMITTERS] pgsql: Speedup pgstat_report_activity by moving mb-aware truncationto

Speedup pgstat_report_activity by moving mb-aware truncation to read side.

Previously multi-byte aware truncation was done on every
pgstat_report_activity() call - proving to be a bottleneck for
workloads with long query strings that execute quickly.

Instead move the truncation to the read side, which commonly is
executed far less frequently. That's possible because all server
encodings allow to determine the length of a multi-byte string from
the first byte.

Rename PgBackendStatus.st_activity to st_activity_raw so existing
extension users of the field break - their code has to be adjusted to
use pgstat_clip_activity().

Author: Andres Freund
Tested-By: Khuntal Ghosh
Reviewed-By: Robert Haas, Tom Lane
Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkviecpz@alap3.anarazel.de

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/54b6cd589ac2f5635a42511236a5eb7299e2dcaf

Modified Files
--------------
src/backend/postmaster/pgstat.c     | 63 ++++++++++++++++++++++++++++---------
src/backend/utils/adt/pgstatfuncs.c | 17 +++++++---
src/include/pgstat.h                | 12 +++++--
3 files changed, 72 insertions(+), 20 deletions(-)


--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Andres Freund <andres@anarazel.de> writes:
> Speedup pgstat_report_activity by moving mb-aware truncation to read side.

Looks like you're going to need to not depend on strnlen().
        regards, tom lane


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

On 2017-09-19 16:25:31 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Speedup pgstat_report_activity by moving mb-aware truncation to read side.
> 
> Looks like you're going to need to not depend on strnlen().

Yup noticed - we have pg_strnlen as a static function in
src/port/snprintf. I'm tempted to just make that public, rather than
open code the equivalent?

Greetings,

Andres Freund


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Andres Freund <andres@anarazel.de> writes:
> On 2017-09-19 16:25:31 -0400, Tom Lane wrote:
>> Looks like you're going to need to not depend on strnlen().

> Yup noticed - we have pg_strnlen as a static function in
> src/port/snprintf. I'm tempted to just make that public, rather than
> open code the equivalent?

No, we should not tie this to whether we're using port/snprintf.c.

Personally, I don't see why this isn't coded more like
int rawlen = strlen(activity);
rawlen = Min(rawlen, pgstat_track_activity_query_size - 1);

It seems likely to me that strlen() is going to be optimized a lot
harder than strnlen() on most platforms, so that this way would win
not only for strings up to pgstat_track_activity_query_size but for
strings some way beyond that.  Yes, for really really long queries
the strnlen way would win, but that's optimizing for the wrong case IMO.

If you really insist on strnlen() then we're going to have to move
it to its own src/port/ file with a configure test.
        regards, tom lane


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

On 2017-09-19 16:53:09 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-09-19 16:25:31 -0400, Tom Lane wrote:
> >> Looks like you're going to need to not depend on strnlen().
> 
> > Yup noticed - we have pg_strnlen as a static function in
> > src/port/snprintf. I'm tempted to just make that public, rather than
> > open code the equivalent?
> 
> No, we should not tie this to whether we're using port/snprintf.c.

Oh, I would have moved it to src/common/string.c, sorry for not
mentioning that.


> Personally, I don't see why this isn't coded more like
> 
>     int rawlen = strlen(activity);
> 
>     rawlen = Min(rawlen, pgstat_track_activity_query_size - 1);

Pure paranoia, but probably not justified paranoia. We take care that
there's always a trailing NULL byte, even under concurrent
modifications (which exist due to pgstat_get_backend_current_activity()).

I'll just change things so that the pnstrdup() happens first, and then
do a plain strlen() on that. Same paranoia, less cost.


> It seems likely to me that strlen() is going to be optimized a lot
> harder than strnlen() on most platforms, so that this way would win
> not only for strings up to pgstat_track_activity_query_size but for
> strings some way beyond that.  Yes, for really really long queries
> the strnlen way would win, but that's optimizing for the wrong case IMO.

I wasn't thinking of optimizing this - and I've a hard time seing a case
where even the naive snprintf pg_strnlen(), or a plain strlen() would be
relevant performancewise.

Greetings,

Andres Freund


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Andres Freund <andres@anarazel.de> writes:
> I'll just change things so that the pnstrdup() happens first, and then
> do a plain strlen() on that. Same paranoia, less cost.

WFM.
        regards, tom lane


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers