False sharing for PgBackendStatus, made worse by in-core query_id handling - Mailing list pgsql-hackers

From Andres Freund
Subject False sharing for PgBackendStatus, made worse by in-core query_id handling
Date
Msg-id 20230627013458.axge7iylw7llyvww@awork3.anarazel.de
Whole thread Raw
List pgsql-hackers
Hi,

On Twitter Thomas thoroughly nerdsniped me [1]. As part of that I ran a
concurrent readonly pgbench workload and analyzed cacheline "contention" using
perf c2c.

One of the cacheline conflicts, by far not the most common, but significant,
is one I hadn't noticed in the past. The cacheline accesses are by
pgstat_report_query_id() and pgstat_report_activity().

The reason for the conflict is simple - we don't ensure any aligment for
PgBackendStatus. Thus the end of one backend's PgBackendStatus will be in the
same cacheline as another backend's start of PgBackendStatus.

Historically that didn't show up too much, because the end of PgBackendStatus
happened to contain less-frequently changing data since at least 9.6, namely
    int64        st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];

which effectively avoided any relevant false sharing.


But in 4f0b0966c866 a new trailing element was added to PgBackendStatus:

    /* query identifier, optionally computed using post_parse_analyze_hook */
    uint64        st_query_id;

which is very frequently set, due to the following in ExecutorStart:
    /*
     * In some cases (e.g. an EXECUTE statement) a query execution will skip
     * parse analysis, which means that the query_id won't be reported.  Note
     * that it's harmless to report the query_id multiple times, as the call
     * will be ignored if the top level query_id has already been reported.
     */
    pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);



The benchmarks I ran used -c 48 -j 48 clients on my two socket workstation, 2x
10/20 cores/threads.

With a default pgbench -S workload, the regression is barely visible - the
context switches between pgbench and backend use too many resources. But a
pipelined pgbench -S shows a 1-2% regression and server-side query execution
of a simple statement [2] regresses by ~5.5%.

Note that this is with compute_query_id = auto, without any extensions
loaded.

The fix for this is quite simple, something like:
#ifdef pg_attribute_aligned
    pg_attribute_aligned(PG_CACHE_LINE_SIZE)
#endif

at the start of PgBackendStatus.


Unfortunately we can't fix that in the backbranches, as it obviously is an ABI
violation.


Leaving the performance issue aside for a moment, I'm somewhat confused by the
maintenance of PgBackendStatus->st_query_id:

1) Why are there pgstat_report_query_id() calls in parse_analyze_*()? We aren't
   executing the statements at that point?

2) pgstat_report_query_id() doesn't overwrite a non-zero query_id unless force
   is passed in. Force is only passed in exec_simple_query(). query_id is also
   reset when pgstat_report_activity(STATE_RUNNING) is called.

   I think this means that e.g. bgworkers issuing queries will often get stuck
   on the first query_id used, unless they call pgstat_report_activity()?

Greetings,

Andres Freund

[1] https://twitter.com/MengTangmu/status/1673439083518115840
[2] DO $$ BEGIN FOR i IN 1..10000 LOOP EXECUTE 'SELECT'; END LOOP;END;$$;



pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL
Next
From: Andres Freund
Date:
Subject: ReadRecentBuffer() doesn't scale well