Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead
Date
Msg-id 20210824.151228.73442993506757234.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-hackers
At Wed, 18 Aug 2021 05:16:38 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in 
> On Tue, 2021-08-17 at 02:14 -0700, Andres Freund wrote:
> > On 2021-08-17 10:44:51 +0200, Laurenz Albe wrote:
> > > On Sun, 2021-08-01 at 13:55 -0700, Andres Freund wrote:
> > > > We maintain last_report as GetCurrentTransactionStopTimestamp(), but then use
> > > > a separate timestamp in pgstat_send_connstats() to compute the difference from
> > > > last_report, which is then set to GetCurrentTransactionStopTimestamp()'s
> > > > return value.
> > > 
> > > Makes sense to me.  How about passing "now", which was just initialized from
> > > GetCurrentTransactionStopTimestamp(), as additional parameter to
> > > pgstat_send_connstats() and use that value instead of taking the current time?
> > 
> > Yes.
> 
> Here is a patch for that.

FWIW, looks good to me.

> > > > I'm also not all that happy with sending yet another UDP packet for this.
> > > 
> > > Are you suggesting that connection statistics should be shoehorned into
> > > some other statistics message?  That would reduce the number of UDP packets,
> > > but it sounds ugly and confusing to me.
> > 
> > That ship already has sailed. Look at struct PgStat_MsgTabstat
> > 
> > Given that we transport number of commits/commits, block read/write time
> > adding the time the connection was active/inactive doesn't really seem like it
> > makes things meaningfully worse?
> 
> Point taken.
> 
> I looked at the other statistics sent in pgstat_report_stat(), and I see
> none that are sent unconditionally.  Are you thinking of this:

IIUC, that means that pg_stat_report sends at least one
PgStat_MsgTabstat struct for the database stats purpose if any stats
are sent.  So the connection stats can piggy-back on the packet.

>     /*
>      * Send partial messages.  Make sure that any pending xact commit/abort
>      * gets counted, even if there are no table stats to send.
>      */
>     if (regular_msg.m_nentries > 0 ||
>         pgStatXactCommit > 0 || pgStatXactRollback > 0)
>         pgstat_send_tabstat(®ular_msg);
>     if (shared_msg.m_nentries > 0)
>         pgstat_send_tabstat(&shared_msg);
> 
> I can't think of a way to hack this up that wouldn't make my stomach turn.

No need to change the condition. It's sufficient that the connection
stats are sent at the same time with transaction stats are sent.

> > > > Alternatively we could just not send an update to connection stats every 500ms
> > > > for every active connection, but only do so at connection end. The database
> > > > stats would only contain stats for disconnected sessions, while the stats for
> > > > "live" connections are maintained via backend_status.c.
> > > 
> > > That was my original take, but if I remember right, Magnus convinced me that
> > > it would be more useful to have statistics for open sessions as well.
> > > With a connection pool, connections can stay open for a very long time,
> > > and the accuracy and usefulness of the statistics would become questionable.
> > 
> > That's not a contradiction to what I propose. Having the data available via
> > backend_status.c allows to sum up the data for active connections and the data
> > for past connections.
> > 
> > I think it's also just cleaner to not constantly report changing preliminary
> > data as pgstat_send_connstats() does.
> 
> Currently, the data are kept in static variables in the backend process.
> That would have to change for such an approach, right?

Total session time can be summarized from beentry any time, but I'm
not sure how we can summarize active/idle time.. Anyway it's not
needed if the attached works.

> > Doubling the number of UDP messages in common workloads seems also problematic
> > enough that it should be addressed for 14.
> 
> Ok, but I don't know how to go about it.

The attached is a heavy-WIP on:

- remove redundant gettimeofday().
- avoid sending dedicate UCP packet for connection stats.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a3c35bdf60..f48ee80fc7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -329,11 +329,11 @@ static bool pgstat_db_requested(Oid databaseid);
 static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
 static void pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotstats, TimestampTz ts);
 
-static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
+static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, bool disconnect,
+                                TimestampTz last_report, TimestampTz now);
 static void pgstat_send_funcstats(void);
 static void pgstat_send_slru(void);
 static HTAB *pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid);
-static void pgstat_send_connstats(bool disconnect, TimestampTz last_report);
 
 static PgStat_TableStatus *get_tabstat_entry(Oid rel_id, bool isshared);
 
@@ -365,7 +365,6 @@ static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len);
-static void pgstat_recv_connstat(PgStat_MsgConn *msg, int len);
 static void pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
@@ -889,12 +888,6 @@ pgstat_report_stat(bool disconnect)
         !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
         return;
 
-    /* for backends, send connection statistics */
-    if (MyBackendType == B_BACKEND)
-        pgstat_send_connstats(disconnect, last_report);
-
-    last_report = now;
-
     /*
      * Destroy pgStatTabHash before we start invalidating PgStat_TableEntry
      * entries it points to.  (Should we fail partway through the loop below,
@@ -946,7 +939,7 @@ pgstat_report_stat(bool disconnect)
                    sizeof(PgStat_TableCounts));
             if (++this_msg->m_nentries >= PGSTAT_NUM_TABENTRIES)
             {
-                pgstat_send_tabstat(this_msg);
+                pgstat_send_tabstat(this_msg, disconnect, last_report, now);
                 this_msg->m_nentries = 0;
             }
         }
@@ -962,9 +955,11 @@ pgstat_report_stat(bool disconnect)
      */
     if (regular_msg.m_nentries > 0 ||
         pgStatXactCommit > 0 || pgStatXactRollback > 0)
-        pgstat_send_tabstat(®ular_msg);
+        pgstat_send_tabstat(®ular_msg, disconnect, last_report, now);
     if (shared_msg.m_nentries > 0)
-        pgstat_send_tabstat(&shared_msg);
+        pgstat_send_tabstat(&shared_msg, disconnect, last_report, now);
+
+    last_report = now;
 
     /* Now, send function statistics */
     pgstat_send_funcstats();
@@ -978,9 +973,15 @@ pgstat_report_stat(bool disconnect)
 
 /*
  * Subroutine for pgstat_report_stat: finish and send a tabstat message
+ *
+ *  The last three parameters are required only on normal backends.
+ *    The parameter "disconnect" will be true when the backend exits.
+ *    "last_report" is the last time we were called (0 if never).
+ *    "now" is the current time.
  */
 static void
-pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg)
+pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, bool disconnect,
+                    TimestampTz last_report, TimestampTz now)
 {
     int            n;
     int            len;
@@ -989,6 +990,42 @@ pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg)
     if (pgStatSock == PGINVALID_SOCKET)
         return;
 
+    /*
+     * for backends, send connection statistics
+     *
+     * This should run no frequently than once per a transaction.
+     */
+    if (OidIsValid(tsmsg->m_databaseid) && MyBackendType == B_BACKEND &&
+        (pgStatXactCommit > 0 || pgStatXactRollback > 0))
+    {
+        long        secs;
+        int            usecs;
+
+        /* session time since the last report */
+        TimestampDifference(
+            ((last_report == 0) ? MyStartTimestamp : last_report), now,
+            &secs, &usecs);
+        tsmsg->m_session_time = secs * 1000000 + usecs;
+        tsmsg->m_disconnect =
+            disconnect ? pgStatSessionEndCause : DISCONNECT_NOT_YET;
+        tsmsg->m_active_time = pgStatActiveTime;
+        tsmsg->m_idle_in_xact_time = pgStatTransactionIdleTime;
+            
+        pgStatActiveTime = 0;
+        pgStatTransactionIdleTime = 0;
+
+        /* report a new session only the first time */
+        tsmsg->m_count = (last_report == 0) ? 1 : 0;
+    }
+    else
+    {
+        tsmsg->m_session_time = 0;
+        tsmsg->m_disconnect = 0;
+        tsmsg->m_active_time = 0;
+        tsmsg->m_idle_in_xact_time = 0;
+        tsmsg->m_count = 0;
+    }
+
     /*
      * Report and reset accumulated xact commit/rollback and I/O timings
      * whenever we send a normal tabstat message
@@ -1378,48 +1415,6 @@ pgstat_drop_relation(Oid relid)
 #endif                            /* NOT_USED */
 
 
-/* ----------
- * pgstat_send_connstats() -
- *
- *    Tell the collector about session statistics.
- *    The parameter "disconnect" will be true when the backend exits.
- *    "last_report" is the last time we were called (0 if never).
- * ----------
- */
-static void
-pgstat_send_connstats(bool disconnect, TimestampTz last_report)
-{
-    PgStat_MsgConn msg;
-    long        secs;
-    int            usecs;
-
-    if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
-        return;
-
-    pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CONNECTION);
-    msg.m_databaseid = MyDatabaseId;
-
-    /* session time since the last report */
-    TimestampDifference(((last_report == 0) ? MyStartTimestamp : last_report),
-                        GetCurrentTimestamp(),
-                        &secs, &usecs);
-    msg.m_session_time = secs * 1000000 + usecs;
-
-    msg.m_disconnect = disconnect ? pgStatSessionEndCause : DISCONNECT_NOT_YET;
-
-    msg.m_active_time = pgStatActiveTime;
-    pgStatActiveTime = 0;
-
-    msg.m_idle_in_xact_time = pgStatTransactionIdleTime;
-    pgStatTransactionIdleTime = 0;
-
-    /* report a new session only the first time */
-    msg.m_count = (last_report == 0) ? 1 : 0;
-
-    pgstat_send(&msg, sizeof(PgStat_MsgConn));
-}
-
-
 /* ----------
  * pgstat_reset_counters() -
  *
@@ -3459,10 +3454,6 @@ PgstatCollectorMain(int argc, char *argv[])
                     pgstat_recv_replslot(&msg.msg_replslot, len);
                     break;
 
-                case PGSTAT_MTYPE_CONNECTION:
-                    pgstat_recv_connstat(&msg.msg_conn, len);
-                    break;
-
                 default:
                     break;
             }
@@ -4898,6 +4889,27 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
     dbentry->n_block_read_time += msg->m_block_read_time;
     dbentry->n_block_write_time += msg->m_block_write_time;
 
+    dbentry->n_sessions += msg->m_count;
+    dbentry->total_session_time += msg->m_session_time;
+    dbentry->total_active_time += msg->m_active_time;
+    dbentry->total_idle_in_xact_time += msg->m_idle_in_xact_time;
+    switch (msg->m_disconnect)
+    {
+        case DISCONNECT_NOT_YET:
+        case DISCONNECT_NORMAL:
+            /* we don't collect these */
+            break;
+        case DISCONNECT_CLIENT_EOF:
+            dbentry->n_sessions_abandoned++;
+            break;
+        case DISCONNECT_FATAL:
+            dbentry->n_sessions_fatal++;
+            break;
+        case DISCONNECT_KILLED:
+            dbentry->n_sessions_killed++;
+            break;
+    }
+
     /*
      * Process all table entries in the message.
      */
@@ -5557,41 +5569,6 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
     }
 }
 
-/* ----------
- * pgstat_recv_connstat() -
- *
- *  Process connection information.
- * ----------
- */
-static void
-pgstat_recv_connstat(PgStat_MsgConn *msg, int len)
-{
-    PgStat_StatDBEntry *dbentry;
-
-    dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
-    dbentry->n_sessions += msg->m_count;
-    dbentry->total_session_time += msg->m_session_time;
-    dbentry->total_active_time += msg->m_active_time;
-    dbentry->total_idle_in_xact_time += msg->m_idle_in_xact_time;
-    switch (msg->m_disconnect)
-    {
-        case DISCONNECT_NOT_YET:
-        case DISCONNECT_NORMAL:
-            /* we don't collect these */
-            break;
-        case DISCONNECT_CLIENT_EOF:
-            dbentry->n_sessions_abandoned++;
-            break;
-        case DISCONNECT_FATAL:
-            dbentry->n_sessions_fatal++;
-            break;
-        case DISCONNECT_KILLED:
-            dbentry->n_sessions_killed++;
-            break;
-    }
-}
-
 /* ----------
  * pgstat_recv_tempfile() -
  *
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 509849c7ff..5364943b55 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -81,7 +81,6 @@ typedef enum StatMsgType
     PGSTAT_MTYPE_DEADLOCK,
     PGSTAT_MTYPE_CHECKSUMFAILURE,
     PGSTAT_MTYPE_REPLSLOT,
-    PGSTAT_MTYPE_CONNECTION,
 } StatMsgType;
 
 /* ----------
@@ -289,6 +288,11 @@ typedef struct PgStat_MsgTabstat
     int            m_nentries;
     int            m_xact_commit;
     int            m_xact_rollback;
+    PgStat_Counter m_count;
+    PgStat_Counter m_session_time;
+    PgStat_Counter m_active_time;
+    PgStat_Counter m_idle_in_xact_time;
+    SessionEndType m_disconnect;
     PgStat_Counter m_block_read_time;    /* times in microseconds */
     PgStat_Counter m_block_write_time;
     PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
@@ -652,22 +656,6 @@ typedef struct PgStat_MsgChecksumFailure
     TimestampTz m_failure_time;
 } PgStat_MsgChecksumFailure;
 
-/* ----------
- * PgStat_MsgConn            Sent by the backend to update connection statistics.
- * ----------
- */
-typedef struct PgStat_MsgConn
-{
-    PgStat_MsgHdr m_hdr;
-    Oid            m_databaseid;
-    PgStat_Counter m_count;
-    PgStat_Counter m_session_time;
-    PgStat_Counter m_active_time;
-    PgStat_Counter m_idle_in_xact_time;
-    SessionEndType m_disconnect;
-} PgStat_MsgConn;
-
-
 /* ----------
  * PgStat_Msg                    Union over all possible messages.
  * ----------
@@ -700,7 +688,6 @@ typedef union PgStat_Msg
     PgStat_MsgTempFile msg_tempfile;
     PgStat_MsgChecksumFailure msg_checksumfailure;
     PgStat_MsgReplSlot msg_replslot;
-    PgStat_MsgConn msg_conn;
 } PgStat_Msg;
 


pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN
Next
From: "tanghy.fnst@fujitsu.com"
Date:
Subject: RE: Skipping logical replication transactions on subscriber side