Re: Flush some statistics within running transactions - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: Flush some statistics within running transactions
Date
Msg-id CAA5RZ0tGhu9jXGgFjzx7yK4cseE+t40Q4qZ+KhY66B2MZ=pBrQ@mail.gmail.com
Whole thread Raw
In response to Re: Flush some statistics within running transactions  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Flush some statistics within running transactions
List pgsql-hackers
> Also the attached is now split in 4 sub-patches with 0002 introducing a new
> GUC to control the flush interval (default is 10s). Note that 0001 to 0003 could
> be merged as one patch but I did it that way to ease the review.

Thanks for the patches!

I do like the GUC introduced in 0002 to control the ANYTIME stats
flush interval.

But, as I was looking at this a bit more today and testing it, I am
not quite sure
I like what is happening here:

+void
+pgstat_report_anytime_stat(bool force)
+{
+       bool            nowait = !force;
+
+       pgstat_assert_is_up();
+
+       /*
+        * Exit if no pending stats at all. This avoids unnecessary work when
+        * backends are idle or in sessions without stats accumulation.
+        *
+        * Note: This check isn't precise as there might be only transactional
+        * stats pending, which we'll skip during the flush. However,
maintaining
+        * precise tracking would add complexity that does not seem
worth it from
+        * a performance point of view (no noticeable performance regression has
+        * been observed with the current implementation).
+        */
+       if (dlist_is_empty(&pgStatPending) && !pgstat_report_fixed)
+               return;
+
+       /* Flush stats outside of transaction boundary */
+       pgstat_flush_pending_entries(nowait, true);
+       pgstat_flush_fixed_stats(nowait, true);
+}

There is a check if pgStatPending is empty so we can return early.
However, with FLUSH_MIXED, the scenario is we will more than likely always
have TXN_BOUNDARY stats that can only be flushed at the end of a transaction.
This means that most of the time we will call pgstat_flush_pending_entries
and then leave it up to the anytime_cb to check for pending ANYTIME stats to
flush (before taking the lock ).

+bool
+pgstat_relation_flush_anytime_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+       Oid                     dboid;
+       PgStat_TableStatus *lstats; /* pending stats entry */
+       PgStatShared_Relation *shtabstats;
+       PgStat_StatTabEntry *tabentry;  /* table entry of shared stats */
+       PgStat_StatDBEntry *dbentry;    /* pending database entry */
+
+       dboid = entry_ref->shared_entry->key.dboid;
+       lstats = (PgStat_TableStatus *) entry_ref->pending;
+       shtabstats = (PgStatShared_Relation *) entry_ref->shared_stats;
+
+       /*
+        * Check if there are any non-transactional stats to flush. Avoid
+        * unnecessarily locking the entry if nothing accumulated.
+        */
+       if (!(lstats->counts.numscans > 0 ||
+                 lstats->counts.tuples_returned > 0 ||
+                 lstats->counts.tuples_fetched > 0 ||
+                 lstats->counts.blocks_fetched > 0 ||
+                 lstats->counts.blocks_hit > 0))
+               return true;

This makes things confusing because instead of just relying on
dlist_is_empty(&pgStatPending) to check if we need to flush anything,
the responsibility now moves to the callback, which now also has to
account for all the ANYTIME fields.

The way I can think about making this better is to somehow track if
we have ANYTIME data that needs to be flushed a different way
( maybe a second pgStatPending list for anytime stats ?? )

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Document NULL
Next
From: John Naylor
Date:
Subject: Re: More speedups for tuple deformation