Re: shared-memory based stats collector - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: shared-memory based stats collector
Date
Msg-id 20210323.112052.1952896078786703992.horikyota.ntt@gmail.com
Whole thread Raw
In response to shared-memory based stats collector  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
At Fri, 19 Mar 2021 14:27:38 -0700, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2021-03-10 20:26:56 -0800, Andres Freund wrote:
> > > +         * We expose this shared entry now.  You might think that the entry
> > > +         * can be removed by a concurrent backend, but since we are creating
> > > +         * an stats entry, the object actually exists and used in the upper
> > > +         * layer. Such an object cannot be dropped until the first vacuum
> > > +         * after the current transaction ends.
> > > +         */
> > > +        dshash_release_lock(pgStatSharedHash, shhashent);
> >
> > I don't think you can safely release the lock before you incremented the
> > refcount?  What if, once the lock is released, somebody looks up that
> > entry, increments the refcount, and decrements it again? It'll see a
> > refcount of 0 at the end and decide to free the memory. Then the code
> > below will access already freed / reused memory, no?
> 
> Yep, it's not even particularly hard to hit:
>
> S0: CREATE TABLE a_table();
> S0: INSERT INTO a_table();
> S0: disconnect
> S1: set a breakpoint to just after the dshash_release_lock(), with an
>  if objid == a_table_oid
> S1: SELECT pg_stat_get_live_tuples('a_table'::regclass);
>   (will break at above breakpoint, without having incremented the
>   refcount yet)
> S2: DROP TABLE a_table;
> S2: VACUUM pg_class;
> 
> At that point S2's call to pgstat_vacuum_stat() will find the shared
> stats entry for a_table, delete the entry from the shared hash table,
> see that the stats data has a zero refcount, and free it. Once S1 wakes
> up it'll use already freed (and potentially since reused) memory.

Sorry for the delay.  You're right. I actually see permanent-block when
continue running S1 after the vacuum.  That happens at LWLockRelease
on freed block.

Moving the refcount bumping before the dshash_release_lock call fixes
that.  One issue doing that *was* get_stat_entry() had the path for
the case pgStatCacheContext is not available, which is already
dead. After the early lock releasing is removed, the comment is no
longer needed, too.

While working on this, I noticed that the previous diff
v56-57-func-diff.txt was slightly stale (missing a later bug fix).  So
attached contains a fix on the amendment path.

Please find the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f546afe6a..3e2b90e92b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -884,6 +884,8 @@ get_stat_entry(PgStatTypes type, Oid dbid, Oid objid, bool nowait, bool create,
                                      create, nowait, create, &shfound);
     if (shhashent)
     {
+        bool        lofound;
+
         if (create && !shfound)
         {
             /* Create new stats entry. */
@@ -900,38 +902,27 @@ get_stat_entry(PgStatTypes type, Oid dbid, Oid objid, bool nowait, bool create,
         else
             shheader = dsa_get_address(area, shhashent->body);
 
-        /*
-         * We expose this shared entry now.  You might think that the entry
-         * can be removed by a concurrent backend, but since we are creating
-         * an stats entry, the object actually exists and used in the upper
-         * layer. Such an object cannot be dropped until the first vacuum
-         * after the current transaction ends.
-         */
+        pg_atomic_add_fetch_u32(&shheader->refcount, 1);
         dshash_release_lock(pgStatSharedHash, shhashent);
 
-        /* register to local hash if possible */
-        if (pgStatEntHash || pgStatCacheContext)
+        /* Create local hash if not yet */
+        if (pgStatEntHash == NULL)
         {
-            bool        lofound;
+            Assert(pgStatCacheContext);
 
-            if (pgStatEntHash == NULL)
-            {
-                pgStatEntHash =
-                    pgstat_localhash_create(pgStatCacheContext,
-                                            PGSTAT_TABLE_HASH_SIZE, NULL);
-                pgStatEntHashAge =
-                    pg_atomic_read_u64(&StatsShmem->gc_count);
-            }
-
-            lohashent =
-                pgstat_localhash_insert(pgStatEntHash, key, &lofound);
-
-            Assert(!lofound);
-            lohashent->body = shheader;
-            lohashent->dsapointer = shhashent->body;
-
-            pg_atomic_add_fetch_u32(&shheader->refcount, 1);
+            pgStatEntHash =
+                pgstat_localhash_create(pgStatCacheContext,
+                                        PGSTAT_TABLE_HASH_SIZE, NULL);
+            pgStatEntHashAge =
+                pg_atomic_read_u64(&StatsShmem->gc_count);
         }
+
+        lohashent =
+            pgstat_localhash_insert(pgStatEntHash, key, &lofound);
+
+        Assert(!lofound);
+        lohashent->body = shheader;
+        lohashent->dsapointer = shhashent->body;
     }
 
     if (found)
@@ -1260,10 +1251,12 @@ pgstat_report_stat(bool force)
         pgstat_localhash_start_iterate(pgStatLocalHash, &i);
         while ((lent = pgstat_localhash_iterate(pgStatLocalHash, &i)) != NULL)
         {
-            /* no other types of entry must be found here */
-            Assert(lent->key.type == PGSTAT_TYPE_DB);
-
-            if (flush_dbstat(lent, nowait))
+            /*
+             * The loop above may have left some non-db entries while system is
+             * busy.  Process only database stats entries here.
+             */
+            if (lent->key.type == PGSTAT_TYPE_DB &&
+                flush_dbstat(lent, nowait))
             {
                 remains--;
                 /* Remove the successfully flushed entry */

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Key management with tests
Next
From: Fujii Masao
Date:
Subject: Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.