Re: shared-memory based stats collector - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: shared-memory based stats collector |
Date | |
Msg-id | 20210323.112052.1952896078786703992.horikyota.ntt@gmail.com Whole thread Raw |
In response to | shared-memory based stats collector (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
List | pgsql-hackers |
At Fri, 19 Mar 2021 14:27:38 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2021-03-10 20:26:56 -0800, Andres Freund wrote: > > > + * We expose this shared entry now. You might think that the entry > > > + * can be removed by a concurrent backend, but since we are creating > > > + * an stats entry, the object actually exists and used in the upper > > > + * layer. Such an object cannot be dropped until the first vacuum > > > + * after the current transaction ends. > > > + */ > > > + dshash_release_lock(pgStatSharedHash, shhashent); > > > > I don't think you can safely release the lock before you incremented the > > refcount? What if, once the lock is released, somebody looks up that > > entry, increments the refcount, and decrements it again? It'll see a > > refcount of 0 at the end and decide to free the memory. Then the code > > below will access already freed / reused memory, no? > > Yep, it's not even particularly hard to hit: > > S0: CREATE TABLE a_table(); > S0: INSERT INTO a_table(); > S0: disconnect > S1: set a breakpoint to just after the dshash_release_lock(), with an > if objid == a_table_oid > S1: SELECT pg_stat_get_live_tuples('a_table'::regclass); > (will break at above breakpoint, without having incremented the > refcount yet) > S2: DROP TABLE a_table; > S2: VACUUM pg_class; > > At that point S2's call to pgstat_vacuum_stat() will find the shared > stats entry for a_table, delete the entry from the shared hash table, > see that the stats data has a zero refcount, and free it. Once S1 wakes > up it'll use already freed (and potentially since reused) memory. Sorry for the delay. You're right. I actually see permanent-block when continue running S1 after the vacuum. That happens at LWLockRelease on freed block. Moving the refcount bumping before the dshash_release_lock call fixes that. One issue doing that *was* get_stat_entry() had the path for the case pgStatCacheContext is not available, which is already dead. After the early lock releasing is removed, the comment is no longer needed, too. While working on this, I noticed that the previous diff v56-57-func-diff.txt was slightly stale (missing a later bug fix). So attached contains a fix on the amendment path. Please find the attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 3f546afe6a..3e2b90e92b 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -884,6 +884,8 @@ get_stat_entry(PgStatTypes type, Oid dbid, Oid objid, bool nowait, bool create, create, nowait, create, &shfound); if (shhashent) { + bool lofound; + if (create && !shfound) { /* Create new stats entry. */ @@ -900,38 +902,27 @@ get_stat_entry(PgStatTypes type, Oid dbid, Oid objid, bool nowait, bool create, else shheader = dsa_get_address(area, shhashent->body); - /* - * We expose this shared entry now. You might think that the entry - * can be removed by a concurrent backend, but since we are creating - * an stats entry, the object actually exists and used in the upper - * layer. Such an object cannot be dropped until the first vacuum - * after the current transaction ends. - */ + pg_atomic_add_fetch_u32(&shheader->refcount, 1); dshash_release_lock(pgStatSharedHash, shhashent); - /* register to local hash if possible */ - if (pgStatEntHash || pgStatCacheContext) + /* Create local hash if not yet */ + if (pgStatEntHash == NULL) { - bool lofound; + Assert(pgStatCacheContext); - if (pgStatEntHash == NULL) - { - pgStatEntHash = - pgstat_localhash_create(pgStatCacheContext, - PGSTAT_TABLE_HASH_SIZE, NULL); - pgStatEntHashAge = - pg_atomic_read_u64(&StatsShmem->gc_count); - } - - lohashent = - pgstat_localhash_insert(pgStatEntHash, key, &lofound); - - Assert(!lofound); - lohashent->body = shheader; - lohashent->dsapointer = shhashent->body; - - pg_atomic_add_fetch_u32(&shheader->refcount, 1); + pgStatEntHash = + pgstat_localhash_create(pgStatCacheContext, + PGSTAT_TABLE_HASH_SIZE, NULL); + pgStatEntHashAge = + pg_atomic_read_u64(&StatsShmem->gc_count); } + + lohashent = + pgstat_localhash_insert(pgStatEntHash, key, &lofound); + + Assert(!lofound); + lohashent->body = shheader; + lohashent->dsapointer = shhashent->body; } if (found) @@ -1260,10 +1251,12 @@ pgstat_report_stat(bool force) pgstat_localhash_start_iterate(pgStatLocalHash, &i); while ((lent = pgstat_localhash_iterate(pgStatLocalHash, &i)) != NULL) { - /* no other types of entry must be found here */ - Assert(lent->key.type == PGSTAT_TYPE_DB); - - if (flush_dbstat(lent, nowait)) + /* + * The loop above may have left some non-db entries while system is + * busy. Process only database stats entries here. + */ + if (lent->key.type == PGSTAT_TYPE_DB && + flush_dbstat(lent, nowait)) { remains--; /* Remove the successfully flushed entry */
pgsql-hackers by date: