Re: shared-memory based stats collector - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: shared-memory based stats collector
Date
Msg-id 11936.1537430127@linux-at0a
Whole thread Raw
In response to Re: shared-memory based stats collector  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: shared-memory based stats collector
List pgsql-hackers
I've spent some time reviewing this version.

Design
------

1. Even with your patch the stats collector still uses an UDP socket to
   receive data. Now that the shared memory API is there, shouldn't the
   messages be sent via shared memory queue? [1] That would increase the
   reliability of message delivery.

   I can actually imagine backends inserting data into the shared hash tables
   themselves, but that might make them wait if the same entries are accessed
   by another backend. It should be much cheaper just to insert message into
   the queue and let the collector process it. In future version the collector
   can launch parallel workers so that writes by backends do not get blocked
   due to full queue.

2. I think the access to the shared hash tables introduces more contention
   than necessary. For example, pgstat_recv_tabstat() retrieves "dbentry" and
   leaves the containing hash table partition locked *exclusively* even if it
   changes only the containing table entries, while changes of the containing
   dbentry are done.

   It appears that the shared hash tables are only modified by the stats
   collector. The unnecessary use of the exclusive lock might be a bigger
   issue in the future if the stats collector will use parallel
   workers. Monitoring functions and autovacuum are affected by the locking
   now.

   (I see that the it's not trivial to get just-created entry locked in shared
   mode: it may need a loop in which we release the exclusive lock and acquire
   the shared lock unless the entry was already removed.)

3. Data in both shared_archiverStats and shared_globalStats is mostly accessed
   w/o locking. Is that ok? I'd expect the StatsLock to be used for these.

Coding
------

* git apply v4-0003-dshash-based-stats-collector.patch needed manual
  resolution of one conflict.

* pgstat_quickdie_handler() appears to be the only "quickdie handler" that
  calls on_exit_reset(), although the comments are almost copy & pasted from
  such a handler of other processes. Can you please explain what's specific
  about pgstat.c?

* the variable name "area" would be sufficient if it was local to some
  function, otherwise I think the name is too generic.

* likewise db_stats is too generic for a global variable. How about
  "snapshot_db_stats_local"?

* backend_get_db_entry() passes 0 for handle to snapshot_statentry(). How
  about DSM_HANDLE_INVALID ?

* I only see one call of snapshot_statentry_all() and it receives 0 for
  handle. Thus the argument can be removed and the function does not have to
  attach / detach to / from the shared hash table.

* backend_snapshot_global_stats() switches to TopMemoryContext before it calls
  pgstat_attach_shared_stats(), but the latter takes care of the context
  itself.

* pgstat_attach_shared_stats() - header comment should explain what the return
  value means.

* reset_dbentry_counters() does more than just resetting the counters. Name
  like initialize_dbentry() would be more descriptive.

* typos:

    ** backend_snapshot_global_stats(): "liftime" -> "lifetime"

    ** snapshot_statentry(): "entriy" -> "entry"

    ** backend_get_func_etnry(): "onshot" -> "oneshot"

    ** snapshot_statentry_all(): "Returns a local hash contains ..." -> "Returns a local hash containing ..."


[1] https://www.postgresql.org/message-id/20180711000605.sqjik3vqe5opqz33@alap3.anarazel.de

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


pgsql-hackers by date:

Previous
From: ROS Didier
Date:
Subject: impact of SPECTRE and MELTDOWN patches
Next
From: Christoph Berg
Date:
Subject: Re: [patch] Support LLVM 7