Re: shared-memory based stats collector - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: shared-memory based stats collector |
Date | |
Msg-id | 20181108.204222.203165659.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: shared-memory based stats collector (Antonin Houska <ah@cybertec.at>) |
List | pgsql-hackers |
Thank you for the comments. This message contains the whole refactored patch set. At Mon, 29 Oct 2018 15:10:10 +0100, Antonin Houska <ah@cybertec.at> wrote in <28855.1540822210@localhost> > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > This is more saner version of previous v5-0008, which didn't pass > > regression test. v6-0008 to v6-0010 are attached and they are > > applied on top of v5-0001-0007. > > > > - stats collector has been removed. > > > > - modified dshash further so that deletion is allowed during > > sequential scan. > > > > - I'm not sure about the following existing comment at the > > beginning of pgstat.c > > > > * - Add a pgstat config column to pg_database, so this > > * entire thing can be enabled/disabled on a per db basis. > > Following is the next handful of my comments: > > * If you remove the stats collector, I think the remaining code in pgstat.c > does no longer fit into the backend/postmaster/ directory. I didn't condier that, but I still don't find nice place among existing directories. backend/statistics may be that but I feel somewhat uneasy.. Finally I split pgstat.c into two files (suggested in the file comment) and put both of them into a new directory backend/statmon. One part is backend status faclity, named bestatus (BackEnd Status) and one is pgstat, access statistics part. The last 0008 patch does that. I tried to move it ealier but it was a bit tough. > * I'm not sure it's o.k. to call pgstat_write_statsfiles() from > postmaster.c:reaper(): the function can raise ERROR (I see at least one code > path: pgstat_write_statsfile() -> get_dbstat_filename()) and, as reaper() is > a signal handler, it's hard to imagine the consequences. Maybe a reason to > leave some functionality in a separate worker, although the "stats > collector" would have to be changed. I was careless about that. longjmp() in signal handler is inhibited so we mustn't emit ERROR there. In the first place they are placed in wrong places from several perspective. I changed the way load and store. In the attached patch (0003), load is performed on the way initializing shared memory on postmaster and storing is done in shutdown hook on postmaster. Since the shared memory area is inherited to all children so no process actually does initial attaching any longer. Addition to that archiver process became an auxiliary process (0004) since it writes to the shared stats. > * Question still remains whether all the statistics should be loaded into > shared memory, see the note on paging near the bottom of [1]. Even counting possible page-in latency on stats writing, I agree to what Robert said in the message that we will win by average regarding to users who don't create so many databases. If some stats page to write were paged-out, also related heap page would have been evicted out from shared buffers (or the buffer page itself may have been paged-out) and every resources that can be stashed out may be stashed out. So I don't think it becomes a serious problem. On reading stats, it is currently reading a file and sometimes waiting for maiking a up-to-date file. I think we are needless to say about the case. For cluster with many-many databases, a backend running on a database will mainly see only stats for the current database (and about shared tables) we can split stats by that criteria in the next step. > * if dshash_seq_init() is passed consistent=false, shouldn't we call > ensure_valid_bucket_pointers() also from dshash_seq_next()? If the scan > needs to access the next partition and the old partition lock got released, > the table can be resized before the next partition lock is acquired, and > thus the backend-local copy of buckets becomes obsolete. Oops. You're right. Addition to that, resizing can happen while dshash_seq_next moves the lock to the next partition. Resizing happens on the way breaks sequential scan semantics. I added ensure_valid_bucket_pointers() after the initial acquisition of partition lock and move the lock seamlessly during scan. (0001) > * Neither snapshot_statentry_all() nor backend_snapshot_all_db_entries() seems > to be used in the current patch version. Thanks. This is not used since we concluded that we no longer need strict consistency in stats numbers. Removed. (0003) > * pgstat_initstats(): I think WaitLatch() should be used instead of sleep(). Bgwriter and checkpointer waited for postmaster's loading of stats files. But I changed the startup sequence (as mentioned above), so the wait became useless. Most of them are reaplced with Assert. (0003) > * pgstat_get_db_entry(): "return false" should probably be "return NULL". I don't find that. (Isn't it caught by compiler?) Maybe it is "found = false"? (it might be a bit tricky) > * Is the PGSTAT_TABLE_WRITE flag actually used? Unlike PGSTAT_TABLE_CREATE, I > couldn't find a place where it's value is tested. Thank you for fiding that. As pointed, PGSTAT_TABLE_WRITE is finally not used since WRITE is always accompanied by CREATE in the patch. I think WRITE is more readable than CREATE there so I removed CREATE. I renamed all PGSTAT_TABLE_ symbols as the follows while fixing this. PGSTAST_TABLE_READ -> PGSTAT_FETCH_SHARED PGSTAST_TABLE_WRITE -> PGSTAT_FETCH_EXCLUSIVE PGSTAST_TABLE_NOWAIT -> PGSTAT_FETCH_NOWAIT PGSTAST_TABLE_NOT_FOUND -> PGSTAT_ENTRY_NOT_FOUND PGSTAST_TABLE_FOUND -> PGSTAT_ENTRY_FOUND PGSTAST_TABLE_LOCK_FAILED -> PGSTAT_LOCK_FAILED > * dshash_seq_init(): does it need to be called with consistent=true from > pgstat_vacuum_stat() when the the entries returned by the scan are just > dropped? > > dshash_seq_init(&dshstat, db_stats, true, true); > > I suspect this is a thinko because another call from the same function looks > like > > dshash_seq_init(&dshstat, dshtable, false, true); It's a left-behind, which was just migrated from the previous (in v4) snapshot-based code. Snapshots had such consistency. But it no longer looks useful. (0003) As the result consistent=false in all caller site so I can remove the parameter but I leave it alone for a while. > * I'm not sure about usefulness of dshash_get_num_entries(). It passes > consistent=false to dshash_seq_init(), so the number of entries can change > during the execution. And even if the function requested a "consistent > scan", entries can be added / removed as soon as the scan is over. > > If the returned value is used to decide whether the hashtable should be > scanned or not, I think you don't break anything if you simply start the > scan unconditionally and see if you find some entries. > And if you really need to count the entries, I suggest that you use the > per-partition counts (dshash_partition.count) instead of scanning individual > entries. It is mainly to avoid useless call to pgstat_collect_oid(). The shortcut is useful because funcstat is not taken ususally. Instead, I removed the function and create function stats dshash on-demand. Then changed the condition "dshash_get_num_entries() > 0" to "dbentry->functions != DSM_HANDLE_INVALID". (0003) > [1] https://www.postgresql.org/message-id/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com New version of this patch is attched to reply to Tomas's message. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: