Re: [HACKERS] More stats about skipped vacuums - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: [HACKERS] More stats about skipped vacuums
Date
Msg-id CABUevEyW5hhf_cacw3uRY+a4+2J2tk2mTvfdcCikFj6mf5W-cg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] More stats about skipped vacuums  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] More stats about skipped vacuums  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Nov 27, 2017 at 7:53 PM, Robert Haas wrote: > On Sun, Nov 26, 2017 at 3:19 PM, Tom Lane wrote: > > Robert Haas writes: > >> On Sat, Nov 25, 2017 at 12:09 PM, Tom Lane wrote: > >>> Mumble. It's a property I'm pretty hesitant to give up, especially > >>> since the stats views have worked like that since day one. It's > >>> inevitable that weakening that guarantee would break peoples' queries, > >>> probably subtly. > > > >> You mean, queries against the stats views, or queries in general? If > >> the latter, by what mechanism would the breakage happen? > > > > Queries against the stats views, of course. > > Hmm. Those are probably rare. If we only took a snapshot of the > statistics for the backends that explicitly access those views, that > probably wouldn't be too crazy. > > Sorry if this is a stupid question, but how often and for what purpose > to regular backends need the stats collector data for purposes other > than querying the stats views? I thought that the data was only used > to decide whether to VACUUM/ANALYZE, and therefore would be accessed > mostly by autovacuum, and for that you'd actually want the most > up-to-date view of the stats for a particular table that is available, > not any older snapshot. > > Autovacuum resets the stats to make sure. Autovacuum in particular can probably be made a lot more efficient, because it only ever looks at one relation at a time, I think. What I've been thinking about for that one before is if we could just invent a protocol (shmq based maybe) whereby autovacuum can ask the stats collector for a single table or index stat. If autovacuum never needs to see a consistent view between multiple tables, I would think that's going to be a win in a lot of cases. I don't think regular backends use them at all. But anybody looking at the stats do, and it is pretty important there. However, when it comes to the stats system, I'd say that on any busy system (which would be the ones to care about), the stats structures are still going to be *written* a lot more than they are read. We certainly don't read them at the rate of once per transaction. A lot of the reads are also limited to one database of course. I wonder if we want to implement some sort of copy-on-read-snapshot in the stats collector itself. So instead of unconditionally publishing everything, have the backends ask for it. When a backend asks for it it gets a "snapshot counter" or something from the stats collector, and on the next write after that we do a copy-write if the snapshot it still available. (no, i have not thought in detail) Or -- if we keep a per-database hashtable in dynamic shared memory (which we can now). Can we copy it into local memory in the backend fast enough that we can hold a lock and just queue up the stats updates during the copy? If we can copy the complete structure, that would fix one of the bigger bottlenecks with it today which is that we dump and rebuild the hashtables as we go through the tempfiles. -- Magnus HaganderMe: https://www.hagander.net/ Work: https://www.redpill-linpro.com/

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: default range partition and constraint exclusion
Next
From: Hadi Moshayedi
Date:
Subject: Re: [PATCH] Fix crash in int8_avg_combine().