Home > mailing lists

Re: shared memory stats: high level design decisions: consistency, dropping - Mailing list pgsql-hackers

From	Magnus Hagander
Subject	Re: shared memory stats: high level design decisions: consistency, dropping
Date	March 24, 2021 16:26:12
Msg-id	CABUevEwkXVxW65f-rWUg56zrH_kAgrfT1byag-=3wJsX-VECqQ@mail.gmail.com Whole thread Raw
In response to	Re: shared memory stats: high level design decisions: consistency, dropping (Greg Stark <stark@mit.edu>)
List	pgsql-hackers

Tree view

On Tue, Mar 23, 2021 at 4:21 AM Greg Stark <stark@mit.edu> wrote:
>
> On Sun, 21 Mar 2021 at 18:16, Stephen Frost <sfrost@snowman.net> wrote:
> >
> > Greetings,
> >
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > > I also believe that the snapshotting behavior has advantages in terms
> > > of being able to perform multiple successive queries and get consistent
> > > results from them.  Only the most trivial sorts of analysis don't need
> > > that.
> > >
> > > In short, what you are proposing sounds absolutely disastrous for
> > > usability of the stats views, and I for one will not sign off on it
> > > being acceptable.
> > >
> > > I do think we could relax the consistency guarantees a little bit,
> > > perhaps along the lines of only caching view rows that have already
> > > been read, rather than grabbing everything up front.  But we can't
> > > just toss the snapshot concept out the window.  It'd be like deciding
> > > that nobody needs MVCC, or even any sort of repeatable read.
> >
> > This isn't the same use-case as traditional tables or relational
> > concepts in general- there aren't any foreign keys for the fields that
> > would actually be changing across these accesses to the shared memory
> > stats- we're talking about gross stats numbers like the number of
> > inserts into a table, not an employee_id column.  In short, I don't
> > agree that this is a fair comparison.
>
> I use these stats quite a bit and do lots of slicing and dicing with
> them. I don't think it's as bad as Tom says but I also don't think we
> can be quite as loosy-goosy as I think Andres or Stephen might be
> proposing either (though I note that haven't said they don't want any
> consistency at all).
>
> The cases where the consistency really matter for me is when I'm doing
> math involving more than one statistic.
>
> Typically that's ratios. E.g. with pg_stat_*_tables I routinely divide
> seq_tup_read by seq_scan or idx_tup_* by idx_scans. I also often look
> at the ratio between n_tup_upd and n_tup_hot_upd.
>
> And no, it doesn't help that these are often large numbers after a
> long time because I'm actually working with the first derivative of
> these numbers using snapshots or a time series database. So if you
> have the seq_tup_read incremented but not seq_scan incremented you
> could get a wildly incorrect calculation of "tup read per seq scan"
> which actually matters.
>
> I don't think I've ever done math across stats for different objects.
> I mean, I've plotted them together and looked at which was higher but
> I don't think that's affected by some plots having peaks slightly out
> of sync with the other. I suppose you could look at the ratio of
> access patterns between two tables and know that they're only ever
> accessed by a single code path at the same time and therefore the
> ratios would be meaningful. But I don't think users would be surprised
> to find they're not consistent that way either.

Yeah, it's important to differentiate if things can be inconsistent
within a single object, or just between objects. And I agree that in a
lot of cases, just having per-object consistent data is probably
enough.

Normally when you graph things for example, your peaks will look
across >1 sample point anyway, and in that case it doesn't much matter
does it?

But if we said we try to offer per-object consistency only, then for
example the idx_scans value in the tables view may see changes to some
but not all indexes on that table. Would that be acceptable?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

pgsql-hackers by date:

From: Andrew Dunstan
Date: 24 March 2021, 16:23:05
Subject: Re: multi-install PostgresNode

From: Dean Rasheed
Date: 24 March 2021, 16:36:15
Subject: Re: PoC/WIP: Extended statistics on expressions

Re: shared memory stats: high level design decisions: consistency, dropping - Mailing list pgsql-hackers

Previous

Next