Re: shared memory stats: high level design decisions: consistency, dropping - Mailing list pgsql-hackers

From Robert Haas
Subject Re: shared memory stats: high level design decisions: consistency, dropping
Date
Msg-id CA+Tgmob4FG0yVyLXmxcgnEADG57Es9BSLyv_racax73Zrqex=Q@mail.gmail.com
Whole thread Raw
In response to Re: shared memory stats: high level design decisions: consistency, dropping  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Mar 21, 2021 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If I understand what you are proposing, all stats views would become
> completely volatile, without even within-query consistency.  That really
> is not gonna work.  As an example, you could get not-even-self-consistent
> results from a join to a stats view if the planner decides to implement
> it as a nestloop with the view on the inside.
>
> I also believe that the snapshotting behavior has advantages in terms
> of being able to perform multiple successive queries and get consistent
> results from them.  Only the most trivial sorts of analysis don't need
> that.
>
> In short, what you are proposing sounds absolutely disastrous for
> usability of the stats views, and I for one will not sign off on it
> being acceptable.
>
> I do think we could relax the consistency guarantees a little bit,
> perhaps along the lines of only caching view rows that have already
> been read, rather than grabbing everything up front.  But we can't
> just toss the snapshot concept out the window.  It'd be like deciding
> that nobody needs MVCC, or even any sort of repeatable read.

So, just as a data point, the output of pg_locks is not stable within
a transaction. In fact, the pg_locks output could technically include
the same exact lock more than once, if it's being moved from the
fast-path table to the main table just as you are reading all the
data. In theory, that could produce the same kinds of problems that
you're concerned about here, and I suspect sometimes it does. But I
haven't seen a lot of people yelling and screaming about it. The
situation isn't ideal, but it's not disastrous either.

I think it's really hard for us as developers to predict what kinds of
effects of these kinds of decisions will have in real-world
deployments. All of us have probably had the experience of making some
behavioral change that we thought would not be too big a deal and it
actually pissed off a bunch of users who were relying on the old
behavior. I know I have. Conversely, I've reluctantly made changes
that seemed rather dangerous to me and heard nary a peep. If somebody
takes the position that changing this behavior is scary because we
don't know how many users will be inconvenienced or how badly, I can
only agree. But saying that it's tantamount to deciding that nobody
needs MVCC is completely over the top. This is statistical data, not
user data, and there are good reasons to think that people don't have
the same expectations in both cases, starting with the fact that we
have some stuff that works like that already.

More than that, there's a huge problem with the way this works today
that can't be fixed without making some compromises. In the test case
Andres mentioned upthread, the stats snapshot burned through 170MB of
RAM. Now, you might dismiss that as not much memory in 2021, but if
you have a lot of backends accessing the stats, that value could be
multiplied by a two digit or even three digit number, and that is
*definitely* a lot of memory, even in 2021. But even if it's not
multiplied by anything, we're shipping with a default work_mem of just
4MB. So, the position we're implicitly taking today is: if you join
two 5MB tables, it's too risky to put the entire contents of one of
them into a single in-memory hash table, because we might run the
machine out of RAM. But if you have millions of objects in your
database and touch the statistics for one of those objects, once, it's
absolutely OK to slurp tens or even hundreds of megabytes of data into
backend-private memory to avoid the possibility that you might later
access another one of those counters and expect snapshot semantics.

To be honest, I don't find either of those positions very believable.
I do not think it likely that the typical user really wants a 5MB hash
join to be done in batches to save memory, and I think it equally
unlikely that everybody wants to read and cache tens or hundreds of
megabytes of data to get MVCC semantics for volatile statistics. I
think there are probably some cases where having that information be
stable across a transaction lifetime is really useful, so if we can
provide that as an optional behavior, I think that would be a pretty
good idea. But I don't think it's reasonable for that to be the only
behavior, and I'm doubtful about whether it should even be the
default. I bet there are a lot of cases where somebody just wants to
take a quick glance at some of the values as they exist right now, and
has no intention of running any more queries that might examine the
same data again later. Not only does caching all the data use a lot of
memory, but having to read all the data in order to cache it is
potentially a lot slower than just reading the data actually
requested. I'm unwilling to dismiss that as a negligible problem.

In short, I agree that there's stuff to worry about here, but I don't
agree that a zero-consistency model is a crazy idea, even though I
also think it would be nice if we can make stronger consistency
available upon request.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Proposal: Save user's original authenticated identity for logging
Next
From: Dave Cramer
Date:
Subject: Re: Error on failed COMMIT