Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections |
Date | |
Msg-id | CA+TgmoZXNEXyTnzsukdR5mNtipr0UrQJbUb4ATOfv-Dfit3bbw@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections
|
List | pgsql-hackers |
On Thu, Jul 7, 2022 at 10:39 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Thu, 7 Jul 2022 13:58:06 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in > > I agree that this is a bug, since it can (and did) cause false positives in a > > monitoring system. > > I'm not this is undoubtfully a bug but agree about the rest. I don't agree that this is a bug, and even if it were, I don't think this patch can fix it. Let's start with the second point first: pgstat_report_wait_start() and pgstat_report_wait_end() change the advertised wait event for a process, while the backend state is changed by pgstat_report_activity(). Since those function calls are in different places, those changes are bound to happen at different times, and therefore you can observe drift between the two values. Now perhaps there are some one-directional guarantees: I think we probably always set the state to idle before we start reading from the client, and always finish reading from the client before the state ceases to be idle. But I don't really see how that helps anything, because when you read those values, you must read one and then the other. If you read the activity before the wait event, you might see the state before it goes idle and then the wait event after it's reached ClientRead. If you read the wait event before the activity, you might see the wait event as ClientRead, and then by the time you check the activity the backend might have gotten some data from the client and no longer be idle. The very best a patch like this can hope to do is narrow the race condition enough that the discrepancies are observed less frequently in practice. And that's why I think this is not a bug fix, or even a good idea. It's just encouraging people to rely on something which can never be fully reliable in the way that the original poster is hoping. There was never any intention of having wait events synchronized with the pgstat_report_activity() stuff, and I think that's perfectly fine. Both systems are trying to provide visibility into states that can change very quickly, and therefore they need to be low-overhead, and therefore they use very lightweight synchronization, which means that ephemeral discrepancies are possible by nature. There are plenty of other examples of that as well. You can't for example query pg_locks and pg_stat_activity in the same query and expect that all and only those backends that are apparently waiting for a lock in pg_stat_activity will have an ungranted lock in pg_locks. It just doesn't work like that, and there's a very good reason for that: trying to make all of these introspection facilities behave in MVCC-like ways would be painful to code and probably end up slowing the system down substantially. I think the right fix here is to change nothing in the code, and stop expecting these things to be perfectly consistent with each other. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: