Thread: Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.
Robert Haas <rhaas@postgresql.org> writes: > With the old code, a backend that read pg_stat_activity without ever > having executed a parallel query might see a backend in the midst of > executing one waiting on a DSA LWLock, resulting in a crash. The > solution is for backends to register the tranche at startup time, not > the first time a parallel query is executed. While I have no objection to the patch as committed, I have to wonder if this isn't papering over the underlying problem rather than solving it. It seems like this direction means that there's no such thing as dynamic registration of LWLock tranches and we should just give up on that concept entirely. If we do want to preserve the concept, don't we need to fix the pg_stat_activity code so it doesn't fail on tranches that aren't known locally? regards, tom lane
Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.
From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <rhaas@postgresql.org> writes: >> With the old code, a backend that read pg_stat_activity without ever >> having executed a parallel query might see a backend in the midst of >> executing one waiting on a DSA LWLock, resulting in a crash. The >> solution is for backends to register the tranche at startup time, not >> the first time a parallel query is executed. > > While I have no objection to the patch as committed, I have to wonder > if this isn't papering over the underlying problem rather than solving it. > It seems like this direction means that there's no such thing as dynamic > registration of LWLock tranches and we should just give up on that concept > entirely. If we do want to preserve the concept, don't we need to fix the > pg_stat_activity code so it doesn't fail on tranches that aren't known > locally? It actually has such a safeguard already (see GetLWLockIdentifier). Not that you mention it, I think I mis-stated the problem in the commit message: the problem is not if the tranche is unregistered, but rather if it is registered but the pointer references an address that is no longer valid. Registering the tranche with a fixed string rather than a pointer into a DSM segment that can go away fixes that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Not that you mention it, I think I mis-stated the problem in the > commit message: the problem is not if the tranche is unregistered, but > rather if it is registered but the pointer references an address that > is no longer valid. Registering the tranche with a fixed string > rather than a pointer into a DSM segment that can go away fixes that. Got it. That's fine then, but perhaps the comment on LWLockRegisterTranche needs to be reconsidered. It's not good enough for the tranche name to be "backend lifetime", it has to be something valid in other backends as well. That probably means either (1) compile-time constant in the core backend (not loadable modules!) or (2) allocated in the main shared-memory segment. regards, tom lane
Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.
From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Not that you mention it, I think I mis-stated the problem in the >> commit message: the problem is not if the tranche is unregistered, but >> rather if it is registered but the pointer references an address that >> is no longer valid. Registering the tranche with a fixed string >> rather than a pointer into a DSM segment that can go away fixes that. > > Got it. That's fine then, but perhaps the comment on > LWLockRegisterTranche needs to be reconsidered. It's not good enough for > the tranche name to be "backend lifetime", it has to be something valid in > other backends as well. That probably means either (1) compile-time > constant in the core backend (not loadable modules!) or (2) allocated in > the main shared-memory segment. No, I think backend-lifetime is right. The tranche registrations are all backend-local state, so there's no problem with backend A registering a string at one address and backend B registering a string at a different address. It's just important that neither of those strings goes away before the corresponding backend does. To put that another way, registration is performed separately in every backend. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > No, I think backend-lifetime is right. The tranche registrations are > all backend-local state, so there's no problem with backend A > registering a string at one address and backend B registering a string > at a different address. It's just important that neither of those > strings goes away before the corresponding backend does. Then I don't understand what's going on. Isn't the crash you fixed because backend A was looking at the tranche containing the lock backend B was blocked on? How can a tranche name local to backend B work? regards, tom lane
Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.
From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 1:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> No, I think backend-lifetime is right. The tranche registrations are >> all backend-local state, so there's no problem with backend A >> registering a string at one address and backend B registering a string >> at a different address. It's just important that neither of those >> strings goes away before the corresponding backend does. > > Then I don't understand what's going on. Isn't the crash you fixed > because backend A was looking at the tranche containing the lock backend B > was blocked on? How can a tranche name local to backend B work? I think the chain of events must be as follows: 1. Backend A executed a parallel query at least once. During the course of doing so, it registered the parallel_query_dsa tranche, but after the execution of the parallel query completed, the tranche_name pointer was no longer valid, because it pointed into a DSM segment that got removed at the end of the query, rather than using backend-lifespan memory as advocated by the header comment of LWLockRegisterTranche. 2. Backend B began executing a parallel query and, somehow, ended up blocking on one of the DSA LWLocks. I'm not quite clear how this could've happened, because I didn't think we had any committed code that did DSA allocations yet, but maybe those locks are taken while attaching to the DSA or something like that. The details don't really matter: somehow, B managed to advertise LWTRANCHE_PARALLEL_QUERY_DSA in pg_stat_activity. 3. At exactly that moment, Backend A examined pg_stat_activity and saw that tranche ID and tried to look it up, following the dangling pointer left behind by step #1. Boom. If step #1 had not occurred, at the last step, backend A would have seen the tranche ID as unregistered and it would not have crashed. Instead, pg_stat_activity would have shown the wait event as "extension" rather than crashing. That's still wrong, of course, though not as bad as crashing. Basically, there are two problems here: - Every backend needs to register every built-in tranche ID at backend startup. If it doesn't, then it might read some other backend's wait event as "extension" instead of the correct value. The DSA code had the mistaken idea that tranche IDs only need to be registered before acquiring locks that use that tranche ID, which isn't true: we might need to interpret a tranche ID that some OTHER backend has advertised via pg_stat_activity. A corollary is that an extension that uses a tranche ID should register that tranche ID as early as possible (i.e. _PG_init()) so that a backend which has loaded but not used an extension can interpret that tranche ID if some other backend advertises it in pg_stat_activity. - Every backend needs to follow the directions and store the tranche name in memory that won't go away before the backend itself. If it doesn't, an attempt to interpret a tranche ID read from pg_stat_activity may follow a dangling pointer and crash, which is what must have been happening here. I suspect you're going to tell me this all needs to be better documented, which is probably a valid criticism. Suggestions as to where such documentation should be added - either as code comments or in a README somewhere or in doc/src/sgml - will be gratefully accepted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I suspect you're going to tell me this all needs to be better > documented, which is probably a valid criticism. Suggestions as to > where such documentation should be added - either as code comments or > in a README somewhere or in doc/src/sgml - will be gratefully > accepted. Better documentation seems required, but really the whole design seems rather wacko. Backends must agree on numeric tranche IDs, but every backend has its own copy of the tranche name? How do we even know what agreement is? And every one has to "register" every tranche ID for itself? Why in the world isn't registration done *once* and the tranche name stored in shared memory? regards, tom lane
Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.
From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I suspect you're going to tell me this all needs to be better >> documented, which is probably a valid criticism. Suggestions as to >> where such documentation should be added - either as code comments or >> in a README somewhere or in doc/src/sgml - will be gratefully >> accepted. > > Better documentation seems required, but really the whole design seems > rather wacko. Backends must agree on numeric tranche IDs, but every > backend has its own copy of the tranche name? How do we even know what > agreement is? And every one has to "register" every tranche ID for > itself? Why in the world isn't registration done *once* and the tranche > name stored in shared memory? Well, with the original design, that wasn't feasible, because each backend had to store not only the name (which was constant across all backends) but also the array_base (which might not be, if the locks were stored in DSM) and array_stride. Since commit 3761fe3c20bb040b15f0e8da58d824631da00caa, it would be much easier to do what you're proposing. It's still not without difficulties, though. There are 65,536 possible tranche IDs, and allocating an array of 64k pointers in shared memory would consume half a megabyte of shared memory the vast majority of which would normally be completely unused. So I'm not very enthused about that solution, but you aren't the first person to propose it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 5, 2017 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Better documentation seems required, but really the whole design seems >> rather wacko. Backends must agree on numeric tranche IDs, but every >> backend has its own copy of the tranche name? How do we even know what >> agreement is? And every one has to "register" every tranche ID for >> itself? Why in the world isn't registration done *once* and the tranche >> name stored in shared memory? > Well, with the original design, that wasn't feasible, because each > backend had to store not only the name (which was constant across all > backends) but also the array_base (which might not be, if the locks > were stored in DSM) and array_stride. Since commit > 3761fe3c20bb040b15f0e8da58d824631da00caa, it would be much easier to > do what you're proposing. It's still not without difficulties, > though. There are 65,536 possible tranche IDs, and allocating an > array of 64k pointers in shared memory would consume half a megabyte > of shared memory the vast majority of which would normally be > completely unused. So I'm not very enthused about that solution, but > you aren't the first person to propose it. So, um, how do we know that backend A and backend B have the same idea about what tranche id 37 means? regards, tom lane
Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.
From
Thomas Munro
Date:
On Fri, Jan 6, 2017 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So, um, how do we know that backend A and backend B have the same idea > about what tranche id 37 means? [butting in] In the particular case of dsa.c, the client has to supply a tranche ID when creating the DSA area, and then the ID is recorded in dsa_area_control (the area's header in either DSM or fixed shared memory) and used when other backends attach. How other backends get their hands on the DSA area handle or pointer to attach is the problem of the client code that's using DSA, but it doesn't have to worry about sharing the tranche ID. There are two cases to consider: 1. There are well known tranche IDs defined in lwlock.h, like the one used here. As far as I can see, any tranche ID defined that way should also be registered with a name in RegisterLWLockTranches in every backend, and the name should be a pointer to constant data so that it is always dereferenceable. That is the case for the tranche that's being discussed here as of this recent commit, and any backend will be able to see the tranche name for that one. 2. There are tranche IDs that are allocated at runtime. An extension that wants to create DSA areas or anything else that involves setting up new LWLocks would probably need to allocate one with LWLockNewTrancheId, and should ideally arrange to register a name in all backends. Sharing the tranche ID and name with other backends is the extension's problem (it probably needs to put it the ID in shared memory for other backends to find, like dsa.c does, and the name probably needs to be a string constant). If you look at pg_stat_activity at a moment when some backend is waiting for an LWLock with one of these dynamically allocated tranche IDs, and the extension hasn't arranged to register the name for that tranche ID in your backend, then pg_stat_activity will just show "extension" because it has no better information. To do better we'd probably need a whole registry and recycling mechanism that tracks both IDs and names in shared memory as several people have mentioned. But the motivation to do that has reduced significantly now that T_ID tracking has been dropped. Previously, you might have wanted to create and register a variable number of tranche IDs in order to support, say, a bank of LWLocks used by an executor node (because there might be more than one instance of that executor node in query plan, and they'd need distinct IDs in order for you to be able to register their different base + stride for T_ID purposes, if we still had that concept). The infrastructure was somewhat lacking for that. Now that tranche IDs are only used to find a name to show in pg_stat_activity, there is nothing stopping you from using the same tranche ID in several places at the same time, it just means that you can't tell which of them is involved when reading that view. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.
From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 5:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jan 5, 2017 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Better documentation seems required, but really the whole design seems >>> rather wacko. Backends must agree on numeric tranche IDs, but every >>> backend has its own copy of the tranche name? How do we even know what >>> agreement is? And every one has to "register" every tranche ID for >>> itself? Why in the world isn't registration done *once* and the tranche >>> name stored in shared memory? > >> Well, with the original design, that wasn't feasible, because each >> backend had to store not only the name (which was constant across all >> backends) but also the array_base (which might not be, if the locks >> were stored in DSM) and array_stride. Since commit >> 3761fe3c20bb040b15f0e8da58d824631da00caa, it would be much easier to >> do what you're proposing. It's still not without difficulties, >> though. There are 65,536 possible tranche IDs, and allocating an >> array of 64k pointers in shared memory would consume half a megabyte >> of shared memory the vast majority of which would normally be >> completely unused. So I'm not very enthused about that solution, but >> you aren't the first person to propose it. > > So, um, how do we know that backend A and backend B have the same idea > about what tranche id 37 means? Well, if they just call C exposed functions at random with arguments picked out of a hat, then we don't. But if they use the APIs in the manner documented in lwlock.h, then we do: /** There is another, more flexible method of obtaining lwlocks. First, call* LWLockNewTrancheId just once to obtain a trancheID; this allocates from* a shared counter. Next, each individual process using the tranche should* call LWLockRegisterTranche()to associate that tranche ID with a name.* Finally, LWLockInitialize should be called just once perlwlock, passing* the tranche ID as an argument.** It may seem strange that each process using the tranche must registerit* separately, but dynamic shared memory segments aren't guaranteed to be* mapped at the same address in all coordinatingbackends, so storing the* registration in the main shared memory segment wouldn't work for that case.*/ -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company