Thread: Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

Robert Haas <rhaas@postgresql.org> writes:
> With the old code, a backend that read pg_stat_activity without ever
> having executed a parallel query might see a backend in the midst of
> executing one waiting on a DSA LWLock, resulting in a crash.  The
> solution is for backends to register the tranche at startup time, not
> the first time a parallel query is executed.

While I have no objection to the patch as committed, I have to wonder
if this isn't papering over the underlying problem rather than solving it.
It seems like this direction means that there's no such thing as dynamic
registration of LWLock tranches and we should just give up on that concept
entirely.  If we do want to preserve the concept, don't we need to fix the
pg_stat_activity code so it doesn't fail on tranches that aren't known
locally?
        regards, tom lane



Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <rhaas@postgresql.org> writes:
>> With the old code, a backend that read pg_stat_activity without ever
>> having executed a parallel query might see a backend in the midst of
>> executing one waiting on a DSA LWLock, resulting in a crash.  The
>> solution is for backends to register the tranche at startup time, not
>> the first time a parallel query is executed.
>
> While I have no objection to the patch as committed, I have to wonder
> if this isn't papering over the underlying problem rather than solving it.
> It seems like this direction means that there's no such thing as dynamic
> registration of LWLock tranches and we should just give up on that concept
> entirely.  If we do want to preserve the concept, don't we need to fix the
> pg_stat_activity code so it doesn't fail on tranches that aren't known
> locally?

It actually has such a safeguard already (see GetLWLockIdentifier).
Not that you mention it, I think I mis-stated the problem in the
commit message: the problem is not if the tranche is unregistered, but
rather if it is registered but the pointer references an address that
is no longer valid.  Registering the tranche with a fixed string
rather than a pointer into a DSM segment that can go away fixes that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> Not that you mention it, I think I mis-stated the problem in the
> commit message: the problem is not if the tranche is unregistered, but
> rather if it is registered but the pointer references an address that
> is no longer valid.  Registering the tranche with a fixed string
> rather than a pointer into a DSM segment that can go away fixes that.

Got it.  That's fine then, but perhaps the comment on
LWLockRegisterTranche needs to be reconsidered.  It's not good enough for
the tranche name to be "backend lifetime", it has to be something valid in
other backends as well.  That probably means either (1) compile-time
constant in the core backend (not loadable modules!) or (2) allocated in
the main shared-memory segment.
        regards, tom lane



Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Not that you mention it, I think I mis-stated the problem in the
>> commit message: the problem is not if the tranche is unregistered, but
>> rather if it is registered but the pointer references an address that
>> is no longer valid.  Registering the tranche with a fixed string
>> rather than a pointer into a DSM segment that can go away fixes that.
>
> Got it.  That's fine then, but perhaps the comment on
> LWLockRegisterTranche needs to be reconsidered.  It's not good enough for
> the tranche name to be "backend lifetime", it has to be something valid in
> other backends as well.  That probably means either (1) compile-time
> constant in the core backend (not loadable modules!) or (2) allocated in
> the main shared-memory segment.

No, I think backend-lifetime is right.  The tranche registrations are
all backend-local state, so there's no problem with backend A
registering a string at one address and backend B registering a string
at a different address.  It's just important that neither of those
strings goes away before the corresponding backend does.

To put that another way, registration is performed separately in every backend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> No, I think backend-lifetime is right.  The tranche registrations are
> all backend-local state, so there's no problem with backend A
> registering a string at one address and backend B registering a string
> at a different address.  It's just important that neither of those
> strings goes away before the corresponding backend does.

Then I don't understand what's going on.  Isn't the crash you fixed
because backend A was looking at the tranche containing the lock backend B
was blocked on?  How can a tranche name local to backend B work?
        regards, tom lane



Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 1:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> No, I think backend-lifetime is right.  The tranche registrations are
>> all backend-local state, so there's no problem with backend A
>> registering a string at one address and backend B registering a string
>> at a different address.  It's just important that neither of those
>> strings goes away before the corresponding backend does.
>
> Then I don't understand what's going on.  Isn't the crash you fixed
> because backend A was looking at the tranche containing the lock backend B
> was blocked on?  How can a tranche name local to backend B work?

I think the chain of events must be as follows:

1. Backend A executed a parallel query at least once.  During the
course of doing so, it registered the parallel_query_dsa tranche, but
after the execution of the parallel query completed, the tranche_name
pointer was no longer valid, because it pointed into a DSM segment
that got removed at the end of the query, rather than using
backend-lifespan memory as advocated by the header comment of
LWLockRegisterTranche.

2. Backend B began executing a parallel query and, somehow, ended up
blocking on one of the DSA LWLocks.  I'm not quite clear how this
could've happened, because I didn't think we had any committed code
that did DSA allocations yet, but maybe those locks are taken while
attaching to the DSA or something like that.  The details don't really
matter: somehow, B managed to advertise LWTRANCHE_PARALLEL_QUERY_DSA
in pg_stat_activity.

3. At exactly that moment, Backend A examined pg_stat_activity and saw
that tranche ID and tried to look it up, following the dangling
pointer left behind by step #1.  Boom.

If step #1 had not occurred, at the last step, backend A would have
seen the tranche ID as unregistered and it would not have crashed.
Instead, pg_stat_activity would have shown the wait event as
"extension" rather than crashing.  That's still wrong, of course,
though not as bad as crashing.  Basically, there are two problems
here:

- Every backend needs to register every built-in tranche ID at backend
startup.  If it doesn't, then it might read some other backend's wait
event as "extension" instead of the correct value.  The DSA code had
the mistaken idea that tranche IDs only need to be registered before
acquiring locks that use that tranche ID, which isn't true: we might
need to interpret a tranche ID that some OTHER backend has advertised
via pg_stat_activity.  A corollary is that an extension that uses a
tranche ID should register that tranche ID as early as possible (i.e.
_PG_init()) so that a backend which has loaded but not used an
extension can interpret that tranche ID if some other backend
advertises it in pg_stat_activity.

- Every backend needs to follow the directions and store the tranche
name in memory that won't go away before the backend itself.  If it
doesn't, an attempt to interpret a tranche ID read from
pg_stat_activity may follow a dangling pointer and crash, which is
what must have been happening here.

I suspect you're going to tell me this all needs to be better
documented, which is probably a valid criticism.  Suggestions as to
where such documentation should be added - either as code comments or
in a README somewhere or in doc/src/sgml - will be gratefully
accepted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> I suspect you're going to tell me this all needs to be better
> documented, which is probably a valid criticism.  Suggestions as to
> where such documentation should be added - either as code comments or
> in a README somewhere or in doc/src/sgml - will be gratefully
> accepted.

Better documentation seems required, but really the whole design seems
rather wacko.  Backends must agree on numeric tranche IDs, but every
backend has its own copy of the tranche name?  How do we even know what
agreement is?  And every one has to "register" every tranche ID for
itself?  Why in the world isn't registration done *once* and the tranche
name stored in shared memory?
        regards, tom lane



Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I suspect you're going to tell me this all needs to be better
>> documented, which is probably a valid criticism.  Suggestions as to
>> where such documentation should be added - either as code comments or
>> in a README somewhere or in doc/src/sgml - will be gratefully
>> accepted.
>
> Better documentation seems required, but really the whole design seems
> rather wacko.  Backends must agree on numeric tranche IDs, but every
> backend has its own copy of the tranche name?  How do we even know what
> agreement is?  And every one has to "register" every tranche ID for
> itself?  Why in the world isn't registration done *once* and the tranche
> name stored in shared memory?

Well, with the original design, that wasn't feasible, because each
backend had to store not only the name (which was constant across all
backends) but also the array_base (which might not be, if the locks
were stored in DSM) and array_stride.  Since commit
3761fe3c20bb040b15f0e8da58d824631da00caa, it would be much easier to
do what you're proposing.  It's still not without difficulties,
though.  There are 65,536 possible tranche IDs, and allocating an
array of 64k pointers in shared memory would consume half a megabyte
of shared memory the vast majority of which would normally be
completely unused.  So I'm not very enthused about that solution, but
you aren't the first person to propose it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jan 5, 2017 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Better documentation seems required, but really the whole design seems
>> rather wacko.  Backends must agree on numeric tranche IDs, but every
>> backend has its own copy of the tranche name?  How do we even know what
>> agreement is?  And every one has to "register" every tranche ID for
>> itself?  Why in the world isn't registration done *once* and the tranche
>> name stored in shared memory?

> Well, with the original design, that wasn't feasible, because each
> backend had to store not only the name (which was constant across all
> backends) but also the array_base (which might not be, if the locks
> were stored in DSM) and array_stride.  Since commit
> 3761fe3c20bb040b15f0e8da58d824631da00caa, it would be much easier to
> do what you're proposing.  It's still not without difficulties,
> though.  There are 65,536 possible tranche IDs, and allocating an
> array of 64k pointers in shared memory would consume half a megabyte
> of shared memory the vast majority of which would normally be
> completely unused.  So I'm not very enthused about that solution, but
> you aren't the first person to propose it.

So, um, how do we know that backend A and backend B have the same idea
about what tranche id 37 means?
        regards, tom lane



Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

From
Thomas Munro
Date:
On Fri, Jan 6, 2017 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So, um, how do we know that backend A and backend B have the same idea
> about what tranche id 37 means?

[butting in]

In the particular case of dsa.c, the client has to supply a tranche ID
when creating the DSA area, and then the ID is recorded in
dsa_area_control (the area's header in either DSM or fixed shared
memory) and used when other backends attach.  How other backends get
their hands on the DSA area handle or pointer to attach is the problem
of the client code that's using DSA, but it doesn't have to worry
about sharing the tranche ID.

There are two cases to consider:

1.  There are well known tranche IDs defined in lwlock.h, like the one
used here.  As far as I can see, any tranche ID defined that way
should also be registered with a name in RegisterLWLockTranches in
every backend, and the name should be a pointer to constant data so
that it is always dereferenceable.  That is the case for the tranche
that's being discussed here as of this recent commit, and any backend
will be able to see the tranche name for that one.

2.  There are tranche IDs that are allocated at runtime.  An extension
that wants to create DSA areas or anything else that involves setting
up new LWLocks would probably need to allocate one with
LWLockNewTrancheId, and should ideally arrange to register a name in
all backends.  Sharing the tranche ID and name with other backends is
the extension's problem (it probably needs to put it the ID in shared
memory for other backends to find, like dsa.c does, and the name
probably needs to be a string constant).  If you look at
pg_stat_activity at a moment when some backend is waiting for an
LWLock with one of these dynamically allocated tranche IDs, and the
extension hasn't arranged to register the name for that tranche ID in
your backend, then pg_stat_activity will just show "extension" because
it has no better information.

To do better we'd probably need a whole registry and recycling
mechanism that tracks both IDs and names in shared memory as several
people have mentioned.  But the motivation to do that has reduced
significantly now that T_ID tracking has been dropped.  Previously,
you might have wanted to create and register a variable number of
tranche IDs in order to support, say, a bank of LWLocks used by an
executor node (because there might be more than one instance of that
executor node in query plan, and they'd need distinct IDs in order for
you to be able to register their different base + stride for T_ID
purposes, if we still had that concept).  The infrastructure was
somewhat lacking for that.  Now that tranche IDs are only used to find
a name to show in pg_stat_activity, there is nothing stopping you from
using the same tranche ID in several places at the same time, it just
means that you can't tell which of them is involved when reading that
view.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

From
Robert Haas
Date:
On Thu, Jan 5, 2017 at 5:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jan 5, 2017 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Better documentation seems required, but really the whole design seems
>>> rather wacko.  Backends must agree on numeric tranche IDs, but every
>>> backend has its own copy of the tranche name?  How do we even know what
>>> agreement is?  And every one has to "register" every tranche ID for
>>> itself?  Why in the world isn't registration done *once* and the tranche
>>> name stored in shared memory?
>
>> Well, with the original design, that wasn't feasible, because each
>> backend had to store not only the name (which was constant across all
>> backends) but also the array_base (which might not be, if the locks
>> were stored in DSM) and array_stride.  Since commit
>> 3761fe3c20bb040b15f0e8da58d824631da00caa, it would be much easier to
>> do what you're proposing.  It's still not without difficulties,
>> though.  There are 65,536 possible tranche IDs, and allocating an
>> array of 64k pointers in shared memory would consume half a megabyte
>> of shared memory the vast majority of which would normally be
>> completely unused.  So I'm not very enthused about that solution, but
>> you aren't the first person to propose it.
>
> So, um, how do we know that backend A and backend B have the same idea
> about what tranche id 37 means?

Well, if they just call C exposed functions at random with arguments
picked out of a hat, then we don't.  But if they use the APIs in the
manner documented in lwlock.h, then we do:

/** There is another, more flexible method of obtaining lwlocks. First, call* LWLockNewTrancheId just once to obtain a
trancheID; this allocates from* a shared counter.  Next, each individual process using the tranche should* call
LWLockRegisterTranche()to associate that tranche ID with a name.* Finally, LWLockInitialize should be called just once
perlwlock, passing* the tranche ID as an argument.** It may seem strange that each process using the tranche must
registerit* separately, but dynamic shared memory segments aren't guaranteed to be* mapped at the same address in all
coordinatingbackends, so storing the* registration in the main shared memory segment wouldn't work for that case.*/
 

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company