Thread: ALTER TABLE lock strength reduction patch is unsafe

ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

16 June 2011, 19:54:59

If you set up a pgbench test case that hits the database with a lot of
concurrent selects and non-exclusive-locking ALTER TABLEs, 9.1 soon
falls over.  For example:

$ cat foo.script
alter table pgbench_accounts set (fillfactor = 100);
SELECT abalance FROM pgbench_accounts WHERE aid = 525212;

$ createdb bench
$ pgbench -i -s 10 bench
...
$ pgbench -c 50 -t 1000000 -f foo.script bench
starting vacuum...end.
Client 10 aborted in state 0: ERROR:  relation "pgbench_accounts" does not exist
Client 5 aborted in state 1: ERROR:  cache lookup failed for relation 46260
Client 44 aborted in state 0: ERROR:  relation "pgbench_accounts" does not exist
Client 3 aborted in state 1: ERROR:  relation "pgbench_accounts" does not exist
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;                            ^
Client 45 aborted in state 1: ERROR:  could not open relation with OID 46260
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;                            ^
Client 15 aborted in state 1: ERROR:  cache lookup failed for relation 46260
Client 34 aborted in state 1: ERROR:  could not open relation with OID 46260
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;                            ^
Client 43 aborted in state 1: ERROR:  cache lookup failed for relation 46260
Client 49 aborted in state 1: ERROR:  relation "pgbench_accounts" does not exist
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;                            ^
Client 12 aborted in state 0: ERROR:  relation "pgbench_accounts" does not exist
Client 23 aborted in state 0: ERROR:  relation "pgbench_accounts" does not exist
Client 14 aborted in state 0: ERROR:  relation "pgbench_accounts" does not exist
Client 6 aborted in state 1: ERROR:  could not open relation with OID 46260
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;                            ^
Client 11 aborted in state 1: ERROR:  could not open relation with OID 46260
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;                            ^
Client 4 aborted in state 0: ERROR:  relation "pgbench_accounts" does not exist
... etc etc ...

On my four-core workstation, the failures are infrequent at up to 30
clients but come pretty fast and furious at 50.

What is happening here is this:

1. Some backend commits an ALTER TABLE and sends out an sinval message.

2. In response, some other backend starts to reload its relcache entry
for pgbench_accounts when it begins its next command.  It does an
indexscan with SnapshotNow on pg_class to find the updated pg_class row.

3. Meanwhile, some third backend commits another ALTER TABLE, updating
the pg_class row another time.  Since we have removed the
AccessExclusiveLock that all variants of ALTER TABLE used to take, this
commit can happen while backend #2 is in process of scanning pg_class.

4. Backend #2 visits the new, about-to-be-committed version of
pgbench_accounts' pg_class row just before backend #3 commits.
It sees the row as not good and keeps scanning.  By the time it
reaches the previous version of the row, however, backend #3
*has* committed.  So that version isn't good according to SnapshotNow
either.

5. Thus, backend #2 fails to find any version of the pg_class row
that satisfies SnapshotNow, and it reports an error.  Depending on just
when this happens during the cache load process, you can get any of
the errors displayed above, or probably some other ones.

The particular case I'm showing here only updates pg_class, but other
non-exclusive-lock variants of ALTER TABLE can probably provoke similar
failures with respect to other catalogs, leading to yet different
misbehaviors.

In typical cases where both versions of the row are on the same page,
the window for the concurrent commit to happen is very narrow --- that's
why you need so many clients to make it happen easily.  With enough
clients there's a good chance of losing the CPU between tuple visits.
But of course Murphy's Law says this will happen in production
situations even if the load isn't so high.

I believe that this is fundamentally unavoidable so long as we use
SnapshotNow to read catalogs --- which is something we've talked about
changing, but it will require a pretty major R&D effort to make it
happen.  In the meantime, we have to go back to using
AccessExclusiveLock for table alterations.  It doesn't help to have
a lower lock level if that means that concurrent transactions will
unpredictably fail instead of waiting.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

16 June 2011, 20:15:26

On Thu, Jun 16, 2011 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> In typical cases where both versions of the row are on the same page,
> the window for the concurrent commit to happen is very narrow --- that's
> why you need so many clients to make it happen easily.  With enough
> clients there's a good chance of losing the CPU between tuple visits.
> But of course Murphy's Law says this will happen in production
> situations even if the load isn't so high.

Thanks for doing the research on this. Much better we know now than
enter production like this.

> I believe that this is fundamentally unavoidable so long as we use
> SnapshotNow to read catalogs --- which is something we've talked about
> changing, but it will require a pretty major R&D effort to make it
> happen.  In the meantime, we have to go back to using
> AccessExclusiveLock for table alterations.  It doesn't help to have
> a lower lock level if that means that concurrent transactions will
> unpredictably fail instead of waiting.

If we were to change ALTER TABLE so it takes a session lock rather
than a normal lock, then we can commit the change and then wait until
the relcache invalidations have been received before we release the
lock. That way we would be able to avoid the concurrent issues you
describe.

Or alternatively we could just re-scan if we can't find a valid row
when building the cache. We have time in the failure path...

Do you have stomach for any this in 9.1?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

16 June 2011, 23:32:20

On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I believe that this is fundamentally unavoidable so long as we use
> SnapshotNow to read catalogs --- which is something we've talked about
> changing, but it will require a pretty major R&D effort to make it
> happen.  In the meantime, we have to go back to using
> AccessExclusiveLock for table alterations.  It doesn't help to have
> a lower lock level if that means that concurrent transactions will
> unpredictably fail instead of waiting.

Ouch.

I wonder if we could avoid this anomaly by taking a throwaway MVCC
snapshot at the beginning of each system catalog scan and using it
just for the duration of that scan.  If nothing that has touched the
catalog commits while the scan is open, then this is logically
equivalent to SnapshotNow.  If something does commit in mid-scan, then
we might not get the latest version of the row, but we should end up
with exactly one.  If it's not the latest one, we'll do the rebuild
again upon seeing the next sinval message; in the meantime, the
version we're using mustn't be too intolerably bad or it was an error
not to use AccessExclusiveLock in the first place.

IIUC, the problem with this approach is not correctness but
performance.  Taking snapshots is (currently) expensive.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

17 June 2011, 05:33:02

On Thu, Jun 16, 2011 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> 2. In response, some other backend starts to reload its relcache entry
> for pgbench_accounts when it begins its next command.  It does an
> indexscan with SnapshotNow on pg_class to find the updated pg_class row.
>
> 3. Meanwhile, some third backend commits another ALTER TABLE, updating
> the pg_class row another time.  Since we have removed the
> AccessExclusiveLock that all variants of ALTER TABLE used to take, this
> commit can happen while backend #2 is in process of scanning pg_class.

This part is the core of the problem:

We must not be able to update the catalog entry while a relcache
rebuild scan is in place.

So I'm prototyping something that allows
LockRelationDefinitionOid(targetRelId, ShareLock);



--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

17 June 2011, 05:47:14

On Fri, Jun 17, 2011 at 9:32 AM, simon <simon@2ndquadrant.com> wrote:
> On Thu, Jun 16, 2011 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> 2. In response, some other backend starts to reload its relcache entry
>> for pgbench_accounts when it begins its next command.  It does an
>> indexscan with SnapshotNow on pg_class to find the updated pg_class row.
>>
>> 3. Meanwhile, some third backend commits another ALTER TABLE, updating
>> the pg_class row another time.  Since we have removed the
>> AccessExclusiveLock that all variants of ALTER TABLE used to take, this
>> commit can happen while backend #2 is in process of scanning pg_class.
>
> This part is the core of the problem:
>
> We must not be able to update the catalog entry while a relcache rebuild scan is in place.
>
> So I'm prototyping something that allows
> LockRelationDefinitionOid(targetRelId, ShareLock);

Similar to the way we lock a relation for extension, as a sub-lock of
the main relation lock.

Relcache rebuilds use a ShareLock, ALTER TABLE uses an ExclusiveLock.

I've written the patch, just need to test later today.... gotta step out now.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

17 June 2011, 14:22:53

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I believe that this is fundamentally unavoidable so long as we use
>> SnapshotNow to read catalogs --- which is something we've talked about
>> changing, but it will require a pretty major R&D effort to make it
>> happen.

> Ouch.

> I wonder if we could avoid this anomaly by taking a throwaway MVCC
> snapshot at the beginning of each system catalog scan and using it
> just for the duration of that scan.  If nothing that has touched the
> catalog commits while the scan is open, then this is logically
> equivalent to SnapshotNow.  If something does commit in mid-scan, then
> we might not get the latest version of the row, but we should end up
> with exactly one.  If it's not the latest one, we'll do the rebuild
> again upon seeing the next sinval message; in the meantime, the
> version we're using mustn't be too intolerably bad or it was an error
> not to use AccessExclusiveLock in the first place.

Yeah, this seems like a possibly workable direction to explore.  I like
this better than what Simon is proposing, because it would fix the
generic issue for all types of catalog SnapshotNow scans.

> IIUC, the problem with this approach is not correctness but
> performance.  Taking snapshots is (currently) expensive.

Yeah.  After mulling it for awhile, what about this idea: we could
redefine SnapshotNow as a snapshot type that includes a list of
transactions-in-progress, somewhat like an MVCC snapshot, but we don't
fill that list from the PGPROC array.  Instead, while running a scan
with SnapshotNow, anytime we determine that a particular XID is
still-in-progress, we add that XID to the snapshot's list.
Subsequently, the SnapshotNow code assumes that XID to be
still-in-progress without consulting its actual state.  We reset the XID
list to empty when starting a new SnapshotNow scan.  (We might be able
to do so less often than that, like only when we do
AcceptInvalidationMessages, but it's not clear to me that there's any
real benefit in hanging onto the state longer.)

This costs no performance; if anything it should be faster than now,
because we'll be replacing expensive transaction state probes with
relatively-cheap searches of an XID array that should almost always
be quite short.

With this approach, we would have no serialization anomalies from single
transactions committing while a scan is in progress.  There could be
anomalies resulting from considering an earlier XID to be in-progress
while a later XID is considered committed (because we didn't observe
it until later).  So far as I can see offhand, the impact of that would
be that there might be multiple versions of a tuple that are considered
good, but never that there would be no version considered good (so long
as the other XIDs simply updated the tuple and didn't delete it).  I
think this would be all right, since the scan would just seize on the
first good version it finds.  As you argue above, if that's not good
enough for our purposes then the updater(s) should have taken a stronger
lock.

I am not, however, particularly pleased with the idea of trying to make
this work in 9.1.  I still think that we should back off the attempt
to reduce lock strength in 9.1, and take it up for 9.2.  We need to be
stabilizing 9.1 for release, not introducing new untested mechanisms in
it.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

17 June 2011, 15:24:17

On Fri, Jun 17, 2011 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, this seems like a possibly workable direction to explore.  I like
> this better than what Simon is proposing, because it would fix the
> generic issue for all types of catalog SnapshotNow scans.

It would also avoid adding more lock manager traffic which - as
recently discussed on the relevant threads - turns out to be a
significant performance bottleneck for us right now on some workloads.

>> IIUC, the problem with this approach is not correctness but
>> performance.  Taking snapshots is (currently) expensive.
>
> Yeah.  After mulling it for awhile, what about this idea: we could
> redefine SnapshotNow as a snapshot type that includes a list of
> transactions-in-progress, somewhat like an MVCC snapshot, but we don't
> fill that list from the PGPROC array.  Instead, while running a scan
> with SnapshotNow, anytime we determine that a particular XID is
> still-in-progress, we add that XID to the snapshot's list.
> Subsequently, the SnapshotNow code assumes that XID to be
> still-in-progress without consulting its actual state.  We reset the XID
> list to empty when starting a new SnapshotNow scan.  (We might be able
> to do so less often than that, like only when we do
> AcceptInvalidationMessages, but it's not clear to me that there's any
> real benefit in hanging onto the state longer.)

I think that something like that might possibly work, but what if the
XID array overflows?

A while back I proposed the idea of a "lazy" snapshot, by which I had
in mind something similar to what you are suggesting but different in
detail.  Initially, when asked to acquire a snapshot, the snapshot
manager acknowledges having taken one but does not actually do any
work.  As long as it sees only XIDs that either precede the oldest XID
still running anywhere in the cluster, or have aborted, it can provide
answers that are 100% correct without any further data.  If it ever
sees a newer, non-aborted XID then it goes and really gets an MVCC
snapshot at that point, which it can uses from that point onward.  I
think that it might be possible to make such a system work even for
MVCC snapshots generally, but even if not, it might be sufficient for
this purpose.  Unlike your approach, it would avoid both the "see no
rows" and the "see multiple rows" cases, which might be thought an
advantage.

> I am not, however, particularly pleased with the idea of trying to make
> this work in 9.1.  I still think that we should back off the attempt
> to reduce lock strength in 9.1, and take it up for 9.2.  We need to be
> stabilizing 9.1 for release, not introducing new untested mechanisms in
> it.

I like this feature a lot, but it's hard to imagine that any of the
fixes anyone has so far suggested can be implemented without
collateral damage.  Nor is there any certainty that this is the last
bug.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

17 June 2011, 15:44:26

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jun 17, 2011 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah.  After mulling it for awhile, what about this idea: we could
>> redefine SnapshotNow as a snapshot type that includes a list of
>> transactions-in-progress, somewhat like an MVCC snapshot, but we don't
>> fill that list from the PGPROC array.  Instead, while running a scan
>> with SnapshotNow, anytime we determine that a particular XID is
>> still-in-progress, we add that XID to the snapshot's list.

> I think that something like that might possibly work, but what if the
> XID array overflows?

Well, you repalloc it bigger.  In either this idea or yours below,
I fear SnapshotNow snaps will have to become dynamically-allocated
structures instead of being simple references to a shared constant
object.  (This is because we can sometimes do a SnapshotNow scan when
another one is already in progress, and we couldn't let the inner one
change the outer one's state.)  That's not really a performance problem;
one more palloc to do a catalog scan is a non-issue.  But it is likely
to be a large notational change compared to what we've got now.

> A while back I proposed the idea of a "lazy" snapshot, by which I had
> in mind something similar to what you are suggesting but different in
> detail.  Initially, when asked to acquire a snapshot, the snapshot
> manager acknowledges having taken one but does not actually do any
> work.  As long as it sees only XIDs that either precede the oldest XID
> still running anywhere in the cluster, or have aborted, it can provide
> answers that are 100% correct without any further data.  If it ever
> sees a newer, non-aborted XID then it goes and really gets an MVCC
> snapshot at that point, which it can uses from that point onward.  I
> think that it might be possible to make such a system work even for
> MVCC snapshots generally, but even if not, it might be sufficient for
> this purpose.  Unlike your approach, it would avoid both the "see no
> rows" and the "see multiple rows" cases, which might be thought an
> advantage.

Hmm, yeah, I think this idea is probably better than mine, just because
of the less dubious semantics.  I don't see how you'd make it work for
generic MVCC scans, because the behavior will be "the database state as
of some hard-to-predict time after the scan starts", which is not what
we want for MVCC.  But it ought to be fine for SnapshotNow.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

17 June 2011, 16:08:16

On Fri, Jun 17, 2011 at 7:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Jun 17, 2011 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, this seems like a possibly workable direction to explore.  I like
>> this better than what Simon is proposing, because it would fix the
>> generic issue for all types of catalog SnapshotNow scans.
>
> It would also avoid adding more lock manager traffic which - as
> recently discussed on the relevant threads - turns out to be a
> significant performance bottleneck for us right now on some workloads.

>> I am not, however, particularly pleased with the idea of trying to make
>> this work in 9.1.  I still think that we should back off the attempt
>> to reduce lock strength in 9.1, and take it up for 9.2.  We need to be
>> stabilizing 9.1 for release, not introducing new untested mechanisms in
>> it.
>
> I like this feature a lot, but it's hard to imagine that any of the
> fixes anyone has so far suggested can be implemented without
> collateral damage.  Nor is there any certainty that this is the last
> bug.

Not so. The extra locking would only occur on the first lock
acquisition after DDL operations occur. If that was common then your
other performance patch would not be an effective optimisation. There
is no additional locking from what I've proposed in the common code
path - that's why we have a relcache.

Any effects from the additional locking will only be felt by people
issuing a stream of DDL statements against a table. Even assuming
there are some effects of real note.

So there is no "collateral damage" and additional locking is a viable
solution for 9.1.

It's possible that we may have a better solution in 9.2+ but then
we've said that before and have it never happen, many times.

Having spent a few hours mulling through this, I think there is a
reasonable solution for 9.1 and I continue to work on it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

17 June 2011, 16:09:23

On Fri, Jun 17, 2011 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm, yeah, I think this idea is probably better than mine, just because
> of the less dubious semantics.  I don't see how you'd make it work for
> generic MVCC scans, because the behavior will be "the database state as
> of some hard-to-predict time after the scan starts", which is not what
> we want for MVCC.  But it ought to be fine for SnapshotNow.

Department of second thoughts: I think I see a problem.

Suppose we have a tuple that has not been updated for a long time.
Its XMIN is committed and all-visible, and its XMAX is invalid.  As
we're scanning the table (transaction T1), we see that tuple and say,
oh, it's visible.  Now, another transaction (T2) begins, updates the
tuple, and commits.  Our scan then reaches the page where the new
tuple is located, and says, oh, this is recent, I'd better take a real
snapshot.  Of course, the new snapshot can see the new version of the
tuple, too.  Of course, if T1 had taken its snapshot before starting
the scan, the second tuple would have been invisible.  But since we
didn't take it until later, after T2 had already committed, we see a
duplicate.

That's still no worse than your idea, which rests on the theory that
duplicates don't matter anyway, but the case for it being better is a
lot thinner.  I'd sure prefer something that had less crazy semantics
than either of these ideas, if we can think of something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

17 June 2011, 16:15:42

On Fri, Jun 17, 2011 at 3:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Not so. The extra locking would only occur on the first lock
> acquisition after DDL operations occur. If that was common then your
> other performance patch would not be an effective optimisation. There
> is no additional locking from what I've proposed in the common code
> path - that's why we have a relcache.

The extra locking would also occur when *initially* building relcache
entries.  In other words, this would increase - likely quite
significantly - the overhead of backend startup.  It's not going to be
sufficient to do this just for pg_class; I think you'll have to do it
for pg_attribute, pg_attrdef, pg_constraint, pg_index, pg_trigger,
pg_rewrite, and maybe a few others I'm not thinking of right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

17 June 2011, 16:34:38

On Fri, Jun 17, 2011 at 8:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 17, 2011 at 3:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Not so. The extra locking would only occur on the first lock
>> acquisition after DDL operations occur. If that was common then your
>> other performance patch would not be an effective optimisation. There
>> is no additional locking from what I've proposed in the common code
>> path - that's why we have a relcache.
>
> The extra locking would also occur when *initially* building relcache
> entries.  In other words, this would increase - likely quite
> significantly - the overhead of backend startup.  It's not going to be
> sufficient to do this just for pg_class; I think you'll have to do it
> for pg_attribute, pg_attrdef, pg_constraint, pg_index, pg_trigger,
> pg_rewrite, and maybe a few others I'm not thinking of right now.

Nothing you say here is accurate, regrettably.

The "extra locking" would be one call to the lock manager per
relation. Taken in shared mode, so it doesn't block.

I see no reason at all to have separate locks for each catalog table,
since it's the relation lock that is the top level lock.

Locking is a very well known solution to such problems. We use it
everywhere and we can use it here, and now.

I think you'd better wait to see the patch.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

17 June 2011, 16:45:26

Robert Haas <robertmhaas@gmail.com> writes:
> Department of second thoughts: I think I see a problem.

Um, yeah, so that doesn't really work any better than my idea.

On further reflection, there's a problem at a higher level than this
anyway.  Even if we can get a single SnapshotNow scan to produce
guaranteed-self-consistent results, that doesn't ensure consistency
between the results of scans occurring serially.  An example here is
ALTER COLUMN DROP DEFAULT, which is currently imagined to impact only
writers.  However, suppose that a concurrent relcache load fetches the
pg_attribute row, notes that it has atthasdef = true, and then the ALTER
commits before we start to scan pg_attrdef.  The consistency checks in
AttrDefaultFetch() will complain about a missing pg_attrdef entry, and
rightly so.  We could lobotomize those checks, but it doesn't feel right
to do so; and anyway there may be other cases that are harder to kluge up.

So really we need consistency across *at least* one entire relcache load
cycle.  We could maybe arrange to take an MVCC snap (or some lighter
weight version of that) at the start, and use that for all the resulting
scans, but I think that would be notationally messy.  It's not clear
that it'd solve everything anyhow.  There are parts of a relcache entry
that we fetch only on-demand, so they are typically loaded later than
the core items, and probably couldn't use the same snapshot.  Worse,
there are lots of places where we assume that use of catcache entries or
direct examination of the catalogs will yield results consistent with
the relcache.

I suspect these latter problems will impact Simon's idea as well.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

17 June 2011, 17:00:56

On Fri, Jun 17, 2011 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Department of second thoughts: I think I see a problem.
>
> Um, yeah, so that doesn't really work any better than my idea.
>
> On further reflection, there's a problem at a higher level than this
> anyway.  Even if we can get a single SnapshotNow scan to produce
> guaranteed-self-consistent results, that doesn't ensure consistency
> between the results of scans occurring serially.  An example here is
> ALTER COLUMN DROP DEFAULT, which is currently imagined to impact only
> writers.  However, suppose that a concurrent relcache load fetches the
> pg_attribute row, notes that it has atthasdef = true, and then the ALTER
> commits before we start to scan pg_attrdef.  The consistency checks in
> AttrDefaultFetch() will complain about a missing pg_attrdef entry, and
> rightly so.  We could lobotomize those checks, but it doesn't feel right
> to do so; and anyway there may be other cases that are harder to kluge up.
>
> So really we need consistency across *at least* one entire relcache load
> cycle.  We could maybe arrange to take an MVCC snap (or some lighter
> weight version of that) at the start, and use that for all the resulting
> scans, but I think that would be notationally messy.  It's not clear
> that it'd solve everything anyhow.  There are parts of a relcache entry
> that we fetch only on-demand, so they are typically loaded later than
> the core items, and probably couldn't use the same snapshot.  Worse,
> there are lots of places where we assume that use of catcache entries or
> direct examination of the catalogs will yield results consistent with
> the relcache.
>
> I suspect these latter problems will impact Simon's idea as well.

I suspect we're going to be told that they don't.

I suspect I'm not going to believe it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

17 June 2011, 17:41:21

Excerpts from Tom Lane's message of vie jun 17 13:22:40 -0400 2011:

> With this approach, we would have no serialization anomalies from single
> transactions committing while a scan is in progress.  There could be
> anomalies resulting from considering an earlier XID to be in-progress
> while a later XID is considered committed (because we didn't observe
> it until later).  So far as I can see offhand, the impact of that would
> be that there might be multiple versions of a tuple that are considered
> good, but never that there would be no version considered good (so long
> as the other XIDs simply updated the tuple and didn't delete it).  I
> think this would be all right, since the scan would just seize on the
> first good version it finds.  As you argue above, if that's not good
> enough for our purposes then the updater(s) should have taken a stronger
> lock.

Hmm, would there be a problem if a scan on catalog A yields results from
supposedly-running transaction X but another scan on catalog B yields
result from transaction Y? (X != Y)  For example, a scan on pg_class
says that there are N triggers but scanning pg_trigger says N-1?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

17 June 2011, 18:02:51

Robert Haas <robertmhaas@gmail.com> writes:
> I like this feature a lot, but it's hard to imagine that any of the
> fixes anyone has so far suggested can be implemented without
> collateral damage.  Nor is there any certainty that this is the last
> bug.

And in fact, here's something else to worry about: consider pg_dump.
pg_dump is pretty heavily reliant on backend catalog-interpretation code
(such as ruleutils.c) that mostly runs on SnapshotNow time.  But it also
does a fair amount of work on the basis of its own inspection of the
catalogs, which is done according to the serializable snapshot it gets
at the beginning of the dump run.  If these two views of a table's
schema aren't consistent, you might get a pg_dump error, but it's at
least as likely that you'll get a silently incorrect dump.

pg_dump tries to minimize the risk by taking AccessShareLock right away
on each table it's going to dump.  This is not perfect but it at least
results in a narrow window for conflicting table changes to occur.
However, that strategy has been blown out of the water by the ALTER
TABLE lock strength reduction.  There is now a *very* wide window for
concurrent ALTERs to occur and possibly break the dump results.

As far as I can see, the only simple way to return pg_dump to its
previous level of safety while retaining this patch is to make it take
ShareUpdateExclusiveLocks, so that it will still block all forms of
ALTER TABLE.  This is rather unpleasant, since it will also block
autovacuum for the duration of the dump.

In the long run, we really ought to fix things so that ruleutils.c
runs on the transaction snapshot, but that's a massive rewrite that is
certainly not getting done for 9.1, and will likely result in
considerable code bloat :-(.

(BTW, I just noticed that dumpSchema does a pretty fair amount of work
before it gets around to calling getTables, which is where said locks
get taken.  Seems like we'd better rearrange the order of operations
there...)
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

17 June 2011, 18:08:35

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Hmm, would there be a problem if a scan on catalog A yields results from
> supposedly-running transaction X but another scan on catalog B yields
> result from transaction Y? (X != Y)  For example, a scan on pg_class
> says that there are N triggers but scanning pg_trigger says N-1?

Yeah, I came to that same conclusion downthread.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

17 June 2011, 18:17:23

Excerpts from Tom Lane's message of vie jun 17 17:08:25 -0400 2011:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Hmm, would there be a problem if a scan on catalog A yields results from
> > supposedly-running transaction X but another scan on catalog B yields
> > result from transaction Y? (X != Y)  For example, a scan on pg_class
> > says that there are N triggers but scanning pg_trigger says N-1?
> 
> Yeah, I came to that same conclusion downthread.

Something is seriously wrong with my email :-(

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

17 June 2011, 18:25:08

On Fri, Jun 17, 2011 at 5:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As far as I can see, the only simple way to return pg_dump to its
> previous level of safety while retaining this patch is to make it take
> ShareUpdateExclusiveLocks, so that it will still block all forms of
> ALTER TABLE.  This is rather unpleasant, since it will also block
> autovacuum for the duration of the dump.

I have been thinking for a while now that it would be sensible to make
vacuum use a different lock type, much as we do for relation
extension.  DROP TABLE and CLUSTER and at least some forms of ALTER
TABLE and maybe a few other things like CREATE INDEX would need to
grab that lock in addition to the ones they already acquire, but a
whole lot of other things wouldn't.  In particular, it's currently not
possible to lock a table against SELECT without also locking it
against VACUUM - and booting off any auto-vacuum worker that happens
to already be processing it.  If you imagine a large table with a
bunch of short-duration exclusive locks, it's not too hard to see how
you can end up with VACUUM starvation.

But that's not something I want to do in 9.1, and I doubt it would
completely solve this problem anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

17 June 2011, 18:36:32

Robert Haas <robertmhaas@gmail.com> writes:
> I have been thinking for a while now that it would be sensible to make
> vacuum use a different lock type, much as we do for relation
> extension.

Hmm.  I had just been toying with the idea of introducing a new
user-visible locking level to allow separation of anti-vacuum locks from
anti-schema-alteration locks.  But I think you're probably right that it
could be done as a specialized LockTag.  That would make it not easily
user-accessible, but it's hard to think of reasons for users to lock out
vacuum anyway, unless they want to lock out everything via
AccessExclusiveLock.

> ... In particular, it's currently not
> possible to lock a table against SELECT without also locking it
> against VACUUM

Well, it still wouldn't be, since AccessExclusiveLock certainly had
better lock out vacuum.  As said above, I think the important thing
is to distinguish vacuum from schema changes.

> But that's not something I want to do in 9.1,

Definitely.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

17 June 2011, 19:34:56

On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 4. Backend #2 visits the new, about-to-be-committed version of
> pgbench_accounts' pg_class row just before backend #3 commits.
> It sees the row as not good and keeps scanning.  By the time it
> reaches the previous version of the row, however, backend #3
> *has* committed.  So that version isn't good according to SnapshotNow
> either.

<thinks some more>

Why isn't this a danger for every pg_class update?  For example, it
would seem that if VACUUM updates relpages/reltuples, it would be
prone to this same hazard.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

17 June 2011, 19:41:33

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 4. Backend #2 visits the new, about-to-be-committed version of
>> pgbench_accounts' pg_class row just before backend #3 commits.
>> It sees the row as not good and keeps scanning. �By the time it
>> reaches the previous version of the row, however, backend #3
>> *has* committed. �So that version isn't good according to SnapshotNow
>> either.

> <thinks some more>

> Why isn't this a danger for every pg_class update?  For example, it
> would seem that if VACUUM updates relpages/reltuples, it would be
> prone to this same hazard.

VACUUM does that with an in-place, nontransactional update.  But yes,
this is a risk for every transactional catalog update.

            regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

19 June 2011, 18:13:22

On Fri, Jun 17, 2011 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> 4. Backend #2 visits the new, about-to-be-committed version of
>>> pgbench_accounts' pg_class row just before backend #3 commits.
>>> It sees the row as not good and keeps scanning.  By the time it
>>> reaches the previous version of the row, however, backend #3
>>> *has* committed.  So that version isn't good according to SnapshotNow
>>> either.
>
>> <thinks some more>
>
>> Why isn't this a danger for every pg_class update?  For example, it
>> would seem that if VACUUM updates relpages/reltuples, it would be
>> prone to this same hazard.
>
> VACUUM does that with an in-place, nontransactional update.  But yes,
> this is a risk for every transactional catalog update.

Well, after various efforts to fix the problem, I notice that there
are two distinct problems brought out by your test case.

One of them is caused by my patch, one of them was already there in
the code - this latter one is actually the hardest to fix.

It took me about an hour to fix the first bug, but its taken a while
of thinking about the second before I realised it was a pre-existing
bug.

The core problem is, as you observed that a pg_class update can cause
rows to be lost with concurrent scans.
We scan pg_class in two ways: to rebuild a relcache entry based on a
relation's oid (easy fix). We also scan pg_class to resolve the name
to oid mapping. The name to oid mapping is performed *without* a lock
on the relation, since we don't know which relation to lock. So the
name lookup can fail if we are in the middle of a pg_class update.
This is an existing potential bug in Postgres unrelated to my patch.
Ref: SearchCatCache()

I've been looking at ways to lock the relation name and namespace
prior to the lookup (or more precisely the hash), but its worth
discussing whether we want that at all?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

20 June 2011, 05:49:01

On Sun, Jun 19, 2011 at 10:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

> pre-existing bug

I've been able to reproduce the bug on 9.0 stable as well, using the
OP's test script.
Multiple errors like this...

ERROR:  relation "pgbench_accounts" does not exist
STATEMENT:  alter table pgbench_accounts set (fillfactor = 100);

Presume we want to fix both bugs, not just the most recent one so will
continue with both patches.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

20 June 2011, 10:14:31

On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Jun 17, 2011 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> 4. Backend #2 visits the new, about-to-be-committed version of
>>>> pgbench_accounts' pg_class row just before backend #3 commits.
>>>> It sees the row as not good and keeps scanning.  By the time it
>>>> reaches the previous version of the row, however, backend #3
>>>> *has* committed.  So that version isn't good according to SnapshotNow
>>>> either.
>>
>>> <thinks some more>
>>
>>> Why isn't this a danger for every pg_class update?  For example, it
>>> would seem that if VACUUM updates relpages/reltuples, it would be
>>> prone to this same hazard.
>>
>> VACUUM does that with an in-place, nontransactional update.  But yes,
>> this is a risk for every transactional catalog update.
>
> Well, after various efforts to fix the problem, I notice that there
> are two distinct problems brought out by your test case.
>
> One of them is caused by my patch, one of them was already there in
> the code - this latter one is actually the hardest to fix.
>
> It took me about an hour to fix the first bug, but its taken a while
> of thinking about the second before I realised it was a pre-existing
> bug.
>
> The core problem is, as you observed that a pg_class update can cause
> rows to be lost with concurrent scans.
> We scan pg_class in two ways: to rebuild a relcache entry based on a
> relation's oid (easy fix). We also scan pg_class to resolve the name
> to oid mapping. The name to oid mapping is performed *without* a lock
> on the relation, since we don't know which relation to lock. So the
> name lookup can fail if we are in the middle of a pg_class update.
> This is an existing potential bug in Postgres unrelated to my patch.
> Ref: SearchCatCache()
>
> I've been looking at ways to lock the relation name and namespace
> prior to the lookup (or more precisely the hash), but its worth
> discussing whether we want that at all?

If this is a pre-existing bug, then it's not clear to me why we need
to do anything about it at all right now.  I mean, it would be nice to
have a fix, but it's hard to imagine that any proposed fix will be
low-risk, and I don't remember user complaints about this.  I continue
to think that the root of the problem here is that SnapshotNow is Evil
(TM).  If we get rid of that, then this problem goes away, but that
strikes me as a long-term project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

20 June 2011, 10:25:01

On Mon, Jun 20, 2011 at 2:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> If this is a pre-existing bug, then it's not clear to me why we need
> to do anything about it at all right now.  I mean, it would be nice to
> have a fix, but it's hard to imagine that any proposed fix will be
> low-risk, and I don't remember user complaints about this.  I continue
> to think that the root of the problem here is that SnapshotNow is Evil
> (TM).  If we get rid of that, then this problem goes away, but that
> strikes me as a long-term project.

There are 2 bugs, one caused by my patch in 9.1, one that is pre-existing.

The 9.1 bug can be fixed easily. I will edit my patch down and repost
here shortly.

The pre-existing bug is slightly harder/contentious because we have to
lock the name of a possible relation, even before we know it exists.
I've been looking to implement that as a lock on the uint32 hash of
the relation's name and namespace. I'm looking for opinions ranging
from fix-now-and-backpatch thru to ignore and discuss for 9.2.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

"Kevin Grittner"

Date:

20 June 2011, 10:43:07

Simon Riggs <simon@2ndQuadrant.com> wrote:
> I'm looking for opinions ranging from fix-now-and-backpatch thru
> to ignore and discuss for 9.2.
If it's a pre-existing bug I would think that one option would be to
put it into the next bug-fix release of each supported major release
in which it is manifest.  Of course, if it is *safe* to work it into
9.1, that'd be great.
-Kevin

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

20 June 2011, 13:57:38

On Mon, Jun 20, 2011 at 9:42 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>
>> I'm looking for opinions ranging from fix-now-and-backpatch thru
>> to ignore and discuss for 9.2.
>
> If it's a pre-existing bug I would think that one option would be to
> put it into the next bug-fix release of each supported major release
> in which it is manifest.  Of course, if it is *safe* to work it into
> 9.1, that'd be great.

I'm currently on the other end of the spectrum: ignore and consider for 9.2.

But that's mostly based on the belief that there isn't going to be a
way of fixing this that isn't far too invasive to back-patch.  Should
that turn out to be incorrect, that's a different matter, of course...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

"Kevin Grittner"

Date:

20 June 2011, 14:22:09

Robert Haas <robertmhaas@gmail.com> wrote:
> I'm currently on the other end of the spectrum: ignore and
> consider for 9.2.
I guess part of my point was that if we can't safely get something
into the initial 9.1 release, it doesn't mean we necessarily need to
wait for 9.2.  Bugs can be fixed along the way in minor releases.  A
9.1.1 fix might give us more time to work through details and ensure
that it is a safe and well-targeted fix.
-Kevin

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

20 June 2011, 18:56:30

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> We scan pg_class in two ways: to rebuild a relcache entry based on a
>> relation's oid (easy fix). We also scan pg_class to resolve the name
>> to oid mapping. The name to oid mapping is performed *without* a lock
>> on the relation, since we don't know which relation to lock. So the
>> name lookup can fail if we are in the middle of a pg_class update.
>> This is an existing potential bug in Postgres unrelated to my patch.

> If this is a pre-existing bug, then it's not clear to me why we need
> to do anything about it at all right now.

Yeah.  This behavior has been there since day zero, and there have been
very few complaints about it.  But note that there's only a risk for
pg_class updates, not any other catalog, and there is exactly one kind
of failure with very predictable consequences.  The ALTER TABLE patch
has greatly expanded the scope of the issue, and that *is* a regression
compared to prior releases.

BTW, it seems to me that this issue is closely related to what Noah is
trying to fix here:
http://archives.postgresql.org/message-id/20110612191843.GF21098@tornado.leadboat.com
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

20 June 2011, 19:55:47

On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> We scan pg_class in two ways: to rebuild a relcache entry based on a
>>> relation's oid (easy fix). We also scan pg_class to resolve the name
>>> to oid mapping. The name to oid mapping is performed *without* a lock
>>> on the relation, since we don't know which relation to lock. So the
>>> name lookup can fail if we are in the middle of a pg_class update.
>>> This is an existing potential bug in Postgres unrelated to my patch.
>
>> If this is a pre-existing bug, then it's not clear to me why we need
>> to do anything about it at all right now.
>
> Yeah.  This behavior has been there since day zero, and there have been
> very few complaints about it.  But note that there's only a risk for
> pg_class updates, not any other catalog, and there is exactly one kind
> of failure with very predictable consequences.

I agree we shouldn't do anything about the name lookups for 9.1
That is SearchCatCache using RELNAMENSP lookups, to be precise, as
well as triggers and few other similar call types.

> The ALTER TABLE patch
> has greatly expanded the scope of the issue, and that *is* a regression
> compared to prior releases.

I agree the scope for RELOID errors increased with my 9.1 patch. I'm
now happy with the locking patch (attached), which significantly
reduces the scope - back to the original error scope, in my testing.

I tried to solve both, but I think that's a step too far given the timing.

It seems likely that there will be objections to this patch. All I
would say is that issuing a stream of ALTER TABLEs against the same
table is not a common situation; if it were we would have seen more of
the pre-existing bug. ALTER TABLE command encompasses many subcommands
and we should evaluate each subcommand differently when we decide what
to do.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

relation_definition_lock.v3.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

20 June 2011, 20:03:13

On Mon, Jun 20, 2011 at 11:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

> I agree we shouldn't do anything about the name lookups for 9.1
> That is SearchCatCache using RELNAMENSP lookups, to be precise, as
> well as triggers and few other similar call types.

Name lookups give ERRORs that look like this...

relation "pgbench_accounts" does not exist

>> The ALTER TABLE patch
>> has greatly expanded the scope of the issue, and that *is* a regression
>> compared to prior releases.
>
> I agree the scope for RELOID errors increased with my 9.1 patch.

Which originally generated ERRORs like

cache lookup failed for relation 16390
or
could not open relation with OID 16390

which should now be absent from the server log.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

21 June 2011, 10:37:19

On Mon, Jun 20, 2011 at 5:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> We scan pg_class in two ways: to rebuild a relcache entry based on a
>>> relation's oid (easy fix). We also scan pg_class to resolve the name
>>> to oid mapping. The name to oid mapping is performed *without* a lock
>>> on the relation, since we don't know which relation to lock. So the
>>> name lookup can fail if we are in the middle of a pg_class update.
>>> This is an existing potential bug in Postgres unrelated to my patch.
>
>> If this is a pre-existing bug, then it's not clear to me why we need
>> to do anything about it at all right now.
>
> Yeah.  This behavior has been there since day zero, and there have been
> very few complaints about it.  But note that there's only a risk for
> pg_class updates, not any other catalog, and there is exactly one kind
> of failure with very predictable consequences.  The ALTER TABLE patch
> has greatly expanded the scope of the issue, and that *is* a regression
> compared to prior releases.

It's not entirely clear to me how many additional failure cases we've
bought ourselves with this patch.  The particular one you've
demonstrated seems pretty similar to the on we already had, although
possibly the window for it is wider.  Did you run the same test you
used on 9.1 on 9.0 for comparison?

> BTW, it seems to me that this issue is closely related to what Noah is
> trying to fix here:
> http://archives.postgresql.org/message-id/20110612191843.GF21098@tornado.leadboat.com

I think there are two general problems here.  One, we use SnapshotNow
to scan system catalogs, and SnapshotNow semantics are a mess.  Two,
even if we used an MVCC snapshot to scan the system catalogs, it's not
necessarily safe or correct to latch onto an old row version, because
the row update is combined with other actions that are not MVCC-safe,
like removing files on disk.  Breaking it down a bit more:

A. The problem with using a DDL lock level < AccessExclusiveLock,
AFAICS, is entirely due to SnapshotNow semantics.  Any row version we
can possibly see is OK, but we had better see exactly one.  Or at
least not less than one.

B. The problem with name lookups failing in the middle of a pg_class
update is also entirely due to SnapshotNow semantics.

C. The problem Noah is complaining about is NOT due to SnapshotNow
semantics.  If someone does BEGIN; DROP TABLE foo; CREATE TABLE foo();
COMMIT, it is no longer OK for anyone to be looking at the old foo.
An MVCC snapshot is enough to guarantee that we'll see some row, but
examining the old one won't cut it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

21 June 2011, 10:40:24

On Mon, Jun 20, 2011 at 6:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I agree the scope for RELOID errors increased with my 9.1 patch. I'm
> now happy with the locking patch (attached), which significantly
> reduces the scope - back to the original error scope, in my testing.
>
> I tried to solve both, but I think that's a step too far given the timing.
>
> It seems likely that there will be objections to this patch. All I
> would say is that issuing a stream of ALTER TABLEs against the same
> table is not a common situation; if it were we would have seen more of
> the pre-existing bug. ALTER TABLE command encompasses many subcommands
> and we should evaluate each subcommand differently when we decide what
> to do.

Well, my principal objection is that I think heavyweight locking is an
excessively expensive solution to this problem.  I think the patch is
simple enough that I wouldn't object to applying it on those grounds
even at this late date, but I bet if we do some benchmarking on the
right workload we'll find a significant performance regression.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

21 June 2011, 12:06:34

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jun 20, 2011 at 5:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah.  This behavior has been there since day zero, and there have been
>> very few complaints about it.  But note that there's only a risk for
>> pg_class updates, not any other catalog, and there is exactly one kind
>> of failure with very predictable consequences.  The ALTER TABLE patch
>> has greatly expanded the scope of the issue, and that *is* a regression
>> compared to prior releases.

> It's not entirely clear to me how many additional failure cases we've
> bought ourselves with this patch.  The particular one you've
> demonstrated seems pretty similar to the on we already had, although
> possibly the window for it is wider.

It's not so much the probability of failure that is bothering me, as
the variety of possible symptoms.  There was exactly one failure mode
before, namely "no such relation".  I'm not sure how many possible
symptoms there are now, but there's a lot, and most of them are going
to be weird "what the heck was that??" behaviors.  If we let 9.1 ship
like this, we are going to be creating a support headache.  Even worse,
knowing that those bugs exist will tempt us to write off reports of
weird cache lookup failures as being instances of this problem, when
closer investigation might show that they're something else.

Please note that this position should not be regarded as support for
Simon's proposed patch.  I still think the right decision is to revert
the ALTER TABLE feature, mainly because I do not believe this is the
last bug in it.  And the fact that there's a pre-existing bug with a
vaguely similar symptom is no justification for introducing more bugs.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

21 June 2011, 12:12:46

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The ALTER TABLE patch
>> has greatly expanded the scope of the issue, and that *is* a regression
>> compared to prior releases.

> I agree the scope for RELOID errors increased with my 9.1 patch. I'm
> now happy with the locking patch (attached), which significantly
> reduces the scope - back to the original error scope, in my testing.

> I tried to solve both, but I think that's a step too far given the timing.

> It seems likely that there will be objections to this patch.

Yup, you're right.  Having read this patch, I have absolutely zero
confidence in it.  It introduces some locks in random places, with no
rhyme or reason that I can see.  There is no reason to think that this
is a complete solution, and considerable reason to think that it isn't
(notably, the RELOID syscache is hardly the only one at risk).  Worse,
it's adding more locking in performance-critical places, which seems
to me to severely degrade the argument for the original feature,
namely that it was supposed to give us *less* locking.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

21 June 2011, 14:13:23

Excerpts from Robert Haas's message of mar jun 21 09:40:16 -0400 2011:
> On Mon, Jun 20, 2011 at 6:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > I agree the scope for RELOID errors increased with my 9.1 patch. I'm
> > now happy with the locking patch (attached), which significantly
> > reduces the scope - back to the original error scope, in my testing.
> >
> > I tried to solve both, but I think that's a step too far given the timing.
> >
> > It seems likely that there will be objections to this patch. All I
> > would say is that issuing a stream of ALTER TABLEs against the same
> > table is not a common situation; if it were we would have seen more of
> > the pre-existing bug. ALTER TABLE command encompasses many subcommands
> > and we should evaluate each subcommand differently when we decide what
> > to do.
> 
> Well, my principal objection is that I think heavyweight locking is an
> excessively expensive solution to this problem.  I think the patch is
> simple enough that I wouldn't object to applying it on those grounds
> even at this late date, but I bet if we do some benchmarking on the
> right workload we'll find a significant performance regression.

Yeah, taking a hw lock at each relcache item build is likely to be
prohibitively expensive.  Heck, relation extension is expensive already
in some loads.  (I'm guessing that things will tank when there's a
relcache reset).  Still, this seems to be an overall better approach to
the problem than what's been proposed elsewhere.  Maybe we can do
something with a map of relations that are protected by a bunch of
lwlocks instead.  (We could use that for relation extension too; that'd
rock.)

The patch may be simple, but is it complete?  I think you need to have
lock acquisition on create rule and create index too.  Right now it only
has locks on trigger stuff and alter table.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

21 June 2011, 15:24:43

Excerpts from Tom Lane's message of mar jun 21 11:06:22 -0400 2011:

> Please note that this position should not be regarded as support for
> Simon's proposed patch.  I still think the right decision is to revert
> the ALTER TABLE feature, mainly because I do not believe this is the
> last bug in it.  And the fact that there's a pre-existing bug with a
> vaguely similar symptom is no justification for introducing more bugs.

Note that this feature can be disabled by tweaking
AlterTableGetLockLevel so that it always returns AccessExclusive.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

21 June 2011, 15:37:53

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of mar jun 21 11:06:22 -0400 2011:
>> Please note that this position should not be regarded as support for
>> Simon's proposed patch.  I still think the right decision is to revert
>> the ALTER TABLE feature, mainly because I do not believe this is the
>> last bug in it.  And the fact that there's a pre-existing bug with a
>> vaguely similar symptom is no justification for introducing more bugs.

> Note that this feature can be disabled by tweaking
> AlterTableGetLockLevel so that it always returns AccessExclusive.

I think Simon had also hacked a couple of other places such as CREATE
TRIGGER, but yeah, I was thinking of just lobotomizing that function
with an #ifdef.  When and if we get these problems worked out, it'll
be easy to re-enable the feature.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

22 June 2011, 12:26:49

On Tue, Jun 21, 2011 at 4:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The ALTER TABLE patch
>>> has greatly expanded the scope of the issue, and that *is* a regression
>>> compared to prior releases.
>
>> I agree the scope for RELOID errors increased with my 9.1 patch. I'm
>> now happy with the locking patch (attached), which significantly
>> reduces the scope - back to the original error scope, in my testing.
>
>> I tried to solve both, but I think that's a step too far given the timing.
>
>> It seems likely that there will be objections to this patch.
>
> Yup, you're right.  Having read this patch, I have absolutely zero
> confidence in it.  It introduces some locks in random places, with no
> rhyme or reason that I can see.

There is something to be salvaged here and I have additional analysis
and suggestions. Please read all the way through.

We already allowed INHERIT/dis INHERIT with a lower lock level than
AccessExclusiveLock and have received no complaints to speak of. We
aren't discussing moving that operation to AccessExclusiveLock in the
light of this discussion, nor are we discussing TYPE OF operations. I
don't think its sensible to reset those operations to
AccessExclusiveLock.

The patch has extensive comments that explain the carefully analysed
and tested locations for the locking.

We take an ExclusiveLock on the RelationDefinition during an ALTER TABLE.
We take a ShareLock on CatCache and RelCache rebuilds.

The only time these can block is during an ALTER TABLE with a lower
lock than AccessExclusiveLock that occurs immediately after other DDL.
The locking has been placed to minimise the lock window for simple
ALTER TABLE statements.

> There is no reason to think that this
> is a complete solution, and considerable reason to think that it isn't
> (notably, the RELOID syscache is hardly the only one at risk).

The whole of the relcache rebuild is protected, not just access to
pg_class, so that is safe, no matter which catalog tables are
accessed.

Catcache does look more complex, so your fears here are worth looking
into. I was already doing that since my last post and have now
finished.

Based on the above, I think we should limit the patch to these areas:

1. Simple operations on pg_class and/or pg_attribute. This includes
                        case AT_SetStatistics:   ATTNAME catcache
            case AT_SetOptions:     ATTNAME catcache
            case AT_ResetOptions:  ATTNAME catcache
            case AT_SetStorage:    ATTNAME catcache
            case AT_SetRelOptions: RELOID catcache
            case AT_ResetRelOptions: RELOID catcache

2. FK VALIDATION which touches CONSTROID

Limiting the scope of the patch to those areas provides good benefit,
whilst at the same time limiting our risk footprint. Doing that means
we can evaluate the patch against those aspects.

For people that still think there is still risk, I've added a
parameter called "relaxed_ddl_locking" which defaults to "off". That
way people can use this feature in an informed way without exposing us
to support risks from its use.

[Patch needs minor docs and test documentation changes prior to commit]

> Worse,
> it's adding more locking in performance-critical places, which seems
> to me to severely degrade the argument for the original feature,
> namely that it was supposed to give us *less* locking.

I think we should be clear that this adds *one* lock/unlock for each
object for each session, ONCE only after each transaction that
executed a DDL statement against an object. So with 200 sessions, a
typical ALTER TABLE statement will cause 400 locks. The current lock
manager maxes out above 50,000 locks per second, so any comparative
analysis will show this is a minor blip on even a heavy workload.

It isn't in a performance critical place and the lock held in case of
a rebuild is a ShareLock so won't block, accept against an ALTER TABLE
statement.

This is a marked *reduction* in locking overhead, in comparison with
ALTER TABLE being an AccessExclusiveLock which would make the SQL
statement wait for potentially a very long time.

With relaxed_ddl_locking = off (default) there is zero effect.

So that is not a valid argument to refuse this patch.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

relaxed_ddl_locking.v4.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

23 June 2011, 16:59:36

On Wed, Jun 22, 2011 at 11:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> For people that still think there is still risk, I've added a
> parameter called "relaxed_ddl_locking" which defaults to "off". That
> way people can use this feature in an informed way without exposing us
> to support risks from its use.

I can't get excited about adding a GUC that says "you can turn this
off if you want but don't blame us if it breaks".  That just doesn't
seem like good software engineering to me.

> I think we should be clear that this adds *one* lock/unlock for each
> object for each session, ONCE only after each transaction that
> executed a DDL statement against an object. So with 200 sessions, a
> typical ALTER TABLE statement will cause 400 locks. The current lock
> manager maxes out above 50,000 locks per second, so any comparative
> analysis will show this is a minor blip on even a heavy workload.

I think that using locks to address this problem is the wrong
approach.  I think the right fix is to use something other than
SnapshotNow to do the system catalog scans.  However, if we were going
to use locking, then we'd need to ensure that:

(1) No transaction which has made changes to a system catalog may
commit while some other transaction is in the middle of scanning that
catalog.
(2) No transaction which has made changes to a set of system catalogs
may commit while some other transaction is in the midst of fetching
interrelated data from a subset of those catalogs.

It's important to note, I think, that the problem here doesn't occur
at the time that the table rows are updated, because those rows aren't
visible yet.  The problem happens at commit time.  So unless I'm
missing something, ALTER TABLE would only need to acquire the relation
definition lock after it has finished all of its other work.  In fact,
it doesn't even really need to acquire it then, either.  It could just
queue up a list of relation definition locks to acquire immediately
before commit, which would be advantageous if the transaction went on
to abort, since in that case we'd skip the work of acquiring the locks
at all.

In fact, without doing that, this patch would undo much of the point
of the original ALTER TABLE lock strength reduction:

begin;
alter table foo alter column a set storage plain;
<start a new psql session in another window>
select * from foo;

In master, the SELECT completes without blocking.  With this patch
applied, the SELECT blocks, just as it did in 9.0, because it can't
rebuild the relcache entry without getting the relation definition
lock, which the alter table will hold until commit.  In fact, the same
thing happens even if foo has been accessed previously in the same
session.  If there is already an open *transaction* that has accessed
foo, then, it seems, it can continue to do so until it commits.  But
as soon as it tries to start a new transaction, it blocks again.  I
don't quite understand why that is - I didn't think we invalidated the
entire relcache on every commit.  But that's what happens.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

24 June 2011, 16:46:28

On Thu, Jun 23, 2011 at 8:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jun 22, 2011 at 11:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> For people that still think there is still risk, I've added a
>> parameter called "relaxed_ddl_locking" which defaults to "off". That
>> way people can use this feature in an informed way without exposing us
>> to support risks from its use.
>
> I can't get excited about adding a GUC that says "you can turn this
> off if you want but don't blame us if it breaks".  That just doesn't
> seem like good software engineering to me.

Nobody is suggesting we fix the pre-existing bug - we're just going to
leave it. That doesn't sound like good software engineering either,
but I'm not going to complain because I understand.

We're in a bind here and I'm trying to suggest practical ways out, as
well as cater for the levels of risk people are willing to accept. I
don't think we need that parameter at all, but if you do then I'm
willing to tolerate it.

>> I think we should be clear that this adds *one* lock/unlock for each
>> object for each session, ONCE only after each transaction that
>> executed a DDL statement against an object. So with 200 sessions, a
>> typical ALTER TABLE statement will cause 400 locks. The current lock
>> manager maxes out above 50,000 locks per second, so any comparative
>> analysis will show this is a minor blip on even a heavy workload.
>
> I think that using locks to address this problem is the wrong
> approach.  I think the right fix is to use something other than
> SnapshotNow to do the system catalog scans.

I agree with that, but its a much bigger job than you think and I
really doubt that you will do it for 9.2, definitely not for 9.1.

There are so many call points, I would not bother personally to
attempt a such an critical and widely invasive fix.

So I'd put that down as a Blue Sky will-fix-one-day thing.

> However, if we were going
> to use locking, then we'd need to ensure that:
>
> (1) No transaction which has made changes to a system catalog may
> commit while some other transaction is in the middle of scanning that
> catalog.
> (2) No transaction which has made changes to a set of system catalogs
> may commit while some other transaction is in the midst of fetching
> interrelated data from a subset of those catalogs.
>
> It's important to note, I think, that the problem here doesn't occur
> at the time that the table rows are updated, because those rows aren't
> visible yet.  The problem happens at commit time.  So unless I'm
> missing something, ALTER TABLE would only need to acquire the relation
> definition lock after it has finished all of its other work.  In fact,
> it doesn't even really need to acquire it then, either.  It could just
> queue up a list of relation definition locks to acquire immediately
> before commit, which would be advantageous if the transaction went on
> to abort, since in that case we'd skip the work of acquiring the locks
> at all.

That would work if the only thing ALTER TABLE does is write. There are
various places where we read catalogs and all of those catalogs need
to be relationally integrated. So the only safe way, without lots of
incredibly detailed analysis (which you would then find fault with) is
to lock at the top and hold the lock throughout most of the
processing. That's the safe thing to do, at least.

I do already use the point you make in the patch, where it's used  to
unlock before and then relock after the FK constraint check, so that
the RelationDefinition lock is never held for long periods in any code
path.

> In fact, without doing that, this patch would undo much of the point
> of the original ALTER TABLE lock strength reduction:
>
> begin;
> alter table foo alter column a set storage plain;
> <start a new psql session in another window>
> select * from foo;
>
> In master, the SELECT completes without blocking.  With this patch
> applied, the SELECT blocks, just as it did in 9.0, because it can't
> rebuild the relcache entry without getting the relation definition
> lock, which the alter table will hold until commit.

It's not the same at all. Yes, they are both locks; what counts is the
frequency and duration of locking.

With AccessExclusiveLock we prevent all SELECTs from accessing the
table while we queue for the lock, which can be blocked behind a long
running query. So the total outage to the application for one minor
change can be hours - unpredictably. (Which is why I never wrote the
ALTER TABLE set ndistinct myself, even though I suggested it, cos its
mostly unusable).

With the reduced locking level AND this patch the *rebuild* can be
blocked behind the DDL. But the DDL itself is not blocked in the same
way, so the total outage is much reduced and the whole set of actions
goes through fairly quickly.

If you only submit one DDL operation then the rebuild has nothing to
block against, so the whole thing goes through smoothly.

> In fact, the same
> thing happens even if foo has been accessed previously in the same
> session.  If there is already an open *transaction* that has accessed
> foo, then, it seems, it can continue to do so until it commits.  But
> as soon as it tries to start a new transaction, it blocks again.  I
> don't quite understand why that is - I didn't think we invalidated the
> entire relcache on every commit.

We don't invalidate the relcache on every commit.

> But that's what happens.

Test case please. I don't understand the problem you're describing.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

24 June 2011, 17:00:35

On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Test case please. I don't understand the problem you're describing.

S1: select * from foo;
S2: begin;
S2: alter table foo alter column a set storage plain;
S1: select * from foo;
<blocks>

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

24 June 2011, 17:27:47

On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Test case please. I don't understand the problem you're describing.
>
> S1: select * from foo;
> S2: begin;
> S2: alter table foo alter column a set storage plain;
> S1: select * from foo;
> <blocks>

Er,,.yes, that what locks do. Where is the bug?

We have these choices of behaviour
1. It doesn't error and doesn't block - not possible for 9.1, probably
not for 9.2 either
2. It doesn't block, but may throw an error sometimes - the reported bug
3. It blocks in some cases for short periods where people do repeated
DDL, but never throws errors - this patch
4. Full scale locking - human sacrifice, cats and dogs, living
together, mass hysteria

If you want to avoid the blocking, then don't hold open the transaction.

Do this

S1: select * from foo
S2: alter table....   run in its own transaction
S1: select * from foo

Doesn't block, no errors. Which is exactly what most people do on
their production servers. The ALTER TABLE statements we're talking
about are not schema changes. They don't need to be coordinated with
other DDL.

This patch has locking, but its the most reduced form of locking that
is available for a non invasive patch for 9.1

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

24 June 2011, 18:08:38

On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Test case please. I don't understand the problem you're describing.
>>
>> S1: select * from foo;
>> S2: begin;
>> S2: alter table foo alter column a set storage plain;
>> S1: select * from foo;
>> <blocks>
>
> Er,,.yes, that what locks do. Where is the bug?

I didn't say it was a bug.  I said that the additional locking you
added acted to remove the much of the benefit of reducing the lock
strength in the first place.  If a brief exclusive lock (such as would
be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't
acceptable, a brief exclusive lock on a different lock tag that blocks
most of the same operations isn't going to be any better.  You might
still see some improvement in the cases where some complicated
operation like scanning or rewriting the table can be performed
without holding the relation definition lock, and then only get the
relation definition lock afterward.  But for something like the above
example (or the ALTER TABLE SET (n_distinct) feature you mentioned in
your previous email) the benefit is going to be darned close to zero.

> This patch has locking, but its the most reduced form of locking that
> is available for a non invasive patch for 9.1

I don't like the extra lock manager traffic this adds to operations
that aren't even performing DDL, I'm not convinced that it is safe,
the additional locking negates a significant portion of the benefit of
the original patch, and Tom's made a fairly convincing argument that
even with this pg_dump will still become significantly less reliable
than heretofore even if we did apply it.  In my view, what we need to
do is revert to using AccessExclusiveLock until some of these other
details are worked out.  I wasn't entirely convinced that that was
necessary when Tom first posted this email, but having thought through
the issues and looked over your patch, it's now my position that that
is the best way forward.  The fact that your proposed patch appears
not to be thinking very clearly about when locks need to be taken and
released only adds to my feeling that we are in for a bad time if we
go down this route.

In short: -1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

24 June 2011, 18:28:35

On Fri, Jun 24, 2011 at 10:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> Test case please. I don't understand the problem you're describing.
>>>
>>> S1: select * from foo;
>>> S2: begin;
>>> S2: alter table foo alter column a set storage plain;
>>> S1: select * from foo;
>>> <blocks>
>>
>> Er,,.yes, that what locks do. Where is the bug?
>
> I didn't say it was a bug.  I said that the additional locking you
> added acted to remove the much of the benefit of reducing the lock
> strength in the first place.  If a brief exclusive lock (such as would
> be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't
> acceptable, a brief exclusive lock on a different lock tag that blocks
> most of the same operations isn't going to be any better.  You might
> still see some improvement in the cases where some complicated
> operation like scanning or rewriting the table can be performed
> without holding the relation definition lock, and then only get the
> relation definition lock afterward.  But for something like the above
> example (or the ALTER TABLE SET (n_distinct) feature you mentioned in
> your previous email) the benefit is going to be darned close to zero.
>
>> This patch has locking, but its the most reduced form of locking that
>> is available for a non invasive patch for 9.1
>
> I don't like the extra lock manager traffic this adds to operations
> that aren't even performing DDL, I'm not convinced that it is safe,
> the additional locking negates a significant portion of the benefit of
> the original patch, and Tom's made a fairly convincing argument that
> even with this pg_dump will still become significantly less reliable
> than heretofore even if we did apply it.  In my view, what we need to
> do is revert to using AccessExclusiveLock until some of these other
> details are worked out.  I wasn't entirely convinced that that was
> necessary when Tom first posted this email, but having thought through
> the issues and looked over your patch, it's now my position that that
> is the best way forward.  The fact that your proposed patch appears
> not to be thinking very clearly about when locks need to be taken and
> released only adds to my feeling that we are in for a bad time if we
> go down this route.
>
> In short: -1 from me.

I've explained all of the above points to you already and you're wrong.

I don't think you understand this patch at all, nor even the original feature.

Locking the table is completely different from locking the definition
of a table.

Do you advocate that all ALTER TABLE operations use
AccessExclusiveLock, or just the operations I added? If you see
problems here, then you must also be willing to increase the lock
strength of things such as INHERITS, and back patch them to where the
same problems exist. You'll wriggle out of that, I'm sure. There are
regrettably, many bugs here and they can't be fixed in the simple
manner you propose.

It's not me you block Robert, I'm not actually a user and I will sleep
well whatever happens, happy that I tried to resolve this. Users watch
and remember.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

24 June 2011, 18:43:54

On Fri, Jun 24, 2011 at 5:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I've explained all of the above points to you already and you're wrong.

We're going to have to agree to disagree on that point.

> Do you advocate that all ALTER TABLE operations use
> AccessExclusiveLock, or just the operations I added? If you see
> problems here, then you must also be willing to increase the lock
> strength of things such as INHERITS, and back patch them to where the
> same problems exist. You'll wriggle out of that, I'm sure. There are
> regrettably, many bugs here and they can't be fixed in the simple
> manner you propose.

I think there is quite a lot of difference between realizing that we
can't fix every problem, and deciding to put out a release that adds a
whole lot more of them that we have no plans to fix.

> It's not me you block Robert, I'm not actually a user and I will sleep
> well whatever happens, happy that I tried to resolve this. Users watch
> and remember.

If you are proposing that I should worry about a posse of angry
PostgreSQL users hunting me down (or abandoning the product) because I
agreed with Tom Lane on the necessity of reverting one of your
patches, then I'm willing to take that chance.  For one thing, there's
a pretty good chance they'll go after Tom first.  For two things,
there's at least an outside chance I might be rescued by an
alternative posse who supports our tradition of putting out high
quality releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

26 June 2011, 13:16:38

On Fri, Jun 24, 2011 at 10:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:

>> It's not me you block Robert, I'm not actually a user and I will sleep
>> well whatever happens, happy that I tried to resolve this. Users watch
>> and remember.
>
> If you are proposing that I should worry about a posse of angry
> PostgreSQL users hunting me down (or abandoning the product) because I
> agreed with Tom Lane on the necessity of reverting one of your
> patches, then I'm willing to take that chance.  For one thing, there's
> a pretty good chance they'll go after Tom first.  For two things,
> there's at least an outside chance I might be rescued by an
> alternative posse who supports our tradition of putting out high
> quality releases.

Not sure where anger and violence entered the discussion, but a posse
does sound amusing.

If such groups ever met, I'm sure both would support the tradition of
high quality, but I suspect they might define it differently and that
would be the source of the problem. Much like Life of Brian.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

27 June 2011, 05:12:06

On Fri, Jun 17, 2011 at 8:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Department of second thoughts: I think I see a problem.
>
> Um, yeah, so that doesn't really work any better than my idea.
>
> On further reflection, there's a problem at a higher level than this
> anyway.  Even if we can get a single SnapshotNow scan to produce
> guaranteed-self-consistent results, that doesn't ensure consistency
> between the results of scans occurring serially.  An example here is
> ALTER COLUMN DROP DEFAULT, which is currently imagined to impact only
> writers.  However, suppose that a concurrent relcache load fetches the
> pg_attribute row, notes that it has atthasdef = true, and then the ALTER
> commits before we start to scan pg_attrdef.  The consistency checks in
> AttrDefaultFetch() will complain about a missing pg_attrdef entry, and
> rightly so.  We could lobotomize those checks, but it doesn't feel right
> to do so; and anyway there may be other cases that are harder to kluge up.

Locking the whole definition is at least one way of solving this
problem. My locking fix does that.

> So really we need consistency across *at least* one entire relcache load
> cycle.  We could maybe arrange to take an MVCC snap (or some lighter
> weight version of that) at the start, and use that for all the resulting
> scans, but I think that would be notationally messy.  It's not clear
> that it'd solve everything anyhow.  There are parts of a relcache entry
> that we fetch only on-demand, so they are typically loaded later than
> the core items, and probably couldn't use the same snapshot.  Worse,
> there are lots of places where we assume that use of catcache entries or
> direct examination of the catalogs will yield results consistent with
> the relcache.
>
> I suspect these latter problems will impact Simon's idea as well.

I think you're probably right, or at least, the suspicion is not
something I can address quickly enough to be safe.

I will revert to the AccessExclusiveLocks.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

27 June 2011, 05:47:07

On Fri, Jun 17, 2011 at 6:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I believe that this is fundamentally unavoidable so long as we use
>>> SnapshotNow to read catalogs --- which is something we've talked about
>>> changing, but it will require a pretty major R&D effort to make it
>>> happen.
>
>> Ouch.
>
>> I wonder if we could avoid this anomaly by taking a throwaway MVCC
>> snapshot at the beginning of each system catalog scan and using it
>> just for the duration of that scan.  If nothing that has touched the
>> catalog commits while the scan is open, then this is logically
>> equivalent to SnapshotNow.  If something does commit in mid-scan, then
>> we might not get the latest version of the row, but we should end up
>> with exactly one.  If it's not the latest one, we'll do the rebuild
>> again upon seeing the next sinval message; in the meantime, the
>> version we're using mustn't be too intolerably bad or it was an error
>> not to use AccessExclusiveLock in the first place.
>
> Yeah, this seems like a possibly workable direction to explore.  I like
> this better than what Simon is proposing, because it would fix the
> generic issue for all types of catalog SnapshotNow scans.
>
>> IIUC, the problem with this approach is not correctness but
>> performance.  Taking snapshots is (currently) expensive.
>
> Yeah.  After mulling it for awhile, what about this idea: we could
> redefine SnapshotNow as a snapshot type that includes a list of
> transactions-in-progress, somewhat like an MVCC snapshot, but we don't
> fill that list from the PGPROC array.  Instead, while running a scan
> with SnapshotNow, anytime we determine that a particular XID is
> still-in-progress, we add that XID to the snapshot's list.
> Subsequently, the SnapshotNow code assumes that XID to be
> still-in-progress without consulting its actual state.  We reset the XID
> list to empty when starting a new SnapshotNow scan.  (We might be able
> to do so less often than that, like only when we do
> AcceptInvalidationMessages, but it's not clear to me that there's any
> real benefit in hanging onto the state longer.)
>
> This costs no performance; if anything it should be faster than now,
> because we'll be replacing expensive transaction state probes with
> relatively-cheap searches of an XID array that should almost always
> be quite short.
>
> With this approach, we would have no serialization anomalies from single
> transactions committing while a scan is in progress.  There could be
> anomalies resulting from considering an earlier XID to be in-progress
> while a later XID is considered committed (because we didn't observe
> it until later).  So far as I can see offhand, the impact of that would
> be that there might be multiple versions of a tuple that are considered
> good, but never that there would be no version considered good (so long
> as the other XIDs simply updated the tuple and didn't delete it).  I
> think this would be all right, since the scan would just seize on the
> first good version it finds.  As you argue above, if that's not good
> enough for our purposes then the updater(s) should have taken a stronger
> lock.

I liked this idea, so began to prototype the code. My rough hack is
attached, for the record.

One thing that occurs to me about this is that SnapshotNow with or
without these changes returns the latest committed row and ignores
in-progress changes.

Accepting an older version of the definition will always be
potentially dangerous. I can't see a way of doing this that doesn't
require locking - for changes such as new constraints we need to wait
until in progress changes are complete.

So maybe this idea is worth doing, but I don't think it helps us much
reduce lock levels for DDL.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

snapshotnow_consistent.v1.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Bruce Momjian

Date:

29 November 2011, 12:50:47

Simon Riggs wrote:
> > This costs no performance; if anything it should be faster than now,
> > because we'll be replacing expensive transaction state probes with
> > relatively-cheap searches of an XID array that should almost always
> > be quite short.
> >
> > With this approach, we would have no serialization anomalies from single
> > transactions committing while a scan is in progress. ?There could be
> > anomalies resulting from considering an earlier XID to be in-progress
> > while a later XID is considered committed (because we didn't observe
> > it until later). ?So far as I can see offhand, the impact of that would
> > be that there might be multiple versions of a tuple that are considered
> > good, but never that there would be no version considered good (so long
> > as the other XIDs simply updated the tuple and didn't delete it). ?I
> > think this would be all right, since the scan would just seize on the
> > first good version it finds. ?As you argue above, if that's not good
> > enough for our purposes then the updater(s) should have taken a stronger
> > lock.
> 
> I liked this idea, so began to prototype the code. My rough hack is
> attached, for the record.
> 
> One thing that occurs to me about this is that SnapshotNow with or
> without these changes returns the latest committed row and ignores
> in-progress changes.
> 
> Accepting an older version of the definition will always be
> potentially dangerous. I can't see a way of doing this that doesn't
> require locking - for changes such as new constraints we need to wait
> until in progress changes are complete.
> 
> So maybe this idea is worth doing, but I don't think it helps us much
> reduce lock levels for DDL.

Just confirming we decided not to persue this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

30 November 2011, 00:42:28

On Tue, Nov 29, 2011 at 11:50 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Just confirming we decided not to persue this.

Doesn't sound like it.

I've been thinking a lot about the more general problem here - namely,
that allowing catalog changes without an access exclusive lock is
unsafe - and trying to come up with a good solution, because I'd
really like to see us make this work.  It strikes me that there are
really two separate problems here.

1. If you are scanning a system catalog using SnapshotNow, and a
commit happens at just the right moment, you might see two versions of
the row or zero rather than one.  You'll see two if you scan the old
row version, then the concurrent transaction commits, and then you
scan the new one.  You'll see zero if you scan the new row (ignoring
it, since it isn't committed yet) and then the concurrent transaction
commits, and then you scan the old row.  On a related note, if you
scan several interrelated catalogs (e.g. when building a relcache
entry) and a transaction that has made matching modifications to
multiple catalogs commits midway through, you might see the old
version of the data from one table and the new version of the data
from some other table.  Using AccessExclusiveLock fixes the problem
because we guarantee that anybody who wants to read those rows first
takes some kind of lock, which they won't be able to do while the
updater holds AccessExclusiveLock.

2. Other backends may have data in the relcache or catcaches which
won't get invalidated until they do AcceptInvalidationMessages().
That will always happen no later than the next time they lock the
relation, so if you are holding AccessExclusiveLock then you should be
OK: no one else holds any lock, and they'll have to go get one before
doing anything interesting.  But if you are holding any weaker lock,
there's nothing to force AcceptInvalidationMessages to happen before
you reuse those cache entries.

In both cases, name lookups are an exception: we don't know what OID
to lock until we've done the name lookup.

Random ideas for solving the first problem:

1-A. Take a fresh MVCC snapshot for each catalog scan.  I think we've
rejected this before because of the cost involved, but maybe with
Pavan's patch and some of the other improvements that are on the way
we could make this cheap enough to be acceptable.
1-B. Don't allow transactions that have modified system catalog X to
commit while we're scanning system catalog X; make the scans take some
kind of shared lock and commits an exclusive lock.  This seems awfully
bad for concurrency, though, not to mention the overhead of taking and
releasing all those locks even when nothing conflicts.
1-C. Wrap a retry loop around every place where we scan a system
catalog.  If a transaction that has written rows in catalog X commits
while we're scanning catalog X, discard the results of the first scan
and declare a do-over (or maybe something more coarse-grained, or at
the other extreme even more fine-grained).  If we're doing several
interrelated scans then the retry loop must surround the entire unit,
and enforce a do-over of the whole operation if commits occur in any
of the catalogs before the scan completes.

Unfortunately, any of these seem likely to require a Lot of Work,
probably more than can be done in time for 9.2.  However, perhaps it
would be feasible to hone in on a limited subset of the cases (e.g.
name lookups, which are mainly done in only a handful of places, and
represent an existing bug) and implement the necessary code changes
just for those cases.  Then we could generalize that pattern to other
cases as time and energy permit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

16 December 2011, 08:15:39

On Wed, Nov 30, 2011 at 4:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> It strikes me that there are really two separate problems here.
>
> 1. If you are scanning a system catalog using SnapshotNow, and a
> commit happens at just the right moment, you might see two versions of
> the row or zero rather than one.  You'll see two if you scan the old
> row version, then the concurrent transaction commits, and then you
> scan the new one.  You'll see zero if you scan the new row (ignoring
> it, since it isn't committed yet) and then the concurrent transaction
> commits, and then you scan the old row.

That is a bug and one we should fix. I supplied a patch for that,
written to Tom's idea for how to solve it.

I will apply that, unless there are objections.

> 2. Other backends may have data in the relcache or catcaches which
> won't get invalidated until they do AcceptInvalidationMessages().
> That will always happen no later than the next time they lock the
> relation, so if you are holding AccessExclusiveLock then you should be
> OK: no one else holds any lock, and they'll have to go get one before
> doing anything interesting.  But if you are holding any weaker lock,
> there's nothing to force AcceptInvalidationMessages to happen before
> you reuse those cache entries.

Yes, inconsistent cache entries will occur if we allow catalog updates
while the table is already locked by others.

The question is whether that is a problem in all cases.

With these ALTER TABLE subcommands, I don't see any problem with
different backends seeing different values.
            /*             * These subcommands affect general strategies for performance             * and maintenance,
thoughdon't change the semantic results             * from normal data reads and writes. Delaying an ALTER TABLE
    * behind currently active writes only delays the point where             * the new strategy begins to take effect,
sothere is no             * benefit in waiting. In this case the minimum restriction             * applies: we don't
currentlyallow concurrent catalog             * updates.             */        case AT_SetStatistics: 
// only used by ANALYZE, which is shut out by the ShareUpdateExclusiveLock
        case AT_ClusterOn:        case AT_DropCluster:
// only used by CLUSTER, which is shut out because it needs AccessExclusiveLock
        case AT_SetRelOptions:        case AT_ResetRelOptions:        case AT_SetOptions:        case AT_ResetOptions:
     case AT_SetStorage: 
// not critical
        case AT_ValidateConstraint:
// not a problem if some people think its invalid when it is valid

So ISTM that we can allow reduced lock levels for the above commands
if we fix SnapshotNow.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

16 December 2011, 09:39:18

On Fri, Dec 16, 2011 at 7:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> That is a bug and one we should fix. I supplied a patch for that,
> written to Tom's idea for how to solve it.
>
> I will apply that, unless there are objections.

I remember several attempts at that, but I don't remember any that
didn't meet with objections.  Do you have a link?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

16 December 2011, 09:55:04

On Fri, Dec 16, 2011 at 1:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 16, 2011 at 7:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> That is a bug and one we should fix. I supplied a patch for that,
>> written to Tom's idea for how to solve it.
>>
>> I will apply that, unless there are objections.
>
> I remember several attempts at that, but I don't remember any that
> didn't meet with objections.  Do you have a link?

My patch to implement SnapshotNow correctly, from Jun 27 on this
thread was never reviewed or commented upon by anybody. That was
probably because it only fixes one of the problems, not all of them.

But it does fix a current bug and that's why I'm asking now if there
are objections to committing it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

16 December 2011, 23:22:36

On Fri, Dec 16, 2011 at 8:54 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Dec 16, 2011 at 1:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Dec 16, 2011 at 7:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> That is a bug and one we should fix. I supplied a patch for that,
>>> written to Tom's idea for how to solve it.
>>>
>>> I will apply that, unless there are objections.
>>
>> I remember several attempts at that, but I don't remember any that
>> didn't meet with objections.  Do you have a link?
>
> My patch to implement SnapshotNow correctly, from Jun 27 on this
> thread was never reviewed or commented upon by anybody. That was
> probably because it only fixes one of the problems, not all of them.

Well, I think it was mostly because you didn't sound terribly
optimistic about the approach: "So maybe this idea is worth doing, but
I don't think it helps us much reduce lock levels for DDL."  And also
because you described the patch as a "rough hack", and not something
you thought ready to commit.

I am also not entirely sure I believe that this is plugging all the
failure cases.  I think that it may just be making the failure cases
more obscure, rather than really getting rid of them.  Consider
something like the following:

T1: Update row version A, creating new row version B.
T2: Begin scanning the catalog in question.  We happen to encounter
row version B first.  We remember T1's XID as in progress, but don't
see the row since T1 hasn't committed.
T1: Rollback.
T3: Update row version A, creating new row version C.
T3: Commit.
T2: Scan now comes to row version A; we don't see that version either,
since T3 is committed.

I don't think there's any guarantee that T2's scan will see tuples
inserted after the start of the scan.  If I'm correct about that, and
I'm pretty sure it's true for sequential scans anyway, then T2's scan
might end without seeing C either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

17 December 2011, 07:12:23

On Sat, Dec 17, 2011 at 3:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> I am also not entirely sure I believe that this is plugging all the
> failure cases.  I think that it may just be making the failure cases
> more obscure, rather than really getting rid of them.  Consider
> something like the following:
>
> T1: Update row version A, creating new row version B.
> T2: Begin scanning the catalog in question.  We happen to encounter
> row version B first.  We remember T1's XID as in progress, but don't
> see the row since T1 hasn't committed.
> T1: Rollback.
> T3: Update row version A, creating new row version C.
> T3: Commit.
> T2: Scan now comes to row version A; we don't see that version either,
> since T3 is committed.
>
> I don't think there's any guarantee that T2's scan will see tuples
> inserted after the start of the scan.  If I'm correct about that, and
> I'm pretty sure it's true for sequential scans anyway, then T2's scan
> might end without seeing C either.

A simpler example shows it better. Tom's idea prevents 2 rows being
returned, but doesn't prevent zero rows.

That's a useful improvement, but not the only thing.

ISTM we can run a scan as we do now, without locking. If scan returns
zero rows we scan again, but take a definition lock first. ALTER TABLE
holds the definition lock while running, which won't be a problem
because we would only use that on shorter AT statements.

Definition lock being the type of lock described earlier on this
thread, that we already have code for. So we have code for all the
parts we need to make this work.

So that means we'd be able to plug the gaps noted, without reducing
performance on the common code paths.

I don't see any objections made so far remain valid with that approach.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

19 December 2011, 13:05:37

On Sat, Dec 17, 2011 at 6:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> A simpler example shows it better. Tom's idea prevents 2 rows being
> returned, but doesn't prevent zero rows.

I don't think it prevents either one.  Suppose we read and return a
tuple, and then someone HOT-updates it and commits.  SnapshotNow is
very happy to also return the updated version of that same tuple on
the next iteration of the scan.  See commit
c0f03aae0469e758964faac0fb741685170c39a5.

> That's a useful improvement, but not the only thing.
>
> ISTM we can run a scan as we do now, without locking. If scan returns
> zero rows we scan again, but take a definition lock first. ALTER TABLE
> holds the definition lock while running, which won't be a problem
> because we would only use that on shorter AT statements.
>
> Definition lock being the type of lock described earlier on this
> thread, that we already have code for. So we have code for all the
> parts we need to make this work.
>
> So that means we'd be able to plug the gaps noted, without reducing
> performance on the common code paths.
>
> I don't see any objections made so far remain valid with that approach.

I think a retry loop is a possibly useful tool for solving this
problem, but I'm very skeptical about the definition locks, unless we
can arrange to have them taken just before commit (regardless of when
during the transaction ALTER TABLE was executed) and released
immediately afterwards.  What I think might be even better is
something along the lines of what Noah Misch did RangeVarGetRelid --
don't try to lock out concurrent activity, just detect it and redo the
operation if anything bad happens while we're in process.  In this
case, that would mean having a mechanism to know whether any
concurrent transaction had updated the catalog we're scanning during
the scan.

Yet another option, which I wonder whether we're dismissing too
lightly, is to just call GetSnapshotData() and do the scan using a
plain old MVCC snapshot.  Sure, it's more expensive than SnapshotNow,
but is it expensive enough to worry about?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

19 December 2011, 23:10:29

On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote:
> Yet another option, which I wonder whether we're dismissing too
> lightly, is to just call GetSnapshotData() and do the scan using a
> plain old MVCC snapshot.  Sure, it's more expensive than SnapshotNow,
> but is it expensive enough to worry about?

Good point.  For the most part, we already regard a catalog scan as too
expensive for bulk use, hence relcache and catcache.  That's not license to
slow them down recklessly, but it's worth discovering how much of a hit we'd
actually face.  I created a function that does this in a loop:
    HeapTuple t;
    CatalogCacheFlushCatalog(ProcedureRelationId);    t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */);
 if (!HeapTupleIsValid(t))        elog(ERROR, "cache lookup failed for function 42");    ReleaseSysCache(t);

Then, I had pgbench call the function once per client with various numbers of
clients and a loop iteration count such that the total number of scans per run
was always 19200000.  Results for master and for a copy patched to use MVCC
snapshots in catcache.c only:
2 clients, master: 4:30.66elapsed4 clients, master: 4:26.82elapsed
32 clients, master: 4:25.30elapsed2 clients, master: 4:25.67elapsed4 clients, master: 4:26.58elapsed
32 clients, master: 4:26.40elapsed2 clients, master: 4:27.54elapsed4 clients, master: 4:26.60elapsed
32 clients, master: 4:27.20elapsed2 clients, mvcc-catcache: 4:35.13elapsed4 clients, mvcc-catcache: 4:30.40elapsed
32 clients, mvcc-catcache: 4:37.91elapsed2 clients, mvcc-catcache: 4:28.13elapsed4 clients, mvcc-catcache:
4:27.06elapsed
32 clients, mvcc-catcache: 4:32.84elapsed2 clients, mvcc-catcache: 4:32.47elapsed4 clients, mvcc-catcache:
4:24.35elapsed
32 clients, mvcc-catcache: 4:31.54elapsed

I see roughly a 2% performance regression.  However, I'd expect any bulk
losses to come from increased LWLock contention, which just doesn't
materialize in a big way on this 2-core box.  If anyone would like to rerun
this on a larger machine, I can package it up for reuse.

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

20 December 2011, 00:14:21

Noah Misch <noah@leadboat.com> writes:
> On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote:
>> Yet another option, which I wonder whether we're dismissing too
>> lightly, is to just call GetSnapshotData() and do the scan using a
>> plain old MVCC snapshot.  Sure, it's more expensive than SnapshotNow,
>> but is it expensive enough to worry about?

That might actually be workable ...

> I created a function that does this in a loop:

>         HeapTuple t;

>         CatalogCacheFlushCatalog(ProcedureRelationId);
>         t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */);
>         if (!HeapTupleIsValid(t))
>             elog(ERROR, "cache lookup failed for function 42");
>         ReleaseSysCache(t);

... but this performance test seems to me to be entirely misguided,
because it's testing a situation that isn't going to occur much in the
field, precisely because the syscache should prevent constant reloads of
the same syscache entry.

Poking around a bit, it looks to me like one of the bigger users of
non-cache-fronted SnapshotNow scans is dependency.c.  So maybe testing
the speed of large cascaded drops would be a more relevant test case.
For instance, create a schema containing a few thousand functions, and
see how long it takes to drop the schema.

Another thing that would be useful to know is what effect such a change
would have on the time to run the regression tests with
CLOBBER_CACHE_ALWAYS.  That has nothing to do with any real-world
performance concerns, but it would be good to see if we're going to
cause a problem for the long-suffering buildfarm member that does that.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

20 December 2011, 12:23:22

On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > I created a function that does this in a loop:
> 
> >         HeapTuple t;
> 
> >         CatalogCacheFlushCatalog(ProcedureRelationId);
> >         t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */);
> >         if (!HeapTupleIsValid(t))
> >             elog(ERROR, "cache lookup failed for function 42");
> >         ReleaseSysCache(t);
> 
> ... but this performance test seems to me to be entirely misguided,
> because it's testing a situation that isn't going to occur much in the
> field, precisely because the syscache should prevent constant reloads of
> the same syscache entry.

> [ideas for more-realistic tests]

Granted, but I don't hope to reliably measure a change in a macro-benchmark
after seeing a rickety 2% change in a micro-benchmark.

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

20 December 2011, 12:35:24

Noah Misch <noah@leadboat.com> writes:
> On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote:
>> ... but this performance test seems to me to be entirely misguided,
>> because it's testing a situation that isn't going to occur much in the
>> field, precisely because the syscache should prevent constant reloads of
>> the same syscache entry.

>> [ideas for more-realistic tests]

> Granted, but I don't hope to reliably measure a change in a macro-benchmark
> after seeing a rickety 2% change in a micro-benchmark.

No, I'm not sure about that at all.  In particular I think that
CatalogCacheFlushCatalog is pretty expensive and so the snapshot costs
could be a larger part of a more-realistic test.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 13:09:41

On Tue, Dec 20, 2011 at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Noah Misch <noah@leadboat.com> writes:
>> On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote:
>>> ... but this performance test seems to me to be entirely misguided,
>>> because it's testing a situation that isn't going to occur much in the
>>> field, precisely because the syscache should prevent constant reloads of
>>> the same syscache entry.
>
>>> [ideas for more-realistic tests]
>
>> Granted, but I don't hope to reliably measure a change in a macro-benchmark
>> after seeing a rickety 2% change in a micro-benchmark.
>
> No, I'm not sure about that at all.  In particular I think that
> CatalogCacheFlushCatalog is pretty expensive and so the snapshot costs
> could be a larger part of a more-realistic test.

Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
the start of each scan iff SnapshotNow is passed as the scan's
snapshot. It's fairly brief but seems to do the trick.

Assuming that is the right approach, some timings

10,000 functions individually
Create 15.7s 14.3s 14.8s 14.8s
Drop    11.9s 11.7 11.6s 12.0s

10,000 functions in a schema
Create   ... same ...
Drop Cascade    2.5s

That seems OK to me.

Any feedback?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

snapshotnow_consistent.v4.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

02 January 2012, 13:17:53

Simon Riggs <simon@2ndQuadrant.com> writes:
> Assuming that is the right approach, some timings

Um ... timings of what?
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 13:50:35

On Mon, Jan 2, 2012 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Assuming that is the right approach, some timings
>
> Um ... timings of what?

Apologies for being terse, no problem to give a full explanation.

You suggested earlier that putting calls to GetSnapshotData() in place
of using SnapshotNow might increase the time taken to do catalog
scans. That matters most in places where stuff isn't cached, one such
place you suggested might be the dependency check code, so a test of
dropping thousands of functions might show up any bad regressions.

From what I can see, there are no bad performance regressions from
making SnapshotNow use MVCC.

We're benefiting in 9.2 from good reductions in GetSnapshotData()
contention and improved performance anyway, so it looks like a
reasonable balance.

I'm proposing that we apply the above patch (v4) to allow SnapshotNow
scans to return consistent, trustable results. That fixes some of the
corner case bugs we've seen and paves the way for me to re-enable the
lock reduction code.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

02 January 2012, 14:06:48

On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
> the start of each scan iff SnapshotNow is passed as the scan's
> snapshot. It's fairly brief but seems to do the trick.

That's a neat trick.  However, if you start a new SnapshotNow scan while one is
ongoing, the primordial scan's snapshot will change mid-stream.

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 14:48:53

On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
>> the start of each scan iff SnapshotNow is passed as the scan's
>> snapshot. It's fairly brief but seems to do the trick.
>
> That's a neat trick.  However, if you start a new SnapshotNow scan while one is
> ongoing, the primordial scan's snapshot will change mid-stream.

Do we ever do that? (and if so, Why?!? or perhaps just Where?)

We can use more complex code if required, but we'll be adding
complexity and code into the main path that I'd like to avoid.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

02 January 2012, 15:13:21

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote:
>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>>> Attached patch makes SnapshotNow into an MVCC snapshot,

>> That's a neat trick. �However, if you start a new SnapshotNow scan while one is
>> ongoing, the primordial scan's snapshot will change mid-stream.

> Do we ever do that?

Almost certainly yes.  For example, a catcache load may invoke catcache
or relcache reload operations on its way to opening the table or index
needed to fetch the desired row.

I think you can only safely do this if each caller has its own snapshot
variable, a la SnapshotDirty, and that's going to be hugely more
invasive.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

02 January 2012, 15:15:04

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Mon, Jan 2, 2012 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Um ... timings of what?

> Apologies for being terse, no problem to give a full explanation.

But you still didn't.  I wanted to know what those numbers were and how
they show that there's not a performance regression.  Presumably you
meant that some were "before" and some "after", but they were not so
labeled.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

02 January 2012, 15:25:46

On Mon, Jan 02, 2012 at 06:41:31PM +0000, Simon Riggs wrote:
> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
> >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
> >> the start of each scan iff SnapshotNow is passed as the scan's
> >> snapshot. It's fairly brief but seems to do the trick.
> >
> > That's a neat trick. ?However, if you start a new SnapshotNow scan while one is
> > ongoing, the primordial scan's snapshot will change mid-stream.
>
> Do we ever do that? (and if so, Why?!? or perhaps just Where?)

I hacked up your patch a bit, as attached, to emit a WARNING for any nested
use of SnapshotNow.  This made 97/127 test files fail.  As one example,
RelationBuildRuleLock() does TextDatumGetCString() for every tuple of its
SnapshotNow scan.  That may need a detoast, which itself runs a scan.

> We can use more complex code if required, but we'll be adding
> complexity and code into the main path that I'd like to avoid.

Agreed.

Attachment

find-overlap-snapshotnow.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

02 January 2012, 15:35:38

Excerpts from Noah Misch's message of lun ene 02 16:25:25 -0300 2012:
> On Mon, Jan 02, 2012 at 06:41:31PM +0000, Simon Riggs wrote:
> > On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote:
> > > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
> > >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
> > >> the start of each scan iff SnapshotNow is passed as the scan's
> > >> snapshot. It's fairly brief but seems to do the trick.
> > >
> > > That's a neat trick. ?However, if you start a new SnapshotNow scan while one is
> > > ongoing, the primordial scan's snapshot will change mid-stream.
> >
> > Do we ever do that? (and if so, Why?!? or perhaps just Where?)
>
> I hacked up your patch a bit, as attached, to emit a WARNING for any nested
> use of SnapshotNow.  This made 97/127 test files fail.  As one example,
> RelationBuildRuleLock() does TextDatumGetCString() for every tuple of its
> SnapshotNow scan.  That may need a detoast, which itself runs a scan.

Uh, I thought detoasting had its own visibility test function .. I mean,
otherwise, what is HeapTupleSatisfiesToast for?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

02 January 2012, 15:39:31

On Mon, Jan 02, 2012 at 04:33:28PM -0300, Alvaro Herrera wrote:
> 
> Excerpts from Noah Misch's message of lun ene 02 16:25:25 -0300 2012:
> > On Mon, Jan 02, 2012 at 06:41:31PM +0000, Simon Riggs wrote:
> > > On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote:
> > > > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
> > > >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
> > > >> the start of each scan iff SnapshotNow is passed as the scan's
> > > >> snapshot. It's fairly brief but seems to do the trick.
> > > >
> > > > That's a neat trick. ?However, if you start a new SnapshotNow scan while one is
> > > > ongoing, the primordial scan's snapshot will change mid-stream.
> > > 
> > > Do we ever do that? (and if so, Why?!? or perhaps just Where?)
> > 
> > I hacked up your patch a bit, as attached, to emit a WARNING for any nested
> > use of SnapshotNow.  This made 97/127 test files fail.  As one example,
> > RelationBuildRuleLock() does TextDatumGetCString() for every tuple of its
> > SnapshotNow scan.  That may need a detoast, which itself runs a scan.
> 
> Uh, I thought detoasting had its own visibility test function .. I mean,
> otherwise, what is HeapTupleSatisfiesToast for?

The SnapshotNow scan was actually to build the relcache entry for the toast
table before scanning the toast table itself.  Stack trace:

#0  InitSnapshotNowIfNeeded (snap=0xb7e8c0) at snapmgr.c:216
#1  0x000000000048f773 in index_beginscan_internal (indexRelation=0x7f9a091e5cc8, nkeys=2, norderbys=0,
snapshot=0xb7e8c0)at indexam.c:295
 
#2  0x000000000048f7e9 in index_beginscan (heapRelation=0x7f9a091a6c60, indexRelation=0xb7e8c0, snapshot=0xb7e8c0,
nkeys=152984776,norderbys=0) at indexam.c:238
 
#3  0x000000000048e1de in systable_beginscan (heapRelation=0x7f9a091a6c60, indexId=<value optimized out>,
indexOK=<valueoptimized out>, snapshot=0xb7e8c0, nkeys=2, key=0x7fffe211b610) at genam.c:289
 
#4  0x00000000007287db in RelationBuildTupleDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at
relcache.c:455
#5  RelationBuildDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:880
#6  0x000000000072a050 in RelationIdGetRelation (relationId=2838) at relcache.c:1567
#7  0x00000000004872c0 in relation_open (relationId=2838, lockmode=1) at heapam.c:910
#8  0x0000000000487328 in heap_open (relationId=12052672, lockmode=152984776) at heapam.c:1052
#9  0x000000000048a737 in toast_fetch_datum (attr=<value optimized out>) at tuptoaster.c:1586
#10 0x000000000048bf74 in heap_tuple_untoast_attr (attr=0xb7e8c0) at tuptoaster.c:135
#11 0x0000000000737d77 in pg_detoast_datum_packed (datum=0xb7e8c0) at fmgr.c:2266
#12 0x00000000006f0fb0 in text_to_cstring (t=0xb7e8c0) at varlena.c:135
#13 0x0000000000727083 in RelationBuildRuleLock (relation=0x7f9a091b3818) at relcache.c:673
#14 0x000000000072909e in RelationBuildDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:886
#15 0x000000000072a050 in RelationIdGetRelation (relationId=11119) at relcache.c:1567
#16 0x00000000004872c0 in relation_open (relationId=11119, lockmode=0) at heapam.c:910
#17 0x0000000000487483 in relation_openrv_extended (relation=<value optimized out>, lockmode=<value optimized out>,
missing_ok=<valueoptimized out>) at heapam.c:1011
 
#18 0x000000000048749c in heap_openrv_extended (relation=0xb7e8c0, lockmode=152984776, missing_ok=5 '\005') at
heapam.c:1110
#19 0x000000000053570c in parserOpenTable (pstate=0xc885f8, relation=0xc88258, lockmode=1) at parse_relation.c:835
#20 0x000000000053594d in addRangeTableEntry (pstate=0xc885f8, relation=0xc88258, alias=0x0, inh=1 '\001', inFromCl=1
'\001')at parse_relation.c:901
 
#21 0x0000000000524dbf in transformTableEntry (pstate=0xc885f8, n=0xc88258, top_rte=0x7fffe211bd78,
top_rti=0x7fffe211bd84,relnamespace=0x7fffe211bd70, containedRels=0x7fffe211bd68) at parse_clause.c:445
 
#22 transformFromClauseItem (pstate=0xc885f8, n=0xc88258, top_rte=0x7fffe211bd78, top_rti=0x7fffe211bd84,
relnamespace=0x7fffe211bd70,containedRels=0x7fffe211bd68) at parse_clause.c:683
 
#23 0x0000000000526124 in transformFromClause (pstate=0xc885f8, frmList=<value optimized out>) at parse_clause.c:129
#24 0x000000000050c321 in transformSelectStmt (pstate=0xb7e8c0, parseTree=0xc88340) at analyze.c:896
#25 transformStmt (pstate=0xb7e8c0, parseTree=0xc88340) at analyze.c:186
#26 0x000000000050d583 in parse_analyze (parseTree=0xc88340, sourceText=0xc87828 "table pg_stat_user_tables;",
paramTypes=0x0,numParams=0) at analyze.c:94
 
#27 0x00000000006789c5 in pg_analyze_and_rewrite (parsetree=0xc88340, query_string=0xc87828 "table
pg_stat_user_tables;",paramTypes=0x0, numParams=0) at postgres.c:583
 
#28 0x0000000000678c77 in exec_simple_query (query_string=0xc87828 "table pg_stat_user_tables;") at postgres.c:941
#29 0x0000000000679997 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=0xbe03c0 "nm")
atpostgres.c:3881
 
#30 0x0000000000633d41 in BackendRun () at postmaster.c:3587
#31 BackendStartup () at postmaster.c:3272
#32 ServerLoop () at postmaster.c:1350
#33 0x0000000000635330 in PostmasterMain (argc=1, argv=0xbdc170) at postmaster.c:1110
#34 0x00000000005d206b in main (argc=1, argv=0xbdc170) at main.c:199

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 15:42:18

On Mon, Jan 2, 2012 at 7:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote:
>>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>>>> Attached patch makes SnapshotNow into an MVCC snapshot,
>
>>> That's a neat trick.  However, if you start a new SnapshotNow scan while one is
>>> ongoing, the primordial scan's snapshot will change mid-stream.
>
>> Do we ever do that?
>
> Almost certainly yes.  For example, a catcache load may invoke catcache
> or relcache reload operations on its way to opening the table or index
> needed to fetch the desired row.

Ah, of course. I was thinking they would be rare by design.

> I think you can only safely do this if each caller has its own snapshot
> variable, a la SnapshotDirty, and that's going to be hugely more
> invasive.

OK, will look into it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 15:47:06

On Mon, Jan 2, 2012 at 7:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Mon, Jan 2, 2012 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Um ... timings of what?
>
>> Apologies for being terse, no problem to give a full explanation.
>
> But you still didn't.  I wanted to know what those numbers were and how
> they show that there's not a performance regression.  Presumably you
> meant that some were "before" and some "after", but they were not so
> labeled.

All timings were "after" applying the patch. Since all of the tests
had very acceptable absolute values I didn't test without-patch.

Anyway, looks like we need to bin that and retest with new patch when it comes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

02 January 2012, 15:50:16

Excerpts from Noah Misch's message of lun ene 02 16:39:09 -0300 2012:
>
> On Mon, Jan 02, 2012 at 04:33:28PM -0300, Alvaro Herrera wrote:

> > Uh, I thought detoasting had its own visibility test function .. I mean,
> > otherwise, what is HeapTupleSatisfiesToast for?
>
> The SnapshotNow scan was actually to build the relcache entry for the toast
> table before scanning the toast table itself.  Stack trace:

Oh, right, that makes sense.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 15:54:53

On Mon, Jan 2, 2012 at 7:49 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>
> Excerpts from Noah Misch's message of lun ene 02 16:39:09 -0300 2012:
>>
>> On Mon, Jan 02, 2012 at 04:33:28PM -0300, Alvaro Herrera wrote:
>
>> > Uh, I thought detoasting had its own visibility test function .. I mean,
>> > otherwise, what is HeapTupleSatisfiesToast for?
>>
>> The SnapshotNow scan was actually to build the relcache entry for the toast
>> table before scanning the toast table itself.  Stack trace:
>
> Oh, right, that makes sense.

Certainly does. Thanks Noah, much appreciated.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

02 January 2012, 16:20:28

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Mon, Jan 2, 2012 at 7:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But you still didn't. �I wanted to know what those numbers were and how
>> they show that there's not a performance regression. �Presumably you
>> meant that some were "before" and some "after", but they were not so
>> labeled.

> All timings were "after" applying the patch. Since all of the tests
> had very acceptable absolute values I didn't test without-patch.

What is a "very acceptable absolute value", and how do you know it's
acceptable if you don't know what the previous performance was?  This
reasoning makes no sense to me at all.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 17:07:36

On Mon, Jan 2, 2012 at 8:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Mon, Jan 2, 2012 at 7:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> But you still didn't.  I wanted to know what those numbers were and how
>>> they show that there's not a performance regression.  Presumably you
>>> meant that some were "before" and some "after", but they were not so
>>> labeled.
>
>> All timings were "after" applying the patch. Since all of the tests
>> had very acceptable absolute values I didn't test without-patch.
>
> What is a "very acceptable absolute value", and how do you know it's
> acceptable if you don't know what the previous performance was?  This
> reasoning makes no sense to me at all.

I don't need to do things twice before deciding I enjoyed it the first
time. A low value showed that there were no problems, to me.

If you want to see more or discuss, that's cool, no problem. But no
need to beat me up for not guessing correctly the level of rigour that
would be acceptable to you. Now I have the first test result of your
requirements, I will be able to judge further test results against the
required standard.

As I've said, this is all invalid now anyway.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 17:17:52

On Mon, Jan 2, 2012 at 7:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote:
>>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>>>> Attached patch makes SnapshotNow into an MVCC snapshot,
>
>>> That's a neat trick.  However, if you start a new SnapshotNow scan while one is
>>> ongoing, the primordial scan's snapshot will change mid-stream.
>
>> Do we ever do that?
>
> Almost certainly yes.  For example, a catcache load may invoke catcache
> or relcache reload operations on its way to opening the table or index
> needed to fetch the desired row.
>
> I think you can only safely do this if each caller has its own snapshot
> variable, a la SnapshotDirty, and that's going to be hugely more
> invasive.


I think I can avoid doing things that way and keep it non-invasive.

If we  #define  SnapshotNow   (GetSnapshotNow())

and make GetSnapshotNow() reuse the static SnapshotNowData if possible.
If not available we can allocate a new snapshot in TopTransactionContext.

We can free the snapshot at the end of scan.

That hides most of the complexity without causing any problems, ISTM.

Working on this now.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 20:12:44

On Mon, Jan 2, 2012 at 9:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Jan 2, 2012 at 7:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote:
>>>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>>>>> Attached patch makes SnapshotNow into an MVCC snapshot,
>>
>>>> That's a neat trick.  However, if you start a new SnapshotNow scan while one is
>>>> ongoing, the primordial scan's snapshot will change mid-stream.
>>
>>> Do we ever do that?
>>
>> Almost certainly yes.  For example, a catcache load may invoke catcache
>> or relcache reload operations on its way to opening the table or index
>> needed to fetch the desired row.
>>
>> I think you can only safely do this if each caller has its own snapshot
>> variable, a la SnapshotDirty, and that's going to be hugely more
>> invasive.
>
>
> I think I can avoid doing things that way and keep it non-invasive.
>
> If we
>   #define  SnapshotNow   (GetSnapshotNow())
>
> and make GetSnapshotNow() reuse the static SnapshotNowData if possible.
> If not available we can allocate a new snapshot in TopTransactionContext.
>
> We can free the snapshot at the end of scan.
>
> That hides most of the complexity without causing any problems, ISTM.
>
> Working on this now.

Options for implementing this are

1) add FreeSnapshotNowIfNeeded() to every heap_endscan and index_endscan
Small code footprint, touches every scan

2) change all heap_scans that uses SnapshotNow into a systable_scan
and add FreeSnapshotNowIfNeeded() to systable_endscan
35 call points, touches only system table scans

3) code specific calls to create a snapshot for each SnapshotNow call,
similar to InitDirtySnapshot
185 call points, very fine grained control, but probably overkill

(2) seems the right way to do this, with selective use of (3) and has
the side benefit of a reasonable code rationalisation

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 January 2012, 21:24:21

On Mon, Jan 2, 2012 at 6:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote:
>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>>> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
>>> the start of each scan iff SnapshotNow is passed as the scan's
>>> snapshot. It's fairly brief but seems to do the trick.
>>
>> That's a neat trick.  However, if you start a new SnapshotNow scan while one is
>> ongoing, the primordial scan's snapshot will change mid-stream.
>
> Do we ever do that? (and if so, Why?!? or perhaps just Where?)

Just for the record, yes we do run multiple catalog scans in some
parts of the code.

So I can see how we might trigger 4 nested scans, using cache
replacement while scanning, so best assume more, with no guarantee of
them being neatly stacked for pop/push type access.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

03 January 2012, 00:41:03

On Tue, Jan 03, 2012 at 01:18:41AM +0000, Simon Riggs wrote:
> Just for the record, yes we do run multiple catalog scans in some
> parts of the code.
> 
> So I can see how we might trigger 4 nested scans, using cache
> replacement while scanning, so best assume more, with no guarantee of
> them being neatly stacked for pop/push type access.

Yeah, I wouldn't want to commit to a nesting limit.  However, I _would_ have
expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot()
is adequate for a great deal of the backend, after all.  In what sort of
situation do catalog scans not strictly nest?

Thanks,
nm

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

03 January 2012, 06:47:40

On Tue, Jan 3, 2012 at 4:40 AM, Noah Misch <noah@leadboat.com> wrote:
> On Tue, Jan 03, 2012 at 01:18:41AM +0000, Simon Riggs wrote:
>> Just for the record, yes we do run multiple catalog scans in some
>> parts of the code.
>>
>> So I can see how we might trigger 4 nested scans, using cache
>> replacement while scanning, so best assume more, with no guarantee of
>> them being neatly stacked for pop/push type access.
>
> Yeah, I wouldn't want to commit to a nesting limit.

Agreed

> However, I _would_ have
> expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot()
> is adequate for a great deal of the backend, after all.  In what sort of
> situation do catalog scans not strictly nest?

It's possible. Making assumptions about what is possible bit me
before, as you showed.

I've seen code where we are explicitly running two concurrent scans.

I don't want to put unnecessary and subtle assumptions into catalog
scan code so I'm working on the assumption that endscan may not be
called in strict FIFO order. The difference is fairly minor and
doesn't restrict us in other ways.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

03 January 2012, 11:25:12

On Tue, Jan 3, 2012 at 5:47 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jan 3, 2012 at 4:40 AM, Noah Misch <noah@leadboat.com> wrote:
>> On Tue, Jan 03, 2012 at 01:18:41AM +0000, Simon Riggs wrote:
>>> Just for the record, yes we do run multiple catalog scans in some
>>> parts of the code.
>>>
>>> So I can see how we might trigger 4 nested scans, using cache
>>> replacement while scanning, so best assume more, with no guarantee of
>>> them being neatly stacked for pop/push type access.
>>
>> Yeah, I wouldn't want to commit to a nesting limit.
>
> Agreed
>
>> However, I _would_ have
>> expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot()
>> is adequate for a great deal of the backend, after all.  In what sort of
>> situation do catalog scans not strictly nest?
>
> It's possible. Making assumptions about what is possible bit me
> before, as you showed.
>
> I've seen code where we are explicitly running two concurrent scans.
>
> I don't want to put unnecessary and subtle assumptions into catalog
> scan code so I'm working on the assumption that endscan may not be
> called in strict FIFO order. The difference is fairly minor and
> doesn't restrict us in other ways.

I feel like the first thing we should be doing here is some
benchmarking.  If we change just the scans in dependency.c and then
try the test case Tom suggested (dropping a schema containing a large
number of functions), we can compare the patched code with master and
get an idea of whether the performance is acceptable.  If it is,
changing everything else is a mostly mechanical process that we can
simply grind through.  If it's not, I'd rather learn that before we
start grinding.

I started to do this before Christmas, but then I ... went on
vacation.  Here's Perl script that can be used to generate an SQL
script to create as many functions as you'd like to try dropping at
once, in case it's helpful.  The idea is to run the resulting SQL
script through psql and then test the speed of "DROP SCHEMA
lots_of_functions CASCADE;".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

lots_of_functions.pl

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

03 January 2012, 12:17:48

On Tue, Jan 3, 2012 at 3:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> I feel like the first thing we should be doing here is some
> benchmarking.  If we change just the scans in dependency.c and then
> try the test case Tom suggested (dropping a schema containing a large
> number of functions), we can compare the patched code with master and
> get an idea of whether the performance is acceptable.

Yes, I've done this and it takes 2.5s to drop 10,000 functions using
an MVCC snapshot.

That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.

We can do a lot of tests but at the end its a human judgement. Is 100%
correct results from catalog accesses worth having when the real world
speed of it is not substantially very good? (Whether its x1000000
times slower or not is not relevant if it is still fast enough).

Anybody with that many DDL statements probably cares about doing
various operations with lower lock levels.

Fastpath locking will slow down DDL but we didn't measure the
performance slow down there. We understood the benefit and were
willing to pay the price.

So I'll proceed for now with the patch, which isn't as simple as you think.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

03 January 2012, 13:39:35

On Tue, Jan 3, 2012 at 11:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jan 3, 2012 at 3:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I feel like the first thing we should be doing here is some
>> benchmarking.  If we change just the scans in dependency.c and then
>> try the test case Tom suggested (dropping a schema containing a large
>> number of functions), we can compare the patched code with master and
>> get an idea of whether the performance is acceptable.
>
> Yes, I've done this and it takes 2.5s to drop 10,000 functions using
> an MVCC snapshot.
>
> That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.
>
> We can do a lot of tests but at the end its a human judgement. Is 100%
> correct results from catalog accesses worth having when the real world
> speed of it is not substantially very good? (Whether its x1000000
> times slower or not is not relevant if it is still fast enough).

Sure.  But I don't see why that means it wouldn't be nice to know
whether or not it is in fact a million times slower.  If Tom's
artificial worst case is a million times slower, then probably there
are some cases we care about more that are going to be measurably
impacted, and we're going to want to think about what to do about
that.  We can wait until you've finished the patch before we do that
testing, or we can do it now and maybe get some idea whether the
approach is likely to be viable or whether it's going to require some
adjustment before we actually go trawl through all that code.

On my laptop, dropping a schema with 10,000 functions using commit
d5448c7d31b5af66a809e6580bae9bd31448bfa7 takes 400-500 ms.  If my
laptop is the same speed as your laptop, that would mean a 5-6x
slowdown, but of course that's comparing apples and oranges...  in any
event, if the real number is anywhere in that ballpark, it's probably
a surmountable obstacle, but I'd like to know rather than guessing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

03 January 2012, 13:56:19

Simon Riggs <simon@2ndQuadrant.com> writes:
> That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.

> We can do a lot of tests but at the end its a human judgement. Is 100%
> correct results from catalog accesses worth having when the real world
> speed of it is not substantially very good? (Whether its x1000000
> times slower or not is not relevant if it is still fast enough).

That argument is just nonsense AFAICT.  Yes, 2.5 s to drop 10000
functions is probably fine, but that is an artificial test case whose
only real interest is to benchmark what a change in SnapshotNow scans
might cost us.  In the real world it's hard to guess what fraction of a
real workload might consist of such scans, but I suspect there are some
where a significant increase in that cost might hurt.  So in my mind the
point of the exercise is to find out how much the cost increased, and
I'm just baffled as to why you won't use a benchmark case to, um,
benchmark.

Another point that requires some thought is that switching SnapshotNow
to be MVCC-based will presumably result in a noticeable increase in each
backend's rate of wanting to acquire snapshots.  Hence, more contention
in GetSnapshotData can be expected.  A single-threaded test case doesn't
prove anything at all about what that might cost under load.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

03 January 2012, 14:14:04

On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.
>
>> We can do a lot of tests but at the end its a human judgement. Is 100%
>> correct results from catalog accesses worth having when the real world
>> speed of it is not substantially very good? (Whether its x1000000
>> times slower or not is not relevant if it is still fast enough).
>
> That argument is just nonsense AFAICT.  Yes, 2.5 s to drop 10000
> functions is probably fine, but that is an artificial test case whose
> only real interest is to benchmark what a change in SnapshotNow scans
> might cost us.  In the real world it's hard to guess what fraction of a
> real workload might consist of such scans, but I suspect there are some
> where a significant increase in that cost might hurt.  So in my mind the
> point of the exercise is to find out how much the cost increased, and
> I'm just baffled as to why you won't use a benchmark case to, um,
> benchmark.

Ditto.

> Another point that requires some thought is that switching SnapshotNow
> to be MVCC-based will presumably result in a noticeable increase in each
> backend's rate of wanting to acquire snapshots.  Hence, more contention
> in GetSnapshotData can be expected.  A single-threaded test case doesn't
> prove anything at all about what that might cost under load.

This is obviously true at some level, but I'm not sure that it really
matters.  It's not that difficult to construct a test case where we
have lots of people concurrently reading a table, or reading many
tables, or writing a table, or writing many tables, but what kind of
realistic test case involves enough DDL for any of this to matter?  If
you're creating or dropping tables, for example, the filesystem costs
are likely to be a much bigger problem than GetSnapshotData(), to the
point where you probably can't get enough concurrency for
GetSnapshotData() to matter.  Maybe you could find a problem case
involving creating or dropping lots and lots of functions
concurrently, or something like that, but who does that?  You'd have
to be performing operations on hundreds of non-table SQL objects per
second, and it is hard for me to imagine why anyone would be doing
that.  Maybe I'm not imaginative enough?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

03 January 2012, 14:15:27

Excerpts from Robert Haas's message of mar ene 03 12:24:52 -0300 2012:

> I feel like the first thing we should be doing here is some
> benchmarking.  If we change just the scans in dependency.c and then
> try the test case Tom suggested (dropping a schema containing a large
> number of functions), we can compare the patched code with master and
> get an idea of whether the performance is acceptable.  If it is,
> changing everything else is a mostly mechanical process that we can
> simply grind through.  If it's not, I'd rather learn that before we
> start grinding.

If there are many call sites, maybe it'd be a good idea to use a
semantic patcher tool such as Coccinelle instead of doing it one by one.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

03 January 2012, 14:25:03

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another point that requires some thought is that switching SnapshotNow
>> to be MVCC-based will presumably result in a noticeable increase in each
>> backend's rate of wanting to acquire snapshots. �Hence, more contention
>> in GetSnapshotData can be expected. �A single-threaded test case doesn't
>> prove anything at all about what that might cost under load.

> This is obviously true at some level, but I'm not sure that it really
> matters.  It's not that difficult to construct a test case where we
> have lots of people concurrently reading a table, or reading many
> tables, or writing a table, or writing many tables, but what kind of
> realistic test case involves enough DDL for any of this to matter?

Um ... you're supposing that only DDL uses SnapshotNow, which is wrong.
I refer you to the parser, the planner, execution functions for arrays,
records, enums, any sort of relcache reload, etc etc etc.  Yes, some
of that is masked by backend-internal caching, some of the time, but
it's folly to just assume that there are no SnapshotNow scans during
normal queries.

None of this is necessarily grounds to reject a patch along the proposed
lines.  I'm just asking for some benchmarking effort to establish what
the costs might be, rather than naively hoping they are negligible.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

03 January 2012, 14:34:01

On Tue, Jan 3, 2012 at 1:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Um ... you're supposing that only DDL uses SnapshotNow, which is wrong.
> I refer you to the parser, the planner, execution functions for arrays,
> records, enums, any sort of relcache reload, etc etc etc.  Yes, some
> of that is masked by backend-internal caching, some of the time, but
> it's folly to just assume that there are no SnapshotNow scans during
> normal queries.

Hmm.  That's unfortunate, because it seems difficult to construct a
test case that will exercise every feature in the system.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

"Kevin Grittner"

Date:

03 January 2012, 14:42:24

Robert Haas <robertmhaas@gmail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Um ... you're supposing that only DDL uses SnapshotNow, which is
>> wrong.  I refer you to the parser, the planner, execution
>> functions for arrays, records, enums, any sort of relcache
>> reload, etc etc etc.  Yes, some of that is masked by
>> backend-internal caching, some of the time, but it's folly to
>> just assume that there are no SnapshotNow scans during normal
>> queries.
> 
> Hmm.  That's unfortunate, because it seems difficult to construct
> a test case that will exercise every feature in the system.
Would the result of IsMVCCSnapshot() change for these snapshots?  If
so, it might require work in the SSI code to avoid a performance hit
there.  In early profiling and stepping through execution I noticed
that we had overhead in serializable transactions for the planner
checks for the actual values at the beginning or end of an index. 
This went away when we avoided SSI work for reads using a non-MVCC
snapshot.  If we're going to start using MVCC snapshots for such
things, we need some other way to avoid unnecessary work in this
area (and possibly others).
At a minimum, some comparative benchmarks at the serializable
isolation level would be in order when considering a patch like
this.
-Kevin

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

03 January 2012, 14:43:00

I wrote:
>> Another point that requires some thought is that switching SnapshotNow
>> to be MVCC-based will presumably result in a noticeable increase in each
>> backend's rate of wanting to acquire snapshots.

BTW, I wonder if this couldn't be ameliorated by establishing some
ground rules about how up-to-date a snapshot really needs to be.
Arguably, it should be okay for successive SnapshotNow scans to use the
same snapshot as long as we have not acquired a new lock in between.
If not, reusing an old snap doesn't introduce any race condition that
wasn't there already.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

03 January 2012, 15:04:47

On Tue, Jan 3, 2012 at 1:42 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Um ... you're supposing that only DDL uses SnapshotNow, which is
>>> wrong.  I refer you to the parser, the planner, execution
>>> functions for arrays, records, enums, any sort of relcache
>>> reload, etc etc etc.  Yes, some of that is masked by
>>> backend-internal caching, some of the time, but it's folly to
>>> just assume that there are no SnapshotNow scans during normal
>>> queries.
>>
>> Hmm.  That's unfortunate, because it seems difficult to construct
>> a test case that will exercise every feature in the system.
>
> Would the result of IsMVCCSnapshot() change for these snapshots?  If
> so, it might require work in the SSI code to avoid a performance hit
> there.  In early profiling and stepping through execution I noticed
> that we had overhead in serializable transactions for the planner
> checks for the actual values at the beginning or end of an index.
> This went away when we avoided SSI work for reads using a non-MVCC
> snapshot.  If we're going to start using MVCC snapshots for such
> things, we need some other way to avoid unnecessary work in this
> area (and possibly others).
>
> At a minimum, some comparative benchmarks at the serializable
> isolation level would be in order when considering a patch like
> this.

Ugh.  Yeah, that sounds like a problem.  :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

03 January 2012, 15:17:47

On Tue, Jan 3, 2012 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>>> Another point that requires some thought is that switching SnapshotNow
>>> to be MVCC-based will presumably result in a noticeable increase in each
>>> backend's rate of wanting to acquire snapshots.
>
> BTW, I wonder if this couldn't be ameliorated by establishing some
> ground rules about how up-to-date a snapshot really needs to be.
> Arguably, it should be okay for successive SnapshotNow scans to use the
> same snapshot as long as we have not acquired a new lock in between.
> If not, reusing an old snap doesn't introduce any race condition that
> wasn't there already.

Is that likely to help much?  I think our usual pattern is to lock the
catalog, scan it, and then release the lock, so there will normally be
an AcceptInvalidationMessages() just before the scan.  Or at least, I
think there will.

Another thought is that it should always be safe to reuse an old
snapshot if no transactions have committed or aborted since it was
taken (possibly we could narrow that to "no transactions have
committed since it was taken", but I think that might cause some
headaches with RecentGlobalXmin).  I wonder if we could come up with a
cheap, hopefully lock-free method of determining whether or not that's
the case.  For example, suppose we store the last XID to commit or
abort in shared memory.  In GetSnapshotData(), we check whether that
value has changed (probably, after enforcing a memory barrier of some
kind) before acquiring the lock. If it has, we proceed normally, but
if not, we just reuse the results from the previous GetSnapshotData().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

03 January 2012, 15:22:57

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 3, 2012 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, I wonder if this couldn't be ameliorated by establishing some
>> ground rules about how up-to-date a snapshot really needs to be.
>> Arguably, it should be okay for successive SnapshotNow scans to use the
>> same snapshot as long as we have not acquired a new lock in between.
>> If not, reusing an old snap doesn't introduce any race condition that
>> wasn't there already.

> Is that likely to help much?  I think our usual pattern is to lock the
> catalog, scan it, and then release the lock, so there will normally be
> an AcceptInvalidationMessages() just before the scan.  Or at least, I
> think there will.

Um, good point.  Those locks aren't meant to avoid race conditions,
but the mechanism doesn't know that.

> Another thought is that it should always be safe to reuse an old
> snapshot if no transactions have committed or aborted since it was
> taken

Yeah, that might work better.  And it'd be a win for all MVCC snaps,
not just the ones coming from promoted SnapshotNow ...
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

03 January 2012, 16:55:28

On Tue, Jan 3, 2012 at 6:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Another point that requires some thought is that switching SnapshotNow
>>> to be MVCC-based will presumably result in a noticeable increase in each
>>> backend's rate of wanting to acquire snapshots.  Hence, more contention
>>> in GetSnapshotData can be expected.  A single-threaded test case doesn't
>>> prove anything at all about what that might cost under load.
>
>> This is obviously true at some level, but I'm not sure that it really
>> matters.  It's not that difficult to construct a test case where we
>> have lots of people concurrently reading a table, or reading many
>> tables, or writing a table, or writing many tables, but what kind of
>> realistic test case involves enough DDL for any of this to matter?
>
> Um ... you're supposing that only DDL uses SnapshotNow, which is wrong.
> I refer you to the parser, the planner, execution functions for arrays,
> records, enums, any sort of relcache reload, etc etc etc.  Yes, some
> of that is masked by backend-internal caching, some of the time, but
> it's folly to just assume that there are no SnapshotNow scans during
> normal queries.
>
> None of this is necessarily grounds to reject a patch along the proposed
> lines.  I'm just asking for some benchmarking effort to establish what
> the costs might be, rather than naively hoping they are negligible.

All of which is reasonable doubt.

So far, all I'm saying is that on a couple of heavy duty cases that
I've run, I haven't noticed a problem. That is sufficient for me to
push forwards with a patch that does very close to what we want, so we
can judge acceptability with wider tests. I'm not saying it will be
acceptable a priori, only that it is worth trying because the benefits
are high. The number of call points are also high.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

03 January 2012, 16:58:20

On Tue, Jan 3, 2012 at 6:14 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

> If there are many call sites, maybe it'd be a good idea to use a
> semantic patcher tool such as Coccinelle instead of doing it one by one.

Thanks for the suggestion, regrettably I've already made those changes.

After examining the call sites, I identified 35 that might need
changing. Of those, about 30 were changed to use systable_beginscan,
while a few others use declared snapshots instead. So not a great
effort and worth doing the by-hand inspection.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

03 January 2012, 17:07:42

Excerpts from Simon Riggs's message of mar ene 03 17:57:56 -0300 2012:
>
> On Tue, Jan 3, 2012 at 6:14 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>
> > If there are many call sites, maybe it'd be a good idea to use a
> > semantic patcher tool such as Coccinelle instead of doing it one by one.
>
> Thanks for the suggestion, regrettably I've already made those changes.
>
> After examining the call sites, I identified 35 that might need
> changing. Of those, about 30 were changed to use systable_beginscan,
> while a few others use declared snapshots instead. So not a great
> effort and worth doing the by-hand inspection.

Yeah, for 35 changes I wouldn't have gone all the length required to
learn the new tool either.  If they had been 200, OTOH ...

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Bruce Momjian

Date:

17 August 2012, 01:11:16

Was this resolved?  (Sorry to be bugging everyone.)

---------------------------------------------------------------------------

On Tue, Nov 29, 2011 at 11:42:10PM -0500, Robert Haas wrote:
> On Tue, Nov 29, 2011 at 11:50 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > Just confirming we decided not to persue this.
> 
> Doesn't sound like it.
> 
> I've been thinking a lot about the more general problem here - namely,
> that allowing catalog changes without an access exclusive lock is
> unsafe - and trying to come up with a good solution, because I'd
> really like to see us make this work.  It strikes me that there are
> really two separate problems here.
> 
> 1. If you are scanning a system catalog using SnapshotNow, and a
> commit happens at just the right moment, you might see two versions of
> the row or zero rather than one.  You'll see two if you scan the old
> row version, then the concurrent transaction commits, and then you
> scan the new one.  You'll see zero if you scan the new row (ignoring
> it, since it isn't committed yet) and then the concurrent transaction
> commits, and then you scan the old row.  On a related note, if you
> scan several interrelated catalogs (e.g. when building a relcache
> entry) and a transaction that has made matching modifications to
> multiple catalogs commits midway through, you might see the old
> version of the data from one table and the new version of the data
> from some other table.  Using AccessExclusiveLock fixes the problem
> because we guarantee that anybody who wants to read those rows first
> takes some kind of lock, which they won't be able to do while the
> updater holds AccessExclusiveLock.
> 
> 2. Other backends may have data in the relcache or catcaches which
> won't get invalidated until they do AcceptInvalidationMessages().
> That will always happen no later than the next time they lock the
> relation, so if you are holding AccessExclusiveLock then you should be
> OK: no one else holds any lock, and they'll have to go get one before
> doing anything interesting.  But if you are holding any weaker lock,
> there's nothing to force AcceptInvalidationMessages to happen before
> you reuse those cache entries.
> 
> In both cases, name lookups are an exception: we don't know what OID
> to lock until we've done the name lookup.
> 
> Random ideas for solving the first problem:
> 
> 1-A. Take a fresh MVCC snapshot for each catalog scan.  I think we've
> rejected this before because of the cost involved, but maybe with
> Pavan's patch and some of the other improvements that are on the way
> we could make this cheap enough to be acceptable.
> 1-B. Don't allow transactions that have modified system catalog X to
> commit while we're scanning system catalog X; make the scans take some
> kind of shared lock and commits an exclusive lock.  This seems awfully
> bad for concurrency, though, not to mention the overhead of taking and
> releasing all those locks even when nothing conflicts.
> 1-C. Wrap a retry loop around every place where we scan a system
> catalog.  If a transaction that has written rows in catalog X commits
> while we're scanning catalog X, discard the results of the first scan
> and declare a do-over (or maybe something more coarse-grained, or at
> the other extreme even more fine-grained).  If we're doing several
> interrelated scans then the retry loop must surround the entire unit,
> and enforce a do-over of the whole operation if commits occur in any
> of the catalogs before the scan completes.
> 
> Unfortunately, any of these seem likely to require a Lot of Work,
> probably more than can be done in time for 9.2.  However, perhaps it
> would be feasible to hone in on a limited subset of the cases (e.g.
> name lookups, which are mainly done in only a handful of places, and
> represent an existing bug) and implement the necessary code changes
> just for those cases.  Then we could generalize that pattern to other
> cases as time and energy permit.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

17 August 2012, 19:45:38

On Thu, Aug 16, 2012 at 9:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Was this resolved?  (Sorry to be bugging everyone.)

Nope.  I've got some ideas, but not enough round tuits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

07 July 2013, 13:25:00

On 3 January 2012 18:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>>> Another point that requires some thought is that switching SnapshotNow
>>> to be MVCC-based will presumably result in a noticeable increase in each
>>> backend's rate of wanting to acquire snapshots.
>
> BTW, I wonder if this couldn't be ameliorated by establishing some
> ground rules about how up-to-date a snapshot really needs to be.
> Arguably, it should be okay for successive SnapshotNow scans to use the
> same snapshot as long as we have not acquired a new lock in between.
> If not, reusing an old snap doesn't introduce any race condition that
> wasn't there already.

Now that has been implemented using the above design, we can resubmit
the lock level reduction patch, with thanks to Robert.

Submitted patch passes original complaint/benchmark.

Changes
* various forms of ALTER TABLE, notably ADD constraint and VALIDATE
* CREATE TRIGGER

One minor coirrections to earlier thinking with respect to toast
tables. That might be later relaxed.

Full tests including proof of lock level reductions, plus docs.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v13.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

15 July 2013, 14:06:41

On Sun, Jul 7, 2013 at 9:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 January 2012 18:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wrote:
>>>> Another point that requires some thought is that switching SnapshotNow
>>>> to be MVCC-based will presumably result in a noticeable increase in each
>>>> backend's rate of wanting to acquire snapshots.
>>
>> BTW, I wonder if this couldn't be ameliorated by establishing some
>> ground rules about how up-to-date a snapshot really needs to be.
>> Arguably, it should be okay for successive SnapshotNow scans to use the
>> same snapshot as long as we have not acquired a new lock in between.
>> If not, reusing an old snap doesn't introduce any race condition that
>> wasn't there already.
>
> Now that has been implemented using the above design, we can resubmit
> the lock level reduction patch, with thanks to Robert.
>
> Submitted patch passes original complaint/benchmark.
>
> Changes
> * various forms of ALTER TABLE, notably ADD constraint and VALIDATE
> * CREATE TRIGGER
>
> One minor coirrections to earlier thinking with respect to toast
> tables. That might be later relaxed.
>
> Full tests including proof of lock level reductions, plus docs.

I'm quite glad to see this patch back again for another round.  I
haven't reviewed it in detail yet, so the purpose of this email is
just to lay out some general areas of concern and food for thought
rather than to critique anything in the patch too specifically.

Generally, the question on the table is: to what extent do MVCC
catalog scans make the world safe for concurrent DDL, or put
negatively, what hazards remain?

Noah discovered an interesting one recently: apparently, the relcache
machinery has some logic in it that depends on the use of
AccessExclusiveLock in subtle ways.  I'm going to attempt to explain
it, and maybe he can jump in and explain it better.  Essentially, the
problem is that when a relcache reload occurs, certain data structures
(like the tuple descriptor, but there are others) are compared against
the old version of the same data structure.  If there are no changes,
we do nothing; else, we free the old one and install the new one.  The
reason why we don't free the old one and install the new one
unconditionally is because other parts of the backend might have
pointers to the old data structure, so just replacing it all the time
would result in crashes.  Since DDL requires AccessExclusiveLock +
CheckTableNotInUse(), any actual change to the data structure will
happen when we haven't got any pointers anyway.

A second thing I'm concerned about is that, although MVCC catalog
scans guarantee that we won't miss a concurrently-updated row
entirely, or see a duplicate, they don't guarantee that the rows we
see during a scan of catalog A will be consistent with the rows we see
during a scan of catalog B moments later.  For example, hilarity will
ensue if relnatts doesn't match what we see in pg_attribute.  Now I
don't think this particular example matters very much because I think
there are probably lots of other things that would also break if we
try to add a column without a full table lock, so we're probably
doomed there anyway.  But there might be other instances of this
problem that are more subtle.

I'll try to find some time to look at this in more detail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

15 July 2013, 14:30:23

On 2013-07-15 10:06:31 -0400, Robert Haas wrote:
> Noah discovered an interesting one recently: apparently, the relcache
> machinery has some logic in it that depends on the use of
> AccessExclusiveLock in subtle ways.  I'm going to attempt to explain
> it, and maybe he can jump in and explain it better.  Essentially, the
> problem is that when a relcache reload occurs, certain data structures
> (like the tuple descriptor, but there are others) are compared against
> the old version of the same data structure.  If there are no changes,
> we do nothing; else, we free the old one and install the new one.  The
> reason why we don't free the old one and install the new one
> unconditionally is because other parts of the backend might have
> pointers to the old data structure, so just replacing it all the time
> would result in crashes.  Since DDL requires AccessExclusiveLock +
> CheckTableNotInUse(), any actual change to the data structure will
> happen when we haven't got any pointers anyway.

Aren't we swapping in the new data on a data level for that reason? See
RelationClearRelation().

> A second thing I'm concerned about is that, although MVCC catalog
> scans guarantee that we won't miss a concurrently-updated row
> entirely, or see a duplicate, they don't guarantee that the rows we
> see during a scan of catalog A will be consistent with the rows we see
> during a scan of catalog B moments later.  For example, hilarity will
> ensue if relnatts doesn't match what we see in pg_attribute.  Now I
> don't think this particular example matters very much because I think
> there are probably lots of other things that would also break if we
> try to add a column without a full table lock, so we're probably
> doomed there anyway.  But there might be other instances of this
> problem that are more subtle.

Hm. Other transactions basically should be protected against this
because they can't se uncommitted data anyway, right? ISTM that our own
session basically already has be safe against hilarity of this kind,
right?

I am concerned about stuff like violating constraints because we haven't
yet seen the new constraint definition and the like... Or generating
wrong plans because we haven't seen that somebody has dropped a
constraint and inserted data violating the old one.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

15 July 2013, 15:09:44

On Mon, Jul 15, 2013 at 10:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-07-15 10:06:31 -0400, Robert Haas wrote:
>> Noah discovered an interesting one recently: apparently, the relcache
>> machinery has some logic in it that depends on the use of
>> AccessExclusiveLock in subtle ways.  I'm going to attempt to explain
>> it, and maybe he can jump in and explain it better.  Essentially, the
>> problem is that when a relcache reload occurs, certain data structures
>> (like the tuple descriptor, but there are others) are compared against
>> the old version of the same data structure.  If there are no changes,
>> we do nothing; else, we free the old one and install the new one.  The
>> reason why we don't free the old one and install the new one
>> unconditionally is because other parts of the backend might have
>> pointers to the old data structure, so just replacing it all the time
>> would result in crashes.  Since DDL requires AccessExclusiveLock +
>> CheckTableNotInUse(), any actual change to the data structure will
>> happen when we haven't got any pointers anyway.
>
> Aren't we swapping in the new data on a data level for that reason? See
> RelationClearRelation().

Look at the keep_tupdesc and keep_rules variables.

>> A second thing I'm concerned about is that, although MVCC catalog
>> scans guarantee that we won't miss a concurrently-updated row
>> entirely, or see a duplicate, they don't guarantee that the rows we
>> see during a scan of catalog A will be consistent with the rows we see
>> during a scan of catalog B moments later.  For example, hilarity will
>> ensue if relnatts doesn't match what we see in pg_attribute.  Now I
>> don't think this particular example matters very much because I think
>> there are probably lots of other things that would also break if we
>> try to add a column without a full table lock, so we're probably
>> doomed there anyway.  But there might be other instances of this
>> problem that are more subtle.
>
> Hm. Other transactions basically should be protected against this
> because they can't se uncommitted data anyway, right? ISTM that our own
> session basically already has be safe against hilarity of this kind,
> right?

Other transactions are protected only within a single scan.  If they
do two or more separate scans one after the another, some other
transaction can commit in the middle of things.  Any commits that
happen after starting the first scan and before starting the second
scan will be reflected in the second scan, but not in the first.
That's what I'm concerned about.  Our own session can't have a problem
of this kind, because we'll never commit a subtransaction (or, heaven
forbid, a top-level transaction) while in the middle of a system
catalog scan.

> I am concerned about stuff like violating constraints because we haven't
> yet seen the new constraint definition and the like... Or generating
> wrong plans because we haven't seen that somebody has dropped a
> constraint and inserted data violating the old one.

Yes, we need to carefully audit the commands for dependencies of that type.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

15 July 2013, 16:50:52

On 15 July 2013 15:06, Robert Haas <robertmhaas@gmail.com> wrote:

> Generally, the question on the table is: to what extent do MVCC
> catalog scans make the world safe for concurrent DDL, or put
> negatively, what hazards remain?

On Tom's test I've been unable to find a single problem.

> Noah discovered an interesting one recently: apparently, the relcache
> machinery has some logic in it that depends on the use of
> AccessExclusiveLock in subtle ways.  I'm going to attempt to explain
> it, and maybe he can jump in and explain it better.  Essentially, the
> problem is that when a relcache reload occurs, certain data structures
> (like the tuple descriptor, but there are others) are compared against
> the old version of the same data structure.  If there are no changes,
> we do nothing; else, we free the old one and install the new one.  The
> reason why we don't free the old one and install the new one
> unconditionally is because other parts of the backend might have
> pointers to the old data structure, so just replacing it all the time
> would result in crashes.  Since DDL requires AccessExclusiveLock +
> CheckTableNotInUse(), any actual change to the data structure will
> happen when we haven't got any pointers anyway.
>
> A second thing I'm concerned about is that, although MVCC catalog
> scans guarantee that we won't miss a concurrently-updated row
> entirely, or see a duplicate, they don't guarantee that the rows we
> see during a scan of catalog A will be consistent with the rows we see
> during a scan of catalog B moments later.  For example, hilarity will
> ensue if relnatts doesn't match what we see in pg_attribute.  Now I
> don't think this particular example matters very much because I think
> there are probably lots of other things that would also break if we
> try to add a column without a full table lock, so we're probably
> doomed there anyway.  But there might be other instances of this
> problem that are more subtle.

If you look at this as a generalised problem you probably can find
some issues, but that doesn't mean they apply in the specific cases
we're addressing.

The lock reductions we are discussing in all cases have nothing at all
to do with structure and only relate to various options. Except in the
case of constraints, though even there I see no issues as yet.

I've no problem waiting awhile to apply while others try to find loopholes.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

01 August 2013, 00:54:02

On Mon, Jul 15, 2013 at 05:50:40PM +0100, Simon Riggs wrote:
> On 15 July 2013 15:06, Robert Haas <robertmhaas@gmail.com> wrote:
> > Generally, the question on the table is: to what extent do MVCC
> > catalog scans make the world safe for concurrent DDL, or put
> > negatively, what hazards remain?
> 
> On Tom's test I've been unable to find a single problem.
> 
> > Noah discovered an interesting one recently: apparently, the relcache
> > machinery has some logic in it that depends on the use of
> > AccessExclusiveLock in subtle ways.  I'm going to attempt to explain
> > it, and maybe he can jump in and explain it better.  Essentially, the
> > problem is that when a relcache reload occurs, certain data structures
> > (like the tuple descriptor, but there are others) are compared against
> > the old version of the same data structure.  If there are no changes,
> > we do nothing; else, we free the old one and install the new one.  The
> > reason why we don't free the old one and install the new one
> > unconditionally is because other parts of the backend might have
> > pointers to the old data structure, so just replacing it all the time
> > would result in crashes.  Since DDL requires AccessExclusiveLock +
> > CheckTableNotInUse(), any actual change to the data structure will
> > happen when we haven't got any pointers anyway.

> If you look at this as a generalised problem you probably can find
> some issues, but that doesn't mean they apply in the specific cases
> we're addressing.
> 
> The lock reductions we are discussing in all cases have nothing at all
> to do with structure and only relate to various options. Except in the
> case of constraints, though even there I see no issues as yet.

I was able to distill the above hypothesis into an actual crash with
reduce_lock_levels.v13.patch.  Test recipe:

1. Build with --enable-cassert and with -DCATCACHE_FORCE_RELEASE=1.  An
AcceptInvalidationMessages() will then happen at nearly every syscache lookup,
making it far easier to hit a problem of this sort.

2. Run these commands as setup: create table root (c int); create table part (check (c > 0), check (c > 0)) inherits
(root);

3. Attach a debugger to your session and set a breakpoint at plancat.c:660 (as
of commit 16f38f72ab2b8a3b2d45ba727d213bb31111cea4).

4. Run this in your session; the breakpoint will trip: select * from root where c = -1;

5. Start another session and run: alter table part add check (c > 0);

6. Exit the debugger to release the first session.  It will crash.

plancache.c:657 stashes a pointer to memory belonging to the rd_att of a
relcache entry.  It then proceeds to call eval_const_expressions(), which
performs a syscache lookup in its simplify_function() subroutine.  Under
CATCACHE_FORCE_RELEASE, the syscache lookup reliably prompts an
AcceptInvalidationMessages().  The ensuing RelationClearRelation() against
"part" notices the new constraint, decides keep_tupdesc = false, and frees the
existing tupdesc.  plancache.c is now left holding a pointer into freed
memory.  The next loop iteration will crash when it dereferences a pointer
stored within that freed memory at plancat.c:671.

A remediation strategy that seemed attractive when I last contemplated this
problem is to repoint rd_att immediately but arrange to free the obsolete
TupleDesc in AtEOXact_RelationCache().

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

01 August 2013, 06:04:23

On 1 August 2013 01:53, Noah Misch <noah@leadboat.com> wrote:

> I was able to distill the above hypothesis into an actual crash with
> reduce_lock_levels.v13.patch.  Test recipe:

Thank you for the report; thank you again for reporting in sufficient
time to allow me to investigate and fix by the next CF.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

13 November 2013, 11:48:27

On 1 August 2013 01:53, Noah Misch <noah@leadboat.com> wrote:

> A remediation strategy that seemed attractive when I last contemplated this
> problem is to repoint rd_att immediately but arrange to free the obsolete
> TupleDesc in AtEOXact_RelationCache().

I agree that the best way to resolve this is to retain a copy of the
TupleDesc, so that copied pointers to it remain valid.

EOXact is actually longer than strictly necessary in some cases, but
trying to work out a more minimal approach seems hard and possibly
inefficient.

Comments in relcache.c indicate that the Relation swapping concept
might be replaced by refcounting approach. I can't see how that
differs from your suggested route.

Which means I can't see any other way of doing this other than the way
you suggest. Will implement.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

14 January 2014, 17:37:10

On 7 July 2013 14:24, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 January 2012 18:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wrote:
>>>> Another point that requires some thought is that switching SnapshotNow
>>>> to be MVCC-based will presumably result in a noticeable increase in each
>>>> backend's rate of wanting to acquire snapshots.
>>
>> BTW, I wonder if this couldn't be ameliorated by establishing some
>> ground rules about how up-to-date a snapshot really needs to be.
>> Arguably, it should be okay for successive SnapshotNow scans to use the
>> same snapshot as long as we have not acquired a new lock in between.
>> If not, reusing an old snap doesn't introduce any race condition that
>> wasn't there already.
>
> Now that has been implemented using the above design, we can resubmit
> the lock level reduction patch, with thanks to Robert.
>
> Submitted patch passes original complaint/benchmark.
>
> Changes
> * various forms of ALTER TABLE, notably ADD constraint and VALIDATE
> * CREATE TRIGGER
>
> One minor coirrections to earlier thinking with respect to toast
> tables. That might be later relaxed.
>
> Full tests including proof of lock level reductions, plus docs.

Rebased to v14

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v14.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

15 January 2014, 14:58:04

On 1 August 2013 01:53, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Jul 15, 2013 at 05:50:40PM +0100, Simon Riggs wrote:
>> On 15 July 2013 15:06, Robert Haas <robertmhaas@gmail.com> wrote:
>> > Generally, the question on the table is: to what extent do MVCC
>> > catalog scans make the world safe for concurrent DDL, or put
>> > negatively, what hazards remain?
>>
>> On Tom's test I've been unable to find a single problem.
>>
>> > Noah discovered an interesting one recently: apparently, the relcache
>> > machinery has some logic in it that depends on the use of
>> > AccessExclusiveLock in subtle ways.  I'm going to attempt to explain
>> > it, and maybe he can jump in and explain it better.  Essentially, the
>> > problem is that when a relcache reload occurs, certain data structures
>> > (like the tuple descriptor, but there are others) are compared against
>> > the old version of the same data structure.  If there are no changes,
>> > we do nothing; else, we free the old one and install the new one.  The
>> > reason why we don't free the old one and install the new one
>> > unconditionally is because other parts of the backend might have
>> > pointers to the old data structure, so just replacing it all the time
>> > would result in crashes.  Since DDL requires AccessExclusiveLock +
>> > CheckTableNotInUse(), any actual change to the data structure will
>> > happen when we haven't got any pointers anyway.
>
>> If you look at this as a generalised problem you probably can find
>> some issues, but that doesn't mean they apply in the specific cases
>> we're addressing.
>>
>> The lock reductions we are discussing in all cases have nothing at all
>> to do with structure and only relate to various options. Except in the
>> case of constraints, though even there I see no issues as yet.
>
> I was able to distill the above hypothesis into an actual crash with
> reduce_lock_levels.v13.patch.  Test recipe:
>
> 1. Build with --enable-cassert and with -DCATCACHE_FORCE_RELEASE=1.  An
> AcceptInvalidationMessages() will then happen at nearly every syscache lookup,
> making it far easier to hit a problem of this sort.
>
> 2. Run these commands as setup:
>   create table root (c int);
>   create table part (check (c > 0), check (c > 0)) inherits (root);
>
> 3. Attach a debugger to your session and set a breakpoint at plancat.c:660 (as
> of commit 16f38f72ab2b8a3b2d45ba727d213bb31111cea4).
>
> 4. Run this in your session; the breakpoint will trip:
>   select * from root where c = -1;
>
> 5. Start another session and run:
>   alter table part add check (c > 0);
>
> 6. Exit the debugger to release the first session.  It will crash.
>
> plancache.c:657 stashes a pointer to memory belonging to the rd_att of a
> relcache entry.  It then proceeds to call eval_const_expressions(), which
> performs a syscache lookup in its simplify_function() subroutine.  Under
> CATCACHE_FORCE_RELEASE, the syscache lookup reliably prompts an
> AcceptInvalidationMessages().  The ensuing RelationClearRelation() against
> "part" notices the new constraint, decides keep_tupdesc = false, and frees the
> existing tupdesc.  plancache.c is now left holding a pointer into freed
> memory.  The next loop iteration will crash when it dereferences a pointer
> stored within that freed memory at plancat.c:671.
>
>
> A remediation strategy that seemed attractive when I last contemplated this
> problem is to repoint rd_att immediately but arrange to free the obsolete
> TupleDesc in AtEOXact_RelationCache().

v15 to fix the above problem.

Looking at other potential problems around plancache but nothing found as yet.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v15.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Peter Geoghegan

Date:

24 January 2014, 06:08:46

On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> v15 to fix the above problem.

Minor quibble: I get a compiler warning with the patch applied.
"relcache.c: In function ‘RememberToFreeTupleDescAtEOX’:
relcache.c:2317:3: warning: ISO C90 forbids mixed declarations and
code [-Werror=declaration-after-statement]".


--
Peter Geoghegan

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

24 January 2014, 08:33:42

On 24 January 2014 07:08, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> v15 to fix the above problem.
>
> Minor quibble: I get a compiler warning with the patch applied.
> "relcache.c: In function 'RememberToFreeTupleDescAtEOX':
> relcache.c:2317:3: warning: ISO C90 forbids mixed declarations and
> code [-Werror=declaration-after-statement]".

Thanks, that was a wart, now fixed.

v16 attached

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v16.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

27 January 2014, 17:58:09

On 24 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 24 January 2014 07:08, Peter Geoghegan <pg@heroku.com> wrote:
>> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> v15 to fix the above problem.
>>
> v16 attached

v17 attached

This version adds a GUC called ddl_exclusive_locks which allows us to
keep the 9.3 behaviour if we wish it. Some people may be surprised
that their programs don't wait in the same places they used to. We
hope that is a positive and useful behaviour, but it may not always be
so.

I'll commit this on Thurs 30 Jan unless I hear objections.

Thanks

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

27 January 2014, 17:58:50

On 27 January 2014 17:58, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 24 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 24 January 2014 07:08, Peter Geoghegan <pg@heroku.com> wrote:
>>> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> v15 to fix the above problem.
>>>
>> v16 attached
>
> v17 attached

Frostbite...

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v17.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

27 January 2014, 20:25:28

On Mon, Jan 27, 2014 at 12:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 24 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 24 January 2014 07:08, Peter Geoghegan <pg@heroku.com> wrote:
>>> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> v15 to fix the above problem.
>>>
>> v16 attached
>
> v17 attached
>
> This version adds a GUC called ddl_exclusive_locks which allows us to
> keep the 9.3 behaviour if we wish it. Some people may be surprised
> that their programs don't wait in the same places they used to. We
> hope that is a positive and useful behaviour, but it may not always be
> so.
>
> I'll commit this on Thurs 30 Jan unless I hear objections.

I haven't reviewed the patch, but -1 for adding a GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Peter Geoghegan

Date:

27 January 2014, 20:36:08

On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I haven't reviewed the patch, but -1 for adding a GUC.

I'm pretty surprised that it's been suggested that some people might
prefer AccessExclusiveLocks. Why would anyone prefer that?


-- 
Peter Geoghegan

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

27 January 2014, 20:47:12

Peter Geoghegan <pg@heroku.com> writes:
> On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I haven't reviewed the patch, but -1 for adding a GUC.

> I'm pretty surprised that it's been suggested that some people might
> prefer AccessExclusiveLocks. Why would anyone prefer that?

For one thing, so they can back this out if it proves to be broken,
as the last committed version was.  Given that this patch was marked
(by its author) as Ready for Committer without any review in the current
CF, I can't say that I have any faith in it.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

27 January 2014, 20:57:36

On 27 January 2014 20:35, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I haven't reviewed the patch, but -1 for adding a GUC.
>
> I'm pretty surprised that it's been suggested that some people might
> prefer AccessExclusiveLocks. Why would anyone prefer that?

Nobody has said "prefer". I said...

> Some people may be surprised
> that their programs don't wait in the same places they used to. We
> hope that is a positive and useful behaviour, but it may not always be
> so.

We get the new behaviour by default and I expect we'll be very happy with it.

A second thought is that if we have problems of some kind in the field
as a result of the new lock modes then we will be able to turn them
off. I'm happy to fix any problems that occur, but that doesn't mean
there won't be any. If everybody is confident that we've foreseen
every bug, then no problem, lets remove it. I recall being asked to
add hot_standby = on | off for similar reasons.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

27 January 2014, 21:08:59

On 27 January 2014 20:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I haven't reviewed the patch, but -1 for adding a GUC.
>
>> I'm pretty surprised that it's been suggested that some people might
>> prefer AccessExclusiveLocks. Why would anyone prefer that?
>
> For one thing, so they can back this out if it proves to be broken,
> as the last committed version was.

Agreed

> Given that this patch was marked
> (by its author) as Ready for Committer without any review in the current
> CF

True. The main review happened in a previous commitfest and there was
a small addition for this CF.

It was my understanding that you wanted us to indicate that to allow
you to review. I am happy to set status differently, as you wish,
presumably back to needs review.


>I can't say that I have any faith in it.

That's a shame.


-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Bruce Momjian

Date:

28 January 2014, 14:55:22

On Mon, Jan 27, 2014 at 08:57:26PM +0000, Simon Riggs wrote:
> We get the new behaviour by default and I expect we'll be very happy with it.
> 
> A second thought is that if we have problems of some kind in the field
> as a result of the new lock modes then we will be able to turn them
> off. I'm happy to fix any problems that occur, but that doesn't mean
> there won't be any. If everybody is confident that we've foreseen
> every bug, then no problem, lets remove it. I recall being asked to
> add hot_standby = on | off for similar reasons.

Addressing a larger issue, I have no problem with systematically adding
GUCs to turn off features we add in each major release if we can also
systematically remove those GUCs in the next major release.

This would require putting all these settings in the compatibility
section of postgresql.conf.

However, I don't think it makes sense to do this in a one-off manner. 
It is also possible that there are enough cases where we _can't_ turn
the feature off with a GUC that this would be unworkable.

So, if we can't do it systematically, that means we will have enough
breakage cases that we just need to rush out new versions to fix major
breakage and one-off GUCs just don't buy us much, and add confusion.

Does that make sense?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

28 January 2014, 16:36:47

On 28 January 2014 14:55, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Jan 27, 2014 at 08:57:26PM +0000, Simon Riggs wrote:
>> We get the new behaviour by default and I expect we'll be very happy with it.
>>
>> A second thought is that if we have problems of some kind in the field
>> as a result of the new lock modes then we will be able to turn them
>> off. I'm happy to fix any problems that occur, but that doesn't mean
>> there won't be any. If everybody is confident that we've foreseen
>> every bug, then no problem, lets remove it. I recall being asked to
>> add hot_standby = on | off for similar reasons.
>
> Addressing a larger issue, I have no problem with systematically adding
> GUCs to turn off features we add in each major release if we can also
> systematically remove those GUCs in the next major release.

Agreed. I propose 2 releases since the time between release of 9.4 and
the feature freeze of 9.5 is only 4 months, not usually enough for
subtle bugs to be discovered.

> This would require putting all these settings in the compatibility
> section of postgresql.conf.

Agreed, that is where I have added the parameter.

> However, I don't think it makes sense to do this in a one-off manner.
> It is also possible that there are enough cases where we _can't_ turn
> the feature off with a GUC that this would be unworkable.
>
> So, if we can't do it systematically, that means we will have enough
> breakage cases that we just need to rush out new versions to fix major
> breakage and one-off GUCs just don't buy us much, and add confusion.
>
> Does that make sense?

For me, reducing the strength of DDL locking is a major change in
RDBMS behaviour that could both delight and surprise our users. Maybe
a few actually depend upon the locking behaviour, maybe. After some
years of various people looking at this, I think we've got it right.
Experience tells me that while I think this is the outcome, we are
well advised to protect against the possibility that it is not correct
and that if we have corner case issues, it would be good to easily
disable this in the field. In the current case, a simple parameter
works very well to disable the feature; in other cases, not.

Summary: This is an atypical case. I do not normally propose such
things - this is the third time in 10 years, IIRC.

I have no problem removing the parameter if required to. In that case,
I would like to leave the parameter in until mid beta, to allow
greater certainty. In any case, I would wish to retain as a minimum an
extern bool variable allowing it to be turned off by C function if
desired.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Bruce Momjian

Date:

28 January 2014, 17:15:34

On Tue, Jan 28, 2014 at 04:36:39PM +0000, Simon Riggs wrote:
> For me, reducing the strength of DDL locking is a major change in
> RDBMS behaviour that could both delight and surprise our users. Maybe
> a few actually depend upon the locking behaviour, maybe. After some
> years of various people looking at this, I think we've got it right.
> Experience tells me that while I think this is the outcome, we are
> well advised to protect against the possibility that it is not correct
> and that if we have corner case issues, it would be good to easily
> disable this in the field. In the current case, a simple parameter
> works very well to disable the feature; in other cases, not.
> 
> Summary: This is an atypical case. I do not normally propose such
> things - this is the third time in 10 years, IIRC.

Uh, in my memory, you are the person who is most likely to suggest a
GUC of all our developers.

> I have no problem removing the parameter if required to. In that case,
> I would like to leave the parameter in until mid beta, to allow
> greater certainty. In any case, I would wish to retain as a minimum an
> extern bool variable allowing it to be turned off by C function if
> desired.

Anything changed to postgresql.conf during beta is going to require an
initdb.

Also, lots of backward-compatibility infrastructure, as you are
suggesting above, add complexity to the system.

I am basically against a GUC on this.  We have far larger problems with
9.3 than backward compatibility, and limited resources.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Heikki Linnakangas

Date:

28 January 2014, 17:22:09

On 01/28/2014 07:15 PM, Bruce Momjian wrote:
> On Tue, Jan 28, 2014 at 04:36:39PM +0000, Simon Riggs wrote:
>> For me, reducing the strength of DDL locking is a major change in
>> RDBMS behaviour that could both delight and surprise our users. Maybe
>> a few actually depend upon the locking behaviour, maybe. After some
>> years of various people looking at this, I think we've got it right.
>> Experience tells me that while I think this is the outcome, we are
>> well advised to protect against the possibility that it is not correct
>> and that if we have corner case issues, it would be good to easily
>> disable this in the field. In the current case, a simple parameter
>> works very well to disable the feature; in other cases, not.

I don't understand why anyone would want to turn this feature off, ie. 
require stronger locks than necessary for a DDL change.

If we're not confident that the patch is correct, then it should not be 
applied. If we are confident enough to commit it, and a bug crops up 
later, we'll fix the bug as usual.

A user savvy enough to fiddle with such a GUC can also do their DDL with:

BEGIN;
LOCK TABLE <table>
<DDL>
COMMIT;

>> I have no problem removing the parameter if required to. In that case,
>> I would like to leave the parameter in until mid beta, to allow
>> greater certainty. In any case, I would wish to retain as a minimum an
>> extern bool variable allowing it to be turned off by C function if
>> desired.
>
> Anything changed to postgresql.conf during beta is going to require an
> initdb.

Huh? Surely not.

- Heikki

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Bruce Momjian

Date:

28 January 2014, 17:27:05

On Tue, Jan 28, 2014 at 07:21:50PM +0200, Heikki Linnakangas wrote:
> >>I have no problem removing the parameter if required to. In that case,
> >>I would like to leave the parameter in until mid beta, to allow
> >>greater certainty. In any case, I would wish to retain as a minimum an
> >>extern bool variable allowing it to be turned off by C function if
> >>desired.
> >
> >Anything changed to postgresql.conf during beta is going to require an
> >initdb.
> 
> Huh? Surely not.

Uh, if we ship beta1 with a GUC in postgresql.conf, and then we remove
support for the GUC in beta2, anyone starting a server initdb-ed with
beta1 is going to get an error and the server is not going to start:
LOG:  unrecognized configuration parameter "xxx" in file "/u/pgsql/data/postgresql.conf" line 1FATAL:  configuration
file"/u/pgsql/data/postgresql.conf" contains errors
 

so, yeah, it isn't going to require an initdb, but it is going to
require everyone to edit their postgresql.conf.  My guess is a lot of
people are going to assume the old postgresql.conf is not compatible and
are going to initdb and reload.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Heikki Linnakangas

Date:

28 January 2014, 17:30:56

On 01/28/2014 07:26 PM, Bruce Momjian wrote:
> On Tue, Jan 28, 2014 at 07:21:50PM +0200, Heikki Linnakangas wrote:
>>>> I have no problem removing the parameter if required to. In that case,
>>>> I would like to leave the parameter in until mid beta, to allow
>>>> greater certainty. In any case, I would wish to retain as a minimum an
>>>> extern bool variable allowing it to be turned off by C function if
>>>> desired.
>>>
>>> Anything changed to postgresql.conf during beta is going to require an
>>> initdb.
>>
>> Huh? Surely not.
>
> Uh, if we ship beta1 with a GUC in postgresql.conf, and then we remove
> support for the GUC in beta2, anyone starting a server initdb-ed with
> beta1 is going to get an error and the server is not going to start:
>
>     LOG:  unrecognized configuration parameter "xxx" in file "/u/pgsql/data/postgresql.conf" line 1
>     FATAL:  configuration file "/u/pgsql/data/postgresql.conf" contains errors
>
> so, yeah, it isn't going to require an initdb, but it is going to
> require everyone to edit their postgresql.conf.

Only if you uncommented the value in the first place.

- Heikki

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

28 January 2014, 17:36:52

Bruce Momjian escribió:

> > I have no problem removing the parameter if required to. In that case,
> > I would like to leave the parameter in until mid beta, to allow
> > greater certainty.

Uhm.  If we remove a GUC during beta we don't need to force an initdb.
At worst we will need to keep a no-op GUC variable that is removed in
the next devel cycle.  That said, if we're going to have a GUC that's
going to disappear later, I think it's better to wait for 2 releases as
proposed, not remove mid-beta.

> > In any case, I would wish to retain as a minimum an extern bool
> > variable allowing it to be turned off by C function if desired.

I think this amounts to having an undocumented GUC that is hard to
change.  I prefer the GUC, myself.

> Anything changed to postgresql.conf during beta is going to require an
> initdb.
> Also, lots of backward-compatibility infrastructure, as you are
> suggesting above, add complexity to the system.
> 
> I am basically against a GUC on this.  We have far larger problems with
> 9.3 than backward compatibility, and limited resources.

If we have a clear plan on removing the parameter, it's easy enough to
follow through.  I don't think lack of resources is a good argument,
because at that point there will be little to discuss and it's an easy
change to make.  And I think the plan is clear: if no bug is found the
parameter is removed.  If a bug is found, it is fixed and the parameter
is removed anyway.

Honestly, I would prefer that we push a patch that has been thoroughly
reviewed and in which we have more confidence, so that we can push
without a GUC.  This means mark in CF as needs-review, then some other
developer reviews it and marks it as ready-for-committer.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Bruce Momjian

Date:

28 January 2014, 17:42:01

On Tue, Jan 28, 2014 at 07:30:47PM +0200, Heikki Linnakangas wrote:
> On 01/28/2014 07:26 PM, Bruce Momjian wrote:
> >On Tue, Jan 28, 2014 at 07:21:50PM +0200, Heikki Linnakangas wrote:
> >>>>I have no problem removing the parameter if required to. In that case,
> >>>>I would like to leave the parameter in until mid beta, to allow
> >>>>greater certainty. In any case, I would wish to retain as a minimum an
> >>>>extern bool variable allowing it to be turned off by C function if
> >>>>desired.
> >>>
> >>>Anything changed to postgresql.conf during beta is going to require an
> >>>initdb.
> >>
> >>Huh? Surely not.
> >
> >Uh, if we ship beta1 with a GUC in postgresql.conf, and then we remove
> >support for the GUC in beta2, anyone starting a server initdb-ed with
> >beta1 is going to get an error and the server is not going to start:
> >
> >    LOG:  unrecognized configuration parameter "xxx" in file "/u/pgsql/data/postgresql.conf" line 1
> >    FATAL:  configuration file "/u/pgsql/data/postgresql.conf" contains errors
> >
> >so, yeah, it isn't going to require an initdb, but it is going to
> >require everyone to edit their postgresql.conf.
> 
> Only if you uncommented the value in the first place.

Oh, I had forgotten that.  Right.  It would still be odd to have a
commented-out line in postgresql.conf that was not support.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

28 January 2014, 17:44:16

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Honestly, I would prefer that we push a patch that has been thoroughly
> reviewed and in which we have more confidence, so that we can push
> without a GUC.  This means mark in CF as needs-review, then some other
> developer reviews it and marks it as ready-for-committer.

FWIW, that was the point I was trying to make as well ;-)
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

28 January 2014, 17:47:23

On 28 January 2014 17:15, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Jan 28, 2014 at 04:36:39PM +0000, Simon Riggs wrote:
>> For me, reducing the strength of DDL locking is a major change in
>> RDBMS behaviour that could both delight and surprise our users. Maybe
>> a few actually depend upon the locking behaviour, maybe. After some
>> years of various people looking at this, I think we've got it right.
>> Experience tells me that while I think this is the outcome, we are
>> well advised to protect against the possibility that it is not correct
>> and that if we have corner case issues, it would be good to easily
>> disable this in the field. In the current case, a simple parameter
>> works very well to disable the feature; in other cases, not.
>>
>> Summary: This is an atypical case. I do not normally propose such
>> things - this is the third time in 10 years, IIRC.
>
> Uh, in my memory, you are the person who is most likely to suggest a
> GUC of all our developers.

(smiles) I have suggested parameters for many features, mostly in the
early days of my developments before I saw the light if autotuning.
But those were control parameters for tuning features. So yes, I have
probably introduced more parameters than most, amongst the many things
I've done. I'm guessing you don't recall how much trouble I went to in
order to allow sync rep to have only 1 parameter, c'est la vie.

What I'm discussing here is a compatibility parameter to allow new
features to be disabled. This would be the third time in 10 years I
suggested a parameter for that reason, i.e. one that the user would
hardly ever wish to set.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

28 January 2014, 17:50:37

On 2014-01-27 15:25:22 -0500, Robert Haas wrote:
> On Mon, Jan 27, 2014 at 12:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > This version adds a GUC called ddl_exclusive_locks which allows us to
> > keep the 9.3 behaviour if we wish it. Some people may be surprised
> > that their programs don't wait in the same places they used to. We
> > hope that is a positive and useful behaviour, but it may not always be
> > so.
> >
> > I'll commit this on Thurs 30 Jan unless I hear objections.
> 
> I haven't reviewed the patch, but -1 for adding a GUC.

Dito. I don't think the patch in a bad shape otherwise. I'd very quickly
looked at a previous version and it did look rather sane, and several
other people had looked at earlier versions as well. IIRC Noah had a
fairly extensive look at some intricate race conditions...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

28 January 2014, 17:54:59

On 28 January 2014 17:21, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> I don't understand why anyone would want to turn this feature off, ie.
> require stronger locks than necessary for a DDL change.

Nobody would *want* to turn it off. They might need to, as explained.

> If we're not confident that the patch is correct, then it should not be
> applied. If we are confident enough to commit it, and a bug crops up later,
> we'll fix the bug as usual.

I'd like to point out here that my own customers are already well
covered by the support services we offer. They will receive any such
fix very quickly.

My proposal was of assistance only to those without such contracts in
place, as are many of my proposals.

It doesn't bother me at all if you insist it should not be added. Just
choose v16 of the patch for review rather than v17.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

29 January 2014, 05:04:57

On Tue, Jan 28, 2014 at 12:36 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Bruce Momjian escribió:
>> > I have no problem removing the parameter if required to. In that case,
>> > I would like to leave the parameter in until mid beta, to allow
>> > greater certainty.
>
> Uhm.  If we remove a GUC during beta we don't need to force an initdb.
> At worst we will need to keep a no-op GUC variable that is removed in
> the next devel cycle.  That said, if we're going to have a GUC that's
> going to disappear later, I think it's better to wait for 2 releases as
> proposed, not remove mid-beta.
>
>> > In any case, I would wish to retain as a minimum an extern bool
>> > variable allowing it to be turned off by C function if desired.
>
> I think this amounts to having an undocumented GUC that is hard to
> change.  I prefer the GUC, myself.
>
>> Anything changed to postgresql.conf during beta is going to require an
>> initdb.
>> Also, lots of backward-compatibility infrastructure, as you are
>> suggesting above, add complexity to the system.
>>
>> I am basically against a GUC on this.  We have far larger problems with
>> 9.3 than backward compatibility, and limited resources.
>
> If we have a clear plan on removing the parameter, it's easy enough to
> follow through.  I don't think lack of resources is a good argument,
> because at that point there will be little to discuss and it's an easy
> change to make.  And I think the plan is clear: if no bug is found the
> parameter is removed.  If a bug is found, it is fixed and the parameter
> is removed anyway.
>
> Honestly, I would prefer that we push a patch that has been thoroughly
> reviewed and in which we have more confidence, so that we can push
> without a GUC.  This means mark in CF as needs-review, then some other
> developer reviews it and marks it as ready-for-committer.

I also believe that would be the correct procedure.  Personally, I
think it would be great if Noah can review this, as he's very good at
finding the kind of problems that are likely to crop up here, and has
examined previous versions.  I also have some interest in looking at
it myself.  But I doubt that either of us (or any other senior hacker)
can do that by Thursday.  I think all such people are hip-deep in
other patches at the moment; I certainly am.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

29 January 2014, 05:43:56

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 28, 2014 at 12:36 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Honestly, I would prefer that we push a patch that has been thoroughly
>> reviewed and in which we have more confidence, so that we can push
>> without a GUC.  This means mark in CF as needs-review, then some other
>> developer reviews it and marks it as ready-for-committer.

> I also believe that would be the correct procedure.  Personally, I
> think it would be great if Noah can review this, as he's very good at
> finding the kind of problems that are likely to crop up here, and has
> examined previous versions.  I also have some interest in looking at
> it myself.  But I doubt that either of us (or any other senior hacker)
> can do that by Thursday.  I think all such people are hip-deep in
> other patches at the moment; I certainly am.

Yeah.  I actually have little doubt that the patch is sane on its own
terms of discussion, that is that Simon has chosen locking levels that
are mutually consistent in an abstract sense.  What sank the previous
iteration was that he'd failed to consider an implementation detail,
namely possible inconsistencies in SnapshotNow-based catalog fetches.
I'm afraid that there may be more such problems lurking under the
surface.  Noah's pretty good at finding such things, but really two
or three of us need to sit down and think about it for awhile before
I'd have much confidence in such a fundamental change.  And I sure don't
have time for this patch right now myself.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

29 January 2014, 19:54:35

On 29 January 2014 05:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jan 28, 2014 at 12:36 PM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>>> Honestly, I would prefer that we push a patch that has been thoroughly
>>> reviewed and in which we have more confidence, so that we can push
>>> without a GUC.  This means mark in CF as needs-review, then some other
>>> developer reviews it and marks it as ready-for-committer.
>
>> I also believe that would be the correct procedure.  Personally, I
>> think it would be great if Noah can review this, as he's very good at
>> finding the kind of problems that are likely to crop up here, and has
>> examined previous versions.  I also have some interest in looking at
>> it myself.  But I doubt that either of us (or any other senior hacker)
>> can do that by Thursday.  I think all such people are hip-deep in
>> other patches at the moment; I certainly am.
>
> Yeah.  I actually have little doubt that the patch is sane on its own
> terms of discussion, that is that Simon has chosen locking levels that
> are mutually consistent in an abstract sense.  What sank the previous
> iteration was that he'd failed to consider an implementation detail,
> namely possible inconsistencies in SnapshotNow-based catalog fetches.
> I'm afraid that there may be more such problems lurking under the
> surface.

Agreed

> Noah's pretty good at finding such things, but really two
> or three of us need to sit down and think about it for awhile before
> I'd have much confidence in such a fundamental change.  And I sure don't
> have time for this patch right now myself.

I've reviewed Noah's earlier comments, fixed things and also further
reviewed for any similar relcache related issues.

I've also reviewed Hot Standby to see if any locking issues arise,
since the ALTER TABLE now won't generate an AccessExclusiveLock as it
used to do for certain operations. I can't see any problems there.

While doing those reviews, I'd remind everybody that this only affects
roughly half of ALTER TABLE subcommands and definitely nothing that
affects SELECTs. So the risk profile is much less than it sounds at
first glance.

If anybody else has any hints or clues about where to look, please
mention them and I will investigate. Thanks.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

24 February 2014, 16:01:52

Hi,

I took a quick peek at this, and noticed the following things:
* I am pretty sure this patch doesn't compile anymore after the latest set of releases.
* This definitely should include isolationtester tests actually performing concurrent ALTER TABLEs. All that's
currentlythere is tests that the locklevel isn't too high, but not that it actually works.
 
* So far no DDL uses ShareRowExclusiveLocks, why do so now? Not sure if there aren't relevant cases for with foreign
keychecks (which afair *do* acquire SRE locks).
 
* Why is AddConstraint "so complicated" when it's all used SRE locks?
* Are you sure AlterConstraint is generally safe without an AEL? It should be safe to change whether something is
deferred,but not necessarily whether it's deferrable?
 
* Why does ChangeOwner need AEL?
* You changed several places to take out lower locks, which in itself is fine, but doesn't that lead to lock upgrade
hazardif a later step acquires a stronger lock? Or do we take out a strong enough lock from the beginning.
 
* There's no explanation why the EOXact TupleDesc stuff is needed?

That's it for now,

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

26 February 2014, 07:32:57

On 24 February 2014 16:01, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> I took a quick peek at this, and noticed the following things:
> * I am pretty sure this patch doesn't compile anymore after the latest
>   set of releases.

Refreshed to v18, will post shortly.

> * This definitely should include isolationtester tests actually
>   performing concurrent ALTER TABLEs. All that's currently there is
>   tests that the locklevel isn't too high, but not that it actually works.

There is no concurrent behaviour here, hence no code that would be
exercised by concurrent tests.

Lock levels are proven in regression tests, so no further tests needed.

> * So far no DDL uses ShareRowExclusiveLocks, why do so now? Not sure if
>   there aren't relevant cases for with foreign key checks (which afair
>   *do* acquire SRE locks).

That was discussed long ago. We can relax it more in the future, if
that is considered safe.

> * Why is AddConstraint "so complicated" when it's all used SRE locks?

To ensure each case was considered, rather than just assume all cases
are the same.

> * Are you sure AlterConstraint is generally safe without an AEL? It
>   should be safe to change whether something is deferred, but not
>   necessarily whether it's deferrable?

We change the lock levels for individual commands. This patch provides
some initial settings and infrastructure.

It is possible you are correct that changing the deferrability is not
safe without an AEL. That was enabled for the first time in this
release in a patch added by me after this patch was written. I will
think on that and change, if required.

> * Why does ChangeOwner need AEL?

Ownership affects privileges, which includes SELECTs, hence AEL.

This is not a critical aspect of the patch.

> * You changed several places to take out lower locks, which in itself is
>   fine, but doesn't that lead to lock upgrade hazard if a later step
>   acquires a stronger lock? Or do we take out a strong enough lock from
>   the beginning.

We asess the lock needed at parse time, then use it consistently
later. There is no lock upgrade hazard since that has been
specifically considered and removed.

> * There's no explanation why the EOXact TupleDesc stuff is needed?

I will update comments.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

26 February 2014, 13:38:51

Hi,

On 2014-02-26 07:32:45 +0000, Simon Riggs wrote:
> > * This definitely should include isolationtester tests actually
> >   performing concurrent ALTER TABLEs. All that's currently there is
> >   tests that the locklevel isn't too high, but not that it actually works.
> 
> There is no concurrent behaviour here, hence no code that would be
> exercised by concurrent tests.

Huh? There's most definitely new concurrent behaviour. Previously no
other backends could have a relation open (and locked) while it got
altered (which then sends out relcache invalidations). That's something
that should be tested.

> > * Why does ChangeOwner need AEL?
> 
> Ownership affects privileges, which includes SELECTs, hence AEL.

So?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

26 February 2014, 15:15:09

On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> On 2014-02-26 07:32:45 +0000, Simon Riggs wrote:
>> > * This definitely should include isolationtester tests actually
>> >   performing concurrent ALTER TABLEs. All that's currently there is
>> >   tests that the locklevel isn't too high, but not that it actually works.
>>
>> There is no concurrent behaviour here, hence no code that would be
>> exercised by concurrent tests.
>
> Huh? There's most definitely new concurrent behaviour. Previously no
> other backends could have a relation open (and locked) while it got
> altered (which then sends out relcache invalidations). That's something
> that should be tested.

It has been. High volume concurrent testing has been performed, per
Tom's original discussion upthread, but that's not part of the test
suite.
For other tests I have no guide as to how to write a set of automated
regression tests. Anything could cause a failure, so I'd need to write
an infinite set of tests to prove there is no bug *somewhere*. How
many tests are required? 0, 1, 3, 30?

>> > * Why does ChangeOwner need AEL?
>>
>> Ownership affects privileges, which includes SELECTs, hence AEL.
>
> So?

That reply could be added to any post. Please explain your concern.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

26 February 2014, 15:25:46

On 2014-02-26 15:15:00 +0000, Simon Riggs wrote:
> On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote:
> > Hi,
> >
> > On 2014-02-26 07:32:45 +0000, Simon Riggs wrote:
> >> > * This definitely should include isolationtester tests actually
> >> >   performing concurrent ALTER TABLEs. All that's currently there is
> >> >   tests that the locklevel isn't too high, but not that it actually works.
> >>
> >> There is no concurrent behaviour here, hence no code that would be
> >> exercised by concurrent tests.
> >
> > Huh? There's most definitely new concurrent behaviour. Previously no
> > other backends could have a relation open (and locked) while it got
> > altered (which then sends out relcache invalidations). That's something
> > that should be tested.
> 
> It has been. High volume concurrent testing has been performed, per
> Tom's original discussion upthread, but that's not part of the test
> suite.

Yea, that's not what I am looking for.

> For other tests I have no guide as to how to write a set of automated
> regression tests. Anything could cause a failure, so I'd need to write
> an infinite set of tests to prove there is no bug *somewhere*. How
> many tests are required? 0, 1, 3, 30?

I think some isolationtester tests for the most important changes in
lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT,
...  while a query is in progress in a nother session.

> >> > * Why does ChangeOwner need AEL?
> >>
> >> Ownership affects privileges, which includes SELECTs, hence AEL.
> >
> > So?
> 
> That reply could be added to any post. Please explain your concern.

I don't understand why that means it needs an AEL. After all,
e.g. changes in table inheritance do *not* require an AEL. I think it's
perfectly ok to not go for the minimally required locklevel for all
subcommands, but then it should be commented as such and not with
"change visible to SELECT" where other operations that do so as well
require lower locklevels.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

27 February 2014, 08:06:41

On 26 February 2014 07:32, Simon Riggs <simon@2ndquadrant.com> wrote:

>> * Are you sure AlterConstraint is generally safe without an AEL? It
>>   should be safe to change whether something is deferred, but not
>>   necessarily whether it's deferrable?
>
> We change the lock levels for individual commands. This patch provides
> some initial settings and infrastructure.
>
> It is possible you are correct that changing the deferrability is not
> safe without an AEL. That was enabled for the first time in this
> release in a patch added by me after this patch was written. I will
> think on that and change, if required.

Thoughts...

It would be a problem to change a DEFERRABLE constraint into a NOT
DEFERRABLE constraint concurrently with a transaction that was
currently deferring its constraint checks. It would not be a problem
to go in the other direction, relaxing the constraint attributes.

The patch uses ShareRowExclusive for AlterConstraint, so no writes are
possible concurrently with the table being ALTERed, hence the problem
situation cannot arise.

So in my understanding, no issue is possible.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

27 February 2014, 08:12:58

On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote:

>> >> > * Why does ChangeOwner need AEL?
>> >>
>> >> Ownership affects privileges, which includes SELECTs, hence AEL.
>> >
>> > So?
>>
>> That reply could be added to any post. Please explain your concern.
>
> I don't understand why that means it needs an AEL. After all,
> e.g. changes in table inheritance do *not* require an AEL. I think it's
> perfectly ok to not go for the minimally required locklevel for all
> subcommands, but then it should be commented as such and not with
> "change visible to SELECT" where other operations that do so as well
> require lower locklevels.

Those are two separate cases, with separate lock levels, so that
argument doesn't hold.

My understanding of the argument as to why Inheritance doesn't need
AEL is that adding/removing children is akin to inserting or deleting
rows from the parent.

Removing SELECT privilege while running a SELECT would be a different
matter.  This is all a matter of definition; we can make up any rules
we like. Doing so is IMHO a separate patch and not something to hold
up the main patch.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

27 February 2014, 08:48:55

On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-02-26 15:15:00 +0000, Simon Riggs wrote:
>> On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote:
>> > Hi,
>> >
>> > On 2014-02-26 07:32:45 +0000, Simon Riggs wrote:
>> >> > * This definitely should include isolationtester tests actually
>> >> >   performing concurrent ALTER TABLEs. All that's currently there is
>> >> >   tests that the locklevel isn't too high, but not that it actually works.
>> >>
>> >> There is no concurrent behaviour here, hence no code that would be
>> >> exercised by concurrent tests.
>> >
>> > Huh? There's most definitely new concurrent behaviour. Previously no
>> > other backends could have a relation open (and locked) while it got
>> > altered (which then sends out relcache invalidations). That's something
>> > that should be tested.
>>
>> It has been. High volume concurrent testing has been performed, per
>> Tom's original discussion upthread, but that's not part of the test
>> suite.
>
> Yea, that's not what I am looking for.
>
>> For other tests I have no guide as to how to write a set of automated
>> regression tests. Anything could cause a failure, so I'd need to write
>> an infinite set of tests to prove there is no bug *somewhere*. How
>> many tests are required? 0, 1, 3, 30?
>
> I think some isolationtester tests for the most important changes in
> lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT,
> ...  while a query is in progress in a nother session.

OK, I'll work on some tests.

v18 attached, with v19 coming soon

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v18.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

01 March 2014, 11:06:39

On 27 February 2014 08:48, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote:
>> On 2014-02-26 15:15:00 +0000, Simon Riggs wrote:
>>> On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote:
>>> > Hi,
>>> >
>>> > On 2014-02-26 07:32:45 +0000, Simon Riggs wrote:
>>> >> > * This definitely should include isolationtester tests actually
>>> >> >   performing concurrent ALTER TABLEs. All that's currently there is
>>> >> >   tests that the locklevel isn't too high, but not that it actually works.
>>> >>
>>> >> There is no concurrent behaviour here, hence no code that would be
>>> >> exercised by concurrent tests.
>>> >
>>> > Huh? There's most definitely new concurrent behaviour. Previously no
>>> > other backends could have a relation open (and locked) while it got
>>> > altered (which then sends out relcache invalidations). That's something
>>> > that should be tested.
>>>
>>> It has been. High volume concurrent testing has been performed, per
>>> Tom's original discussion upthread, but that's not part of the test
>>> suite.
>>
>> Yea, that's not what I am looking for.
>>
>>> For other tests I have no guide as to how to write a set of automated
>>> regression tests. Anything could cause a failure, so I'd need to write
>>> an infinite set of tests to prove there is no bug *somewhere*. How
>>> many tests are required? 0, 1, 3, 30?
>>
>> I think some isolationtester tests for the most important changes in
>> lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT,
>> ...  while a query is in progress in a nother session.
>
> OK, I'll work on some tests.
>
> v18 attached, with v19 coming soon

v19 complete apart from requested comment additions

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v19.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Vik Fearing

Date:

01 March 2014, 21:26:09

On 03/01/2014 12:06 PM, Simon Riggs wrote:
> On 27 February 2014 08:48, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote:
>>> On 2014-02-26 15:15:00 +0000, Simon Riggs wrote:
>>>> On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote:
>>>>> Hi,
>>>>>
>>>>> On 2014-02-26 07:32:45 +0000, Simon Riggs wrote:
>>>>>>> * This definitely should include isolationtester tests actually
>>>>>>>   performing concurrent ALTER TABLEs. All that's currently there is
>>>>>>>   tests that the locklevel isn't too high, but not that it actually works.
>>>>>> There is no concurrent behaviour here, hence no code that would be
>>>>>> exercised by concurrent tests.
>>>>> Huh? There's most definitely new concurrent behaviour. Previously no
>>>>> other backends could have a relation open (and locked) while it got
>>>>> altered (which then sends out relcache invalidations). That's something
>>>>> that should be tested.
>>>> It has been. High volume concurrent testing has been performed, per
>>>> Tom's original discussion upthread, but that's not part of the test
>>>> suite.
>>> Yea, that's not what I am looking for.
>>>
>>>> For other tests I have no guide as to how to write a set of automated
>>>> regression tests. Anything could cause a failure, so I'd need to write
>>>> an infinite set of tests to prove there is no bug *somewhere*. How
>>>> many tests are required? 0, 1, 3, 30?
>>> I think some isolationtester tests for the most important changes in
>>> lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT,
>>> ...  while a query is in progress in a nother session.
>> OK, I'll work on some tests.
>>
>> v18 attached, with v19 coming soon
> v19 complete apart from requested comment additions

I've started to look at this patch and re-read the thread.  The first
thing I noticed is what seems like an automated replace error.  The docs
say "This form requires only an SHARE x EXCLUSIVE lock." where the "an"
was not changed to "a".

Attached is a patch-on-patch to fix this.  A more complete review will
come later.

--
Vik

Attachment

reduce_lock_levels.v19b.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

02 March 2014, 09:51:00

On 1 March 2014 21:25, Vik Fearing <vik.fearing@dalibo.com> wrote:
> On 03/01/2014 12:06 PM, Simon Riggs wrote:
>> On 27 February 2014 08:48, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote:
>>>> On 2014-02-26 15:15:00 +0000, Simon Riggs wrote:
>>>>> On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 2014-02-26 07:32:45 +0000, Simon Riggs wrote:
>>>>>>>> * This definitely should include isolationtester tests actually
>>>>>>>>   performing concurrent ALTER TABLEs. All that's currently there is
>>>>>>>>   tests that the locklevel isn't too high, but not that it actually works.
>>>>>>> There is no concurrent behaviour here, hence no code that would be
>>>>>>> exercised by concurrent tests.
>>>>>> Huh? There's most definitely new concurrent behaviour. Previously no
>>>>>> other backends could have a relation open (and locked) while it got
>>>>>> altered (which then sends out relcache invalidations). That's something
>>>>>> that should be tested.
>>>>> It has been. High volume concurrent testing has been performed, per
>>>>> Tom's original discussion upthread, but that's not part of the test
>>>>> suite.
>>>> Yea, that's not what I am looking for.
>>>>
>>>>> For other tests I have no guide as to how to write a set of automated
>>>>> regression tests. Anything could cause a failure, so I'd need to write
>>>>> an infinite set of tests to prove there is no bug *somewhere*. How
>>>>> many tests are required? 0, 1, 3, 30?
>>>> I think some isolationtester tests for the most important changes in
>>>> lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT,
>>>> ...  while a query is in progress in a nother session.
>>> OK, I'll work on some tests.
>>>
>>> v18 attached, with v19 coming soon
>> v19 complete apart from requested comment additions
>
> I've started to look at this patch and re-read the thread.  The first
> thing I noticed is what seems like an automated replace error.  The docs
> say "This form requires only an SHARE x EXCLUSIVE lock." where the "an"
> was not changed to "a".
>
> Attached is a patch-on-patch to fix this.  A more complete review will
> come later.

v20 includes slightly re-ordered checks in GetLockLevel, plus more
detailed comments on each group of subcommands.

Also corrects grammar as noted by Vik.

Plus adds an example of usage to the docs.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v20.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

03 March 2014, 15:20:01

On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Removing SELECT privilege while running a SELECT would be a different
> matter.  This is all a matter of definition; we can make up any rules
> we like. Doing so is IMHO a separate patch and not something to hold
> up the main patch.

So I think this is an interesting point.  There are various things
that could go wrong as a result of using the wrong lock level.  Worst
would be that the server crashes or corrupts data.  A little less bad
would be that sessions error out with inexplicable error conditions,
as in SnapshotNow days.  Alternatively, we could just have arguably
wrong behavior, like basing query results on the old version of the
table's metadata even after it's been changed.

I don't really care about that second category of behavior.  If
somebody changes some property of a table and existing sessions
continue to use the old value until eoxact, well, we can argue about
that, but at least until we have concrete reports of really
undesirable behavior, I don't think it's the primary issue.  What I'm
really concerned about is whether there are other things like the
SnapshotNow issues that can cause stuff to halt and catch fire.  I
don't know whether there are or are not, but that's my concern.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

03 March 2014, 15:44:02

On 3 March 2014 15:19, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

> What I'm
> really concerned about is whether there are other things like the
> SnapshotNow issues that can cause stuff to halt and catch fire.  I
> don't know whether there are or are not, but that's my concern.

Of course its a concern, I feel it also. But that's why we have beta
period to handle the unknowns.

The question is are there any specific areas of concern here? If not,
then we commit because we've done a lot of work on it and at the
moment the balance is high benefit to users against a non-specific
feeling of risk.

@Noah - Last call...

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

03 March 2014, 15:53:33

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 3 March 2014 15:19, Robert Haas <robertmhaas@gmail.com> wrote:
>> What I'm
>> really concerned about is whether there are other things like the
>> SnapshotNow issues that can cause stuff to halt and catch fire.  I
>> don't know whether there are or are not, but that's my concern.

> Of course its a concern, I feel it also. But that's why we have beta
> period to handle the unknowns.

I have exactly zero faith that beta testing would catch low-probability
problems in this area.  What's needed, and hasn't happened AFAIK, is
detailed study of the patch by assorted senior hackers.

> The question is are there any specific areas of concern here? If not,
> then we commit because we've done a lot of work on it and at the
> moment the balance is high benefit to users against a non-specific
> feeling of risk.

This is backwards.  The default decision around here has never been
to commit when in doubt.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

03 March 2014, 16:06:50

On Sun, Mar 2, 2014 at 4:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> v20 includes slightly re-ordered checks in GetLockLevel, plus more
> detailed comments on each group of subcommands.
>
> Also corrects grammar as noted by Vik.
>
> Plus adds an example of usage to the docs.

This patch contains a one line change to
src/bin/pg_dump/pg_backup_archiver.c which seems not to belong.

This hunk in ATRewriteCatalogs() looks scary:

+       /*
+        * If we think we might need to add/re-add toast tables then
+        * we currently need to hold an AccessExclusiveLock.
+        */
+       if (lockmode < AccessExclusiveLock)
+               return;

It would make sense to me to add an Assert() or elog() check inside
the subsequent loop to verify that the lock level is adequate ... but
just returning silently seems like a bad idea.

I have my doubts about whether it's safe to do AT_AddInherit,
AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock.  All of those
can change the tuple descriptor, and we discussed, back when we did
this the first time, the fact that the executor may get *very* unhappy
if the tuple descriptor changes in mid-execution.  I strongly suspect
these are unsafe with less than a full AccessExclusiveLock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

03 March 2014, 16:23:06

On 3 March 2014 15:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 3 March 2014 15:19, Robert Haas <robertmhaas@gmail.com> wrote:
>>> What I'm
>>> really concerned about is whether there are other things like the
>>> SnapshotNow issues that can cause stuff to halt and catch fire.  I
>>> don't know whether there are or are not, but that's my concern.
>
>> Of course its a concern, I feel it also. But that's why we have beta
>> period to handle the unknowns.
>
> I have exactly zero faith that beta testing would catch low-probability
> problems in this area.  What's needed, and hasn't happened AFAIK, is
> detailed study of the patch by assorted senior hackers.
>
>> The question is are there any specific areas of concern here? If not,
>> then we commit because we've done a lot of work on it and at the
>> moment the balance is high benefit to users against a non-specific
>> feeling of risk.
>
> This is backwards.  The default decision around here has never been
> to commit when in doubt.

I'm not in doubt.

If anybody can give me some more pointers of things to look at, I will.

But I've looked and I can't see anything more.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

03 March 2014, 16:29:28

On 3 March 2014 16:06, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Mar 2, 2014 at 4:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> v20 includes slightly re-ordered checks in GetLockLevel, plus more
>> detailed comments on each group of subcommands.
>>
>> Also corrects grammar as noted by Vik.
>>
>> Plus adds an example of usage to the docs.
>
> This patch contains a one line change to
> src/bin/pg_dump/pg_backup_archiver.c which seems not to belong.
>
> This hunk in ATRewriteCatalogs() looks scary:
>
> +       /*
> +        * If we think we might need to add/re-add toast tables then
> +        * we currently need to hold an AccessExclusiveLock.
> +        */
> +       if (lockmode < AccessExclusiveLock)
> +               return;
>
> It would make sense to me to add an Assert() or elog() check inside
> the subsequent loop to verify that the lock level is adequate ... but
> just returning silently seems like a bad idea.

OK, I will check elog.

> I have my doubts about whether it's safe to do AT_AddInherit,
> AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock.  All of those
> can change the tuple descriptor, and we discussed, back when we did
> this the first time, the fact that the executor may get *very* unhappy
> if the tuple descriptor changes in mid-execution.  I strongly suspect
> these are unsafe with less than a full AccessExclusiveLock.

I'm happy to change those if you feel there is insufficient evidence.

I don't personally feel that it would matter to usability to keep
locks for those at AccessExclusiveLock, especially since they are
otherwise fast.

Some others might be kept higher also. I'm merely trying to balance
between requests to reduce to minimal theoretical level and fears that
anything less than AccessExclusiveLock is a problem.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

03 March 2014, 16:36:49

On Mon, Mar 3, 2014 at 11:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 March 2014 16:06, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sun, Mar 2, 2014 at 4:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> v20 includes slightly re-ordered checks in GetLockLevel, plus more
>>> detailed comments on each group of subcommands.
>>>
>>> Also corrects grammar as noted by Vik.
>>>
>>> Plus adds an example of usage to the docs.
>>
>> This patch contains a one line change to
>> src/bin/pg_dump/pg_backup_archiver.c which seems not to belong.
>>
>> This hunk in ATRewriteCatalogs() looks scary:
>>
>> +       /*
>> +        * If we think we might need to add/re-add toast tables then
>> +        * we currently need to hold an AccessExclusiveLock.
>> +        */
>> +       if (lockmode < AccessExclusiveLock)
>> +               return;
>>
>> It would make sense to me to add an Assert() or elog() check inside
>> the subsequent loop to verify that the lock level is adequate ... but
>> just returning silently seems like a bad idea.
>
> OK, I will check elog.
>
>> I have my doubts about whether it's safe to do AT_AddInherit,
>> AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock.  All of those
>> can change the tuple descriptor, and we discussed, back when we did
>> this the first time, the fact that the executor may get *very* unhappy
>> if the tuple descriptor changes in mid-execution.  I strongly suspect
>> these are unsafe with less than a full AccessExclusiveLock.
>
> I'm happy to change those if you feel there is insufficient evidence.

Not sure what that means, but yes, I think the lock levels on those
need to be increased.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

03 March 2014, 18:59:04

On Mon, Mar 03, 2014 at 10:19:55AM -0500, Robert Haas wrote:
> On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Removing SELECT privilege while running a SELECT would be a different
> > matter.  This is all a matter of definition; we can make up any rules
> > we like. Doing so is IMHO a separate patch and not something to hold
> > up the main patch.
> 
> So I think this is an interesting point.  There are various things
> that could go wrong as a result of using the wrong lock level.  Worst
> would be that the server crashes or corrupts data.  A little less bad
> would be that sessions error out with inexplicable error conditions,
> as in SnapshotNow days.  Alternatively, we could just have arguably
> wrong behavior, like basing query results on the old version of the
> table's metadata even after it's been changed.

I would order the concerns like this:

1. Data corruption
2. Transient, clearly-wrong answers without an error
3. Server crash
4. Catalog logical inconsistency
5. Inexplicable, transient errors
6. Valid behavior capable of surprising more than zero upgraders

> I don't really care about that second category of behavior.  If
> somebody changes some property of a table and existing sessions
> continue to use the old value until eoxact, well, we can argue about
> that, but at least until we have concrete reports of really
> undesirable behavior, I don't think it's the primary issue.  What I'm
> really concerned about is whether there are other things like the
> SnapshotNow issues that can cause stuff to halt and catch fire.  I
> don't know whether there are or are not, but that's my concern.

Since we can't know whether something qualifies as (2) or (6) without
analyzing it, I don't find waiting for user complaints to be a good strategy
here.  An ownership change not immediately affecting ACL checks does fall
under (6), for me.  (However, changing ownership without AccessExclusiveLock
might also create hazards in category (4) for concurrent DDL that performs
owner checks.)

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

03 March 2014, 19:00:28

On Mon, Mar 03, 2014 at 03:43:46PM +0000, Simon Riggs wrote:
> The question is are there any specific areas of concern here? If not,
> then we commit because we've done a lot of work on it and at the
> moment the balance is high benefit to users against a non-specific
> feeling of risk.
> 
> @Noah - Last call...

I am not specifically aware of any outstanding problems.  I have planned to
give this a close look, but it will be at least two weeks before I dig out far
enough to do so.  If that makes it a post-commit review, so be it.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

03 March 2014, 19:19:54

On 3 March 2014 18:57, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Mar 03, 2014 at 10:19:55AM -0500, Robert Haas wrote:
>> On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > Removing SELECT privilege while running a SELECT would be a different
>> > matter.  This is all a matter of definition; we can make up any rules
>> > we like. Doing so is IMHO a separate patch and not something to hold
>> > up the main patch.
>>
>> So I think this is an interesting point.  There are various things
>> that could go wrong as a result of using the wrong lock level.  Worst
>> would be that the server crashes or corrupts data.  A little less bad
>> would be that sessions error out with inexplicable error conditions,
>> as in SnapshotNow days.  Alternatively, we could just have arguably
>> wrong behavior, like basing query results on the old version of the
>> table's metadata even after it's been changed.
>
> I would order the concerns like this:
>
> 1. Data corruption
> 2. Transient, clearly-wrong answers without an error
> 3. Server crash
> 4. Catalog logical inconsistency
> 5. Inexplicable, transient errors
> 6. Valid behavior capable of surprising more than zero upgraders

I like your model for risk assessment. How can we apply it in detail
in a way that helps us decide? Or do we just go on gut feel?

My experience with mentioning such topics is that without structure it
results in an assessment of "unacceptable risk" just simply because
somebody has mentioned some scary words.


>> I don't really care about that second category of behavior.  If
>> somebody changes some property of a table and existing sessions
>> continue to use the old value until eoxact, well, we can argue about
>> that, but at least until we have concrete reports of really
>> undesirable behavior, I don't think it's the primary issue.  What I'm
>> really concerned about is whether there are other things like the
>> SnapshotNow issues that can cause stuff to halt and catch fire.  I
>> don't know whether there are or are not, but that's my concern.
>
> Since we can't know whether something qualifies as (2) or (6) without
> analyzing it, I don't find waiting for user complaints to be a good strategy
> here.  An ownership change not immediately affecting ACL checks does fall
> under (6), for me.  (However, changing ownership without AccessExclusiveLock
> might also create hazards in category (4) for concurrent DDL that performs
> owner checks.)

err, guys, you do realise that changing ownership is staying at
AccessExclusiveLock in this patch?
(and I haven't ever suggested lowering that).

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

03 March 2014, 21:18:32

On 3 March 2014 16:36, Robert Haas <robertmhaas@gmail.com> wrote:

>>> This hunk in ATRewriteCatalogs() looks scary:
>>>
>>> +       /*
>>> +        * If we think we might need to add/re-add toast tables then
>>> +        * we currently need to hold an AccessExclusiveLock.
>>> +        */
>>> +       if (lockmode < AccessExclusiveLock)
>>> +               return;
>>>
>>> It would make sense to me to add an Assert() or elog() check inside
>>> the subsequent loop to verify that the lock level is adequate ... but
>>> just returning silently seems like a bad idea.
>>
>> OK, I will check elog.
>>
>>> I have my doubts about whether it's safe to do AT_AddInherit,
>>> AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock.  All of those
>>> can change the tuple descriptor, and we discussed, back when we did
>>> this the first time, the fact that the executor may get *very* unhappy
>>> if the tuple descriptor changes in mid-execution.  I strongly suspect
>>> these are unsafe with less than a full AccessExclusiveLock.
>>
>> I'm happy to change those if you feel there is insufficient evidence.
>
> Not sure what that means, but yes, I think the lock levels on those
> need to be increased.

v21 with all requested changes, comments and cleanup

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v21.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Noah Misch

Date:

03 March 2014, 22:42:44

On Mon, Mar 03, 2014 at 07:19:45PM +0000, Simon Riggs wrote:
> On 3 March 2014 18:57, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Mar 03, 2014 at 10:19:55AM -0500, Robert Haas wrote:
> >> On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> > Removing SELECT privilege while running a SELECT would be a different
> >> > matter.  This is all a matter of definition; we can make up any rules
> >> > we like. Doing so is IMHO a separate patch and not something to hold
> >> > up the main patch.
> >>
> >> So I think this is an interesting point.  There are various things
> >> that could go wrong as a result of using the wrong lock level.  Worst
> >> would be that the server crashes or corrupts data.  A little less bad
> >> would be that sessions error out with inexplicable error conditions,
> >> as in SnapshotNow days.  Alternatively, we could just have arguably
> >> wrong behavior, like basing query results on the old version of the
> >> table's metadata even after it's been changed.
> >
> > I would order the concerns like this:
> >
> > 1. Data corruption
> > 2. Transient, clearly-wrong answers without an error
> > 3. Server crash
> > 4. Catalog logical inconsistency
> > 5. Inexplicable, transient errors
> > 6. Valid behavior capable of surprising more than zero upgraders
> 
> I like your model for risk assessment. How can we apply it in detail
> in a way that helps us decide? Or do we just go on gut feel?
> 
> My experience with mentioning such topics is that without structure it
> results in an assessment of "unacceptable risk" just simply because
> somebody has mentioned some scary words.

True; it's tough to make use of such a prioritization.  By the time you can
confidently assign something to a category, you can probably just fix it.
(Or, in the case of category (6), document/release-note it.)

Just to be clear, that list is not a commentary on the particular patch at
hand.  Those are merely the kinds of regressions to look for in a patch
affecting this area of the code.

> err, guys, you do realise that changing ownership is staying at
> AccessExclusiveLock in this patch?
> (and I haven't ever suggested lowering that).

Yep.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

04 March 2014, 00:15:38

Noah Misch <noah@leadboat.com> writes:
> Just to be clear, that list is not a commentary on the particular patch at
> hand.  Those are merely the kinds of regressions to look for in a patch
> affecting this area of the code.

A complaint on pgsql-bugs just now reminded me of a specific area that
needs to be looked at hard: how bad are the implications for pg_dump?

Up to now, pg_dump could be reasonably confident that once it had
AccessShareLock on every table it intended to dump, there would be no
schema changes happening on those tables until it got done.  This greatly
ameliorates the snapshot-skew problems that arise from its habit of doing
some things for itself and other things via backend-internal functions
(which historically used SnapshotNow and now use a fresh MVCC snapshot,
either way potentially quite newer than the transaction snapshot pg_dump's
own queries will use).

I suspect that lowering the lock requirements for anything that affects
what pg_dump can see is going to make things a whole lot worse in terms of
consistency of pg_dump output in the face of concurrent DDL.  Admittedly,
we're not perfect in that area now, but I'm not sure that's an adequate
excuse for making it worse.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

04 March 2014, 01:07:41

On 2014-03-03 19:15:27 -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Just to be clear, that list is not a commentary on the particular patch at
> > hand.  Those are merely the kinds of regressions to look for in a patch
> > affecting this area of the code.
> 
> A complaint on pgsql-bugs just now reminded me of a specific area that
> needs to be looked at hard: how bad are the implications for pg_dump?
> 
> Up to now, pg_dump could be reasonably confident that once it had
> AccessShareLock on every table it intended to dump, there would be no
> schema changes happening on those tables until it got done.

The guarantee wasn't actually that strong. It already was quite possible
that indexes got created/dropped during that time, which probably is the
by far most frequent DDL run in production.

> This greatly
> ameliorates the snapshot-skew problems that arise from its habit of doing
> some things for itself and other things via backend-internal functions
> (which historically used SnapshotNow and now use a fresh MVCC snapshot,
> either way potentially quite newer than the transaction snapshot pg_dump's
> own queries will use).

Yea, I wonder if we shouldn't start to make them use a different
snapshot. It's the pg_get_*def() functions, or is there something else?

Afair (I really haven't rechecked) all the actions that have a changed
locklevels affect things that pg_dump recreates clientside, using a
repeatable read snapshot, so there shouldn't be much change there?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

04 March 2014, 01:32:23

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-03 19:15:27 -0500, Tom Lane wrote:
>> This greatly
>> ameliorates the snapshot-skew problems that arise from its habit of doing
>> some things for itself and other things via backend-internal functions
>> (which historically used SnapshotNow and now use a fresh MVCC snapshot,
>> either way potentially quite newer than the transaction snapshot pg_dump's
>> own queries will use).

> Yea, I wonder if we shouldn't start to make them use a different
> snapshot. It's the pg_get_*def() functions, or is there something else?

See past discussions.  By the time you trace all of ruleutils.c's
dependencies, a dauntingly large fraction of the backend's basic catalog
access support is implicated.  For example, you'd need some way of making
the catcaches return data that they know to be outdated.  And I'm pretty
sure pg_dump is making free use of stuff that isn't even in ruleutils.

I would like to see pg_dump doing something much more bulletproof, but
I'm afraid that there isn't any nice simple fix available.

One relatively narrow fix that would probably make it a lot better
*in the current state of affairs* is to provide a way whereby, once
pg_dump has locks on everything it wants to dump, it can advance
its transaction snapshot to current time.  Then at least both its own
queries and the backend's lookups will see post-locking state.  However,
if AccessShareLock isn't enough to block DDL on the tables then we're
still hosed.

> Afair (I really haven't rechecked) all the actions that have a changed
> locklevels affect things that pg_dump recreates clientside, using a
> repeatable read snapshot, so there shouldn't be much change there?

You're missing the point entirely if you think pg_dump recreates
everything client-side.  It's never done that 100%, and we've migrated
more and more stuff to the server side over time to avoid duplicate
implementations of things like index expression decompilation.  So while
a theoretical answer would be to remove all of pg_dump's dependency on
server-side functions, I am pretty sure that's never going to happen.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

04 March 2014, 01:45:28

On 2014-03-03 20:32:13 -0500, Tom Lane wrote:
> > Afair (I really haven't rechecked) all the actions that have a changed
> > locklevels affect things that pg_dump recreates clientside, using a
> > repeatable read snapshot, so there shouldn't be much change there?
> 
> You're missing the point entirely if you think pg_dump recreates
> everything client-side. 

No, I am not obviously not thinking that. What I mean is that the things
that actually change their locking requirement in the proposed patch
primarily influence things that are reconstructed clientside by
pg_dump. E.g ALTER TABLE ... CLUSTER ON, SET(...), ...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

04 March 2014, 02:16:58

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-03 20:32:13 -0500, Tom Lane wrote:
>> You're missing the point entirely if you think pg_dump recreates
>> everything client-side. 

> No, I am not obviously not thinking that. What I mean is that the things
> that actually change their locking requirement in the proposed patch
> primarily influence things that are reconstructed clientside by
> pg_dump. E.g ALTER TABLE ... CLUSTER ON, SET(...), ...

[ raised eyebrow... ]  I'm pretty sure that no such constraint was
part of the design discussion to start with.  Even if it accidentally
happens to be the case now, it sounds utterly fragile.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

04 March 2014, 08:31:44

On 4 March 2014 01:07, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-03-03 19:15:27 -0500, Tom Lane wrote:
>> Noah Misch <noah@leadboat.com> writes:
>> > Just to be clear, that list is not a commentary on the particular patch at
>> > hand.  Those are merely the kinds of regressions to look for in a patch
>> > affecting this area of the code.
>>
>> A complaint on pgsql-bugs just now reminded me of a specific area that
>> needs to be looked at hard: how bad are the implications for pg_dump?
>>
>> Up to now, pg_dump could be reasonably confident that once it had
>> AccessShareLock on every table it intended to dump, there would be no
>> schema changes happening on those tables until it got done.
>
> The guarantee wasn't actually that strong. It already was quite possible
> that indexes got created/dropped during that time, which probably is the
> by far most frequent DDL run in production.

Good points.

In most cases, DDL is applied manually after careful thought, so
people seldom dump at the same time they upgrade the database. This is
especially true for pg_dump since it captures the logical definition
of tables. So most people will be happy with the default locking, but
we could make the lock level optional.

Currently we use AccessShareLock. Locking out all DDL, even with this
patch applied would only require ShareUpdateExclusiveLock.

Looking at the code, it will take about an hour to add an option to
pg_dump that specifies the lock level used when dumping. I would be
happy to include that as part of this patch.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Atri Sharma

Date:

04 March 2014, 08:39:32

Good points.

In most cases, DDL is applied manually after careful thought, so
people seldom dump at the same time they upgrade the database. This is
especially true for pg_dump since it captures the logical definition
of tables. So most people will be happy with the default locking, but
we could make the lock level optional.

Currently we use AccessShareLock. Locking out all DDL, even with this
patch applied would only require ShareUpdateExclusiveLock.

Looking at the code, it will take about an hour to add an option to
pg_dump that specifies the lock level used when dumping. I would be
happy to include that as part of this patch.

I think the use case for specifying multiple locks is pretty slim given that a ShareUpdateExclusiveLock is good enough mostly for everybody.

If its not the case, the user should be more careful about when he is scheduling backups to so that they dont conflict with DDL changes.

I am not too comfortable with exposing the locking type to the user. That may be just me though.

Regards,

Atri

Regards,

Atri

l'apprenant

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

04 March 2014, 09:31:27

On 4 March 2014 08:39, Atri Sharma <atri.jiit@gmail.com> wrote:
>
>>
>> Good points.
>>
>> In most cases, DDL is applied manually after careful thought, so
>> people seldom dump at the same time they upgrade the database. This is
>> especially true for pg_dump since it captures the logical definition
>> of tables. So most people will be happy with the default locking, but
>> we could make the lock level optional.
>>
>> Currently we use AccessShareLock. Locking out all DDL, even with this
>> patch applied would only require ShareUpdateExclusiveLock.
>>
>> Looking at the code, it will take about an hour to add an option to
>> pg_dump that specifies the lock level used when dumping. I would be
>> happy to include that as part of this patch.
>
>
>
> I think the use case for specifying multiple locks is pretty slim given that
> a ShareUpdateExclusiveLock is good enough mostly for everybody.

Increasing the lock strength would be a change in behaviour that might
adversely affect existing users.

> If its not the case, the user should be more careful about when he is
> scheduling backups to so that they dont conflict with DDL changes.

That is most certainly the wise choice.

> I am not too comfortable with exposing the locking type to the user. That
> may be just me though.

Why would that be a problem? Hard reasons, please.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Atri Sharma

Date:

04 March 2014, 09:51:50

> If its not the case, the user should be more careful about when he is
> scheduling backups to so that they dont conflict with DDL changes.

That is most certainly the wise choice.

> I am not too comfortable with exposing the locking type to the user. That
> may be just me though.

Why would that be a problem? Hard reasons, please.

Should we genuinely depend on the user's good judgement to decide the locking types?

Regards,

Atri

l'apprenant

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

04 March 2014, 11:57:18

On 4 March 2014 09:31, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 4 March 2014 08:39, Atri Sharma <atri.jiit@gmail.com> wrote:
>>
>>>
>>> Good points.
>>>
>>> In most cases, DDL is applied manually after careful thought, so
>>> people seldom dump at the same time they upgrade the database. This is
>>> especially true for pg_dump since it captures the logical definition
>>> of tables. So most people will be happy with the default locking, but
>>> we could make the lock level optional.
>>>
>>> Currently we use AccessShareLock. Locking out all DDL, even with this
>>> patch applied would only require ShareUpdateExclusiveLock.
>>>
>>> Looking at the code, it will take about an hour to add an option to
>>> pg_dump that specifies the lock level used when dumping. I would be
>>> happy to include that as part of this patch.
>>
>>
>>
>> I think the use case for specifying multiple locks is pretty slim given that
>> a ShareUpdateExclusiveLock is good enough mostly for everybody.
>
> Increasing the lock strength would be a change in behaviour that might
> adversely affect existing users.

The main impact I see is that this would block VACUUM while pg_dump runs.

But then, while pg_dump runs VACUUM is ineffective anyway so perhaps
that is no bad thing.

Autovacuum requests VACOPT_NOWAIT so would skip the relations being
dumped rather than waiting.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

04 March 2014, 12:19:03

On Tue, Mar 4, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The main impact I see is that this would block VACUUM while pg_dump runs.
>
> But then, while pg_dump runs VACUUM is ineffective anyway so perhaps
> that is no bad thing.

Well, a vacuum that's already running when pg_dump starts up may be
doing a lot of good, so it would be a shame to see pg_dump kill them
all off.

Also, this would put us in the surprising situation that you can't run
two simultaneous dumps of overlapping sets of tables, which doesn't
strike me as a great thing.

I'd really like to see us find a way to apply some version of this
patch.  I was in favor of the concept 3 years ago when we did this the
first time, and I've subsequently done quite a bit of work (viz., MVCC
catalog snapshots) to eliminate the main objection that was raised at
that time.  But it's really hard to reason about what might happen
with lowered lock levels, and convince yourself that there's
absolutely nothing that can ever go wrong.  I don't know what to do
about that tension, but I think even modest improvements in this area
stand to benefit an awful lot of users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Atri Sharma

Date:

04 March 2014, 13:04:06

I'd really like to see us find a way to apply some version of this
patch. I was in favor of the concept 3 years ago when we did this the
first time, and I've subsequently done quite a bit of work (viz., MVCC
catalog snapshots) to eliminate the main objection that was raised at
that time. But it's really hard to reason about what might happen
with lowered lock levels, and convince yourself that there's
absolutely nothing that can ever go wrong. I don't know what to do
about that tension, but I think even modest improvements in this area
stand to benefit an awful lot of users.

Wouldnt MVCC's strictness rules pose harder restrictions on pg_dump instead of relaxing them?

Regards,

Atri

Regards,

Atri

l'apprenant

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

04 March 2014, 13:59:48

On 4 March 2014 12:18, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 4, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The main impact I see is that this would block VACUUM while pg_dump runs.
>>
>> But then, while pg_dump runs VACUUM is ineffective anyway so perhaps
>> that is no bad thing.
>
> Well, a vacuum that's already running when pg_dump starts up may be
> doing a lot of good, so it would be a shame to see pg_dump kill them
> all off.
>
> Also, this would put us in the surprising situation that you can't run
> two simultaneous dumps of overlapping sets of tables, which doesn't
> strike me as a great thing.

These changes in concurrency are the most serious objections and a
definite change in previous behaviour. So we cannot pick a single lock
level that suits all goals the user may have.

> I'd really like to see us find a way to apply some version of this
> patch.  I was in favor of the concept 3 years ago when we did this the
> first time, and I've subsequently done quite a bit of work (viz., MVCC
> catalog snapshots) to eliminate the main objection that was raised at
> that time.  But it's really hard to reason about what might happen
> with lowered lock levels, and convince yourself that there's
> absolutely nothing that can ever go wrong.  I don't know what to do
> about that tension, but I think even modest improvements in this area
> stand to benefit an awful lot of users.

Agreed. The question is, which subset? The issue just raised would
affect whichever subset we choose, so reducing the scope of the patch
does nothing to the impact of the pg_dump issue.

I will add the option to change lock level for pg_dump. It's simple to
use, clear as to why it would be needed and effective at removing this
as an obstacle.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Stephen Frost

Date:

04 March 2014, 14:48:42

* Atri Sharma (atri.jiit@gmail.com) wrote:
> If its not the case, the user should be more careful about when he is
> scheduling backups to so that they dont conflict with DDL changes.

I'm not following this as closely as I'd like to, but I wanted to voice
my opinion that this is just not acceptable as a general answer.  There
are a good many applications out there which do DDL as part of ongoing
activity (part of ETL, or something else) and still need to be able to
get a pg_dump done.  It's not a design I'd recommend, but I don't think
we get to just write it off either.
Thanks,
    Stephen

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Atri Sharma

Date:

04 March 2014, 15:49:51

On Tue, Mar 4, 2014 at 8:19 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Atri Sharma (atri.jiit@gmail.com) wrote:
> If its not the case, the user should be more careful about when he is
> scheduling backups to so that they dont conflict with DDL changes.

I'm not following this as closely as I'd like to, but I wanted to voice
my opinion that this is just not acceptable as a general answer. There
are a good many applications out there which do DDL as part of ongoing
activity (part of ETL, or something else) and still need to be able to
get a pg_dump done. It's not a design I'd recommend, but I don't think
we get to just write it off either.

Well, that will require something like MVCC or stricter locking in general. That is not in line with the aim of this patch, hence I raised this point.

Regards,

Atri

Regards,

Atri

l'apprenant

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

04 March 2014, 15:53:03

Stephen Frost escribió:
> * Atri Sharma (atri.jiit@gmail.com) wrote:
> > If its not the case, the user should be more careful about when he is
> > scheduling backups to so that they dont conflict with DDL changes.
> 
> I'm not following this as closely as I'd like to, but I wanted to voice
> my opinion that this is just not acceptable as a general answer.  There
> are a good many applications out there which do DDL as part of ongoing
> activity (part of ETL, or something else) and still need to be able to
> get a pg_dump done.  It's not a design I'd recommend, but I don't think
> we get to just write it off either.

Agreed -- "user caution" is a recipe for trouble, because these things
cannot always be planned in minute detail (or such planning creates an
excessive cost.)

One concern is schema changes that make a dump unrestorable, for
instance if there's a foreign key relationship between tables A and B,
such that pg_dump dumps the FK for table A but by the time it dumps
table B the unique index has gone and thus restoring the FK fails.
If this is a realistic failure scenario, then we need some mechanism to
avoid it.

One possible idea would be to create a new lock level which conflicts
with DDL changes but not with regular operation including dumps; so it
wouldn't self-conflict but it would conflict with ShareUpdateExclusive.
pg_dump would acquire a lock of that level instead of AccessShare; thus
two pg_dumps would be able to run on the same table simultaneously, but
it would block and be blocked by DDL changes that grab SUE.  The big
hole in this is that pg_dump would still block vacuum, which is a
problem.  I hesitate two suggest two extra levels, one for dumps (which
wouldn't conflict with SUE) and one for non-exclusive DDL changes (which
would.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

04 March 2014, 16:25:26

On Tue, Mar 4, 2014 at 10:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> One possible idea would be to create a new lock level which conflicts
> with DDL changes but not with regular operation including dumps; so it
> wouldn't self-conflict but it would conflict with ShareUpdateExclusive.
> pg_dump would acquire a lock of that level instead of AccessShare; thus
> two pg_dumps would be able to run on the same table simultaneously, but
> it would block and be blocked by DDL changes that grab SUE.  The big
> hole in this is that pg_dump would still block vacuum, which is a
> problem.  I hesitate two suggest two extra levels, one for dumps (which
> wouldn't conflict with SUE) and one for non-exclusive DDL changes (which
> would.)

AFAIK, the only reason why vacuum takes ShareUpdateExclusive lock is
because it can't run at the same time as another vacuum.  I tend to
think (and have thought for some time) that we really ought to have
vacuum take AccessShareLock on the relation plus some other lock that
is specific to vacuum (say, a "relation vacuum" lock, just as we have
"relation extension" locks).

Your idea of a lock strong enough to conflict with DDL but not
self-conflicting is interesting, too, but I can't claim to have
thought through it all that carefully just yet.

I think this is all too late for 9.4, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

04 March 2014, 16:27:34

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> One concern is schema changes that make a dump unrestorable, for
> instance if there's a foreign key relationship between tables A and B,

Yeah.  Ideally, what pg_dump would produce would be a consistent snapshot
of the database state as of its transaction snapshot time.  We have always
had that guarantee so far as user data was concerned, but it's been shaky
(and getting worse) so far as the database schema is concerned.  What
bothers me about the current patch is that it's going to make it a whole
lot more worse.

Also, I don't have any love at all for proposals that we increase the lock
level that pg_dump holds.  pg_dump tends to run for a long time.

I've not been paying all that much attention to the logical-decoding
patches, but wasn't there something in there about being able to see
the catalog state as it was at some point in the past?  If so, maybe
we could leverage that to allow a backend to enter a "pg_dump state"
wherein its view of the catalogs was frozen at its transaction start
snapshot.  We'd have to restrict it to read-only operation for safety,
but that's surely no problem for pg_dump.  If we had that, then this
whole problem of server-side computations producing inconsistent
results would go away.

There might still be a window wherein tables visible at transaction start
could be dropped before AccessShareLock could be acquired, but I think
we could let pg_dump error out in that case.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Stephen Frost

Date:

04 March 2014, 16:37:26

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Yeah.  Ideally, what pg_dump would produce would be a consistent snapshot
> of the database state as of its transaction snapshot time.  We have always
> had that guarantee so far as user data was concerned, but it's been shaky
> (and getting worse) so far as the database schema is concerned.  What
> bothers me about the current patch is that it's going to make it a whole
> lot more worse.

> Also, I don't have any love at all for proposals that we increase the lock
> level that pg_dump holds.  pg_dump tends to run for a long time.

Agreed.

> I've not been paying all that much attention to the logical-decoding
> patches, but wasn't there something in there about being able to see
> the catalog state as it was at some point in the past?  If so, maybe
> we could leverage that to allow a backend to enter a "pg_dump state"
> wherein its view of the catalogs was frozen at its transaction start
> snapshot.  We'd have to restrict it to read-only operation for safety,
> but that's surely no problem for pg_dump.  If we had that, then this
> whole problem of server-side computations producing inconsistent
> results would go away.

That certainly sounds like a tempting idea.

> There might still be a window wherein tables visible at transaction start
> could be dropped before AccessShareLock could be acquired, but I think
> we could let pg_dump error out in that case.

I don't have too much of an issue with the above, but I would like to
have us figure out a solution to the deadlock problem with parallel
pg_dump.  The issue arises when pg_dump gets an AccessShareLock and then
another process attempts to acquire an AccessExclusiveLock, which then
blocks, and then the pg_dump worker process tries to get its
AccessShareLock- we end up not being able to make any progress on
anything at that point.

One suggestion that was discussed at PGConf.EU was having processes
which share the same snapshot (the pg_dump master and worker processes)
able to either share the same locks or at least be able to "jump" the
lock queue (that is, the worker process wouldn't have to wait being the
AEL to get an ASL, since the ASL was already aquired for the snapshot
which was exported and shared with it).
Thanks,
    Stephen

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

04 March 2014, 16:40:21

Robert Haas <robertmhaas@gmail.com> writes:
> I think this is all too late for 9.4, though.

I agree with the feeling that a meaningful fix for pg_dump isn't going
to get done for 9.4.  So that leaves us with the alternatives of
(1) put off the lock-strength-reduction patch for another year;
(2) push it anyway and accept a reduction in pg_dump reliability.

I don't care for (2).  I'd like to have lock strength reduction as
much as anybody, but it can't come at the price of reduction of
reliability.

The bigger picture here is that it seems like anytime I've thought
for more than five minutes about the lock strength reduction patch,
I've come up with some fundamental problem.  That doesn't leave me
with a warm feeling that we're getting close to having something
committable.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Alvaro Herrera

Date:

04 March 2014, 16:45:24

Tom Lane escribió:

> I'd like to have lock strength reduction as much as anybody, but it
> can't come at the price of reduction of reliability.

Can we have at least a cut-down version of it?  If we can just reduce
the lock level required for ALTER TABLE / VALIDATE, that would be an
enormous improvement already.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

04 March 2014, 16:46:22

Stephen Frost <sfrost@snowman.net> writes:
> I don't have too much of an issue with the above, but I would like to
> have us figure out a solution to the deadlock problem with parallel
> pg_dump.  The issue arises when pg_dump gets an AccessShareLock and then
> another process attempts to acquire an AccessExclusiveLock, which then
> blocks, and then the pg_dump worker process tries to get its
> AccessShareLock- we end up not being able to make any progress on
> anything at that point.

The original ASL was acquired by the pg_dump master, right?

> One suggestion that was discussed at PGConf.EU was having processes
> which share the same snapshot (the pg_dump master and worker processes)
> able to either share the same locks or at least be able to "jump" the
> lock queue (that is, the worker process wouldn't have to wait being the
> AEL to get an ASL, since the ASL was already aquired for the snapshot
> which was exported and shared with it).

Yeah, it seems like we need lock export not only snapshot export to make
this work nicely.  But that sounds orthogonal to the issues being
discussed in this thread.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Stephen Frost

Date:

04 March 2014, 16:49:11

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I don't have too much of an issue with the above, but I would like to
> > have us figure out a solution to the deadlock problem with parallel
> > pg_dump.  The issue arises when pg_dump gets an AccessShareLock and then
> > another process attempts to acquire an AccessExclusiveLock, which then
> > blocks, and then the pg_dump worker process tries to get its
> > AccessShareLock- we end up not being able to make any progress on
> > anything at that point.
>
> The original ASL was acquired by the pg_dump master, right?

Yup.  It goes through and gets ASLs on everything first.

> > One suggestion that was discussed at PGConf.EU was having processes
> > which share the same snapshot (the pg_dump master and worker processes)
> > able to either share the same locks or at least be able to "jump" the
> > lock queue (that is, the worker process wouldn't have to wait being the
> > AEL to get an ASL, since the ASL was already aquired for the snapshot
> > which was exported and shared with it).
>
> Yeah, it seems like we need lock export not only snapshot export to make
> this work nicely.  But that sounds orthogonal to the issues being
> discussed in this thread.

Indeed, just figured I'd mention it since we're talking about
pg_dump-related locking.
Thanks,
    Stephen

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Atri Sharma

Date:

04 March 2014, 17:21:18

On Tue, Mar 4, 2014 at 10:20 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I don't have too much of an issue with the above, but I would like to
> > have us figure out a solution to the deadlock problem with parallel
> > pg_dump. The issue arises when pg_dump gets an AccessShareLock and then
> > another process attempts to acquire an AccessExclusiveLock, which then
> > blocks, and then the pg_dump worker process tries to get its
> > AccessShareLock- we end up not being able to make any progress on
> > anything at that point.
>
> The original ASL was acquired by the pg_dump master, right?

Yup. It goes through and gets ASLs on everything first.

> > One suggestion that was discussed at PGConf.EU was having processes
> > which share the same snapshot (the pg_dump master and worker processes)
> > able to either share the same locks or at least be able to "jump" the
> > lock queue (that is, the worker process wouldn't have to wait being the
> > AEL to get an ASL, since the ASL was already aquired for the snapshot
> > which was exported and shared with it).
>
> Yeah, it seems like we need lock export not only snapshot export to make
> this work nicely. But that sounds orthogonal to the issues being
> discussed in this thread.

Indeed, just figured I'd mention it since we're talking about
pg_dump-related locking.

What happens for foreign key constraints? For pg_dump, do we lock the tables referenced by the table which is being dumped right now? If that is the case, wouldnt MVCC based approach be the best way for this?

Please ignore if I said anything silly. I am just trying to understand how it works here.

Regards,

Atri

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Andres Freund

Date:

04 March 2014, 18:22:52

On 2014-03-04 11:40:10 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes: > I think this is all too
> late for 9.4, though.
> 
> I agree with the feeling that a meaningful fix for pg_dump isn't going
> to get done for 9.4.  So that leaves us with the alternatives of (1)
> put off the lock-strength-reduction patch for another year; (2) push
> it anyway and accept a reduction in pg_dump reliability.
> 
> I don't care for (2).  I'd like to have lock strength reduction as
> much as anybody, but it can't come at the price of reduction of
> reliability.

I am sorry, but I think this is vastly overstating the scope of the
pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the
amount of problems that has caused in the past is surprisingly low. If
such a frequently used command doesn't cause problems, why are you
assuming other commands to be that problematic? And I think it's hard to
argue that the proposed changes are more likely to cause problems.

Let's try to go at this a bit more methodically. The commands that -
afaics - change their locklevel due to latest patch (v21) are:

01) ALTER TABLE .. ADD CONSTRAINT
02) ALTER TABLE .. ADD CONSTRAINT ... USING
03) ALTER TABLE .. ENABLE | DISABLE [ REPLICA | ALWAYS ] TRIGGER [ ALL ]
04) ALTER TABLE .. ALTER CONSTRAINT
05) ALTER TABLE .. REPLICA IDENTITY
06) ALTER TABLE .. ALTER COLUMN .. SET NOT NULL (*not* DROP NULL)                cmd_lockmode = ShareRowExclusiveLock;

07) ALTER TABLE ... ALTER COLUMN ... SET STATISTICS
08) ALTER TABLE ... CLUSTER ON ...
09) ALTER TABLE ... SET WITHOUT CLUSTER
10) ALTER TABLE ... SET (...)
11) ALTER TABLE ... RESET (...)
12) ALTER TABLE ... ALTER COLUMN ... SET (...)
13) ALTER TABLE ... ALTER COLUMN ... RESET (...)
14) ALTER TABLE ... VALIDATE CONSTRAINT constraint_name                cmd_lockmode = ShareUpdateExclusiveLock;

I have my reservations about ADD CONSTRAINT (including SET NOT NULL)
being unproblematic (mostly because I haven't thought through possible
consquences for the planner making different choices with added
constraints).

From the perspective of pg_dump consistency, except ADD CONSTRAINT, none
of those seem to have graver possible consequences than CREATE INDEX
(and DROP INDEX CONCURRENTLY) already being unsafe.

In fact all of those should actually end up being *safer*, even ending
up always being dumped consistently since they are all reconstructed
clientside by pg_dump.
You argue elsewhere that that's a fragile coincidence. But so what, even
if it changes, the consequences still are going to be *far* less
significant than missing various index, trigger, and whatnot commands.

I think the set of problems you mention are going to be really important
when we someday get around to make stuff like ALTER TABLE ... ADD/DROP
COLUMN require lower lock levels, but that's not what's proposed.

Greetings,

Andres Freund

-- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

04 March 2014, 19:40:03

On 4 March 2014 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> One concern is schema changes that make a dump unrestorable, for
>> instance if there's a foreign key relationship between tables A and B,
>
> Yeah.  Ideally, what pg_dump would produce would be a consistent snapshot
> of the database state as of its transaction snapshot time.  We have always
> had that guarantee so far as user data was concerned, but it's been shaky
> (and getting worse) so far as the database schema is concerned.  What
> bothers me about the current patch is that it's going to make it a whole
> lot more worse.

While thinking this through it occurs to me that there is no problem at all.

Your earlier claim that the dump is inconsistent just isn't accurate.
We now have MVCC catalogs, so any dump is going to see a perfectly
consistent set of data plus DDL. OK the catalogs may change AFTER the
snapshot was taken for the dump, but then so can the data change -
that's just MVCC.

I was going to add an option to increase lock level, but I can't see
why you'd want it even. The dumps are consistent...

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

04 March 2014, 19:43:21

On March 4, 2014 8:39:55 PM CET, Simon Riggs <simon@2ndQuadrant.com> wrote:
>On 4 March 2014 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> One concern is schema changes that make a dump unrestorable, for
>>> instance if there's a foreign key relationship between tables A and
>B,
>>
>> Yeah.  Ideally, what pg_dump would produce would be a consistent
>snapshot
>> of the database state as of its transaction snapshot time.  We have
>always
>> had that guarantee so far as user data was concerned, but it's been
>shaky
>> (and getting worse) so far as the database schema is concerned.  What
>> bothers me about the current patch is that it's going to make it a
>whole
>> lot more worse.
>
>While thinking this through it occurs to me that there is no problem at
>all.
>
>Your earlier claim that the dump is inconsistent just isn't accurate.
>We now have MVCC catalogs, so any dump is going to see a perfectly
>consistent set of data plus DDL. OK the catalogs may change AFTER the
>snapshot was taken for the dump, but then so can the data change -
>that's just MVCC.
>
>I was going to add an option to increase lock level, but I can't see
>why you'd want it even. The dumps are consistent...

Mvcc scans only guarantee that individual scans are consistent, not that separate scans are. Each individual scan takes
anew snapshot if there's been ddl.
 

Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund                       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Tom Lane

Date:

04 March 2014, 20:08:54

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-04 11:40:10 -0500, Tom Lane wrote:
>> I don't care for (2).  I'd like to have lock strength reduction as
>> much as anybody, but it can't come at the price of reduction of
>> reliability.

> I am sorry, but I think this is vastly overstating the scope of the
> pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the
> amount of problems that has caused in the past is surprisingly low.

CREATE INDEX happens to be okay because pg_dump won't try to dump indexes
it doesn't see in its snapshot, ie the list of indexes to dump is created
client-side.  CREATE INDEX CONCURRENTLY, otoh, already did break pg_dump,
and we had to hack things to fix it; see commit
683abc73dff549e94555d4020dae8d02f32ed78b.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Robert Haas

Date:

04 March 2014, 20:54:28

On Tue, Mar 4, 2014 at 2:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 4 March 2014 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> One concern is schema changes that make a dump unrestorable, for
>>> instance if there's a foreign key relationship between tables A and B,
>>
>> Yeah.  Ideally, what pg_dump would produce would be a consistent snapshot
>> of the database state as of its transaction snapshot time.  We have always
>> had that guarantee so far as user data was concerned, but it's been shaky
>> (and getting worse) so far as the database schema is concerned.  What
>> bothers me about the current patch is that it's going to make it a whole
>> lot more worse.
>
> While thinking this through it occurs to me that there is no problem at all.
>
> Your earlier claim that the dump is inconsistent just isn't accurate.
> We now have MVCC catalogs, so any dump is going to see a perfectly
> consistent set of data plus DDL. OK the catalogs may change AFTER the
> snapshot was taken for the dump, but then so can the data change -
> that's just MVCC.

Unfortunately, this isn't correct.  The MVCC snapshots taken for
catalog scans are "instantaneous"; that is, we take a new, current
snapshot for each catalog scan.  If all of the ruleutils.c stuff were
using the transaction snapshot rather than instantaneous snapshots,
this would be right.  But as has been previously discussed, that's not
the case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

04 March 2014, 21:38:08

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 4, 2014 at 2:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Your earlier claim that the dump is inconsistent just isn't accurate.
>> We now have MVCC catalogs, so any dump is going to see a perfectly
>> consistent set of data plus DDL. OK the catalogs may change AFTER the
>> snapshot was taken for the dump, but then so can the data change -
>> that's just MVCC.

> Unfortunately, this isn't correct.  The MVCC snapshots taken for
> catalog scans are "instantaneous"; that is, we take a new, current
> snapshot for each catalog scan.  If all of the ruleutils.c stuff were
> using the transaction snapshot rather than instantaneous snapshots,
> this would be right.  But as has been previously discussed, that's not
> the case.

Yeah.  And that's *necessary* for catalog lookups in a normally
functioning backend, because we have to see latest data (eg, it wouldn't
do for a backend to fail to enforce a just-added CHECK constraint because
it was committed after the backend's transaction started).

However, it seems possible that we could have a mode in which a read-only
session did all its catalog fetches according to the transaction snapshot.
That would get us to a situation where the backend-internal lookups that
ruleutils relies on would give the same answers as queries done by
pg_dump.  Robert's work on getting rid of SnapshotNow has probably moved
that much closer than it was before, but it's still not exactly a trivial
patch.

Meanwhile, Andres claimed upthread that none of the currently-proposed
reduced-lock ALTER commands affect data that pg_dump is using ruleutils
to fetch.  If that's the case, then maybe this is a problem that we can
punt till later.  I've not gone through the list to verify it though.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Josh Berkus

Date:

04 March 2014, 22:29:46

On 03/04/2014 11:43 AM, Andres Freund wrote:
> On March 4, 2014 8:39:55 PM CET, Simon Riggs <simon@2ndQuadrant.com> wrote:
>> I was going to add an option to increase lock level, but I can't see
>> why you'd want it even. The dumps are consistent...
> 
> Mvcc scans only guarantee that individual scans are consistent, not that separate scans are. Each individual scan
takesa new snapshot if there's been ddl.
 
> 
> Andres
> 

I thought that we were sharing the same snapshot, for parallel dump?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

04 March 2014, 22:39:11

On 2014-03-04 14:29:31 -0800, Josh Berkus wrote:
> On 03/04/2014 11:43 AM, Andres Freund wrote:
> > On March 4, 2014 8:39:55 PM CET, Simon Riggs <simon@2ndQuadrant.com> wrote:
> >> I was going to add an option to increase lock level, but I can't see
> >> why you'd want it even. The dumps are consistent...
> > 
> > Mvcc scans only guarantee that individual scans are consistent, not that separate scans are. Each individual scan
takesa new snapshot if there's been ddl.
 

> I thought that we were sharing the same snapshot, for parallel dump?

That snapshot is about data, not the catalog. And no, we can't easily
reuse one for the other, see elsewhere in this thread for some of the
reasons.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Greg Stark

Date:

04 March 2014, 22:42:39

On Tue, Mar 4, 2014 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> CREATE INDEX CONCURRENTLY, otoh, already did break pg_dump,
> and we had to hack things to fix it; see commit
> 683abc73dff549e94555d4020dae8d02f32ed78b.

Well pg_dump was only broken in that there was a new catalog state to
deal with. But the commit you linked to was fixing pg_upgrade which
was broken because the on-disk schema was then out of sync with what
pg_dump would generate. But that should only matter for creating or
deleting whole relations.
-- 
greg

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Andres Freund

Date:

04 March 2014, 22:44:03

On 2014-03-04 16:37:48 -0500, Tom Lane wrote:
> However, it seems possible that we could have a mode in which a read-only
> session did all its catalog fetches according to the transaction snapshot.
> That would get us to a situation where the backend-internal lookups that
> ruleutils relies on would give the same answers as queries done by
> pg_dump.  Robert's work on getting rid of SnapshotNow has probably moved
> that much closer than it was before, but it's still not exactly a trivial
> patch.

The most interesting bit seems to be the cache invalidation handling. It
would need to be something like PushCatalogSnapshot() which disables
applying invals, including catchup interrupts and all. If the sinval
queue hasn't overflown while that snapshot was up, everything is fine we
just need to apply all pending invalidations, if it has, we need to do a
InvalidateSystemCaches().

> Meanwhile, Andres claimed upthread that none of the currently-proposed
> reduced-lock ALTER commands affect data that pg_dump is using ruleutils
> to fetch.  If that's the case, then maybe this is a problem that we can
> punt till later.  I've not gone through the list to verify it though.

I think it's true for all but T_AddConstraint, but I am wary about that
one anyway. But somebody else should definitely check the list.

I wonder if AddConstraintUsing would possibly be safe...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Tom Lane

Date:

04 March 2014, 22:46:31

Greg Stark <stark@mit.edu> writes:
> On Tue, Mar 4, 2014 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> CREATE INDEX CONCURRENTLY, otoh, already did break pg_dump,
>> and we had to hack things to fix it; see commit
>> 683abc73dff549e94555d4020dae8d02f32ed78b.

> Well pg_dump was only broken in that there was a new catalog state to
> deal with. But the commit you linked to was fixing pg_upgrade which
> was broken because the on-disk schema was then out of sync with what
> pg_dump would generate.

No, it was fixing cases that would cause problems with or without
pg_upgrade.  Arguably that patch made it worse for pg_upgrade, which
needed a followon patch (203d8ae2d).
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Tom Lane

Date:

04 March 2014, 22:54:10

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-04 16:37:48 -0500, Tom Lane wrote:
>> However, it seems possible that we could have a mode in which a read-only
>> session did all its catalog fetches according to the transaction snapshot.
>> That would get us to a situation where the backend-internal lookups that
>> ruleutils relies on would give the same answers as queries done by
>> pg_dump.  Robert's work on getting rid of SnapshotNow has probably moved
>> that much closer than it was before, but it's still not exactly a trivial
>> patch.

> The most interesting bit seems to be the cache invalidation handling. It
> would need to be something like PushCatalogSnapshot() which disables
> applying invals, including catchup interrupts and all. If the sinval
> queue hasn't overflown while that snapshot was up, everything is fine we
> just need to apply all pending invalidations, if it has, we need to do a
> InvalidateSystemCaches().

Yeah, at least within the transaction we would simply ignore invals.
To avoid causing sinval queue overrun (which would hurt everyone
system-wide), my inclination would be to just drop invals on the floor all
the time when in this mode, and forcibly do InvalidateSystemCaches at
transaction end.  For pg_dump, at least, there is no value in working any
harder than that, since it's going to quit at transaction end anyhow.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

05 March 2014, 09:28:55

On 4 March 2014 21:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Mar 4, 2014 at 2:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Your earlier claim that the dump is inconsistent just isn't accurate.
>>> We now have MVCC catalogs, so any dump is going to see a perfectly
>>> consistent set of data plus DDL. OK the catalogs may change AFTER the
>>> snapshot was taken for the dump, but then so can the data change -
>>> that's just MVCC.
>
>> Unfortunately, this isn't correct.  The MVCC snapshots taken for
>> catalog scans are "instantaneous"; that is, we take a new, current
>> snapshot for each catalog scan.  If all of the ruleutils.c stuff were
>> using the transaction snapshot rather than instantaneous snapshots,
>> this would be right.  But as has been previously discussed, that's not
>> the case.
>
> Yeah.  And that's *necessary* for catalog lookups in a normally
> functioning backend, because we have to see latest data (eg, it wouldn't
> do for a backend to fail to enforce a just-added CHECK constraint because
> it was committed after the backend's transaction started).

OK, thanks for explaining. A valuable point to note for us all.

> However, it seems possible that we could have a mode in which a read-only
> session did all its catalog fetches according to the transaction snapshot.
> That would get us to a situation where the backend-internal lookups that
> ruleutils relies on would give the same answers as queries done by
> pg_dump.  Robert's work on getting rid of SnapshotNow has probably moved
> that much closer than it was before, but it's still not exactly a trivial
> patch.
>
> Meanwhile, Andres claimed upthread that none of the currently-proposed
> reduced-lock ALTER commands affect data that pg_dump is using ruleutils
> to fetch.  If that's the case, then maybe this is a problem that we can
> punt till later.  I've not gone through the list to verify it though.

So that returns us to solving the catalog consistency problem in
pg_dump and similar applications.

We could

(1) change the lock taken by pg_dump to be ShareUpdateExclusive. As
discussed, this would be optional. (Trivial implementation)

The catalog accesses are all in a rather isolated piece of code in
pg_dump and run for a short period. That allows us to consider locking
*always* at ShareUpdateExclusive but only for the period of catalog
access and then release the higher level lock before transaction end.
Since pg_dump is a client program any action we take to resolve this
would need to be done in a user accessible way. That is acceptable
since there may be other user programs that wish/need to read a
consistent view of the definition of a table. This can be implemented
in a few ways:

(2) Implement a server function that allows you to lock a table for a
short duration. e.g. pg_lock_catalog(Oid) and pg_unlock_catalog(Oid).
We can already do this in server-side code, so this is simply a matter
of exposing that same functionality for users.

(3) A new variant of the LOCK command: LOCK CATALOG FOR tablename IN
lock mode MODE NOWAIT, which then would have a matching UNLOCK CATALOG
FOR tablename command. This is just a sugar coated version of (2).

(4) Implement functions to suspend invalidation message handling for a
short period. That's basically the same as (2) in profile. My feeling
is that sounds rather dangerous and not something I'd want to go near
now in in the future.

We tried to avoid locking the catalog some years back, which is how we
went off down this MVCC catalog access, which now seems to have been
something of a red-shifted herring. ISTM that the user would need to
specifically request a "consistent catalog".

Using (2) in pg_dump is pretty easy - patch attached. So we can solve
this problem completely in about another 1 hour of work, so I suggest
we implement (2) and be done.

Happy to document this in a new subsection of docs to describe how to
dump a consistent view of a database object from a user application.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

lock_catalog_for_pgdump.v1.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Simon Riggs

Date:

06 March 2014, 09:50:25

On 5 March 2014 09:28, Simon Riggs <simon@2ndquadrant.com> wrote:

> So that returns us to solving the catalog consistency problem in
> pg_dump and similar applications.

No answer, guys, and time is ticking away here.

I'd like to get a communal solution to this rather than just punting
the whole patch.

If we have to strip it down to the bar essentials, so be it. For me,
the biggest need here is to make VALIDATE CONSTRAINT take only a
ShareUpdateExclusiveLock while it runs. Almost everything else needs a
full AccessExclusiveLock anyway, or doesn't run for very long so isn't
a critical problem. (Perhaps we can then wrap ADD CONSTRAINT ... NOT
VALID and VALIDATE into a single command using the CONCURRENTLY
keyword so it runs two transactions to complete the task).

Validating FKs on big tables can take hours and it really isn't
acceptable for us to lock out access while we do that. FKs are
*supposed* to be a major reason people use RDBMS, so keeping them in a
state where they are effectively unusable is a major debilitating
point against adoption of PostgreSQL.

If there are issues with pg_dump we can just document them.

Guide me with your thoughts.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Noah Misch

Date:

06 March 2014, 22:43:47

On Tue, Mar 04, 2014 at 06:50:17PM +0100, Andres Freund wrote:
> On 2014-03-04 11:40:10 -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes: > I think this is all too
> > late for 9.4, though.
> > 
> > I agree with the feeling that a meaningful fix for pg_dump isn't going
> > to get done for 9.4.  So that leaves us with the alternatives of (1)
> > put off the lock-strength-reduction patch for another year; (2) push
> > it anyway and accept a reduction in pg_dump reliability.
> > 
> > I don't care for (2).  I'd like to have lock strength reduction as
> > much as anybody, but it can't come at the price of reduction of
> > reliability.
> 
> I am sorry, but I think this is vastly overstating the scope of the
> pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the
> amount of problems that has caused in the past is surprisingly low. If
> such a frequently used command doesn't cause problems, why are you
> assuming other commands to be that problematic? And I think it's hard to
> argue that the proposed changes are more likely to cause problems.
> 
> Let's try to go at this a bit more methodically. The commands that -
> afaics - change their locklevel due to latest patch (v21) are:
[snip]

Good analysis.  The hazards arise when pg_dump uses one of the ruleutils.c
deparse worker functions.  As a cross-check to your study, I looked briefly at
the use of those functions in pg_dump and how this patch might affect them:

-- pg_get_constraintdef()

pg_dump reads the constraint OID with its transaction snapshot, so we will
never see a too-new constraint.  Dropping a constraint still requires
AccessExclusiveLock.

Concerning VALIDATE CONSTRAINT, pg_dump reads convalidated with its
transaction snapshot and uses that to decide whether to dump the CHECK
constraint as part of the CREATE TABLE or as a separate ALTER TABLE ADD
CONSTRAINT following the data load.  However, pg_get_constraintdef() reads the
latest convalidated to decide whether to emit NOT VALID.  Consequently, one
can get a dump in which the dumped table data did not yet conform to the
constraint, and the ALTER TABLE ADD CONSTRAINT (w/o NOT VALID) fails.
(Suppose you deleted the last invalid rows just before executing the VALIDATE
CONSTRAINT.  I tested this by committing the DELETE + VALIDATE CONSTRAINT with
pg_dump stopped at getTableAttrs().)

One hacky, but maintainable and effective, solution to the VALIDATE CONSTRAINT
problem is to have pg_dump tack on a NOT VALID if pg_get_constraintdef() did
not do so.  It's, conveniently, the last part of the definition.  I would tend
to choose this.  We could also just decide this isn't bad enough to worry
about.  The consequence is that an ALTER TABLE ADD CONSTRAINT fails.  Assuming
no --single-transaction for the original restoral, you just add NOT VALID to
the command and rerun.  Like most of the potential new pg_dump problems, this
can already happen today if the relevant database changes happen between
taking the pg_dump transaction snapshot and locking the tables.

-- pg_get_expr() for default expressions

pg_dump reads pg_attrdef.adbin using its transaction snapshot, so it will
never see a too-new default.  This does allow us to read a dropped default.
That's not a problem directly.  However, suppose the default references a
function dropped at the same time as the default.  pg_dump could fail in
pg_get_expr().

-- pg_get_indexdef()

As you explained elsewhere, new indexes are no problem.  DROP INDEX still
requires AccessExclusiveLock.  Overall, no problems here.

-- pg_get_ruledef()

The patch changes lock requirements for enabling and disabling of rules, but
that is all separate from the rule expression handling.  No problems.

-- pg_get_triggerdef()

The patch reduces CREATE TRIGGER and DROP TRIGGER to ShareUpdateExclusiveLock.
The implications for pg_dump are similar to those for pg_get_expr().

-- pg_get_viewdef()

Untamed: pg_dump does not lock views at all.

One thing not to forget is that you can always get the old mutual exclusion
back by issuing LOCK TABLE just before a DDL operation.  If some unlucky user
regularly gets pg_dump failures due to concurrent DROP TRIGGER, he has a
workaround.  There's no comparable way for someone who would not experience
that problem to weaken the now-hardcoded AccessExclusiveLock.  Many
consequences of insufficient locking are too severe for that workaround to
bring comfort, but the pg_dump failure scenarios around pg_get_expr() and
pg_get_triggerdef() seem mild enough.  Restore-time failures are more serious,
hence my recommendation to put a hack in pg_dump around VALIDATE CONSTRAINT.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

06 March 2014, 23:49:23

On 6 March 2014 22:43, Noah Misch <noah@leadboat.com> wrote:

> Good analysis.  The hazards arise when pg_dump uses one of the ruleutils.c
> deparse worker functions.

Ah, good. We're thinking along the same lines. I was already working
on this analysis also. I'll complete mine and then compare notes.

> One thing not to forget is that you can always get the old mutual exclusion
> back by issuing LOCK TABLE just before a DDL operation.  If some unlucky user
> regularly gets pg_dump failures due to concurrent DROP TRIGGER, he has a
> workaround.  There's no comparable way for someone who would not experience
> that problem to weaken the now-hardcoded AccessExclusiveLock.

Good point.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

07 March 2014, 09:04:15

On 6 March 2014 22:43, Noah Misch <noah@leadboat.com> wrote:
> On Tue, Mar 04, 2014 at 06:50:17PM +0100, Andres Freund wrote:
>> On 2014-03-04 11:40:10 -0500, Tom Lane wrote:
>> > Robert Haas <robertmhaas@gmail.com> writes: > I think this is all too
>> > late for 9.4, though.
>> >
>> > I agree with the feeling that a meaningful fix for pg_dump isn't going
>> > to get done for 9.4.  So that leaves us with the alternatives of (1)
>> > put off the lock-strength-reduction patch for another year; (2) push
>> > it anyway and accept a reduction in pg_dump reliability.
>> >
>> > I don't care for (2).  I'd like to have lock strength reduction as
>> > much as anybody, but it can't come at the price of reduction of
>> > reliability.
>>
>> I am sorry, but I think this is vastly overstating the scope of the
>> pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the
>> amount of problems that has caused in the past is surprisingly low. If
>> such a frequently used command doesn't cause problems, why are you
>> assuming other commands to be that problematic? And I think it's hard to
>> argue that the proposed changes are more likely to cause problems.
>>
>> Let's try to go at this a bit more methodically. The commands that -
>> afaics - change their locklevel due to latest patch (v21) are:
> [snip]
>
> Good analysis.  The hazards arise when pg_dump uses one of the ruleutils.c
> deparse worker functions.  As a cross-check to your study, I looked briefly at
> the use of those functions in pg_dump and how this patch might affect them:
>
> -- pg_get_constraintdef()
>
> pg_dump reads the constraint OID with its transaction snapshot, so we will
> never see a too-new constraint.  Dropping a constraint still requires
> AccessExclusiveLock.

Agreed

> Concerning VALIDATE CONSTRAINT, pg_dump reads convalidated with its
> transaction snapshot and uses that to decide whether to dump the CHECK
> constraint as part of the CREATE TABLE or as a separate ALTER TABLE ADD
> CONSTRAINT following the data load.  However, pg_get_constraintdef() reads the
> latest convalidated to decide whether to emit NOT VALID.  Consequently, one
> can get a dump in which the dumped table data did not yet conform to the
> constraint, and the ALTER TABLE ADD CONSTRAINT (w/o NOT VALID) fails.
> (Suppose you deleted the last invalid rows just before executing the VALIDATE
> CONSTRAINT.  I tested this by committing the DELETE + VALIDATE CONSTRAINT with
> pg_dump stopped at getTableAttrs().)
>
> One hacky, but maintainable and effective, solution to the VALIDATE CONSTRAINT
> problem is to have pg_dump tack on a NOT VALID if pg_get_constraintdef() did
> not do so.  It's, conveniently, the last part of the definition.  I would tend
> to choose this.  We could also just decide this isn't bad enough to worry
> about.  The consequence is that an ALTER TABLE ADD CONSTRAINT fails.  Assuming
> no --single-transaction for the original restoral, you just add NOT VALID to
> the command and rerun.  Like most of the potential new pg_dump problems, this
> can already happen today if the relevant database changes happen between
> taking the pg_dump transaction snapshot and locking the tables.

Too hacky for me, but some good thinking. My proposed solution is below.

> -- pg_get_expr() for default expressions
>
> pg_dump reads pg_attrdef.adbin using its transaction snapshot, so it will
> never see a too-new default.  This does allow us to read a dropped default.
> That's not a problem directly.  However, suppose the default references a
> function dropped at the same time as the default.  pg_dump could fail in
> pg_get_expr().
>
> -- pg_get_indexdef()
>
> As you explained elsewhere, new indexes are no problem.  DROP INDEX still
> requires AccessExclusiveLock.  Overall, no problems here.
>
> -- pg_get_ruledef()
>
> The patch changes lock requirements for enabling and disabling of rules, but
> that is all separate from the rule expression handling.  No problems.
>
> -- pg_get_triggerdef()
>
> The patch reduces CREATE TRIGGER and DROP TRIGGER to ShareUpdateExclusiveLock.
> The implications for pg_dump are similar to those for pg_get_expr().

These are certainly concerning. What surprises me the most is that
pg_dump has been so happy to randomly mix SQL using the transaction
snapshot with sys cache access code using a different snapshot. If
that was intention there is no documentation in code or in the docs to
explain that.

> -- pg_get_viewdef()
>
> Untamed: pg_dump does not lock views at all.

OMG, its really a wonder pg_dump works at all.

> One thing not to forget is that you can always get the old mutual exclusion
> back by issuing LOCK TABLE just before a DDL operation.  If some unlucky user
> regularly gets pg_dump failures due to concurrent DROP TRIGGER, he has a
> workaround.  There's no comparable way for someone who would not experience
> that problem to weaken the now-hardcoded AccessExclusiveLock.  Many
> consequences of insufficient locking are too severe for that workaround to
> bring comfort, but the pg_dump failure scenarios around pg_get_expr() and
> pg_get_triggerdef() seem mild enough.  Restore-time failures are more serious,
> hence my recommendation to put a hack in pg_dump around VALIDATE CONSTRAINT.

The right thing to do here is to not push to the extremes. If we mess
too much with the ruleutil stuff it will just be buggy. A more
considered analysis in a later release is required for a full and
complete approach. As I indicated earlier, an 80/20 solution is better
for this release.

Slimming down the patch, I've removed changes to lock levels for
almost all variants. The only lock levels now reduced are those for
VALIDATE, plus setting of relation and attribute level options.

VALIDATE is implemented by calling pg_get_constraintdef_mvcc(), a
slightly modified variant of pg_get_constraintdef that uses the
transaction snapshot. I propose this rather than Noah's solution
solely because this will allow any user to request the MVCC data,
rather than implement a hack that only works for pg_dump. I will post
the patch later today.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

08 March 2014, 11:14:44

On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote:

> The right thing to do here is to not push to the extremes. If we mess
> too much with the ruleutil stuff it will just be buggy. A more
> considered analysis in a later release is required for a full and
> complete approach. As I indicated earlier, an 80/20 solution is better
> for this release.
>
> Slimming down the patch, I've removed changes to lock levels for
> almost all variants. The only lock levels now reduced are those for
> VALIDATE, plus setting of relation and attribute level options.
>
> VALIDATE is implemented by calling pg_get_constraintdef_mvcc(), a
> slightly modified variant of pg_get_constraintdef that uses the
> transaction snapshot. I propose this rather than Noah's solution
> solely because this will allow any user to request the MVCC data,
> rather than implement a hack that only works for pg_dump. I will post
> the patch later today.

Implemented in attached patch, v22

The following commands (only) are allowed with
ShareUpdateExclusiveLock, patch includes doc changes.

ALTER TABLE ... VALIDATE CONSTRAINT constraint_name
covered by isolation test, plus verified manually with pg_dump

ALTER TABLE ... ALTER COLUMN ... SET STATISTICS
ALTER TABLE ... ALTER COLUMN ... SET (...)
ALTER TABLE ... ALTER COLUMN ... RESET (...)

ALTER TABLE ... CLUSTER ON ...
ALTER TABLE ... SET WITHOUT CLUSTER
ALTER TABLE ... SET (...)
covered by isolation test

ALTER TABLE ... RESET (...)

ALTER INDEX ... SET (...)
ALTER INDEX ... RESET (...)

All other ALTER commands take AccessExclusiveLock

I commend this patch to you for final review; I would like to commit
this in a few days.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

reduce_lock_levels.v22.patch

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Vik Fearing

Date:

10 March 2014, 00:53:20

On 03/06/2014 10:47 AM, Simon Riggs wrote:
> On 5 March 2014 09:28, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> So that returns us to solving the catalog consistency problem in
>> pg_dump and similar applications.
> No answer, guys, and time is ticking away here.

Sorry, I've accumulated several days of backlog (and it'll only get
worse in the coming week) so I haven't had time to look at all this in
the detail I wanted to.

> I'd like to get a communal solution to this rather than just punting
> the whole patch.
>
> If we have to strip it down to the bar essentials, so be it. For me,
> the biggest need here is to make VALIDATE CONSTRAINT take only a
> ShareUpdateExclusiveLock while it runs. Almost everything else needs a
> full AccessExclusiveLock anyway, or doesn't run for very long so isn't
> a critical problem. (Perhaps we can then wrap ADD CONSTRAINT ... NOT
> VALID and VALIDATE into a single command using the CONCURRENTLY
> keyword so it runs two transactions to complete the task).
>
> Validating FKs on big tables can take hours and it really isn't
> acceptable for us to lock out access while we do that. FKs are
> *supposed* to be a major reason people use RDBMS, so keeping them in a
> state where they are effectively unusable is a major debilitating
> point against adoption of PostgreSQL.
>
> If there are issues with pg_dump we can just document them.
>
> Guide me with your thoughts.

I think committing VALIDATE CONSTRAINT is essential for 9.4; the rest
can be delayed until 9.5.  None of the discussion in this thread has
been about that subcommand, and I don't personally see a problem with it.

I don't care much about ADD CONSTRAINT CONCURRENTLY.  If it's there,
fine.  If it's not, that's fine, too.

My personal use case for this, and I even started writing a patch before
I realized I was re-writing this one, is adding a CHECK constraint NOT
VALID so that future commits respect it, then UPDATEing the existing
rows to "fix" them, and then VALIDATE CONSTRAINTing it.  There is zero
need for an AccessExclusiveLock on that last step.

My original idea was to concurrently create a partial index on the "bad"
rows, and then validate the constraint using that index.  The AEL would
only be held long enough to check if the index is empty or not. 
Obviously, reducing the lock level is a cleaner solution, so I'd like to
see that happen.

-- 
Vik

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

18 March 2014, 10:39:12

On 8 March 2014 11:14, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> The right thing to do here is to not push to the extremes. If we mess
>> too much with the ruleutil stuff it will just be buggy. A more
>> considered analysis in a later release is required for a full and
>> complete approach. As I indicated earlier, an 80/20 solution is better
>> for this release.
>>
>> Slimming down the patch, I've removed changes to lock levels for
>> almost all variants. The only lock levels now reduced are those for
>> VALIDATE, plus setting of relation and attribute level options.
>>
>> VALIDATE is implemented by calling pg_get_constraintdef_mvcc(), a
>> slightly modified variant of pg_get_constraintdef that uses the
>> transaction snapshot. I propose this rather than Noah's solution
>> solely because this will allow any user to request the MVCC data,
>> rather than implement a hack that only works for pg_dump. I will post
>> the patch later today.
>
> Implemented in attached patch, v22
>
> The following commands (only) are allowed with
> ShareUpdateExclusiveLock, patch includes doc changes.
>
> ALTER TABLE ... VALIDATE CONSTRAINT constraint_name
> covered by isolation test, plus verified manually with pg_dump
>
> ALTER TABLE ... ALTER COLUMN ... SET STATISTICS
> ALTER TABLE ... ALTER COLUMN ... SET (...)
> ALTER TABLE ... ALTER COLUMN ... RESET (...)
>
> ALTER TABLE ... CLUSTER ON ...
> ALTER TABLE ... SET WITHOUT CLUSTER
> ALTER TABLE ... SET (...)
> covered by isolation test
>
> ALTER TABLE ... RESET (...)
>
> ALTER INDEX ... SET (...)
> ALTER INDEX ... RESET (...)
>
> All other ALTER commands take AccessExclusiveLock
>
> I commend this patch to you for final review; I would like to commit
> this in a few days.

I'm planning to commit this today at 1500UTC barring objections or
negative reviews.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Noah Misch

Date:

18 March 2014, 21:21:31

On Tue, Mar 18, 2014 at 10:39:03AM +0000, Simon Riggs wrote:
> On 8 March 2014 11:14, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Implemented in attached patch, v22

> > I commend this patch to you for final review; I would like to commit
> > this in a few days.
> 
> I'm planning to commit this today at 1500UTC barring objections or
> negative reviews.

Not an objection, but FYI, I'm currently in the midst of reviewing this.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

19 March 2014, 09:52:38

On 18 March 2014 21:21, Noah Misch <noah@leadboat.com> wrote:
> On Tue, Mar 18, 2014 at 10:39:03AM +0000, Simon Riggs wrote:
>> On 8 March 2014 11:14, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > Implemented in attached patch, v22
>
>> > I commend this patch to you for final review; I would like to commit
>> > this in a few days.
>>
>> I'm planning to commit this today at 1500UTC barring objections or
>> negative reviews.
>
> Not an objection, but FYI, I'm currently in the midst of reviewing this.

Thanks, I'll holdoff until I hear from you.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Vik Fearing

Date:

20 March 2014, 08:01:36

On 03/18/2014 11:39 AM, Simon Riggs wrote:
> On 8 March 2014 11:14, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>>> The right thing to do here is to not push to the extremes. If we mess
>>> too much with the ruleutil stuff it will just be buggy. A more
>>> considered analysis in a later release is required for a full and
>>> complete approach. As I indicated earlier, an 80/20 solution is better
>>> for this release.
>>>
>>> Slimming down the patch, I've removed changes to lock levels for
>>> almost all variants. The only lock levels now reduced are those for
>>> VALIDATE, plus setting of relation and attribute level options.
>>>
>>> VALIDATE is implemented by calling pg_get_constraintdef_mvcc(), a
>>> slightly modified variant of pg_get_constraintdef that uses the
>>> transaction snapshot. I propose this rather than Noah's solution
>>> solely because this will allow any user to request the MVCC data,
>>> rather than implement a hack that only works for pg_dump. I will post
>>> the patch later today.
>> Implemented in attached patch, v22
>>
>> The following commands (only) are allowed with
>> ShareUpdateExclusiveLock, patch includes doc changes.
>>
>> ALTER TABLE ... VALIDATE CONSTRAINT constraint_name
>> covered by isolation test, plus verified manually with pg_dump
>>
>> ALTER TABLE ... ALTER COLUMN ... SET STATISTICS
>> ALTER TABLE ... ALTER COLUMN ... SET (...)
>> ALTER TABLE ... ALTER COLUMN ... RESET (...)
>>
>> ALTER TABLE ... CLUSTER ON ...
>> ALTER TABLE ... SET WITHOUT CLUSTER
>> ALTER TABLE ... SET (...)
>> covered by isolation test
>>
>> ALTER TABLE ... RESET (...)
>>
>> ALTER INDEX ... SET (...)
>> ALTER INDEX ... RESET (...)
>>
>> All other ALTER commands take AccessExclusiveLock
>>
>> I commend this patch to you for final review; I would like to commit
>> this in a few days.
> I'm planning to commit this today at 1500UTC barring objections or
> negative reviews.


At my current level of competence, this patch looks good to me.  I'm
looking forward to reading Noah's review to see what I may have missed.

The attached patch fixes two typos in the code comments.

--
Vik

Attachment

typos.patch

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Noah Misch

Date:

21 March 2014, 03:46:07

On Sat, Mar 08, 2014 at 11:14:30AM +0000, Simon Riggs wrote:
> On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote:
> > The right thing to do here is to not push to the extremes. If we mess
> > too much with the ruleutil stuff it will just be buggy. A more
> > considered analysis in a later release is required for a full and
> > complete approach. As I indicated earlier, an 80/20 solution is better
> > for this release.
> >
> > Slimming down the patch, I've removed changes to lock levels for
> > almost all variants. The only lock levels now reduced are those for
> > VALIDATE, plus setting of relation and attribute level options.

Good call.

> The following commands (only) are allowed with
> ShareUpdateExclusiveLock, patch includes doc changes.
> 
> ALTER TABLE ... VALIDATE CONSTRAINT constraint_name
> covered by isolation test, plus verified manually with pg_dump

I found a pre-existing bug aggravated by this, which I will shortly report on
a new thread.

> ALTER TABLE ... ALTER COLUMN ... SET STATISTICS
> ALTER TABLE ... ALTER COLUMN ... SET (...)
> ALTER TABLE ... ALTER COLUMN ... RESET (...)
> 
> ALTER TABLE ... CLUSTER ON ...
> ALTER TABLE ... SET WITHOUT CLUSTER

A comment at check_index_is_clusterable() still mentions "exclusive lock".

> ALTER TABLE ... SET (...)
> covered by isolation test
> 
> ALTER TABLE ... RESET (...)
> 
> ALTER INDEX ... SET (...)
> ALTER INDEX ... RESET (...)

See discussion below.

> All other ALTER commands take AccessExclusiveLock

> --- a/doc/src/sgml/mvcc.sgml
> +++ b/doc/src/sgml/mvcc.sgml
> @@ -865,7 +865,8 @@ ERROR:  could not serialize access due to read/write dependencies among transact
>          <para>
>           Acquired by <command>VACUUM</command> (without <option>FULL</option>),
>           <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
> -         some forms of <command>ALTER TABLE</command>.
> +         <command>ALTER TABLE VALIDATE</command> and other
> +         <command>ALTER TABLE</command> variants that set options.

ALTER TABLE's documentation covers the details, so the old text sufficed for
here.  I find "variants that set options" too vague, considering the nuances
of the actual list of affected forms.

>          </para>
>         </listitem>
>        </varlistentry>
> @@ -951,7 +952,7 @@ ERROR:  could not serialize access due to read/write dependencies among transact
>          </para>
>  
>          <para>
> -         Acquired by the <command>ALTER TABLE</>, <command>DROP TABLE</>,
> +         Acquired by the <command>ALTER TABLE</> for rewriting, <command>DROP TABLE</>,
>           <command>TRUNCATE</command>, <command>REINDEX</command>,
>           <command>CLUSTER</command>, and <command>VACUUM FULL</command>
>           commands.

Forms that rewrite the table are just one class of examples.  I would write
"Acquired by DROP TABLE, ..., VACUUM FULL, and some forms of ALTER TABLE."

> --- a/doc/src/sgml/ref/alter_table.sgml
> +++ b/doc/src/sgml/ref/alter_table.sgml

> @@ -420,6 +439,9 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
>        index specification from the table.  This affects
>        future cluster operations that don't specify an index.
>       </para>
> +     <para>
> +      Changing cluster options uses a <literal>SHARE UPDATE EXCLUSIVE</literal> lock.
> +     </para>
>      </listitem>
>     </varlistentry>
>  
> @@ -467,6 +489,10 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
>        FULL</>, <xref linkend="SQL-CLUSTER"> or one of the forms
>        of <command>ALTER TABLE</> that forces a table rewrite.
>       </para>
> +     <para>
> +      Changing storage parameters requires only a
> +      <literal>SHARE UPDATE EXCLUSIVE</literal> lock.
> +     </para>

Some places say "requires only", while others say "uses".  Please adopt one of
those consistently.  I somewhat prefer the latter.

> @@ -1075,6 +1105,14 @@ ALTER TABLE distributors ADD CONSTRAINT distfk FOREIGN KEY (address) REFERENCES
>    </para>
>  
>    <para>
> +   To add a foreign key constraint to a table minimising impact on other work:

Our documentation has used the "minimize" spelling exclusively.

> --- a/src/backend/catalog/toasting.c
> +++ b/src/backend/catalog/toasting.c

> @@ -52,7 +53,7 @@ static bool needs_toast_table(Relation rel);
>   * to end with CommandCounterIncrement if it makes any changes.
>   */
>  void
> -AlterTableCreateToastTable(Oid relOid, Datum reloptions)
> +AlterTableCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
>  {
>      Relation    rel;
>  
> @@ -63,10 +64,10 @@ AlterTableCreateToastTable(Oid relOid, Datum reloptions)
>       * concurrent readers of the pg_class tuple won't have visibility issues,
>       * so let's be safe.
>       */

The comment ending right here is falsified by the change.

> -    rel = heap_open(relOid, AccessExclusiveLock);
> +    rel = heap_open(relOid, lockmode);

We now request whatever lock our caller has already taken.  If that is
AccessExclusiveLock, create_toast_table() could actually add a toast table.
Otherwise, it will either deduce that no change is required or raise an error.

> @@ -161,6 +164,14 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
>          return false;
>  
>      /*
> +     * We need to create a toast table. We shouldn't
> +     * have got this far if we're in ALTER TABLE unless
> +     * we are holding AccessExclusiveLock.
> +     */
> +    if (lockmode < AccessExclusiveLock)
> +        elog(ERROR, "AccessExclusiveLock required to add toast tables.");

Our model of lock levels has not regarded them as being ordered, so I
recommend "lockmode != AccessExclusiveLock".  (I don't object to using
incidental order in AlterTableGetLockLevel(), where all the relevant levels
originate in the same function.)

> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -2687,8 +2687,7 @@ AlterTableLookupRelation(AlterTableStmt *stmt, LOCKMODE lockmode)
>   * The caller must lock the relation, with an appropriate lock level
>   * for the subcommands requested. Any subcommand that needs to rewrite
>   * tuples in the table forces the whole command to be executed with
> - * AccessExclusiveLock (actually, that is currently required always, but
> - * we hope to relax it at some point).    We pass the lock level down
> + * AccessExclusiveLock.  We pass the lock level down

The set of commands needing AccessExclusiveLock is much broader than those
rewriting the table.  Consider replacing the quoted paragraph with something
like:
* The caller has locked the relation with "lockmode", which must be at least* as strong as
AlterTableGetLockLevel(stmt->cmds). We pass the lock level* down so ...

> @@ -2750,35 +2772,29 @@ AlterTableInternal(Oid relid, List *cmds, bool recurse)
>   * ALTER TABLE RENAME which is treated as a different statement type T_RenameStmt
>   * and does not travel through this section of code and cannot be combined with
>   * any of the subcommands given here.
> + *
> + * Note that Hot Standby only knows about AccessExclusiveLocks on the master
> + * so any changes that might affect SELECTs running on standbys need to use
> + * AccessExclusiveLocks even if you think a lesser lock would do, unless you
> + * have a solution for that also.

Out of curiosity, do SELECTs on hot standbys impose known challenges in this
area not shared with local SELECTs?

> + *
> + * Also note that pg_dump uses only an AccessShareLock, meaning that anything
> + * that takes a lock less than AccessExclusiveLock can change object definitions
> + * while pg_dump is running. Be careful to check that the appropriate data is
> + * derived by pg_dump using an MVCC snapshot, rather than syscache lookups,
> + * otherwise we might end up with an inconsistent dump that can't restore.
> + *
> + * Be careful to ensure this function is called for Tables and Indexes only.
> + * It is not currently safe to be called for Views because security_barrier
> + * is listed as an option and so would be allowed to be set at a level lower
> + * than AccessExclusiveLock, which would not be correct.

This statement is accepted and takes only ShareUpdateExclusiveLock:
 alter table information_schema.triggers set (security_barrier = true);

I suggest adding a LOCKMODE field to relopt_gen and adding a
reloptions_locklevel() function to determine the required level from an
options list.  That puts control of the lock level near the code that
understands the implications for each option.  You can then revert the
addition of AlterViewInternal() and some of the utility.c changes.

If I had to pick a reloption most likely to benefit from AccessExclusiveLock,
it would be user_catalog_table rather than security_barrier.  I wonder if
output plugins would want a clear moment in the WAL stream -- the commit of
the transaction that sets the reloption -- after which they can assume that
all future records pertaining to the table in question have the needed bits.
If so, that implies AccessExclusiveLock for this reloption, because
heap_page_prune_opt() issues affected WAL records under AccessShareLock.

AT_ReplaceRelOptions could require the highest lock level required by any
option.  It usage seems to be limited to CREATE OR REPLACE VIEW, so I would
give it AccessExclusiveLock and be done.

> -     * 2. Relcache needs to be internally consistent, so unless we lock the
> -     * definition during reads we have no way to guarantee that.

I looked for hazards like this, but I found none in the ALTER forms covered by
this patch.  None of them modify multiple catalog rows affecting the same
relcache entry.  However, thinking about that did lead me to ponder another
class of hazards.  When backends can use one or more relations concurrently
with a DDL operation affecting those relations, those backends can find
themselves running with a subset of the catalog changes made within a
particular DDL operation.  Consider VALIDATE CONSTRAINT against an inherited
constraint of an inheritance parent.  It validates child table constraints,
modifying one catalog row per table.  At COMMIT time, we queue sinval messages
for all affected tables.  We add to the queue in atomic groups of
WRITE_QUANTUM (64) messages.  Between two such groups joining the queue,
another backend may process the first group of messages.  If the original DDL
used AccessExclusiveLock, this is always harmless.  The DDL-issuing backend
still holds its lock, which means the inval-accepting backend must not have
the relation open.  If the inval-accepting backend later opens the affected
relation, it will first acquire some lock and process the rest of the
invalidations from the DDL operation.  When doing DDL under a weaker lock, the
inval-accepting backend might apply half the invalidations and immediately use
them in the context of an open relation.  For VALIDATE CONSTRAINT, this means
a backend might briefly recognize only a subset of the inheritance tree
becoming valid.  (I did not actually build a test case to confirm this.)

Considering that constraint exclusion is the sole consumer of
convalidated/ccvalid that can run in parallel with VALIDATE CONSTRAINT, I
think this is harmless.  I did not find problems of this nature in any ALTER
TABLE forms affected by the patch.  Let's just keep it in mind during future
lock level changes.

> --- a/src/backend/utils/adt/ruleutils.c
> +++ b/src/backend/utils/adt/ruleutils.c

>  static char *
>  pg_get_constraintdef_worker(Oid constraintId, bool fullCommand,
> -                            int prettyFlags)
> +                            int prettyFlags, bool useSyscache)
>  {
>      HeapTuple    tup;
>      Form_pg_constraint conForm;
>      StringInfoData buf;
> +    SysScanDesc    scandesc;
> +    ScanKeyData scankey[1];
> +    Relation    relation;

gcc 4.2.4 gives me these warnings:

ruleutils.c: In function `pg_get_constraintdef_worker':
ruleutils.c:1318: warning: `relation' may be used uninitialized in this function
ruleutils.c:1316: warning: `scandesc' may be used uninitialized in this function

> +
> +    if (useSyscache)
> +        tup = SearchSysCache1(CONSTROID, ObjectIdGetDatum(constraintId));
> +    else
> +    {
> +        Snapshot    snapshot = RegisterSnapshot(GetTransactionSnapshot());
> +        PushActiveSnapshot(snapshot);

There's no need to push the snapshot; registration is enough.

> +
> +        relation = heap_open(ConstraintRelationId, AccessShareLock);
> +
> +        ScanKeyInit(&scankey[0],
> +                    ObjectIdAttributeNumber,
> +                    BTEqualStrategyNumber, F_OIDEQ,
> +                    ObjectIdGetDatum(constraintId));
> +
> +        scandesc = systable_beginscan(relation,
> +                                        ConstraintOidIndexId,
> +                                        true,
> +                                        snapshot,
> +                                        1,
> +                                        scankey);
> +        tup = systable_getnext(scandesc);
> +
> +        PopActiveSnapshot();
> +        UnregisterSnapshot(snapshot);
> +    }

Later code uses SysCacheGetAttr() regardless of how we acquired the tuple.
That works, but it deserves a brief comment mention.

pg_get_constraintdef_mvcc() still does syscache lookups by way of
decompile_column_index_array(), get_constraint_index(), and
deparse_expression_pretty().  It uses MVCC for things that matter for pg_dump
vs. reduced lock levels, but not comprehensively.  I recommend not adding a
new function and instead changing pg_get_constraintdef() to use the
transaction snapshot unconditionally.  It's hard to imagine an application
author making a reasoned choice between the two.  Our in-tree callers, psql
and pg_dump, have no cause to prefer the less-MVCC behavior at any time.

> --- a/src/backend/utils/cache/relcache.c
> +++ b/src/backend/utils/cache/relcache.c
> @@ -161,6 +161,14 @@ static bool eoxact_list_overflowed = false;
>              eoxact_list_overflowed = true; \
>      } while (0)
>  
> +/*
> + * EOXactTupleDescArray stores *TupleDescs that (might) need AtEOXact

Stray asterisk.

> @@ -2312,6 +2334,37 @@ RelationCloseSmgrByOid(Oid relationId)
>      RelationCloseSmgr(relation);
>  }
>  
> +void
> +RememberToFreeTupleDescAtEOX(TupleDesc td)
> +{
> +    if (EOXactTupleDescArray == NULL)
> +    {
> +        MemoryContext    oldcxt;
> +        oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
> +
> +        EOXactTupleDescArray = (TupleDesc *) palloc(16 * sizeof(TupleDesc));
> +        EOXactTupleDescArrayLen = 16;
> +        NextEOXactTupleDescNum = 0;
> +        MemoryContextSwitchTo(oldcxt);
> +    }
> +    else if (NextEOXactTupleDescNum >= EOXactTupleDescArrayLen)
> +    {
> +        MemoryContext    oldcxt;
> +        int32           newlen = EOXactTupleDescArrayLen * 2;
> +
> +        oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
> +
> +        Assert(EOXactTupleDescArrayLen > 0);
> +
> +        EOXactTupleDescArray = (TupleDesc *) repalloc(EOXactTupleDescArray,
> +                                                        newlen * sizeof(TupleDesc));
> +        EOXactTupleDescArrayLen = newlen;
> +        MemoryContextSwitchTo(oldcxt);

No need to switch contexts for repalloc().

> --- /dev/null
> +++ b/src/test/isolation/specs/alter-table-1.spec
> @@ -0,0 +1,34 @@
> +# ALTER TABLE - Add and Validate constraint with concurrent writes
> +#
> +# ADD FOREIGN KEY allows a minimum of ShareRowExclusiveLock,
> +# so we mix writes with it to see what works or waits.
> +# VALIDATE allows a minimum of ShareUpdateExclusiveLock
> +# so we mix reads with it to see what works or waits

This comment is obsolete regarding the ADD FOREIGN KEY lock type.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

21 March 2014, 16:11:53

On 21 March 2014 03:45, Noah Misch <noah@leadboat.com> wrote:
> On Sat, Mar 08, 2014 at 11:14:30AM +0000, Simon Riggs wrote:
>> On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > The right thing to do here is to not push to the extremes. If we mess
>> > too much with the ruleutil stuff it will just be buggy. A more
>> > considered analysis in a later release is required for a full and
>> > complete approach. As I indicated earlier, an 80/20 solution is better
>> > for this release.
>> >
>> > Slimming down the patch, I've removed changes to lock levels for
>> > almost all variants. The only lock levels now reduced are those for
>> > VALIDATE, plus setting of relation and attribute level options.
>
> Good call.

Thanks for the review. I'll respond to each point on a later email but
looks nothing much major, apart from the point raised on separate
thread.


>> + * Be careful to ensure this function is called for Tables and Indexes only.
>> + * It is not currently safe to be called for Views because security_barrier
>> + * is listed as an option and so would be allowed to be set at a level lower
>> + * than AccessExclusiveLock, which would not be correct.
>
> This statement is accepted and takes only ShareUpdateExclusiveLock:
>
>   alter table information_schema.triggers set (security_barrier = true);

I find it hard to justify why we accept such a statement. Surely its a
bug when the named table turns out to be a view? Presumably ALTER
SEQUENCE and ALTER <other stuff> has checks for the correct object
type? OMG.


> I suggest adding a LOCKMODE field to relopt_gen and adding a
> reloptions_locklevel() function to determine the required level from an
> options list.  That puts control of the lock level near the code that
> understands the implications for each option.  You can then revert the
> addition of AlterViewInternal() and some of the utility.c changes.

Sure, that's how we code it, but I'm not sure we should introduce that
feature. The above weirdness is not itself justification.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Noah Misch

Date:

21 March 2014, 17:49:51

On Fri, Mar 21, 2014 at 04:11:12PM +0000, Simon Riggs wrote:
> On 21 March 2014 03:45, Noah Misch <noah@leadboat.com> wrote:
> > On Sat, Mar 08, 2014 at 11:14:30AM +0000, Simon Riggs wrote:

> Thanks for the review. I'll respond to each point on a later email but
> looks nothing much major, apart from the point raised on separate
> thread.

Yep.

> >> + * Be careful to ensure this function is called for Tables and Indexes only.
> >> + * It is not currently safe to be called for Views because security_barrier
> >> + * is listed as an option and so would be allowed to be set at a level lower
> >> + * than AccessExclusiveLock, which would not be correct.
> >
> > This statement is accepted and takes only ShareUpdateExclusiveLock:
> >
> >   alter table information_schema.triggers set (security_barrier = true);
> 
> I find it hard to justify why we accept such a statement. Surely its a
> bug when the named table turns out to be a view? Presumably ALTER
> SEQUENCE and ALTER <other stuff> has checks for the correct object
> type? OMG.

We've framed ALTER TABLE's relkind leniency as a historic artifact.  As a move
toward stricter checks, ALTER TABLE refused to operate on foreign tables in
9.1 and 9.2.  9.3 reversed that course, though.  For better or worse, ALTER
TABLE is nearly a union of the relation ALTER possibilities.  That choice is
well-entrenched.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

21 March 2014, 18:53:37

On 21 March 2014 17:49, Noah Misch <noah@leadboat.com> wrote:

>> >> + * Be careful to ensure this function is called for Tables and Indexes only.
>> >> + * It is not currently safe to be called for Views because security_barrier
>> >> + * is listed as an option and so would be allowed to be set at a level lower
>> >> + * than AccessExclusiveLock, which would not be correct.
>> >
>> > This statement is accepted and takes only ShareUpdateExclusiveLock:
>> >
>> >   alter table information_schema.triggers set (security_barrier = true);
>>
>> I find it hard to justify why we accept such a statement. Surely its a
>> bug when the named table turns out to be a view? Presumably ALTER
>> SEQUENCE and ALTER <other stuff> has checks for the correct object
>> type? OMG.
>
> We've framed ALTER TABLE's relkind leniency as a historic artifact.  As a move
> toward stricter checks, ALTER TABLE refused to operate on foreign tables in
> 9.1 and 9.2.  9.3 reversed that course, though.  For better or worse, ALTER
> TABLE is nearly a union of the relation ALTER possibilities.  That choice is
> well-entrenched.

By "well entrenched", I think you mean undocumented, untested, unintentional?

Do we think anyone *relies* on being able to say the word TABLE when
in fact they mean VIEW or SEQUENCE?

How is that artefact anything but a bug? i.e. is anyone going to stop
me fixing it?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

21 March 2014, 19:11:25

On 21 March 2014 03:45, Noah Misch <noah@leadboat.com> wrote:

>> + * Note that Hot Standby only knows about AccessExclusiveLocks on the master
>> + * so any changes that might affect SELECTs running on standbys need to use
>> + * AccessExclusiveLocks even if you think a lesser lock would do, unless you
>> + * have a solution for that also.
>
> Out of curiosity, do SELECTs on hot standbys impose known challenges in this
> area not shared with local SELECTs?

No, but locks less than AccessExclusiveLock won't happen at all, so
its a difference that if improperly handled could cause a bug.

Plus I wanted to indicate I'd thought about it.

>> -      * 2. Relcache needs to be internally consistent, so unless we lock the
>> -      * definition during reads we have no way to guarantee that.
>
> I looked for hazards like this, but I found none in the ALTER forms covered by
> this patch.  None of them modify multiple catalog rows affecting the same
> relcache entry.  However, thinking about that did lead me to ponder another
> class of hazards.  When backends can use one or more relations concurrently
> with a DDL operation affecting those relations, those backends can find
> themselves running with a subset of the catalog changes made within a
> particular DDL operation.  Consider VALIDATE CONSTRAINT against an inherited
> constraint of an inheritance parent.  It validates child table constraints,
> modifying one catalog row per table.  At COMMIT time, we queue sinval messages
> for all affected tables.  We add to the queue in atomic groups of
> WRITE_QUANTUM (64) messages.  Between two such groups joining the queue,
> another backend may process the first group of messages.  If the original DDL
> used AccessExclusiveLock, this is always harmless.  The DDL-issuing backend
> still holds its lock, which means the inval-accepting backend must not have
> the relation open.  If the inval-accepting backend later opens the affected
> relation, it will first acquire some lock and process the rest of the
> invalidations from the DDL operation.  When doing DDL under a weaker lock, the
> inval-accepting backend might apply half the invalidations and immediately use
> them in the context of an open relation.  For VALIDATE CONSTRAINT, this means
> a backend might briefly recognize only a subset of the inheritance tree
> becoming valid.  (I did not actually build a test case to confirm this.)
>
> Considering that constraint exclusion is the sole consumer of
> convalidated/ccvalid that can run in parallel with VALIDATE CONSTRAINT, I
> think this is harmless.  I did not find problems of this nature in any ALTER
> TABLE forms affected by the patch.  Let's just keep it in mind during future
> lock level changes.

I'll document

> pg_get_constraintdef_mvcc() still does syscache lookups by way of
> decompile_column_index_array(), get_constraint_index(), and
> deparse_expression_pretty().  It uses MVCC for things that matter for pg_dump
> vs. reduced lock levels, but not comprehensively.  I recommend not adding a
> new function and instead changing pg_get_constraintdef() to use the
> transaction snapshot unconditionally.

OK

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Noah Misch

Date:

21 March 2014, 20:58:37

On Fri, Mar 21, 2014 at 06:53:27PM +0000, Simon Riggs wrote:
> On 21 March 2014 17:49, Noah Misch <noah@leadboat.com> wrote:
> 
> >> >   alter table information_schema.triggers set (security_barrier = true);
> >>
> >> I find it hard to justify why we accept such a statement. Surely its a
> >> bug when the named table turns out to be a view? Presumably ALTER
> >> SEQUENCE and ALTER <other stuff> has checks for the correct object
> >> type? OMG.
> >
> > We've framed ALTER TABLE's relkind leniency as a historic artifact.  As a move
> > toward stricter checks, ALTER TABLE refused to operate on foreign tables in
> > 9.1 and 9.2.  9.3 reversed that course, though.  For better or worse, ALTER
> > TABLE is nearly a union of the relation ALTER possibilities.  That choice is
> > well-entrenched.
> 
> By "well entrenched", I think you mean undocumented, untested, unintentional?

It's deliberate; a -hackers discussion revisits it perhaps once a year.  The
ALTER VIEW documentation says:
 For historical reasons, ALTER TABLE can be used with views too; but the only variants of ALTER TABLE that are allowed
withviews are equivalent to the ones shown above.
 

ALTER INDEX and ALTER SEQUENCE say something similar.  

> Do we think anyone *relies* on being able to say the word TABLE when
> in fact they mean VIEW or SEQUENCE?

pg_dump emits statements that exercise it:
 psql -c 'create view v as select 1 as c; alter view v alter c set default 0;' pg_dump --table v | grep ALTER

> How is that artefact anything but a bug? i.e. is anyone going to stop
> me fixing it?

It's not the behavior I would choose for a new product, but I can't see
benefits sufficient to overturn previous decisions to keep it.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

21 March 2014, 21:23:40

On 21 March 2014 20:58, Noah Misch <noah@leadboat.com> wrote:
> On Fri, Mar 21, 2014 at 06:53:27PM +0000, Simon Riggs wrote:
>> On 21 March 2014 17:49, Noah Misch <noah@leadboat.com> wrote:
>>
>> >> >   alter table information_schema.triggers set (security_barrier = true);
>> >>
>> >> I find it hard to justify why we accept such a statement. Surely its a
>> >> bug when the named table turns out to be a view? Presumably ALTER
>> >> SEQUENCE and ALTER <other stuff> has checks for the correct object
>> >> type? OMG.
>> >
>> > We've framed ALTER TABLE's relkind leniency as a historic artifact.  As a move
>> > toward stricter checks, ALTER TABLE refused to operate on foreign tables in
>> > 9.1 and 9.2.  9.3 reversed that course, though.  For better or worse, ALTER
>> > TABLE is nearly a union of the relation ALTER possibilities.  That choice is
>> > well-entrenched.
>>
>> By "well entrenched", I think you mean undocumented, untested, unintentional?
>
> It's deliberate; a -hackers discussion revisits it perhaps once a year.  The
> ALTER VIEW documentation says:
>
>   For historical reasons, ALTER TABLE can be used with views too; but the only
>   variants of ALTER TABLE that are allowed with views are equivalent to the
>   ones shown above.
>
> ALTER INDEX and ALTER SEQUENCE say something similar.
>
>> Do we think anyone *relies* on being able to say the word TABLE when
>> in fact they mean VIEW or SEQUENCE?
>
> pg_dump emits statements that exercise it:
>
>   psql -c 'create view v as select 1 as c; alter view v alter c set default 0;'
>   pg_dump --table v | grep ALTER
>
>> How is that artefact anything but a bug? i.e. is anyone going to stop
>> me fixing it?
>
> It's not the behavior I would choose for a new product, but I can't see
> benefits sufficient to overturn previous decisions to keep it.

Speechless

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Tom Lane

Date:

21 March 2014, 23:36:39

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 21 March 2014 20:58, Noah Misch <noah@leadboat.com> wrote:
>> It's not the behavior I would choose for a new product, but I can't see
>> benefits sufficient to overturn previous decisions to keep it.

> Speechless

The key argument for not "fixing" this is that it would break existing
pg_dump files.  That's a pretty hard argument to overcome, unfortunately,
even if you're willing to blow off the possibility that client
applications might contain similar shortcuts.  We still do our best to
read dump files from the 7.0 era (see ConvertTriggerToFK() for one example
of going above and beyond for that); and every so often we do hear of
people trying to get data out of such ancient servers.  So even if you
went and fixed pg_dump tomorrow, it'd probably be ten or fifteen years
before people would let you stop reading dumps from existing versions.
        regards, tom lane

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

22 March 2014, 07:31:15

On 21 March 2014 23:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 21 March 2014 20:58, Noah Misch <noah@leadboat.com> wrote:
>>> It's not the behavior I would choose for a new product, but I can't see
>>> benefits sufficient to overturn previous decisions to keep it.
>
>> Speechless
>
> The key argument for not "fixing" this is that it would break existing
> pg_dump files.  That's a pretty hard argument to overcome, unfortunately,
> even if you're willing to blow off the possibility that client
> applications might contain similar shortcuts.  We still do our best to
> read dump files from the 7.0 era (see ConvertTriggerToFK() for one example
> of going above and beyond for that); and every so often we do hear of
> people trying to get data out of such ancient servers.  So even if you
> went and fixed pg_dump tomorrow, it'd probably be ten or fifteen years
> before people would let you stop reading dumps from existing versions.

Noah had already convinced me, but thank you for explaining the issue
behind that.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From

Simon Riggs

Date:

22 March 2014, 07:36:14

On 21 March 2014 16:11, Simon Riggs <simon@2ndquadrant.com> wrote:

>>> + * Be careful to ensure this function is called for Tables and Indexes only.
>>> + * It is not currently safe to be called for Views because security_barrier
>>> + * is listed as an option and so would be allowed to be set at a level lower
>>> + * than AccessExclusiveLock, which would not be correct.
>>
>> This statement is accepted and takes only ShareUpdateExclusiveLock:
>>
>>   alter table information_schema.triggers set (security_barrier = true);

>> I suggest adding a LOCKMODE field to relopt_gen and adding a
>> reloptions_locklevel() function to determine the required level from an
>> options list.  That puts control of the lock level near the code that
>> understands the implications for each option.  You can then revert the
>> addition of AlterViewInternal() and some of the utility.c changes.
>
> Sure, that's how we code it, but I'm not sure we should introduce that
> feature. The above weirdness is not itself justification.

OK, will follow this path. It's a good idea since it encourages
authors of new "options" to consider properly the lock level that will
be required.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE lock strength reduction patch is unsafe

From

Jim Nasby

Date:

22 March 2014, 17:55:45

On 2/26/14, 9:15 AM, Simon Riggs wrote:
> On 26 February 2014 13:38, Andres Freund<andres@2ndquadrant.com>  wrote:
>> >Hi,
>> >
>> >On 2014-02-26 07:32:45 +0000, Simon Riggs wrote:
>>>> >> >* This definitely should include isolationtester tests actually
>>>> >> >   performing concurrent ALTER TABLEs. All that's currently there is
>>>> >> >   tests that the locklevel isn't too high, but not that it actually works.
>>> >>
>>> >>There is no concurrent behaviour here, hence no code that would be
>>> >>exercised by concurrent tests.
>> >
>> >Huh? There's most definitely new concurrent behaviour. Previously no
>> >other backends could have a relation open (and locked) while it got
>> >altered (which then sends out relcache invalidations). That's something
>> >that should be tested.
> It has been. High volume concurrent testing has been performed, per
> Tom's original discussion upthread, but that's not part of the test
> suite.
> For other tests I have no guide as to how to write a set of automated
> regression tests. Anything could cause a failure, so I'd need to write
> an infinite set of tests to prove there is no bug*somewhere*. How
> many tests are required? 0, 1, 3, 30?

ISTM that we don't want hand-written tests here, but rather generated tests that actually hit all potential cases.
Obviouslywe'd never run that as part of normal reqression, but farm animals certainly could.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net