Thread: Patch for dependency traversal during DROP

Patch for dependency traversal during DROP

From
Tom Lane
Date:
The attached patch rewrites DROP recursion according to my sketch here:
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00301.php

It gets rid of a lot of unwanted NOTICE messages during drop, specifically
by ensuring that AUTO/INTERNAL drops are silent even when they cascade
from other cascaded objects.  (In an unexpected side benefit, the messages
seem to come out in a more logical order, too.)  See the regression test
diffs for examples.  Also, it takes locks on objects before searching for
their dependencies, which fixes all of the concurrent-deletion problem
cases that I linked to in the above message.

I have not done anything about folding cascaded-drop NOTICEs into a
single long message, as was discussed a couple days ago.  That would be
easy to do from here, but it seems like it should be a separate patch.

I was afraid while writing the patch that it might be too slow due to
reliance on simple linear list searching in ObjectAddresses lists ---
that means that deleting N objects is O(N^2), albeit with a constant
factor that's pretty tiny compared to the catalog-update work involved.
While this could be fixed by replacing ObjectAddresses with a more complex
data structure, I didn't want to have to do that in a first-generation
patch either.  So I was pleased to find out that it actually seems to
be faster than CVS HEAD.  I tested by timing "DROP SCHEMA public"
after running the regression tests.  In CVS HEAD this involves dropping
1177 distinguishable objects (ie, there are that many entries in the
targetObjects list at completion of the scan phase).  I get these timings
on a pretty-slow HPPA machine:

CVS HEAD:

$ time psql -c 'drop schema public restrict' regression 2>/dev/null

real    0m2.53s
user    0m0.04s
sys     0m0.03s
$ time psql -c 'drop schema public cascade' regression 2>/dev/null
DROP SCHEMA

real    0m8.06s
user    0m0.05s
sys     0m0.03s

With patch:

$ time psql -c 'drop schema public restrict' regression 2>/dev/null

real    0m0.74s
user    0m0.03s
sys     0m0.02s
$ time psql -c 'drop schema public cascade' regression 2>/dev/null
DROP SCHEMA

real    0m6.83s
user    0m0.03s
sys     0m0.02s

The speedup in RESTRICT timing was expected, but not so much CASCADE.
(BTW, I wonder why aren't the RESTRICT and CASCADE timings about the same
in CVS HEAD?  The old implementation does all the same work in both cases,
and only fails at the end...)

It might be that with many thousand objects to be dropped, we'd
start to hit the O(N^2) behavior, but I suspect that CVS HEAD is
none too pleasant in such a case either.  Anyone want to run some
experiments?

Another possible objection to this patch is that it takes an awful lot
of locks on things that we never locked before; a large DROP operation
could easily run out of locktable shared memory when it did not before.
That's fixable by increasing max_locks_per_transaction, but I wonder
whether there will be any push-back about it.

A couple of stylistic issues:

* After obtaining lock on an object-to-be-deleted, we need to check
whether it's already been deleted.  This could have been done with
a pile of object-type-specific SearchSysCache tests, but I chose to
do it by checking to see if the pg_depend tuple we are traversing
has been marked dead; aside from being object-type-independent,
that should be a lot cheaper.  This required sticking a little bit
of a wart into genam.c, since only at that level do we know which
buffer the tuple is in.  This seemed cleaner than the alternative,
but I wonder if anyone has a better idea?

* The DROP code now requires more per-target-object state than just
the object's identity.  However the ObjectAddresses struct it was
using is also shared with various dependency-adding functionality
that doesn't need any more than that.  I chose to deal with that
by having the ObjectAddresses support routines handle ObjectAddresses
with or without a secondary "extra data" array.  This is pretty ugly
IMHO, but duplicating most of that support code seemed no better.
Anybody have a strong aversion to that, or a better idea?  (The
whole area might have to be revisited anyway, if we decide we need
something smarter than a linear list.)

If there are no objections I'll apply this in a day or two.

            regards, tom lane


Attachment

Re: Patch for dependency traversal during DROP

From
Tom Lane
Date:
I wrote:
> The attached patch rewrites DROP recursion according to my sketch here:
> http://archives.postgresql.org/pgsql-hackers/2008-05/msg00301.php

> I was afraid while writing the patch that it might be too slow due to
> reliance on simple linear list searching in ObjectAddresses lists ---
> that means that deleting N objects is O(N^2),

I did some investigation with gprof.  In a test case involving dropping
about 13000 objects (basically, replicating the regression "public"
schema 10 times with a common owner, and then doing DROP OWNED BY),
I found that object_address_present_add_flags() accounted for about 4
seconds of CPU out of a total elapsed runtime of 60 seconds.
Extrapolating, that function would be accounting for a third of the
runtime at 100K objects and 85% of the runtime at 1M objects.  So unless
anyone foresees people routinely dropping millions of objects at a time,
it seems we don't need to bother improving the ObjectAddresses search
algorithm.

> Another possible objection to this patch is that it takes an awful lot
> of locks on things that we never locked before; a large DROP operation
> could easily run out of locktable shared memory when it did not before.
> That's fixable by increasing max_locks_per_transaction, but I wonder
> whether there will be any push-back about it.

This, on the other hand, might be a real problem --- I had to double the
default value of max_locks_per_transaction to run the 13K-objects
example.

I'm toying with the idea of not taking a deletion lock when traversing
an INTERNAL dependency, on the grounds that no one could be deleting
the dependent object anyway unless they have lock on its owner object ---
which we already have.  This would make for a noticeable reduction in
the number of new locks taken as a result of this patch; for instance
we'd not bother locking the rowtype of a relation when dropping the
relation.  Can anyone think of a case where this would go wrong?

            regards, tom lane