Patch for dependency traversal during DROP - Mailing list pgsql-patches
From | Tom Lane |
---|---|
Subject | Patch for dependency traversal during DROP |
Date | |
Msg-id | 6826.1212794740@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Patch for dependency traversal during DROP
|
List | pgsql-patches |
The attached patch rewrites DROP recursion according to my sketch here: http://archives.postgresql.org/pgsql-hackers/2008-05/msg00301.php It gets rid of a lot of unwanted NOTICE messages during drop, specifically by ensuring that AUTO/INTERNAL drops are silent even when they cascade from other cascaded objects. (In an unexpected side benefit, the messages seem to come out in a more logical order, too.) See the regression test diffs for examples. Also, it takes locks on objects before searching for their dependencies, which fixes all of the concurrent-deletion problem cases that I linked to in the above message. I have not done anything about folding cascaded-drop NOTICEs into a single long message, as was discussed a couple days ago. That would be easy to do from here, but it seems like it should be a separate patch. I was afraid while writing the patch that it might be too slow due to reliance on simple linear list searching in ObjectAddresses lists --- that means that deleting N objects is O(N^2), albeit with a constant factor that's pretty tiny compared to the catalog-update work involved. While this could be fixed by replacing ObjectAddresses with a more complex data structure, I didn't want to have to do that in a first-generation patch either. So I was pleased to find out that it actually seems to be faster than CVS HEAD. I tested by timing "DROP SCHEMA public" after running the regression tests. In CVS HEAD this involves dropping 1177 distinguishable objects (ie, there are that many entries in the targetObjects list at completion of the scan phase). I get these timings on a pretty-slow HPPA machine: CVS HEAD: $ time psql -c 'drop schema public restrict' regression 2>/dev/null real 0m2.53s user 0m0.04s sys 0m0.03s $ time psql -c 'drop schema public cascade' regression 2>/dev/null DROP SCHEMA real 0m8.06s user 0m0.05s sys 0m0.03s With patch: $ time psql -c 'drop schema public restrict' regression 2>/dev/null real 0m0.74s user 0m0.03s sys 0m0.02s $ time psql -c 'drop schema public cascade' regression 2>/dev/null DROP SCHEMA real 0m6.83s user 0m0.03s sys 0m0.02s The speedup in RESTRICT timing was expected, but not so much CASCADE. (BTW, I wonder why aren't the RESTRICT and CASCADE timings about the same in CVS HEAD? The old implementation does all the same work in both cases, and only fails at the end...) It might be that with many thousand objects to be dropped, we'd start to hit the O(N^2) behavior, but I suspect that CVS HEAD is none too pleasant in such a case either. Anyone want to run some experiments? Another possible objection to this patch is that it takes an awful lot of locks on things that we never locked before; a large DROP operation could easily run out of locktable shared memory when it did not before. That's fixable by increasing max_locks_per_transaction, but I wonder whether there will be any push-back about it. A couple of stylistic issues: * After obtaining lock on an object-to-be-deleted, we need to check whether it's already been deleted. This could have been done with a pile of object-type-specific SearchSysCache tests, but I chose to do it by checking to see if the pg_depend tuple we are traversing has been marked dead; aside from being object-type-independent, that should be a lot cheaper. This required sticking a little bit of a wart into genam.c, since only at that level do we know which buffer the tuple is in. This seemed cleaner than the alternative, but I wonder if anyone has a better idea? * The DROP code now requires more per-target-object state than just the object's identity. However the ObjectAddresses struct it was using is also shared with various dependency-adding functionality that doesn't need any more than that. I chose to deal with that by having the ObjectAddresses support routines handle ObjectAddresses with or without a secondary "extra data" array. This is pretty ugly IMHO, but duplicating most of that support code seemed no better. Anybody have a strong aversion to that, or a better idea? (The whole area might have to be revisited anyway, if we decide we need something smarter than a linear list.) If there are no objections I'll apply this in a day or two. regards, tom lane
Attachment
pgsql-patches by date: