Thread: Fix dropped object handling in pg_event_trigger_ddl_commands
Hello, when creating an event trigger for ddl_command_end that calls pg_event_trigger_ddl_commands certain statements will cause the trigger to fail with a cache lookup error. The error happens on master, 13 and 12 I didnt test any previous versions. trg=# ALTER TABLE t ALTER COLUMN f1 SET DATA TYPE bigint, ALTER COLUMN f1 DROP IDENTITY; ERROR: XX000: cache lookup failed for relation 13476892 CONTEXT: PL/pgSQL function ddl_end() line 5 at FOR over SELECT rows LOCATION: getRelationTypeDescription, objectaddress.c:4178 For the ALTER DATA TYPE we create a command to adjust the sequence which gets recorded in the event trigger commandlist, which leads to the described failure when the sequence is dropped as part of another ALTER TABLE subcommand and information about the sequence can no longer be looked up. To reproduce: CREATE OR REPLACE FUNCTION ddl_end() RETURNS event_trigger AS $$ DECLARE r RECORD; BEGIN FOR r IN SELECT * FROM pg_event_trigger_ddl_commands() LOOP RAISE NOTICE 'ddl_end: % %', r.command_tag, r.object_type; END LOOP; END; $$ LANGUAGE plpgsql; CREATE EVENT TRIGGER ddl_end ON ddl_command_end EXECUTE PROCEDURE ddl_end(); CREATE TABLE t(f1 int NOT NULL GENERATED ALWAYS AS IDENTITY); ALTER TABLE t ALTER COLUMN f1 DROP IDENTITY, ALTER COLUMN f1 SET DATA TYPE bigint; I tried really hard to look for a different way to detect this error earlier but since the subcommands are processed independently i couldnt come up with a non-invasive version. Someone more familiar with this code might have an idea for a better solution. Any thoughts? https://www.postgresql.org/message-id/CAMCrgp39V7JQA_Gc+JaEZV3ALOU1ZG=Pwyk3oDpTq7F6Z0JSmg@mail.gmail.com -- Regards, Sven Klemm
Attachment
On Sun, Apr 18, 2021 at 2:12 PM Sven Klemm <sven@timescale.com> wrote: > when creating an event trigger for ddl_command_end that calls > pg_event_trigger_ddl_commands certain statements will cause the > trigger to fail with a cache lookup error. The error happens on > master, 13 and 12 I didnt test any previous versions. > > trg=# ALTER TABLE t ALTER COLUMN f1 SET DATA TYPE bigint, ALTER COLUMN > f1 DROP IDENTITY; > ERROR: XX000: cache lookup failed for relation 13476892 > CONTEXT: PL/pgSQL function ddl_end() line 5 at FOR over SELECT rows > LOCATION: getRelationTypeDescription, objectaddress.c:4178 Any opinions on the patch? Is this not worth the effort to fix or is there a better way to fix this? https://www.postgresql.org/message-id/CAMCrgp2R1cEXU53iYKtW6yVEp2_yKUz+z=3-CTrYpPP+xryRtg@mail.gmail.com -- Regards, Sven Klemm
On 2021-Apr-25, Sven Klemm wrote: > On Sun, Apr 18, 2021 at 2:12 PM Sven Klemm <sven@timescale.com> wrote: > > when creating an event trigger for ddl_command_end that calls > > pg_event_trigger_ddl_commands certain statements will cause the > > trigger to fail with a cache lookup error. The error happens on > > master, 13 and 12 I didnt test any previous versions. > > > > trg=# ALTER TABLE t ALTER COLUMN f1 SET DATA TYPE bigint, ALTER COLUMN > > f1 DROP IDENTITY; > > ERROR: XX000: cache lookup failed for relation 13476892 > > CONTEXT: PL/pgSQL function ddl_end() line 5 at FOR over SELECT rows > > LOCATION: getRelationTypeDescription, objectaddress.c:4178 > > Any opinions on the patch? Is this not worth the effort to fix or is > there a better way to fix this? Hello, I haven't looked at this but judging from the general shape of function and error message, it seems clearly a bug that needs to be fixed somehow. I'll try to make time to look at it sometime soon, but I have other bugs to investigate and fix, so it may be some time. I fear your proposal of ignoring the object may be the best we can do, but I don't like it much. -- Álvaro Herrera 39°49'30"S 73°17'W "La verdad no siempre es bonita, pero el hambre de ella sí"
Hi hackers, > Any opinions on the patch? Is this not worth the effort to fix or is > there a better way to fix this? I confirm that the bug still exists in master (be57f216). The patch fixes it and looks good to me. I changed the wording a little and added a regression test. The updated patch is in the attachment. I added this thread to the CF and updated the status to "Ready for Committer". -- Best regards, Aleksander Alekseev
Attachment
On Mon, Jun 07, 2021 at 12:44:42PM +0300, Aleksander Alekseev wrote: > I confirm that the bug still exists in master (be57f216). The patch > fixes it and looks good to me. I changed the wording a little and > added a regression test. The updated patch is in the attachment. I > added this thread to the CF and updated the status to "Ready for > Committer". FWIW, that looks rather natural to me to me to just ignore the object if it has already been dropped here. The same kind of rules apply to tables dropped with DROP TABLE which would not show up as of pg_event_trigger_ddl_commands(), but one can get a list as of pg_event_trigger_dropped_objects(). Alvaro, were you planning to look at that? I have not looked at the patch in details. missing_ok is available in getObjectIdentity() only since v14, so this cannot be backpatched :/ -- Michael
Attachment
On 2021-Jun-09, Michael Paquier wrote: > On Mon, Jun 07, 2021 at 12:44:42PM +0300, Aleksander Alekseev wrote: > > I confirm that the bug still exists in master (be57f216). The patch > > fixes it and looks good to me. I changed the wording a little and > > added a regression test. The updated patch is in the attachment. I > > added this thread to the CF and updated the status to "Ready for > > Committer". > > FWIW, that looks rather natural to me to me to just ignore the object > if it has already been dropped here. The same kind of rules apply to > tables dropped with DROP TABLE which would not show up as of > pg_event_trigger_ddl_commands(), but one can get a list as of > pg_event_trigger_dropped_objects(). Oh, that parallel didn't occur to me. I agree it seems a useful precedent. > Alvaro, were you planning to look at that? I have not looked at the > patch in details. I have it on my list of things to look at, but it isn't priority. If you to mess with it, please be my guest. > missing_ok is available in getObjectIdentity() only > since v14, so this cannot be backpatched :/ Ooh, yeah, I forgot about that. And that one was pretty invasive ... I'm not sure if we can reasonably implement a fix for older releases. I mean, it's a relatively easy test: do a syscache search for the object or a catalog indexscan (easy to do with get_object_property_data-based API), and if the object is gone, skip getObjectTypeDescription and getObjectIdentity. But maybe this is too much code to add to stable releases ... -- Álvaro Herrera Valdivia, Chile
On Wed, Jun 09, 2021 at 09:55:08AM -0400, Alvaro Herrera wrote: > I'm not sure if we can reasonably implement a fix for older releases. > I mean, it's a relatively easy test: do a syscache search for the object > or a catalog indexscan (easy to do with get_object_property_data-based > API), and if the object is gone, skip getObjectTypeDescription and > getObjectIdentity. But maybe this is too much code to add to stable > releases ... Except that these syscache lookups need to be done on an object-type basis, which is basically what getObjectDescription() & friends now do where the logic makes sure that we have a correct objectId <-> cacheId mapping for the syscache lookups. So that would be roughly copying into event_trigger.c what objectaddress.c does now, but for the back branches. It would be better to just backport the changes to support missing_ok in objectaddress.c if we go down this road, but the invasiveness makes that much more complicated. Another thing is that ATExecDropIdentity() does performDeletion() by using PERFORM_DELETION_INTERNAL, which does not feed the dropped sequence to pg_event_trigger_dropped_objects(). That feels inconsistent with CREATE TABLE GENERATED that would feed to event triggers the CREATE SEQUENCE and ALTER SEQUENCE commands, as well as ALTER TABLE SET DATA TYPE on a generated column that feeds an internal ALTER SEQUENCE. An extra idea we could consider, as the drop events are logged before the other ALTER TABLE subcommands, is to look at the event triggers registered when the generated sequence is dropped and to erase from existence any previous events associated to it, but that would make the logic weak as hell. In all this stuff, simply ignoring the objects still sounds like a fair and simple course of action to take if they are gone at the time an event trigger checks after them within pg_event_trigger_ddl_commands(). Now, this approach makes my spider sense tingle. -- Michael
Attachment
On Thu, Jun 10, 2021 at 05:07:28PM +0900, Michael Paquier wrote: > Except that these syscache lookups need to be done on an object-type > basis, which is basically what getObjectDescription() & friends now do > where the logic makes sure that we have a correct objectId <-> cacheId > mapping for the syscache lookups. So that would be roughly copying > into event_trigger.c what objectaddress.c does now, but for the back > branches. It would be better to just backport the changes to support > missing_ok in objectaddress.c if we go down this road, but the > invasiveness makes that much more complicated. I have been looking at that more this morning, and I have convinced myself that skipping objects should work fine. The test added at the bottom of event_trigger.sql was making the file a bit messier though, and there are already tests for relations when it comes to dropped objects. So let's do a bit of consolidation while on it with an extra event trigger on ddl_command_end and relations on the schema evttrig. This one already included some cases for serial columns, so that's natural to me to extend the area for identity columns. I have also added a case for a serial column dropped, while on it. The last thing is the addition of r.object_identity from pg_event_trigger_ddl_commands() in the data generated for the output messages, so as the output is as complete as possible. Regarding the back-branches, I am tempted to do nothing. The APIs are just not here to do the job. On top of being an invasive change, it took 4 years for somebody to complain on this matter, as this exists since 10. That's not worth the risk/cost. -- Michael
Attachment
Hi Michael, > /* The type can never be NULL */ > type = getObjectTypeDescription(&addr, true); The last argument should be `false` then. -- Best regards, Aleksander Alekseev
Attachment
On Fri, Jun 11, 2021 at 11:00:40AM +0300, Aleksander Alekseev wrote: > The last argument should be `false` then. Hm, nope. I think that we had better pass true as argument here. First, this is more consistent with the identity lookup (OK, it does not matter as we would have discarded the object after the identity lookup anyway, but any future shuffling of this code may not be that wise). Second, now that I look at it, getObjectTypeDescription() can never be NULL as we have fallback names for relations, routines and constraints for all object types so the buffer will be filled with some data. Let's replace the bottom of getObjectTypeDescription() that returns now NULL by Assert(buffer.len > 0). This code is new as of v14, so it is better to adjust that sooner than later. -- Michael
Attachment
On Fri, Jun 11, 2021 at 09:36:57PM +0900, Michael Paquier wrote: > Hm, nope. I think that we had better pass true as argument here. The main patch has been applied as of 2d689ba. > First, this is more consistent with the identity lookup (OK, it does > not matter as we would have discarded the object after the identity > lookup anyway, but any future shuffling of this code may not be that > wise). Second, now that I look at it, getObjectTypeDescription() can > never be NULL as we have fallback names for relations, routines and > constraints for all object types so the buffer will be filled with > some data. Let's replace the bottom of getObjectTypeDescription() > that returns now NULL by Assert(buffer.len > 0). This code is new as > of v14, so it is better to adjust that sooner than later. And this has been simplified with b56b83a. -- Michael