Thread: another failover testing question
Something that we end up doing sometimes in our failover testing is removing slony replication from an "active" (data provider) server. Because this involves removing triggers from tables, we end up with currently connected clients getting a bunch of "OID 123 not found" errors, where the OID is that of the recently removed trigger.
Is there any way short of cycling all client connections to have the server processes clean that information out of their cache when an object disappears like this from the database?
(I'm posting here rather than the slony list because it seems like a general question....)
Thanks.
- DAP
----------------------------------------------------------------------------------
David Parker Tazz Networks (401) 709-5130
"David Parker" <dparker@tazznetworks.com> writes: > Something that we end up doing sometimes in our failover testing is > removing slony replication from an "active" (data provider) server. > Because this involves removing triggers from tables, we end up with > currently connected clients getting a bunch of "OID 123 not found" > errors, where the OID is that of the recently removed trigger. > Is there any way short of cycling all client connections to have the > server processes clean that information out of their cache when an > object disappears like this from the database? AFAICS, there already *is* adequate interlocking for this. What PG version are you testing, and can you provide a self-contained test case? regards, tom lane
Sorry, neglected the version yet again: 7.4.5. What happens is that we have active connections accessing tables that are being replicated by slony. Then somebody does an uninstall of slony, which removes the slony trigger from those tables. Then we start getting the OID error. If this should indeed not be an issue in 7.4.5, I will try to come up with a test case independent of a slony install. Thanks. - DAP >-----Original Message----- >From: Tom Lane [mailto:tgl@sss.pgh.pa.us] >Sent: Thursday, May 26, 2005 4:30 PM >To: David Parker >Cc: postgres general >Subject: Re: [GENERAL] another failover testing question > >"David Parker" <dparker@tazznetworks.com> writes: >> Something that we end up doing sometimes in our failover testing is >> removing slony replication from an "active" (data provider) server. >> Because this involves removing triggers from tables, we end up with >> currently connected clients getting a bunch of "OID 123 not found" >> errors, where the OID is that of the recently removed trigger. > >> Is there any way short of cycling all client connections to have the >> server processes clean that information out of their cache when an >> object disappears like this from the database? > >AFAICS, there already *is* adequate interlocking for this. >What PG version are you testing, and can you provide a >self-contained test case? > > regards, tom lane >
"David Parker" <dparker@tazznetworks.com> writes: > Sorry, neglected the version yet again: 7.4.5. What happens is that we > have active connections accessing tables that are being replicated by > slony. Then somebody does an uninstall of slony, which removes the slony > trigger from those tables. Then we start getting the OID error. > If this should indeed not be an issue in 7.4.5, I will try to come up > with a test case independent of a slony install. It should not be ... at least, assuming that Slony is using the standard DROP TRIGGER operation, rather than playing directly with the system catalogs ... regards, tom lane
>It should not be ... at least, assuming that Slony is using >the standard DROP TRIGGER operation, rather than playing >directly with the system catalogs ... AFAICS, the slony uninstall command is not doing anything exotic, though it DOES do a little bit of fiddling with pg_catalog to RESTORE previously disabled triggers. Otherwise it is using plain vanilla drop trigger. I found a slony list thread from a few months ago that discussed this issue: http://archives.postgresql.org/pgsql-general/2005-02/msg00813.php The discussion there centered around cached plans causing the "no relation with OID" problem. The area of our code that experiences these problems is calling libpq - we have a wrapper for it that plugs into our Tcl environment - but it is not using prepared statements, and the commands it is executing are not calls to stored procedures, etc. I cannot repro this problem simply using psql, so it must have something to do with the way we are using libpq, but I have no idea what object(s) we are holding onto that reference slony OIDs. - DAP
I know better what is happening now. I had the scenario slightly wrong. Slony creates a trigger on all replicated tables that calls into a shared library. The _Slony_I_logTrigger method in this library establishes a saved plan for inserts into its transaction log table sl_log_1. I can create the missing OID error with: 1) configure replication 2) establish a client connection, perform operations on replicated tables 3) remove replication (drops sl_log_1 table) 4) operations on replicated tables on client connection are still fine 5) re-configure replication (re-creates sl_log_1 table) 6) now the OID error appears in the client connection. The OID refers to the previous version of the sl_log_1 table I was pawing through our code to figure out where we might be saving a prepared statement, and was forgetting that the slony1_funcs library does this. This saved plan is executed with SPI_execp, and the documentation states: "If one of the objects (a table, function, etc.) referenced by the prepared plan is dropped during the session then the results of SPI_execp for this plan will be unpredictable." I'm pretty sure I understand the problem now (corrections appreciated), but I'm left with the operational question of how I get around this issue. Is there any way short of PQreset to get a postgres process to refresh its saved plans? I can generally avoid the drop-replication/re-configure replication thing happening in our procedures, but I can't prevent it completely.... - DAP >> Sorry, neglected the version yet again: 7.4.5. What happens >is that we >> have active connections accessing tables that are being >replicated by >> slony. Then somebody does an uninstall of slony, which removes the >> slony trigger from those tables. Then we start getting the OID error. >> If this should indeed not be an issue in 7.4.5, I will try >to come up >> with a test case independent of a slony install. > >It should not be ... at least, assuming that Slony is using >the standard DROP TRIGGER operation, rather than playing >directly with the system catalogs ... > > regards, tom lane >
"David Parker" <dparker@tazznetworks.com> writes: > I can create the missing OID error with: > 1) configure replication > 2) establish a client connection, perform operations on replicated > tables > 3) remove replication (drops sl_log_1 table) > 4) operations on replicated tables on client connection are still fine > 5) re-configure replication (re-creates sl_log_1 table) > 6) now the OID error appears in the client connection. The OID refers > to the previous version of the sl_log_1 table > I was pawing through our code to figure out where we might be saving a > prepared statement, and was forgetting that the slony1_funcs library > does this. I think this is essentially a bug in the Slony library --- it ought to provide a way to flush its internally cached plan(s). In the longer term there may be infrastructure for automatic rebuilding of invalidated plans, but I wouldn't hold my breath waiting for this. (Even if it existed now, most likely the Slony code would have to change to take advantage of it ...) regards, tom lane