Thread: Slony uninstall info/warning
Hopefully this will prevent data loss or problems for others using slony 1.0.5 and pg 7.4.6... We just got bitten by something we didn't foresee when completely uninstalling a slony replication cluster from the master and slave... MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER DATABASE WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle all your open connections after the event! The connections appear to "remember" or refer to objects which are removed by the uninstall node script. And you get lots of errors as a result (and possible data loss??)... Question: Why do our database objects still refer to removed slony objects after they are removed? John Sidney-Woollett More info... Our system is a web application which runs against a postgres 7.4.6 database. Tomcat is restarted at 5am each day. Last Friday afternoon we uninstalled the slony cluster (1 master + 1 slave) so that we could add a new second slave. (I find it easier to uninstall the cluster and then reapply a new setup, subscribe script rather than trying to add the single node.) The cluster was successfully removed, and then rebuilt with 1 master and 2 slave nodes. However, we didn't stop and start our web application which uses Tomcat connection pool and continued to run against the master (during the uninstall and rebuild). Everything appeared fine. Only today while checking something else did I notice lots of ERROR: relation with OID 1036711 does not exist errors in the postgres serverlog In our tomcat application we also saw lots of errors like org.postgresql.util.PSQLException: ERROR: relation with OID 1036711 does not exist at org.postgresql.util.PSQLException.parseServerError(PSQLException.java:139) at org.postgresql.core.QueryExecutor.executeV3(QueryExecutor.java:154) at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:101) at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:43) ... Basically these errors started when the cluster was uninstalled and continued until the web application was restarted (at 5am the following morning). I'm not sure what object had OID 1036711 - maybe the slony replication schema?? Is there a way to find out after the event? I do have daily full backups of the master each day going back 1 month - I could load one into another database and lookup the OID, if someone could explain how to do that. And if that would be useful? But the net effect is that we got bizarre failures in our application, and large gaps in primary key values (using sequences) in certain tables within the database. On the whole the app seemed to be behaving fine, with the odd error that we put down to user error... (bad assumption, I guess in hindsight) At the moment I'm trying to figure out exactly what kind of damage our data has suffered. If anyone has any ideas or suggestions as to what went wrong or what to check for, I'd appreciate hearing. Thanks John
John Sidney-Woollett wrote: > Hopefully this will prevent data loss or problems for others using slony > 1.0.5 and pg 7.4.6... > > We just got bitten by something we didn't foresee when completely > uninstalling a slony replication cluster from the master and slave... > > MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER DATABASE > WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle all your > open connections after the event! > > The connections appear to "remember" or refer to objects which are > removed by the uninstall node script. And you get lots of errors as a > result (and possible data loss??)... > > Question: Why do our database objects still refer to removed slony > objects after they are removed? Well, there are two areas I know cache plans/OIDs: 1. Prepared statements 2. plpgsql functions At a guess, since you mention Java the first almost certainly applies to you. This isn't a slony issue so much as a cached plan issue. I'm guessing the same problems would occur if you were manually changing the database schema. Don't think you could get data loss (unless the application ignores errors). You will however get to see a wide range of OID related errors. -- Richard Huxton Archonet Ltd
Thanks for the info, Richard. I didn't think that it was a slony issue per se, but that a note should be added to the slony docs warning to recycle connections after making substantive changes to the schema. You're right, we use both (java) prepared statements and pl/pgsql functions. The data loss aspect is not so clear cut (for us). We definitely got records that failed to insert (missing sequence numbers in tables) while others did insert correctly into the same tables (using the same pl/pgsql functions). So we can't figure out what the pattern is. Although we do have our connection pool recycle idle connections - it could be that things worked when a new recycled connection was used by the web app, and failed when one of the "old/original" connections was used. This is probably what masked the errors for us... It would be great if postgres could "recompile" pl/pgsql functions whenever it found a missing object referenced within the function - chances are that it would compile cleanly (in this case) and then could be executed without error. something along the lines of execute function OK -> return result ERROR - OID -> recompile function, and re-execute OK -> return result ERROR - OID -> report error This would help get rid of the temp table in functions work around having to use an execute statement. Thanks for your help and feedback. John Sidney-Woollett Richard Huxton wrote: > John Sidney-Woollett wrote: > >> Hopefully this will prevent data loss or problems for others using >> slony 1.0.5 and pg 7.4.6... >> >> We just got bitten by something we didn't foresee when completely >> uninstalling a slony replication cluster from the master and slave... >> >> MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER >> DATABASE WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle >> all your open connections after the event! >> >> The connections appear to "remember" or refer to objects which are >> removed by the uninstall node script. And you get lots of errors as a >> result (and possible data loss??)... >> >> Question: Why do our database objects still refer to removed slony >> objects after they are removed? > > > Well, there are two areas I know cache plans/OIDs: > 1. Prepared statements > 2. plpgsql functions > > At a guess, since you mention Java the first almost certainly applies to > you. > > This isn't a slony issue so much as a cached plan issue. I'm guessing > the same problems would occur if you were manually changing the database > schema. > > Don't think you could get data loss (unless the application ignores > errors). You will however get to see a wide range of OID related errors. > > -- > Richard Huxton > Archonet Ltd
John Sidney-Woollett wrote: > Thanks for the info, Richard. > > I didn't think that it was a slony issue per se, but that a note should > be added to the slony docs warning to recycle connections after making > substantive changes to the schema. > > You're right, we use both (java) prepared statements and pl/pgsql > functions. > > The data loss aspect is not so clear cut (for us). We definitely got > records that failed to insert (missing sequence numbers in tables) while > others did insert correctly into the same tables (using the same > pl/pgsql functions). So we can't figure out what the pattern is. > > Although we do have our connection pool recycle idle connections - it > could be that things worked when a new recycled connection was used by > the web app, and failed when one of the "old/original" connections was > used. This is probably what masked the errors for us... That's almost certainly the case. I'm luck in that I've always been able to have some downtime for schema changes. > It would be great if postgres could "recompile" pl/pgsql functions > whenever it found a missing object referenced within the function - > chances are that it would compile cleanly (in this case) and then could > be executed without error. > > something along the lines of > > execute function > OK -> return result > ERROR - OID -> recompile function, and re-execute > OK -> return result > ERROR - OID -> report error > > This would help get rid of the temp table in functions work around > having to use an execute statement. Hmm - not sure you could do this without a savepoint to catch the error. However, it might be possible to add track dependencies with the function (as with views). Then you'd have to issue a CASCADE to alter the table. -- Richard Huxton Archonet Ltd
Hi all, It's just marginally relevant to the issue at hand: in our code we drop the connection on any error we cannot map to an expected condition. This would eventually recycle all connections on such unexpected problems after just one error per connection. Of course if the error surfaces as a constraint violation which is a recognized condition, this won't help either, and if the problem is persistent, the connections will keep recycling which will drop the effect of the pooling, in the latter case the pooling code could also announce an admin to take a look... HTH, Csaba. On Tue, 2005-02-15 at 13:03, John Sidney-Woollett wrote: > Thanks for the info, Richard. > > I didn't think that it was a slony issue per se, but that a note should > be added to the slony docs warning to recycle connections after making > substantive changes to the schema. > > You're right, we use both (java) prepared statements and pl/pgsql functions. > > The data loss aspect is not so clear cut (for us). We definitely got > records that failed to insert (missing sequence numbers in tables) while > others did insert correctly into the same tables (using the same > pl/pgsql functions). So we can't figure out what the pattern is. > > Although we do have our connection pool recycle idle connections - it > could be that things worked when a new recycled connection was used by > the web app, and failed when one of the "old/original" connections was > used. This is probably what masked the errors for us... > > It would be great if postgres could "recompile" pl/pgsql functions > whenever it found a missing object referenced within the function - > chances are that it would compile cleanly (in this case) and then could > be executed without error. > > something along the lines of > > execute function > OK -> return result > ERROR - OID -> recompile function, and re-execute > OK -> return result > ERROR - OID -> report error > > This would help get rid of the temp table in functions work around > having to use an execute statement. > > Thanks for your help and feedback. > > John Sidney-Woollett > > Richard Huxton wrote: > > > John Sidney-Woollett wrote: > > > >> Hopefully this will prevent data loss or problems for others using > >> slony 1.0.5 and pg 7.4.6... > >> > >> We just got bitten by something we didn't foresee when completely > >> uninstalling a slony replication cluster from the master and slave... > >> > >> MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER > >> DATABASE WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle > >> all your open connections after the event! > >> > >> The connections appear to "remember" or refer to objects which are > >> removed by the uninstall node script. And you get lots of errors as a > >> result (and possible data loss??)... > >> > >> Question: Why do our database objects still refer to removed slony > >> objects after they are removed? > > > > > > Well, there are two areas I know cache plans/OIDs: > > 1. Prepared statements > > 2. plpgsql functions > > > > At a guess, since you mention Java the first almost certainly applies to > > you. > > > > This isn't a slony issue so much as a cached plan issue. I'm guessing > > the same problems would occur if you were manually changing the database > > schema. > > > > Don't think you could get data loss (unless the application ignores > > errors). You will however get to see a wide range of OID related errors. > > > > -- > > Richard Huxton > > Archonet Ltd > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Richard Huxton wrote: > Hmm - not sure you could do this without a savepoint to catch the > error. > However, it might be possible to add track dependencies with the > function (as with views). Then you'd have to issue a CASCADE to alter > the table. If you use Oracle and drop and recreate a table (or any other object) referred to by a function or procedure, and then execute the procedure/function, it will work fine. Oracle will figure out that it needs recompiling (including dependent objects) and will do it prior to execution. You don't get bitten by caches or OID issues... :( Maybe someone will get to this for 8.1? John Sidney-Woollett ps That having been said I still think postgres is very good, and we're only using 7.4 at the moment!...
In an attempt to throw the authorities off his trail, johnsw@wardbrook.com (John Sidney-Woollett) transmitted: > MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER > DATABASE WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle > all your open connections after the event! > > The connections appear to "remember" or refer to objects which are > removed by the uninstall node script. And you get lots of errors as a > result (and possible data loss??)... As has been noted by others, this is an artifact of the JDBC driver holding onto a stored query plan. And it's somewhat worth noting that this is not unique to Slony-I... I have added warnings to relevant documentation in the Slony-I CVS tree. This may also be worthy of a FAQ note for PostgreSQL itself, and is almost certainly worthy of note in the JDBC documentation... -- (format nil "~S@~S" "cbbrowne" "acm.org") http://linuxdatabases.info/info/slony.html "We should start referring to processes which run in the background by their correct technical name:... paenguins." -- Kevin M Bealer
On Tue, Feb 15, 2005 at 02:00:20PM -0500, Christopher Browne wrote: > As has been noted by others, this is an artifact of the JDBC driver > holding onto a stored query plan. And it's somewhat worth noting that > this is not unique to Slony-I... (Yes, I'm behind on reading mail lists.) Actually, I don't think this _is_ a JDBC issue. AIUI, The problem bites you with Slony because its triggers use SPI. So the reference to the oid is to the trigger's oid, which you won't lose until the back ends are all recyled. A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin