Thread: Slony uninstall info/warning

Slony uninstall info/warning

From

John Sidney-Woollett

Date:

15 February 2005, 09:55:57

Hopefully this will prevent data loss or problems for others using slony
1.0.5 and pg 7.4.6...

We just got bitten by something we didn't foresee when completely
uninstalling a slony replication cluster from the master and slave...

MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER DATABASE
WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle all your
open connections after the event!

The connections appear to "remember" or refer to objects which are
removed by the uninstall node script. And you get lots of errors as a
result (and possible data loss??)...

Question: Why do our database objects still refer to removed slony
objects after they are removed?

John Sidney-Woollett

More info...

Our system is a web application which runs against a postgres 7.4.6
database. Tomcat is restarted at 5am each day.

Last Friday afternoon we uninstalled the slony cluster (1 master + 1
slave) so that we could add a new second slave. (I find it easier to
uninstall the cluster and then reapply a new setup, subscribe script
rather than trying to add the single node.)

The cluster was successfully removed, and then rebuilt with 1 master and
2 slave nodes.

However, we didn't stop and start our web application which uses Tomcat
connection pool and continued to run against the master (during the
uninstall and rebuild). Everything appeared fine.

Only today while checking something else did I notice lots of
ERROR:  relation with OID 1036711 does not exist
errors in the postgres serverlog

In our tomcat application we also saw lots of errors like
org.postgresql.util.PSQLException: ERROR: relation with OID 1036711 does
not exist
at
org.postgresql.util.PSQLException.parseServerError(PSQLException.java:139)
at org.postgresql.core.QueryExecutor.executeV3(QueryExecutor.java:154)
at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:101)
at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:43)
...

Basically these errors started when the cluster was uninstalled and
continued until the web application was restarted (at 5am the following
morning).

I'm not sure what object had OID 1036711 - maybe the slony replication
schema?? Is there a way to find out after the event?

I do have daily full backups of the master each day going back 1 month -
I could load one into another database and lookup the OID, if someone
could explain how to do that. And if that would be useful?

But the net effect is that we got bizarre failures in our application,
and large gaps in primary key values (using sequences) in certain tables
within the database. On the whole the app seemed to be behaving fine,
with the odd error that we put down to user error... (bad assumption, I
guess in hindsight)

At the moment I'm trying to figure out exactly what kind of damage our
data has suffered.

If anyone has any ideas or suggestions as to what went wrong or what to
check for, I'd appreciate hearing.

Thanks

John

Re: Slony uninstall info/warning

From

Richard Huxton

Date:

15 February 2005, 11:43:25

John Sidney-Woollett wrote:
> Hopefully this will prevent data loss or problems for others using slony
> 1.0.5 and pg 7.4.6...
>
> We just got bitten by something we didn't foresee when completely
> uninstalling a slony replication cluster from the master and slave...
>
> MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER DATABASE
> WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle all your
> open connections after the event!
>
> The connections appear to "remember" or refer to objects which are
> removed by the uninstall node script. And you get lots of errors as a
> result (and possible data loss??)...
>
> Question: Why do our database objects still refer to removed slony
> objects after they are removed?

Well, there are two areas I know cache plans/OIDs:
  1. Prepared statements
  2. plpgsql functions

At a guess, since you mention Java the first almost certainly applies to
you.

This isn't a slony issue so much as a cached plan issue. I'm guessing
the same problems would occur if you were manually changing the database
schema.

Don't think you could get data loss (unless the application ignores
errors). You will however get to see a wide range of OID related errors.

--
   Richard Huxton
   Archonet Ltd

Re: Slony uninstall info/warning

From

John Sidney-Woollett

Date:

15 February 2005, 12:00:52

Thanks for the info, Richard.

I didn't think that it was a slony issue per se, but that a note should
be added to the slony docs warning to recycle connections after making
substantive changes to the schema.

You're right, we use both (java) prepared statements and pl/pgsql functions.

The data loss aspect is not so clear cut (for us). We definitely got
records that failed to insert (missing sequence numbers in tables) while
others did insert correctly into the same tables (using the same
pl/pgsql functions). So we can't figure out what the pattern is.

Although we do have our connection pool recycle idle connections - it
could be that things worked when a new recycled connection was used by
the web app, and failed when one of the "old/original" connections was
used. This is probably what masked the errors for us...

It would be great if postgres could "recompile" pl/pgsql functions
whenever it found a missing object referenced within the function -
chances are that it would compile cleanly (in this case) and then could
be executed without error.

something along the lines of

execute function
OK -> return result
ERROR - OID -> recompile function, and re-execute
   OK -> return result
   ERROR - OID -> report error

This would help get rid of the temp table in functions work around
having to use an execute statement.

Thanks for your help and feedback.

John Sidney-Woollett

Richard Huxton wrote:

> John Sidney-Woollett wrote:
>
>> Hopefully this will prevent data loss or problems for others using
>> slony 1.0.5 and pg 7.4.6...
>>
>> We just got bitten by something we didn't foresee when completely
>> uninstalling a slony replication cluster from the master and slave...
>>
>> MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER
>> DATABASE WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle
>> all your open connections after the event!
>>
>> The connections appear to "remember" or refer to objects which are
>> removed by the uninstall node script. And you get lots of errors as a
>> result (and possible data loss??)...
>>
>> Question: Why do our database objects still refer to removed slony
>> objects after they are removed?
>
>
> Well, there are two areas I know cache plans/OIDs:
>  1. Prepared statements
>  2. plpgsql functions
>
> At a guess, since you mention Java the first almost certainly applies to
> you.
>
> This isn't a slony issue so much as a cached plan issue. I'm guessing
> the same problems would occur if you were manually changing the database
> schema.
>
> Don't think you could get data loss (unless the application ignores
> errors). You will however get to see a wide range of OID related errors.
>
> --
>   Richard Huxton
>   Archonet Ltd

Re: Slony uninstall info/warning

From

Richard Huxton

Date:

15 February 2005, 12:16:42

John Sidney-Woollett wrote:
> Thanks for the info, Richard.
>
> I didn't think that it was a slony issue per se, but that a note should
> be added to the slony docs warning to recycle connections after making
> substantive changes to the schema.
>
> You're right, we use both (java) prepared statements and pl/pgsql
> functions.
>
> The data loss aspect is not so clear cut (for us). We definitely got
> records that failed to insert (missing sequence numbers in tables) while
> others did insert correctly into the same tables (using the same
> pl/pgsql functions). So we can't figure out what the pattern is.
>
> Although we do have our connection pool recycle idle connections - it
> could be that things worked when a new recycled connection was used by
> the web app, and failed when one of the "old/original" connections was
> used. This is probably what masked the errors for us...

That's almost certainly the case. I'm luck in that I've always been able
to have some downtime for schema changes.

> It would be great if postgres could "recompile" pl/pgsql functions
> whenever it found a missing object referenced within the function -
> chances are that it would compile cleanly (in this case) and then could
> be executed without error.
>
> something along the lines of
>
> execute function
> OK -> return result
> ERROR - OID -> recompile function, and re-execute
>   OK -> return result
>   ERROR - OID -> report error
>
> This would help get rid of the temp table in functions work around
> having to use an execute statement.

Hmm - not sure you could do this without a savepoint to catch the error.
However, it might be possible to add track dependencies with the
function (as with views). Then you'd have to issue a CASCADE to alter
the table.

--
   Richard Huxton
   Archonet Ltd

Re: Slony uninstall info/warning

From

Csaba Nagy

Date:

15 February 2005, 12:17:37

Hi all,

It's just marginally relevant to the issue at hand: in our code we drop
the connection on any error we cannot map to an expected condition. This
would eventually recycle all connections on such unexpected problems
after just one error per connection. Of course if the error surfaces as
a constraint violation which is a recognized condition, this won't help
either, and if the problem is persistent, the connections will keep
recycling which will drop the effect of the pooling, in the latter case
the pooling code could also announce an admin to take a look...

HTH,
Csaba.

On Tue, 2005-02-15 at 13:03, John Sidney-Woollett wrote:
> Thanks for the info, Richard.
>
> I didn't think that it was a slony issue per se, but that a note should
> be added to the slony docs warning to recycle connections after making
> substantive changes to the schema.
>
> You're right, we use both (java) prepared statements and pl/pgsql functions.
>
> The data loss aspect is not so clear cut (for us). We definitely got
> records that failed to insert (missing sequence numbers in tables) while
> others did insert correctly into the same tables (using the same
> pl/pgsql functions). So we can't figure out what the pattern is.
>
> Although we do have our connection pool recycle idle connections - it
> could be that things worked when a new recycled connection was used by
> the web app, and failed when one of the "old/original" connections was
> used. This is probably what masked the errors for us...
>
> It would be great if postgres could "recompile" pl/pgsql functions
> whenever it found a missing object referenced within the function -
> chances are that it would compile cleanly (in this case) and then could
> be executed without error.
>
> something along the lines of
>
> execute function
> OK -> return result
> ERROR - OID -> recompile function, and re-execute
>    OK -> return result
>    ERROR - OID -> report error
>
> This would help get rid of the temp table in functions work around
> having to use an execute statement.
>
> Thanks for your help and feedback.
>
> John Sidney-Woollett
>
> Richard Huxton wrote:
>
> > John Sidney-Woollett wrote:
> >
> >> Hopefully this will prevent data loss or problems for others using
> >> slony 1.0.5 and pg 7.4.6...
> >>
> >> We just got bitten by something we didn't foresee when completely
> >> uninstalling a slony replication cluster from the master and slave...
> >>
> >> MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER
> >> DATABASE WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle
> >> all your open connections after the event!
> >>
> >> The connections appear to "remember" or refer to objects which are
> >> removed by the uninstall node script. And you get lots of errors as a
> >> result (and possible data loss??)...
> >>
> >> Question: Why do our database objects still refer to removed slony
> >> objects after they are removed?
> >
> >
> > Well, there are two areas I know cache plans/OIDs:
> >  1. Prepared statements
> >  2. plpgsql functions
> >
> > At a guess, since you mention Java the first almost certainly applies to
> > you.
> >
> > This isn't a slony issue so much as a cached plan issue. I'm guessing
> > the same problems would occur if you were manually changing the database
> > schema.
> >
> > Don't think you could get data loss (unless the application ignores
> > errors). You will however get to see a wide range of OID related errors.
> >
> > --
> >   Richard Huxton
> >   Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Slony uninstall info/warning

From

John Sidney-Woollett

Date:

15 February 2005, 12:30:33

Richard Huxton wrote:
 > Hmm - not sure you could do this without a savepoint to catch the
 > error.
 > However, it might be possible to add track dependencies with the
 > function (as with views). Then you'd have to issue a CASCADE to alter
 > the table.

If you use Oracle and drop and recreate a table (or any other object)
referred to by a function or procedure, and then execute the
procedure/function, it will work fine. Oracle will figure out that it
needs recompiling (including dependent objects) and will do it prior to
execution.

You don't get bitten by caches or OID issues... :(

Maybe someone will get to this for 8.1?

John Sidney-Woollett

ps That having been said I still think postgres is very good, and we're
only using 7.4 at the moment!...

Re: Slony uninstall info/warning

From

Christopher Browne

Date:

15 February 2005, 20:16:43

In an attempt to throw the authorities off his trail, johnsw@wardbrook.com (John Sidney-Woollett) transmitted:
> MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER
> DATABASE WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle
> all your open connections after the event!
>
> The connections appear to "remember" or refer to objects which are
> removed by the uninstall node script. And you get lots of errors as a
> result (and possible data loss??)...

As has been noted by others, this is an artifact of the JDBC driver
holding onto a stored query plan.  And it's somewhat worth noting that
this is not unique to Slony-I...

I have added warnings to relevant documentation in the Slony-I CVS
tree.  This may also be worthy of a FAQ note for PostgreSQL itself,
and is almost certainly worthy of note in the JDBC documentation...
--
(format nil "~S@~S" "cbbrowne" "acm.org")
http://linuxdatabases.info/info/slony.html
"We should start referring to processes which run in the background by
their correct technical name:... paenguins."  -- Kevin M Bealer

Re: Slony uninstall info/warning

From

Andrew Sullivan

Date:

11 March 2005, 18:09:09

On Tue, Feb 15, 2005 at 02:00:20PM -0500, Christopher Browne wrote:
> As has been noted by others, this is an artifact of the JDBC driver
> holding onto a stored query plan.  And it's somewhat worth noting that
> this is not unique to Slony-I...

(Yes, I'm behind on reading mail lists.)

Actually, I don't think this _is_ a JDBC issue.  AIUI, The problem
bites you with Slony because its triggers use SPI.  So the reference
to the oid is to the trigger's oid, which you won't lose until the
back ends are all recyled.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to.  That actually seems sort of quaint now.
        --J.D. Baldwin