Thread: Drop set problem - pgAdmin / Slony I

Drop set problem - pgAdmin / Slony I

From
Glyn Astill
Date:
Hi chaps,

I've noticed a problem as described below with regards to adding
replication 
sets from within pgAdmin. I originally posted on the slony I general
list, 
and Christopher Browne suggested I ask you guys.

I set up a replication cluster using the pgAdmin scripts, and added a
replication set with some tables and sequences in it (table /sequence
Ids 1 - 4). I've kept the Id numbers the same as the table Ids.

Then I added another set (Id 2), and added a table and a sequence
with Id 5.

In my slony log I get the error:

"remoteWorkerThread_1: node -1 not found in runtime configuration"

Does anyone know what causes the error above? I've read that it
indicates slony had a problem at a point whilst subscribing so it
flipped the node number to -1 to stop any further steps happening,
however this doesn't help me track down what caused it.


No "drop set" option seems to exist from with pgAdmin (although I'm 
pretty sure I saw it once - are their criteria that make it appear?),

so I did a "DROP SET ( ID=2, ORIGIN=1 );" using slonik on the origin,
and 
the set was removed from the origin, but it's still visible 
(using pgadmin) on the subscriber (I have restarted the slons).
I notice this error occours periodically now in the slony
logs, so it seems the subscriber is trying to subscribe it?  E.g.

------------------------------------------------------

2008-01-15_144529 GMT DEBUG2 syncThread: new sl_action_seq 1 - SYNC
38491
2008-01-15_144533 GMT DEBUG1 copy_set 2
2008-01-15_144533 GMT ERROR  remoteWorkerThread_1: node -1 not found
in runtime configuration
2008-01-15_144533 GMT WARN   remoteWorkerThread_1: data copy for set
2 failed - sleep 60 seconds
WARNING:  there is no transaction in progress

------------------------------------------------------

How do I remove this set from the subscriber?

I've had this happen before. Removing the cluster and setting it up
again resolves the problem, however once we are in a production
environment I can't go dropping the whole cluster and replicating all
the tables from scratch when it happens.

Any pointers would be greatly appreciated
Glyn

     ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 



Re: Drop set problem - pgAdmin / Slony I

From
Magnus Hagander
Date:
Glyn Astill wrote:
> Hi chaps,
> 
> I've noticed a problem as described below with regards to adding
> replication 
> sets from within pgAdmin. I originally posted on the slony I general
> list, 
> and Christopher Browne suggested I ask you guys.
> 
> I set up a replication cluster using the pgAdmin scripts, and added a
> replication set with some tables and sequences in it (table /sequence
> Ids 1 - 4). I've kept the Id numbers the same as the table Ids.
> 
> Then I added another set (Id 2), and added a table and a sequence
> with Id 5.
> 
> In my slony log I get the error:
> 
> "remoteWorkerThread_1: node -1 not found in runtime configuration"
> 
> Does anyone know what causes the error above? I've read that it
> indicates slony had a problem at a point whilst subscribing so it
> flipped the node number to -1 to stop any further steps happening,
> however this doesn't help me track down what caused it.

Hm. Never seen that one.


> No "drop set" option seems to exist from with pgAdmin (although I'm 
> pretty sure I saw it once - are their criteria that make it appear?),

You can only drop a set if it has no subscribers. You have to drop the 
subscriber first. Could that be your problem here?



> so I did a "DROP SET ( ID=2, ORIGIN=1 );" using slonik on the origin,
> and 
> the set was removed from the origin, but it's still visible 
> (using pgadmin) on the subscriber (I have restarted the slons).
> I notice this error occours periodically now in the slony
> logs, so it seems the subscriber is trying to subscribe it?  E.g.
> 
> ------------------------------------------------------
> 
> 2008-01-15_144529 GMT DEBUG2 syncThread: new sl_action_seq 1 - SYNC
> 38491
> 2008-01-15_144533 GMT DEBUG1 copy_set 2
> 2008-01-15_144533 GMT ERROR  remoteWorkerThread_1: node -1 not found
> in runtime configuration
> 2008-01-15_144533 GMT WARN   remoteWorkerThread_1: data copy for set
> 2 failed - sleep 60 seconds
> WARNING:  there is no transaction in progress
> 
> ------------------------------------------------------
> 
> How do I remove this set from the subscriber?

I think it's somehow scheduled up behind whatever that thing going for 
node -1 is looking. If you can get rid of that one from sl_listen, I bet 
the set dropping will go through.


> I've had this happen before. Removing the cluster and setting it up
> again resolves the problem, however once we are in a production
> environment I can't go dropping the whole cluster and replicating all
> the tables from scratch when it happens.

Can you give us an exact step-by-step on how to make it happen?

//Magnus


Re: Drop set problem - pgAdmin / Slony I

From
Glyn Astill
Date:
Hi Magnus,

> > How do I remove this set from the subscriber?
> 
> I think it's somehow scheduled up behind whatever that thing going
> for 
> node -1 is looking. If you can get rid of that one from sl_listen,
> I bet 
> the set dropping will go through.
> 
> 

There doesn't seem to be a -1 node in the sl_listen on the origin or
the subscriber.

I've removed the second set that was still visible on the subscriber
from the sl_set table, however I'm still getting an event in the log
where it tries to "copy_set 2" and comes back with the error "set 2
not found in runtime configuration".

Any idea on how I remove / stop this?

> > I've had this happen before. Removing the cluster and setting it
> up
> > again resolves the problem, however once we are in a production
> > environment I can't go dropping the whole cluster and replicating
> all
> > the tables from scratch when it happens.
> 
> Can you give us an exact step-by-step on how to make it happen?
> 

I tried to subscribe a new set with a new table in it and for some
reason it failed with the error mentioned. The slony logs show that
slony was periodically trying to subscribe on the subscriber.

Seeing as I couldn't remove it from pgAdmin, I went in and ran a
slonik script to drop the set, however I didn't try to unsubscribe it
with slonik first. So is there a chance it's come about from me not
unsubscribing first?

What's the law on mixing the use of pgAdmin and slonik to administer
slony?

Thanks
Glyn


     ___________________________________________________________
Support the World Aids Awareness campaign this month with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/



Re: Drop set problem - pgAdmin / Slony I

From
Glyn Astill
Date:
Does anyone have any idea about this?

> > How do I remove this set from the subscriber?
> 
> I think it's somehow scheduled up behind whatever that thing
>going
> for 
> node -1 is looking. If you can get rid of that one from
>sl_listen,
> I bet 
> the set dropping will go through.
> 
> 
There doesn't seem to be a -1 node in the sl_listen on the origin
or the subscriber.
I've removed the second set that was still visible on the subscriber
from the sl_set table, however I'm still getting an event in the log
where it tries to "copy_set 2" and comes back with the error "set 2
not found in runtime configuration".
Any idea on how I remove / stop this?
> > I've had this happen before. Removing the cluster and setting
>it
> up
> > again resolves the problem, however once we are in a production
> > environment I can't go dropping the whole cluster and
>replicating
> all
> > the tables from scratch when it happens.
> 
> Can you give us an exact step-by-step on how to make it happen?
> 

I tried to subscribe a new set with a new table in it and for some
reason it failed with the error mentioned. The slony logs show that
slony was periodically trying to subscribe on the subscriber.

Seeing as I couldn't remove it from pgAdmin, I went in and ran a
slonik script to drop the set, however I didn't try to unsubscribe
it
with slonik first. So is there a chance it's come about from me not
unsubscribing first?

What's the law on mixing the use of pgAdmin and slonik to
administer slony?


     ___________________________________________________________
Support the World Aids Awareness campaign this month with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/



Re: Drop set problem - pgAdmin / Slony I

From
Magnus Hagander
Date:
On Wed, Jan 16, 2008 at 04:55:12AM -0800, Glyn Astill wrote:
> Hi Magnus,
> 
> > > How do I remove this set from the subscriber?
> > 
> > I think it's somehow scheduled up behind whatever that thing going
> > for 
> > node -1 is looking. If you can get rid of that one from sl_listen,
> > I bet 
> > the set dropping will go through.
> > 
> > 
> 
> There doesn't seem to be a -1 node in the sl_listen on the origin or
> the subscriber.
> 
> I've removed the second set that was still visible on the subscriber
> from the sl_set table, however I'm still getting an event in the log
> where it tries to "copy_set 2" and comes back with the error "set 2
> not found in runtime configuration".
> 
> Any idea on how I remove / stop this?

I'll have to defer to the Slony folks on that one.


> > > I've had this happen before. Removing the cluster and setting it
> > up
> > > again resolves the problem, however once we are in a production
> > > environment I can't go dropping the whole cluster and replicating
> > all
> > > the tables from scratch when it happens.
> > 
> > Can you give us an exact step-by-step on how to make it happen?
> > 
> 
> I tried to subscribe a new set with a new table in it and for some
> reason it failed with the error mentioned. The slony logs show that
> slony was periodically trying to subscribe on the subscriber.

I mean complete step-to-step, as in exactly what you click and enter. I've
done what is at least on the surface the same thing you have, with no
problems. So there must be somethign in the details.


> Seeing as I couldn't remove it from pgAdmin, I went in and ran a
> slonik script to drop the set, however I didn't try to unsubscribe it
> with slonik first. So is there a chance it's come about from me not
> unsubscribing first?

Part of it may - I don't recall offhand it the must-unsubscribe-first is a
rqeuirement of Slonik or just of pgAdmin.


> What's the law on mixing the use of pgAdmin and slonik to administer
> slony?

It shuold be ok. pgAdmin currently has the concept of an "admin node" which
isn't compatible with the way core slony/slonik does it, but it shouldn't
actually be causing you any problems. Some very minor functionality (some
discovery) will not work in pgadmin if yo uset the cluster originallyi from
slonik, but managing the servers that you've got shouldn't be a problem a
all.

And it's certainly the design goal - pgadmin shouldn't be doing anything
that breaks "the core slony way".

//Magnus


Re: Drop set problem - pgAdmin / Slony I

From
Magnus Hagander
Date:
On Fri, Jan 18, 2008 at 11:07:27AM -0500, Christopher Browne wrote:
> >> > > I've had this happen before. Removing the cluster and setting it
> >> > up
> >> > > again resolves the problem, however once we are in a production
> >> > > environment I can't go dropping the whole cluster and replicating
> >> > all
> >> > > the tables from scratch when it happens.
> >> > 
> >> > Can you give us an exact step-by-step on how to make it happen?
> >> > 
> >> 
> >> I tried to subscribe a new set with a new table in it and for some
> >> reason it failed with the error mentioned. The slony logs show that
> >> slony was periodically trying to subscribe on the subscriber.
> >
> > I mean complete step-to-step, as in exactly what you click and enter. I've
> > done what is at least on the surface the same thing you have, with no
> > problems. So there must be somethign in the details.
> >
> >> Seeing as I couldn't remove it from pgAdmin, I went in and ran a
> >> slonik script to drop the set, however I didn't try to unsubscribe it
> >> with slonik first. So is there a chance it's come about from me not
> >> unsubscribing first?
> >
> > Part of it may - I don't recall offhand it the must-unsubscribe-first is a
> > rqeuirement of Slonik or just of pgAdmin.
> 
> You don't need to unsubscribe first.
> 
> The stored procedure dropSet() is perfectly happy to drop the set at
> any time.  When the event propagates, it'll happily clear out all
> traces of the set.

Ok. Good. Then I think the reason it's not supported in pgAdmin is simply
because it made the GUI simpler to require that ;-)


//Magnus


Re: Drop set problem - pgAdmin / Slony I

From
Christopher Browne
Date:
Magnus Hagander <magnus@hagander.net> writes:
>> There doesn't seem to be a -1 node in the sl_listen on the origin or
>> the subscriber.
>> 
>> I've removed the second set that was still visible on the subscriber
>> from the sl_set table, however I'm still getting an event in the log
>> where it tries to "copy_set 2" and comes back with the error "set 2
>> not found in runtime configuration".
>> 
>> Any idea on how I remove / stop this?
>
> I'll have to defer to the Slony folks on that one.

Hmm.  Sounds like an event is outstanding that depends on having that
set around, on the provider, but the set was already removed...

Sounds like a case where "event surgery" would be mandated, that is,
to delete offending entries from sl_event.

Surgery is only safe when you have a very good idea of what you're
doing, and why.  If you don't have that certainty, patients frequently
don't survive :-(.

Unfortunately, this situation is leaving a lot of uncertainty in
place, thereby meaning there's rather a lot of risk to the cluster in
doing such surgery :-(.

I don't know where that "-1" node is coming from, which is a big whack
of uncertainty.

>> > > I've had this happen before. Removing the cluster and setting it
>> > up
>> > > again resolves the problem, however once we are in a production
>> > > environment I can't go dropping the whole cluster and replicating
>> > all
>> > > the tables from scratch when it happens.
>> > 
>> > Can you give us an exact step-by-step on how to make it happen?
>> > 
>> 
>> I tried to subscribe a new set with a new table in it and for some
>> reason it failed with the error mentioned. The slony logs show that
>> slony was periodically trying to subscribe on the subscriber.
>
> I mean complete step-to-step, as in exactly what you click and enter. I've
> done what is at least on the surface the same thing you have, with no
> problems. So there must be somethign in the details.
>
>> Seeing as I couldn't remove it from pgAdmin, I went in and ran a
>> slonik script to drop the set, however I didn't try to unsubscribe it
>> with slonik first. So is there a chance it's come about from me not
>> unsubscribing first?
>
> Part of it may - I don't recall offhand it the must-unsubscribe-first is a
> rqeuirement of Slonik or just of pgAdmin.

You don't need to unsubscribe first.

The stored procedure dropSet() is perfectly happy to drop the set at
any time.  When the event propagates, it'll happily clear out all
traces of the set.

>> What's the law on mixing the use of pgAdmin and slonik to administer
>> slony?
>
> It shuold be ok. pgAdmin currently has the concept of an "admin node" which
> isn't compatible with the way core slony/slonik does it, but it shouldn't
> actually be causing you any problems. Some very minor functionality (some
> discovery) will not work in pgadmin if yo uset the cluster originallyi from
> slonik, but managing the servers that you've got shouldn't be a problem a
> all.
>
> And it's certainly the design goal - pgadmin shouldn't be doing anything
> that breaks "the core slony way".

They are using the same underlying stored procedures, so it *ought* to be OK.
-- 
"cbbrowne","@","linuxdatabases.info"
http://cbbrowne.com/info/wp.html
">WindowsNT will not accept fecal matter in its diet... it's that simple.

I suppose that is a good ward against cannibalism." -- Nick Manka


Re: Drop set problem - pgAdmin / Slony I

From
Glyn Astill
Date:
Hi chaps,

I've just gone over what I was doing, however this time using slonik
rather than pgAdmin. I should have done this to start with, as I
really had no idea what was really going off behind the scenes.

What I was doing was creating a blank table on my subscriber, where
the origin had a table with 15 million records. Then I was creating a
set and subscribing the subscriber to it, then trying to merge the
set straight away.

So I might have been trying to merge the set before it was subscribed
properly, either down to there being some latency between clicking
subscribe and then merge, or something to do with the size of the
data to be copied over.

Does this sound possible ??? or would pgAdmin stop me from doing
merge set before they were properly subscribed?

As the slonik docs say "Do not be too quick to merge sets".


--- Magnus Hagander <magnus@hagander.net> wrote:

> On Fri, Jan 18, 2008 at 11:07:27AM -0500, Christopher Browne wrote:
> > >> > > I've had this happen before. Removing the cluster and
> setting it
> > >> > up
> > >> > > again resolves the problem, however once we are in a
> production
> > >> > > environment I can't go dropping the whole cluster and
> replicating
> > >> > all
> > >> > > the tables from scratch when it happens.
> > >> > 
> > >> > Can you give us an exact step-by-step on how to make it
> happen?
> > >> > 
> > >> 
> > >> I tried to subscribe a new set with a new table in it and for
> some
> > >> reason it failed with the error mentioned. The slony logs show
> that
> > >> slony was periodically trying to subscribe on the subscriber.
> > >
> > > I mean complete step-to-step, as in exactly what you click and
> enter. I've
> > > done what is at least on the surface the same thing you have,
> with no
> > > problems. So there must be somethign in the details.
> > >
> > >> Seeing as I couldn't remove it from pgAdmin, I went in and ran
> a
> > >> slonik script to drop the set, however I didn't try to
> unsubscribe it
> > >> with slonik first. So is there a chance it's come about from
> me not
> > >> unsubscribing first?
> > >
> > > Part of it may - I don't recall offhand it the
> must-unsubscribe-first is a
> > > rqeuirement of Slonik or just of pgAdmin.
> > 
> > You don't need to unsubscribe first.
> > 
> > The stored procedure dropSet() is perfectly happy to drop the set
> at
> > any time.  When the event propagates, it'll happily clear out all
> > traces of the set.
> 
> Ok. Good. Then I think the reason it's not supported in pgAdmin is
> simply
> because it made the GUI simpler to require that ;-)
> 
> 
> //Magnus
> 


     __________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com