Thread: Need Force flag for pg_drop_replication_slot()
Hackers, pg_drop_replication_slot() can be a time-critical function when the master is running out of disk space because the replica is falling behind. So I was a little startled by this: cio=# select pg_drop_replication_slot('bdr_24577_6147720645156311471_1_25383__'); ERROR: replication slot "bdr_24577_6147720645156311471_1_25383__" is already active You have to first terminate the replication connection before you can delete the slot ... and do it fast enough that the replica doesn't reconnect before you drop the slot. While I'm just doing this during testing, it could be a critical fail in production. I think the simplest way to resolve this would be to add a boolean flag to pg_drop_replication_slot(), which would terminate the replication connection and delete the slot as a single operation. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Hi, On 2015-05-29 10:15:56 -0700, Josh Berkus wrote: > pg_drop_replication_slot() can be a time-critical function when the > master is running out of disk space because the replica is falling > behind. I don't buy this argument. The same is true for DROP TABLE, TRUNCATE, DROP DATABASE etc. I mean, I agree it'd be convenient, but I can't see it as "critical". > While I'm just doing this during testing, it could be a critical fail in > production. I think the simplest way to resolve this would be to add a > boolean flag to pg_drop_replication_slot(), which would terminate the > replication connection and delete the slot as a single operation. There's no "single operation" for terminating a backend *and* doing something... Greetings, Andres Freund
Andres, * Andres Freund (andres@anarazel.de) wrote: > On 2015-05-29 10:15:56 -0700, Josh Berkus wrote: > > pg_drop_replication_slot() can be a time-critical function when the > > master is running out of disk space because the replica is falling > > behind. > > I don't buy this argument. The same is true for DROP TABLE, TRUNCATE, > DROP DATABASE etc. I disagree about that being the same. > I mean, I agree it'd be convenient, but I can't see it as "critical". Just a random thought- do we check the LOGIN attribute for replication connections? If so, you could tweak that, but that may be an issue if you have multiple replicas using the same role. I'm not sure that it's *critical*, but I could see an argument for adding this post-feature-freeze, which I'm guessing is what Josh was getting at. > > While I'm just doing this during testing, it could be a critical fail in > > production. I think the simplest way to resolve this would be to add a > > boolean flag to pg_drop_replication_slot(), which would terminate the > > replication connection and delete the slot as a single operation. > > There's no "single operation" for terminating a backend *and* doing > something... That's a good point, we'd need to figure out how to make this actually work reliably in the face of a very fast reconnecting process, if we're going to do it. Thanks! Stephen
On 05/29/2015 10:45 AM, Stephen Frost wrote: > Andres, > > * Andres Freund (andres@anarazel.de) wrote: >> On 2015-05-29 10:15:56 -0700, Josh Berkus wrote: >>> pg_drop_replication_slot() can be a time-critical function when the >>> master is running out of disk space because the replica is falling >>> behind. >> >> I don't buy this argument. The same is true for DROP TABLE, TRUNCATE, >> DROP DATABASE etc. > > I disagree about that being the same. > >> I mean, I agree it'd be convenient, but I can't see it as "critical". So, here's they scenario: 1. you're almost out of disk space due to a replica falling behind, like down to 16mb left. Or maybe you are out of disk space. 2. You need to drop the laggy replication slots in a hurry to get your master working again. 3. Now you have to do this timing-sensitive two-stage drop to make it work. When our users are having production emergencies, I don't think that it's helpful for us to make the process of getting out of those situations more complicated than it absolutely has to be. > Just a random thought- do we check the LOGIN attribute for replication > connections? If so, you could tweak that, but that may be an issue if > you have multiple replicas using the same role. > > I'm not sure that it's *critical*, but I could see an argument for > adding this post-feature-freeze, which I'm guessing is what Josh was > getting at. Well, I'll let others decide that. If we could come up with a script which would reliably do the terminate-then-drop, it would be fine for 9.5. I'm not sure that's possible though, because I don't see any way to infallibly relate the pg_stat_replication entry with the pg_replication_slot entry. Imagine having 3 slots and 6 replicas, and only one slot is behind; how do you figure out what to terminate? > >>> While I'm just doing this during testing, it could be a critical fail in >>> production. I think the simplest way to resolve this would be to add a >>> boolean flag to pg_drop_replication_slot(), which would terminate the >>> replication connection and delete the slot as a single operation. >> >> There's no "single operation" for terminating a backend *and* doing >> something... > > That's a good point, we'd need to figure out how to make this actually > work reliably in the face of a very fast reconnecting process, if we're > going to do it. Yeah, which means that this is probably something for 9.6. Although if we can at least come up with something for the documentation for 9.5, it would be really helpful. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Josh Berkus (josh@agliodbs.com) wrote: > 1. you're almost out of disk space due to a replica falling behind, like > down to 16mb left. Or maybe you are out of disk space. This right here is a real issue. What I'd personally like to see is an option which says "you have X GB of disk space. Once it's gone, forget about all replicas or failing archive commands or whatever, and just stop holding on to ancient WAL that you no longer need to operate." Perhaps there would be a warning threshold there too, where you start getting complaints in the log if things are falling too far behind. Ideally, you'd have a monitoring system which is checking for that, but it'd be trivial to include and could be useful for environments that don't have proper monitoring yet. Having this work on the replicas would be nice too. I realize we have time-based constraints there which say "kill off queries which are blocking us from moving forward after X time", but it'd be awful nice to have a size-based way too, to avoid having PG crash when it runs out of space. I have to admit that I'm getting quite tired of the ways in which PG can crash due to out of memory (yes, I know, it's the OOM killer because of a misconfigured Linux box, but still), out of disk space on the master, out of space on the replica, etc, etc. Thanks! Stephen
On 05/29/2015 11:01 AM, Stephen Frost wrote: > * Josh Berkus (josh@agliodbs.com) wrote: >> > 1. you're almost out of disk space due to a replica falling behind, like >> > down to 16mb left. Or maybe you are out of disk space. > This right here is a real issue. What I'd personally like to see is an > option which says "you have X GB of disk space. Once it's gone, forget > about all replicas or failing archive commands or whatever, and just > stop holding on to ancient WAL that you no longer need to operate." The substantial challenge here is how do we determine that you're "almost out of disk space"? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2015-05-29 10:53:30 -0700, Josh Berkus wrote: > On 05/29/2015 10:45 AM, Stephen Frost wrote: > So, here's they scenario: > > 1. you're almost out of disk space due to a replica falling behind, like > down to 16mb left. Or maybe you are out of disk space. > > 2. You need to drop the laggy replication slots in a hurry to get your > master working again. > > 3. Now you have to do this timing-sensitive two-stage drop to make it work. How is this measurably worse than trying to truncate a log table that has grown too large? That's often harder to fight actually, because there's dozens of other processes that might be using the relation? In one case you don't have wait ordering, but only one locker, in the other case you have multiple waiters, and to benefit from wait ordering you need multiple sessions. Again, I'm not against improving either situation, it's just that the urgency argument doesn't seem worth its weight. Note that all of this is 9.4 code, not 9.5.
* Josh Berkus (josh@agliodbs.com) wrote: > On 05/29/2015 11:01 AM, Stephen Frost wrote: > > * Josh Berkus (josh@agliodbs.com) wrote: > >> > 1. you're almost out of disk space due to a replica falling behind, like > >> > down to 16mb left. Or maybe you are out of disk space. > > This right here is a real issue. What I'd personally like to see is an > > option which says "you have X GB of disk space. Once it's gone, forget > > about all replicas or failing archive commands or whatever, and just > > stop holding on to ancient WAL that you no longer need to operate." > > The substantial challenge here is how do we determine that you're > "almost out of disk space"? Eh? That "X GB" above was intended to be the value of a GUC. I know how big my WAL partition is. Let me tell PG how big it is and to not do anything that'll end up going over that amount, and we'll never see a crash due to out of disk space for WAL again. Thanks! Stephen
On 05/29/2015 11:07 AM, Andres Freund wrote: > On 2015-05-29 10:53:30 -0700, Josh Berkus wrote: >> On 05/29/2015 10:45 AM, Stephen Frost wrote: >> So, here's they scenario: >> >> 1. you're almost out of disk space due to a replica falling behind, like >> down to 16mb left. Or maybe you are out of disk space. >> >> 2. You need to drop the laggy replication slots in a hurry to get your >> master working again. >> >> 3. Now you have to do this timing-sensitive two-stage drop to make it work. > > How is this measurably worse than trying to truncate a log table that > has grown too large? That's often harder to fight actually, because > there's dozens of other processes that might be using the relation? In > one case you don't have wait ordering, but only one locker, in the other > case you have multiple waiters, and to benefit from wait ordering you > need multiple sessions. > > Again, I'm not against improving either situation, it's just that the > urgency argument doesn't seem worth its weight. Well, I wouldn't mind a solution for drop table and drop database, either. I'm pretty sure that's on our TODO list. Oh, I see the confusion. When I say "time-critical", I was referring to the situation where someone is running out of disk space. Not coming up with a patch. AFAIK, hardly anyone is using replication slots, still. > > Note that all of this is 9.4 code, not 9.5. Yes, but I'm not suggesting backporting it, just maybe a backported doc patch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Andres Freund (andres@anarazel.de) wrote: > How is this measurably worse than trying to truncate a log table that > has grown too large? That's often harder to fight actually, because > there's dozens of other processes that might be using the relation? In > one case you don't have wait ordering, but only one locker, in the other > case you have multiple waiters, and to benefit from wait ordering you > need multiple sessions. Because we don't fall over if we can't extend a relation. We do fall over if we can't write WAL. Thanks! Stephen
So, here's an example of why it's hard to give our users a workaround. cio=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn -----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------bdr_24577_6147720645156311471_1_26507__ |bdr | logical | 24577 | cio | t | | 906 | 0/1C4F410 (1 row) cio=# select pg_drop_replication_slot('bdr_24577_6147720645156311471_1_26507__'); ERROR: replication slot "bdr_24577_6147720645156311471_1_26507__" is already active cio=# select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------------------- pid | 28481 usesysid | 10 usename | postgres application_name | bdr (6147720645156311471,1,26507,):receive client_addr | 172.17.0.11 client_hostname | client_port | 44583 backend_start | 2015-05-29 18:10:34.601796+00 backend_xmin | state | streaming sent_location | 0/1C4F448 write_location | 0/1C4F448 flush_location | 0/1C4F448 replay_location | 0/1C4F448 sync_priority | 0 sync_state | async Now, BDR is good because it sets an application_name which lets me figure out what's using the replication slot. But that's by no means required; other LC plug-ins, I expect, do not do so. So there's no way for the user to figure out which replication connection relates to which slots, as far as I can tell. In this test, it's easy because there's only one replication connection and one slot. But imagine the case of 14 replication connections with their own slots. How could you possibly figure out which one was the laggy one? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 05/29/2015 12:08 PM, Josh Berkus wrote: > Now, BDR is good because it sets an application_name which lets me > figure out what's using the replication slot. But that's by no means > required; other LC plug-ins, I expect, do not do so. So there's no way > for the user to figure out which replication connection relates to which > slots, as far as I can tell. > > In this test, it's easy because there's only one replication connection > and one slot. But imagine the case of 14 replication connections with > their own slots. How could you possibly figure out which one was the > laggy one? The client_addr? JD > -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
On 2015-05-29 14:39:02 -0400, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > How is this measurably worse than trying to truncate a log table that > > has grown too large? That's often harder to fight actually, because > > there's dozens of other processes that might be using the relation? In > > one case you don't have wait ordering, but only one locker, in the other > > case you have multiple waiters, and to benefit from wait ordering you > > need multiple sessions. > > Because we don't fall over if we can't extend a relation. > > We do fall over if we can't write WAL. As nearly everybody uses the same filesystem for pg_xlog and the actual databases, that distinction isn't worth much. You'll still fail when writing the WAL, even if the disk space has been used by a relation instead of WAL.
On 2015-05-29 12:08:24 -0700, Josh Berkus wrote: > Now, BDR is good because it sets an application_name which lets me > figure out what's using the replication slot. But that's by no means > required; other LC plug-ins, I expect, do not do so. So there's no way > for the user to figure out which replication connection relates to which > slots, as far as I can tell. > > In this test, it's easy because there's only one replication connection > and one slot. But imagine the case of 14 replication connections with > their own slots. How could you possibly figure out which one was the > laggy one? 9.5 shows the pid.
On 05/29/2015 12:27 PM, Andres Freund wrote: > On 2015-05-29 12:08:24 -0700, Josh Berkus wrote: >> Now, BDR is good because it sets an application_name which lets me >> figure out what's using the replication slot. But that's by no means >> required; other LC plug-ins, I expect, do not do so. So there's no way >> for the user to figure out which replication connection relates to which >> slots, as far as I can tell. >> >> In this test, it's easy because there's only one replication connection >> and one slot. But imagine the case of 14 replication connections with >> their own slots. How could you possibly figure out which one was the >> laggy one? > > 9.5 shows the pid. OK, will test, thanks. --Josh Berkus -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 05/29/2015 11:30 AM, Stephen Frost wrote: > I know how big my WAL partition is. Let me tell PG how big it is and to > not do anything that'll end up going over that amount, and we'll never > see a crash due to out of disk space for WAL again. Hmmmm. Do we have a clear idea anywhere in server memory how many WAL segments there are? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Josh Berkus (josh@agliodbs.com) wrote: > On 05/29/2015 11:30 AM, Stephen Frost wrote: > > I know how big my WAL partition is. Let me tell PG how big it is and to > > not do anything that'll end up going over that amount, and we'll never > > see a crash due to out of disk space for WAL again. > > Hmmmm. Do we have a clear idea anywhere in server memory how many WAL > segments there are? Why does it need to be in shared memory..? Clearly, when we're looking at cleaning up the WAL files, we know if the archive command is failing and what file we're trying to archive, or if we're not able to recycle a given file because we have logical replication slots that want it, etc. We certainly know where we're currently at in the WAL stream and we know how big each WAL file is.. We just need a knob to be able to say "alright, this WAL file might still be desired by something, but we're running out of room for *new* WAL and, therefore, that's just too bad for those process that want it" and recycle it anyway. There are probably error conditions we have to consider for replication slots when that happens, etc, but I don't think we lack the info to make the decision, except for what value to set the knob to, which is clearly system-dependent. Thanks! Stephen
On 29 May 2015 at 18:15, Josh Berkus <josh@agliodbs.com> wrote:
--
pg_drop_replication_slot() can be a time-critical function when the
master is running out of disk space because the replica is falling
behind. So I was a little startled by this:
cio=# select
pg_drop_replication_slot('bdr_24577_6147720645156311471_1_25383__');
ERROR: replication slot "bdr_24577_6147720645156311471_1_25383__" is
already active
You have to first terminate the replication connection before you can
delete the slot ... and do it fast enough that the replica doesn't
reconnect before you drop the slot.
Why would you not stop the receiver first, then drop the slot?
Dropping the slot destroys any chance you have of recovering the downstream server, so should not be done lightly.
That sounds like a critical fail to me, so making it easier to do that doesn't sound cool. I oppose this suggestion.
While I'm just doing this during testing, it could be a critical fail in
production. I think the simplest way to resolve this would be to add a
boolean flag to pg_drop_replication_slot(), which would terminate the
replication connection and delete the slot as a single operation.
If you really want it you can write a function to do that for private use.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 29 May 2015 at 18:15, Josh Berkus <josh@agliodbs.com> wrote:
That part is good. I'm sure you will find something in need of improvement.
--
While I'm just doing this during testing
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services