On 03/12/2014 12:03 PM, Andres Freund wrote:
> Hi,
>
> On 2014-03-12 12:00:25 -0700, Josh Berkus wrote:
>> I was just reading Michael's explanation of replication slots
>> (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/)
>> and realized there was something which had completely escaped me in the
>> pre-commit discussion:
>>
>> select pg_drop_replication_slot('slot_1');
>> ERROR: 55006: replication slot "slot_1" is already active
>> LOCATION: ReplicationSlotAcquire, slot.c:339
>>
>> What defines an "active" slot?
>
> One with a connected walsender.
In a world of network proxies, a walsender could be "connected" for
hours after the replica has ceased to exist. Fortunately,
wal_sender_timeout is changeable on a reload. We check for actual
standby feedback for the timeout, yes?
>
>> It seems like there's no way for a DBA to drop slots from the master if
>> it's rapidly running out of disk WAL space without doing a restart, and
>> there's no way to drop the slot for a replica which the DBA knows is
>> permanently offline but was connected earlier. Am I missing something?
>
> It's sufficient to terminate the walsender and then drop the slot. That
> seems ok for now?
We have no safe way to terminate the walsender that I know of;
pg_terminate_backend() doesn't include walsenders last I checked.
So the procedure for this would be:
1) set wal_sender_timeout to some low value (1);
2) reload
3) call pg_drop_replication_slot('slotname')
Clumsy, but it will do for a first pass; we can make it better (for
example, by adding a "force" boolean to pg_drop_replication_slot) in 9.5.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com