Thread: logical replication: could not create file "state.tmp": File exists
Hello! One of my colleagues encountered an out of space condition, which broke his logical replication setup. It`s manifested with the following errors: ERROR: could not receive data from WAL stream: ERROR: could not create file "pg_replslot/some_sub/state.tmp": File exists I`ve digged a bit into this problem, and it`s turned out that in SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT | O_EXCL' flags, which makes this routine as not very reentrant. Since an exclusive lock is taken before temp file creation, I think it should be safe to replace O_EXCL with O_TRUNC. Script to reproduce and patch are attached. -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote: > I`ve digged a bit into this problem, and it`s turned out that in > SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT | > O_EXCL' flags, which makes this routine as not very reentrant. What did you see as I/O problem before facing the actual error reported here? Was it just ENOSPC, a fsync failure, or just a failure in closing the fd? The first pattern is mostly what I guess happened, still a fsync failure would not trigger a PANIC here (actually we really should do that!), but I am raising a different thread about that issue. > Since an exclusive lock is taken before temp file creation, I think it > should be safe to replace O_EXCL with O_TRUNC. > Script to reproduce and patch are attached. Agreed. I prefer the O_TRUNC option because that's less code churn. Also, as it can still be useful to have a look at the temporary state file after a crash or a failure, doing unlink() in the error code paths is no good option IMO. Have others thoughts or objections to share? -- Michael
Attachment
On 12/2/19 7:35 AM, Michael Paquier wrote: > On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote: >> I`ve digged a bit into this problem, and it`s turned out that in >> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT | >> O_EXCL' flags, which makes this routine as not very reentrant. > What did you see as I/O problem before facing the actual error > reported here? Was it just ENOSPC, a fsync failure, or just a failure > in closing the fd? The first pattern is mostly what I guess happened, > still a fsync failure would not trigger a PANIC here (actually we > really should do that!), but I am raising a different thread about > that issue. Hello! I didn`t see the very first error that left behind the temp file. I`ve requested it just now, but it will take some time to get it (there are several terabytes of text log). But I assume that it was out of space error, which, by the look of the code, should produce ERROR and leave temp file behind, just as it happened in aforementioned case. > >> Since an exclusive lock is taken before temp file creation, I think it >> should be safe to replace O_EXCL with O_TRUNC. >> Script to reproduce and patch are attached. > Agreed. I prefer the O_TRUNC option because that's less code churn. > Also, as it can still be useful to have a look at the temporary state > file after a crash or a failure, doing unlink() in the error code > paths is no good option IMO. I`m sorry, but it was an production system, so, as I understand it, stale temp file was hastily deleted without long considerations. Thank you for your interest in this topic. > > Have others thoughts or objections to share? > -- > Michael -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote: > One of my colleagues encountered an out of space condition, which broke his > logical replication setup. > It`s manifested with the following errors: > > ERROR: could not receive data from WAL stream: ERROR: could not create > file "pg_replslot/some_sub/state.tmp": File exists Hm. What was the log output leading to this state? Some cases of this would end up in a PANIC, which'd remove the .tmp file during recovery. But there's some where we won't - it seems the right fix for this would be to unlink the tmp file in that case? > I`ve digged a bit into this problem, and it`s turned out that in > SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT | > O_EXCL' flags, which makes this routine as not very reentrant. > > Since an exclusive lock is taken before temp file creation, I think it > should be safe to replace O_EXCL with O_TRUNC. I'm very doubtful about this. I think it's a good safety measure to ensure that there's no previous state file that we're somehow overwriting. > Script to reproduce and patch are attached. Well: > # Imitate out_of_space/write_operation_error > touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp Doesn't really replicate how we got into this state... Greetings, Andres Freund
On 12/2/19 7:12 PM, Andres Freund wrote: > Hi, > > On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote: >> One of my colleagues encountered an out of space condition, which broke his >> logical replication setup. >> It`s manifested with the following errors: >> >> ERROR: could not receive data from WAL stream: ERROR: could not create >> file "pg_replslot/some_sub/state.tmp": File exists > Hm. What was the log output leading to this state? Some cases of this > would end up in a PANIC, which'd remove the .tmp file during > recovery. But there's some where we won't - it seems the right fix for > this would be to unlink the tmp file in that case? > > >> I`ve digged a bit into this problem, and it`s turned out that in >> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT | >> O_EXCL' flags, which makes this routine as not very reentrant. >> >> Since an exclusive lock is taken before temp file creation, I think it >> should be safe to replace O_EXCL with O_TRUNC. > I'm very doubtful about this. I think it's a good safety measure to > ensure that there's no previous state file that we're somehow > overwriting. Is it possible with exclusive lock taken before that? > > >> Script to reproduce and patch are attached. > Well: > >> # Imitate out_of_space/write_operation_error >> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp > Doesn't really replicate how we got into this state... But it replicate the exactly the same state we would get, if write() to temp file would have failed with out of space. > > Greetings, > > Andres Freund > > -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Here's what happened from the publisher and subscriber point of view:
publisher: (some query) ERROR: could not write to tuplestore temporary file: No space left on device
subscriber: db =, user =, app =, client = ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_logical/snapshots/2AE-F3E52FB8.snap.27574.tmp": No space left on device
subscriber: db =, user =, app =, client = LOG: background worker "logical replication worker" (PID 23114) exited with exit code 1
subscriber: db =, user =, app =, client = LOG: logical replication apply worker for subscription "<name> _sub" has started
publisher: LOG: received replication command: IDENTIFY_SYSTEM
publisher: LOG: received replication command: START_REPLICATION SLOT "<name> _sub" LOGICAL 2AE/F3C0B920 (proto_version '1', publication_names '"<name>_pub"')
publisher: ERROR: could not create file "pg_replslot /<name>_sub/state.tmp": File exists
I think some publisher logs may not be available due to out of space problem.
publisher: (some query) ERROR: could not write to tuplestore temporary file: No space left on device
subscriber: db =, user =, app =, client = ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_logical/snapshots/2AE-F3E52FB8.snap.27574.tmp": No space left on device
subscriber: db =, user =, app =, client = LOG: background worker "logical replication worker" (PID 23114) exited with exit code 1
subscriber: db =, user =, app =, client = LOG: logical replication apply worker for subscription "<name> _sub" has started
publisher: LOG: received replication command: IDENTIFY_SYSTEM
publisher: LOG: received replication command: START_REPLICATION SLOT "<name> _sub" LOGICAL 2AE/F3C0B920 (proto_version '1', publication_names '"<name>_pub"')
publisher: ERROR: could not create file "pg_replslot /<name>_sub/state.tmp": File exists
I think some publisher logs may not be available due to out of space problem.
On Mon, Dec 2, 2019 at 7:54 PM Grigory Smolkin <g.smolkin@postgrespro.ru> wrote:
On 12/2/19 7:12 PM, Andres Freund wrote:
> Hi,
>
> On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
>> One of my colleagues encountered an out of space condition, which broke his
>> logical replication setup.
>> It`s manifested with the following errors:
>>
>> ERROR: could not receive data from WAL stream: ERROR: could not create
>> file "pg_replslot/some_sub/state.tmp": File exists
> Hm. What was the log output leading to this state? Some cases of this
> would end up in a PANIC, which'd remove the .tmp file during
> recovery. But there's some where we won't - it seems the right fix for
> this would be to unlink the tmp file in that case?
>
>
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
>>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.
Is it possible with exclusive lock taken before that?
>
>
>> Script to reproduce and patch are attached.
> Well:
>
>> # Imitate out_of_space/write_operation_error
>> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
> Doesn't really replicate how we got into this state...
But it replicate the exactly the same state we would get, if write() to
temp file would have failed with out of space.
>
> Greetings,
>
> Andres Freund
>
>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 12/2/19 7:12 PM, Andres Freund wrote: > Hi, > > On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote: >> One of my colleagues encountered an out of space condition, which broke his >> logical replication setup. >> It`s manifested with the following errors: >> >> ERROR: could not receive data from WAL stream: ERROR: could not create >> file "pg_replslot/some_sub/state.tmp": File exists > Hm. What was the log output leading to this state? Some cases of this > would end up in a PANIC, which'd remove the .tmp file during > recovery. But there's some where we won't - it seems the right fix for > this would be to unlink the tmp file in that case? > > >> I`ve digged a bit into this problem, and it`s turned out that in >> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT | >> O_EXCL' flags, which makes this routine as not very reentrant. >> >> Since an exclusive lock is taken before temp file creation, I think it >> should be safe to replace O_EXCL with O_TRUNC. > I'm very doubtful about this. I think it's a good safety measure to > ensure that there's no previous state file that we're somehow > overwriting. > > >> Script to reproduce and patch are attached. > Well: > >> # Imitate out_of_space/write_operation_error >> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp > Doesn't really replicate how we got into this state... I`ve managed to reproduce the issue using the attached script: 2019-12-02 20:32:12.547 MSK [4180] ERROR: could not write to file "pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device 2019-12-02 20:32:12.548 MSK [4178] ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device 2019-12-02 20:32:12.557 MSK [4645] LOG: logical replication apply worker for subscription "mysub" has started 2019-12-02 20:32:12.562 MSK [4091] LOG: background worker "logical replication worker" (PID 4178) exited with exit code 1 2019-12-02 20:32:12.569 MSK [4648] LOG: starting logical decoding for slot "mysub" 2019-12-02 20:32:12.569 MSK [4648] DETAIL: Streaming transactions committing after 0/160E9A28, reading WAL from 0/1566A2D8. 2019-12-02 20:32:12.569 MSK [4648] LOG: logical decoding found consistent point at 0/1566A2D8 2019-12-02 20:32:12.569 MSK [4648] DETAIL: Logical decoding will begin using saved snapshot. 2019-12-02 20:32:12.672 MSK [4648] ERROR: could not write to file "pg_replslot/mysub/state.tmp": No space left on device 2019-12-02 20:32:12.672 MSK [4645] ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_replslot/mysub/state.tmp": No space left on device 2019-12-02 20:32:12.672 MSK [4091] LOG: background worker "logical replication worker" (PID 4645) exited with exit code 1 2019-12-02 20:32:17.684 MSK [4653] LOG: logical replication apply worker for subscription "mysub" has started 2019-12-02 20:32:17.693 MSK [4654] LOG: starting logical decoding for slot "mysub" 2019-12-02 20:32:17.693 MSK [4654] DETAIL: Streaming transactions committing after 0/160E9A28, reading WAL from 0/1566A2D8. 2019-12-02 20:32:17.693 MSK [4654] LOG: logical decoding found consistent point at 0/1566A2D8 2019-12-02 20:32:17.693 MSK [4654] DETAIL: Logical decoding will begin using saved snapshot. 2019-12-02 20:32:17.847 MSK [4654] ERROR: could not write to file "pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device 2019-12-02 20:32:17.847 MSK [4653] ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device 2019-12-02 20:32:17.848 MSK [4091] LOG: background worker "logical replication worker" (PID 4653) exited with exit code 1 2019-12-02 20:32:22.859 MSK [4658] LOG: logical replication apply worker for subscription "mysub" has started 2019-12-02 20:32:22.868 MSK [4659] LOG: starting logical decoding for slot "mysub" 2019-12-02 20:32:22.868 MSK [4659] DETAIL: Streaming transactions committing after 0/160E9A28, reading WAL from 0/1566A2D8. 2019-12-02 20:32:22.868 MSK [4659] LOG: logical decoding found consistent point at 0/1566A2D8 2019-12-02 20:32:22.868 MSK [4659] DETAIL: Logical decoding will begin using saved snapshot. 2019-12-02 20:32:22.868 MSK [4659] ERROR: could not create file "pg_replslot/mysub/state.tmp": File exists 2019-12-02 20:32:22.869 MSK [4658] ERROR: could not receive data from WAL stream: ERROR: could not create file "pg_replslot/mysub/state.tmp": File exists > > Greetings, > > Andres Freund -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote: > I'm very doubtful about this. I think it's a good safety measure to > ensure that there's no previous state file that we're somehow > overwriting. During the checkpoint of replication slots, SaveSlotToPath() would just *LOG* any failure while leaving around the state.tmp of a slot, and then any follow-up attempt to create state.tmp would just fail because of that, preventing the slot state file from being flushed continuously. I think that's wrong. Concurrency is not a concern either here as the slot's LWLock to track an I/O in progress is taken in exclusive lock. -- Michael
Attachment
Hi, On 2019-12-11 14:27:45 +0900, Michael Paquier wrote: > On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote: > > I'm very doubtful about this. I think it's a good safety measure to > > ensure that there's no previous state file that we're somehow > > overwriting. > > During the checkpoint of replication slots, SaveSlotToPath() would > just *LOG* any failure while leaving around the state.tmp of a slot, > and then any follow-up attempt to create state.tmp would just fail > because of that, preventing the slot state file from being flushed > continuously. I think that's wrong. Concurrency is not a concern > either here as the slot's LWLock to track an I/O in progress is taken > in exclusive lock. I'm not clear on what you're saying? I don't see how I'm arguing for the type of behaviour you seem to be describing? Greetings, Andres Freund
On 2019-Nov-30, Grigory Smolkin wrote: > One of my colleagues encountered an out of space condition, which broke his > logical replication setup. > It`s manifested with the following errors: > > ERROR: could not receive data from WAL stream: ERROR: could not create > file "pg_replslot/some_sub/state.tmp": File exists We haven't addressed this problem, have we? At least, I see that this routine still looks like this: > diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c > index 21ae8531b3..ef415b44c8 100644 > --- a/src/backend/replication/slot.c > +++ b/src/backend/replication/slot.c > @@ -1253,7 +1253,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel) > sprintf(tmppath, "%s/state.tmp", dir); > sprintf(path, "%s/state", dir); > > - fd = OpenTransientFile(tmppath, O_CREAT | O_EXCL | O_WRONLY | PG_BINARY); -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, May 25, 2020 at 04:35:41PM -0400, Alvaro Herrera wrote: > We haven't addressed this problem, have we? At least, I see that this > routine still looks like this: Yeah, I think that we still need to do something here. I got this issue on my stack for some time but I have not been able to get around it.. -- Michael