Thread: logical replication: could not create file "state.tmp": File exists

logical replication: could not create file "state.tmp": File exists

From
Grigory Smolkin
Date:
Hello!

One of my colleagues encountered an out of space condition, which broke 
his logical replication setup.
It`s manifested with the following errors:

ERROR:  could not receive data from WAL stream: ERROR:  could not create 
file "pg_replslot/some_sub/state.tmp": File exists

I`ve digged a bit into this problem, and it`s turned out that in 
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT 
| O_EXCL' flags, which makes this routine as not very reentrant.

Since an exclusive lock is taken before temp file creation, I think it 
should be safe to replace O_EXCL with O_TRUNC.
Script to reproduce and patch are attached.

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: logical replication: could not create file "state.tmp": Fileexists

From
Michael Paquier
Date:
On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:
> I`ve digged a bit into this problem, and it`s turned out that in
> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
> O_EXCL' flags, which makes this routine as not very reentrant.

What did you see as I/O problem before facing the actual error
reported here?  Was it just ENOSPC, a fsync failure, or just a failure
in closing the fd?  The first pattern is mostly what I guess happened,
still a fsync failure would not trigger a PANIC here (actually we
really should do that!), but I am raising a different thread about
that issue.

> Since an exclusive lock is taken before temp file creation, I think it
> should be safe to replace O_EXCL with O_TRUNC.
> Script to reproduce and patch are attached.

Agreed.  I prefer the O_TRUNC option because that's less code churn.
Also, as it can still be useful to have a look at the temporary state
file after a crash or a failure, doing unlink() in the error code
paths is no good option IMO.

Have others thoughts or objections to share?
--
Michael

Attachment

Re: logical replication: could not create file "state.tmp": Fileexists

From
Grigory Smolkin
Date:
On 12/2/19 7:35 AM, Michael Paquier wrote:
> On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
> What did you see as I/O problem before facing the actual error
> reported here?  Was it just ENOSPC, a fsync failure, or just a failure
> in closing the fd?  The first pattern is mostly what I guess happened,
> still a fsync failure would not trigger a PANIC here (actually we
> really should do that!), but I am raising a different thread about
> that issue.

Hello!

I didn`t see the very first error that left behind the temp file.
I`ve requested it just now, but it will take some time to get it (there 
are several terabytes of text log).
But I assume that it was out of space error, which, by the look of the 
code, should produce ERROR and leave temp file behind, just as it 
happened in aforementioned  case.


>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
>> Script to reproduce and patch are attached.
> Agreed.  I prefer the O_TRUNC option because that's less code churn.
> Also, as it can still be useful to have a look at the temporary state
> file after a crash or a failure, doing unlink() in the error code
> paths is no good option IMO.

I`m sorry, but it was an production system, so, as I understand it, 
stale temp file was hastily deleted without long considerations.

Thank you for your interest in this topic.

>
> Have others thoughts or objections to share?
> --
> Michael

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: logical replication: could not create file "state.tmp": Fileexists

From
Andres Freund
Date:
Hi,

On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
> One of my colleagues encountered an out of space condition, which broke his
> logical replication setup.
> It`s manifested with the following errors:
> 
> ERROR:  could not receive data from WAL stream: ERROR:  could not create
> file "pg_replslot/some_sub/state.tmp": File exists

Hm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?


> I`ve digged a bit into this problem, and it`s turned out that in
> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
> O_EXCL' flags, which makes this routine as not very reentrant.
> 
> Since an exclusive lock is taken before temp file creation, I think it
> should be safe to replace O_EXCL with O_TRUNC.

I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.


> Script to reproduce and patch are attached.

Well:

> # Imitate out_of_space/write_operation_error
> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp

Doesn't really replicate how we got into this state...

Greetings,

Andres Freund



Re: logical replication: could not create file "state.tmp": Fileexists

From
Grigory Smolkin
Date:
On 12/2/19 7:12 PM, Andres Freund wrote:
> Hi,
>
> On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
>> One of my colleagues encountered an out of space condition, which broke his
>> logical replication setup.
>> It`s manifested with the following errors:
>>
>> ERROR:  could not receive data from WAL stream: ERROR:  could not create
>> file "pg_replslot/some_sub/state.tmp": File exists
> Hm. What was the log output leading to this state? Some cases of this
> would end up in a PANIC, which'd remove the .tmp file during
> recovery. But there's some where we won't - it seems the right fix for
> this would be to unlink the tmp file in that case?
>
>
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
>>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.
Is it possible with exclusive lock taken before that?
>
>
>> Script to reproduce and patch are attached.
> Well:
>
>> # Imitate out_of_space/write_operation_error
>> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
> Doesn't really replicate how we got into this state...

But it replicate the exactly the same state we would get, if write() to 
temp file would have failed with out of space.


>
> Greetings,
>
> Andres Freund
>
>
-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: logical replication: could not create file "state.tmp": File exists

From
Dmitry Vasiliev
Date:
Here's what happened from the publisher and subscriber point of view:

publisher:  (some query) ERROR: could not write to tuplestore temporary file: No space left on device
subscriber: db =, user =, app =, client = ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_logical/snapshots/2AE-F3E52FB8.snap.27574.tmp": No space left on device
subscriber: db =, user =, app =, client = LOG: background worker "logical replication worker" (PID 23114) exited with exit code 1
subscriber: db =, user =, app =, client = LOG: logical replication apply worker for subscription "<name> _sub" has started
publisher:  LOG: received replication command: IDENTIFY_SYSTEM
publisher:  LOG: received replication command: START_REPLICATION SLOT "<name> _sub" LOGICAL 2AE/F3C0B920 (proto_version '1', publication_names '"<name>_pub"')
publisher:  ERROR: could not create file "pg_replslot /<name>_sub/state.tmp": File exists

I think some publisher logs may not be available due to out of space problem.

On Mon, Dec 2, 2019 at 7:54 PM Grigory Smolkin <g.smolkin@postgrespro.ru> wrote:

On 12/2/19 7:12 PM, Andres Freund wrote:
> Hi,
>
> On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
>> One of my colleagues encountered an out of space condition, which broke his
>> logical replication setup.
>> It`s manifested with the following errors:
>>
>> ERROR:  could not receive data from WAL stream: ERROR:  could not create
>> file "pg_replslot/some_sub/state.tmp": File exists
> Hm. What was the log output leading to this state? Some cases of this
> would end up in a PANIC, which'd remove the .tmp file during
> recovery. But there's some where we won't - it seems the right fix for
> this would be to unlink the tmp file in that case?
>
>
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
>>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.
Is it possible with exclusive lock taken before that?
>
>
>> Script to reproduce and patch are attached.
> Well:
>
>> # Imitate out_of_space/write_operation_error
>> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
> Doesn't really replicate how we got into this state...

But it replicate the exactly the same state we would get, if write() to
temp file would have failed with out of space.


>
> Greetings,
>
> Andres Freund
>
>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: logical replication: could not create file "state.tmp": Fileexists

From
Grigory Smolkin
Date:
On 12/2/19 7:12 PM, Andres Freund wrote:
> Hi,
>
> On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
>> One of my colleagues encountered an out of space condition, which broke his
>> logical replication setup.
>> It`s manifested with the following errors:
>>
>> ERROR:  could not receive data from WAL stream: ERROR:  could not create
>> file "pg_replslot/some_sub/state.tmp": File exists
> Hm. What was the log output leading to this state? Some cases of this
> would end up in a PANIC, which'd remove the .tmp file during
> recovery. But there's some where we won't - it seems the right fix for
> this would be to unlink the tmp file in that case?
>
>
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
>>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.
>
>
>> Script to reproduce and patch are attached.
> Well:
>
>> # Imitate out_of_space/write_operation_error
>> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
> Doesn't really replicate how we got into this state...


I`ve managed to reproduce the issue using the attached script:

2019-12-02 20:32:12.547 MSK [4180] ERROR:  could not write to file 
"pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device
2019-12-02 20:32:12.548 MSK [4178] ERROR:  could not receive data from 
WAL stream: ERROR:  could not write to file 
"pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device
2019-12-02 20:32:12.557 MSK [4645] LOG:  logical replication apply 
worker for subscription "mysub" has started
2019-12-02 20:32:12.562 MSK [4091] LOG:  background worker "logical 
replication worker" (PID 4178) exited with exit code 1
2019-12-02 20:32:12.569 MSK [4648] LOG:  starting logical decoding for 
slot "mysub"
2019-12-02 20:32:12.569 MSK [4648] DETAIL:  Streaming transactions 
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:12.569 MSK [4648] LOG:  logical decoding found 
consistent point at 0/1566A2D8
2019-12-02 20:32:12.569 MSK [4648] DETAIL:  Logical decoding will begin 
using saved snapshot.
2019-12-02 20:32:12.672 MSK [4648] ERROR:  could not write to file 
"pg_replslot/mysub/state.tmp": No space left on device
2019-12-02 20:32:12.672 MSK [4645] ERROR:  could not receive data from 
WAL stream: ERROR:  could not write to file 
"pg_replslot/mysub/state.tmp": No space left on device
2019-12-02 20:32:12.672 MSK [4091] LOG:  background worker "logical 
replication worker" (PID 4645) exited with exit code 1
2019-12-02 20:32:17.684 MSK [4653] LOG:  logical replication apply 
worker for subscription "mysub" has started
2019-12-02 20:32:17.693 MSK [4654] LOG:  starting logical decoding for 
slot "mysub"
2019-12-02 20:32:17.693 MSK [4654] DETAIL:  Streaming transactions 
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:17.693 MSK [4654] LOG:  logical decoding found 
consistent point at 0/1566A2D8
2019-12-02 20:32:17.693 MSK [4654] DETAIL:  Logical decoding will begin 
using saved snapshot.
2019-12-02 20:32:17.847 MSK [4654] ERROR:  could not write to file 
"pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device
2019-12-02 20:32:17.847 MSK [4653] ERROR:  could not receive data from 
WAL stream: ERROR:  could not write to file 
"pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device
2019-12-02 20:32:17.848 MSK [4091] LOG:  background worker "logical 
replication worker" (PID 4653) exited with exit code 1
2019-12-02 20:32:22.859 MSK [4658] LOG:  logical replication apply 
worker for subscription "mysub" has started
2019-12-02 20:32:22.868 MSK [4659] LOG:  starting logical decoding for 
slot "mysub"
2019-12-02 20:32:22.868 MSK [4659] DETAIL:  Streaming transactions 
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:22.868 MSK [4659] LOG:  logical decoding found 
consistent point at 0/1566A2D8
2019-12-02 20:32:22.868 MSK [4659] DETAIL:  Logical decoding will begin 
using saved snapshot.
2019-12-02 20:32:22.868 MSK [4659] ERROR:  could not create file 
"pg_replslot/mysub/state.tmp": File exists
2019-12-02 20:32:22.869 MSK [4658] ERROR:  could not receive data from 
WAL stream: ERROR:  could not create file "pg_replslot/mysub/state.tmp": 
File exists



>
> Greetings,
>
> Andres Freund

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: logical replication: could not create file "state.tmp": Fileexists

From
Michael Paquier
Date:
On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote:
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.

During the checkpoint of replication slots, SaveSlotToPath() would
just *LOG* any failure while leaving around the state.tmp of a slot,
and then any follow-up attempt to create state.tmp would just fail
because of that, preventing the slot state file from being flushed
continuously.  I think that's wrong.  Concurrency is not a concern
either here as the slot's LWLock to track an I/O in progress is taken
in exclusive lock.
--
Michael

Attachment

Re: logical replication: could not create file "state.tmp": Fileexists

From
Andres Freund
Date:
Hi,

On 2019-12-11 14:27:45 +0900, Michael Paquier wrote:
> On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote:
> > I'm very doubtful about this. I think it's a good safety measure to
> > ensure that there's no previous state file that we're somehow
> > overwriting.
>
> During the checkpoint of replication slots, SaveSlotToPath() would
> just *LOG* any failure while leaving around the state.tmp of a slot,
> and then any follow-up attempt to create state.tmp would just fail
> because of that, preventing the slot state file from being flushed
> continuously.  I think that's wrong.  Concurrency is not a concern
> either here as the slot's LWLock to track an I/O in progress is taken
> in exclusive lock.

I'm not clear on what you're saying? I don't see how I'm arguing for the
type of behaviour you seem to be describing?

Greetings,

Andres Freund



Re: logical replication: could not create file "state.tmp": Fileexists

From
Alvaro Herrera
Date:
On 2019-Nov-30, Grigory Smolkin wrote:

> One of my colleagues encountered an out of space condition, which broke his
> logical replication setup.
> It`s manifested with the following errors:
> 
> ERROR:  could not receive data from WAL stream: ERROR:  could not create
> file "pg_replslot/some_sub/state.tmp": File exists

We haven't addressed this problem, have we?  At least, I see that this
routine still looks like this:

> diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
> index 21ae8531b3..ef415b44c8 100644
> --- a/src/backend/replication/slot.c
> +++ b/src/backend/replication/slot.c
> @@ -1253,7 +1253,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
>      sprintf(tmppath, "%s/state.tmp", dir);
>      sprintf(path, "%s/state", dir);
>  
> -    fd = OpenTransientFile(tmppath, O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical replication: could not create file "state.tmp": Fileexists

From
Michael Paquier
Date:
On Mon, May 25, 2020 at 04:35:41PM -0400, Alvaro Herrera wrote:
> We haven't addressed this problem, have we?  At least, I see that this
> routine still looks like this:

Yeah, I think that we still need to do something here.  I got this
issue on my stack for some time but I have not been able to get around
it..
--
Michael

Attachment