Thread: why pg_walfile_name() cannot be executed during recovery?

why pg_walfile_name() cannot be executed during recovery?

From
SATYANARAYANA NARLAPURAM
Date:
Hello Hackers,

Why pg_walfile_name() can't be executed under recovery? What is the best way for me to get the current timeline and/or the file being recovering on the standby using a postgres query? I know I can get it via process title but don't want to go that route.

Thanks,
Satya

Re: why pg_walfile_name() cannot be executed during recovery?

From
Robert Haas
Date:
On Fri, Apr 2, 2021 at 4:23 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
> Why pg_walfile_name() can't be executed under recovery?

I believe the issue is that the backend executing the function might
not have an accurate idea about which TLI to use. But I don't
understand why we can't find some solution to that problem.

> What is the best way for me to get the current timeline and/or the file being recovering on the standby using a
postgresquery? I know I can get it via process title but don't want to go that route.
 

pg_stat_wal_receiver has LSN and TLI information, but probably won't
help except when WAL receiver is actually active.
pg_last_wal_receive_lsn() and pg_last_wal_replay_lsn() will give the
LSN at any point during recovery, but not the TLI. We might have some
gaps in this area...

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: why pg_walfile_name() cannot be executed during recovery?

From
Jehan-Guillaume de Rorthais
Date:
On Fri, 2 Apr 2021 08:22:09 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Apr 2, 2021 at 4:23 AM SATYANARAYANA NARLAPURAM
> <satyanarlapuram@gmail.com> wrote:
> > Why pg_walfile_name() can't be executed under recovery?  
> 
> I believe the issue is that the backend executing the function might
> not have an accurate idea about which TLI to use. But I don't
> understand why we can't find some solution to that problem.
> 
> > What is the best way for me to get the current timeline and/or the file
> > being recovering on the standby using a postgres query? I know I can get it
> > via process title but don't want to go that route.  
> 
> pg_stat_wal_receiver has LSN and TLI information, but probably won't
> help except when WAL receiver is actually active.
> pg_last_wal_receive_lsn() and pg_last_wal_replay_lsn() will give the
> LSN at any point during recovery, but not the TLI. We might have some
> gaps in this area...

Yep, see previous discussion:
https://www.postgresql.org/message-id/flat/20190723180518.635ac554%40firost

The status by the time was to consider a new view eg. pg_stat_recovery, to
report various recovery stats.

But maybe the best place now would be to include it in the new pg_stat_wal view?

As I'm interesting with this feature as well, I volunteer to work on it as
author or reviewer.

Regards,



Re: why pg_walfile_name() cannot be executed during recovery?

From
Bharath Rupireddy
Date:
On Fri, Apr 2, 2021 at 5:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 2, 2021 at 4:23 AM SATYANARAYANA NARLAPURAM
> <satyanarlapuram@gmail.com> wrote:
> > Why pg_walfile_name() can't be executed under recovery?
>
> I believe the issue is that the backend executing the function might
> not have an accurate idea about which TLI to use. But I don't
> understand why we can't find some solution to that problem.
>
> > What is the best way for me to get the current timeline and/or the file being recovering on the standby using a
postgresquery? I know I can get it via process title but don't want to go that route.
 
>
> pg_stat_wal_receiver has LSN and TLI information, but probably won't
> help except when WAL receiver is actually active.
> pg_last_wal_receive_lsn() and pg_last_wal_replay_lsn() will give the
> LSN at any point during recovery, but not the TLI. We might have some
> gaps in this area...

I spent some time today to allow pg_walfile_{name, name_offset} run in
recovery. Timeline ID is computed while in recovery as follows - WAL
receiver's last received and flushed WAL record's TLI if it's
streaming, otherwise the last replayed WAL record's TLI. This way,
these functions can be used on standby or PITR server or even in crash
recovery if the server opens up for read-only connections.

Please have a look at the attached patch.

If the approach looks okay, I can add notes in the documentation.

Regards,
Bharath Rupireddy.

Attachment

Re: why pg_walfile_name() cannot be executed during recovery?

From
Robert Haas
Date:
On Thu, Apr 7, 2022 at 9:32 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I spent some time today to allow pg_walfile_{name, name_offset} run in
> recovery. Timeline ID is computed while in recovery as follows - WAL
> receiver's last received and flushed WAL record's TLI if it's
> streaming, otherwise the last replayed WAL record's TLI. This way,
> these functions can be used on standby or PITR server or even in crash
> recovery if the server opens up for read-only connections.

I don't think this is a good definition. Suppose I ask for
pg_walfile_name() using an older LSN. With this approach, we're going
to get a filename based on the idea that the TLI that was in effect
back then is the same one as the TLI that is in effect now, which
might not be true. For example, suppose that the current TLI is 2 and
it branched off of timeline 1 at 10/0. If I ask for
pg_walfile_name('F/0'), it's going to give me the name of a WAL file
that has never existed. That seems bad.

It's also worth noting that there's a bit of a definitional problem
here. If in the same situation, I ask for pg_walfile_name('11/0'),
it's going to give me a filename based on TLI 2, but there's also a
WAL file for that LSN with TLI 1. How do we know which one the user
wants? Perhaps one idea would be to say that the relevant TLI is the
one which was in effect at the time that LSN was replayed. If we do
that, what about future LSNs? We could assume that for future LSNs,
the TLI should be the same as the current TLI, but maybe that's also
misleading, because recovery_target_timeline could be set.

I think it's really important to start by being precise about the
question that we think pg_walfile_name() ought to be answering. If we
don't know that, then we really can't say what TLI it should be using.
It's not hard to make the function return SOME answer using SOME TLI,
but then it's not clear that the answer is the right one for any
particular purpose. And in that case the function is more dangerous
than useful, because people will write code that uses it to do stuff,
and then that stuff won't actually work correctly under all
circumstances.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: why pg_walfile_name() cannot be executed during recovery?

From
Michael Paquier
Date:
On Thu, Apr 07, 2022 at 11:37:15AM -0400, Robert Haas wrote:
> It's also worth noting that there's a bit of a definitional problem
> here. If in the same situation, I ask for pg_walfile_name('11/0'),
> it's going to give me a filename based on TLI 2, but there's also a
> WAL file for that LSN with TLI 1. How do we know which one the user
> wants? Perhaps one idea would be to say that the relevant TLI is the
> one which was in effect at the time that LSN was replayed. If we do
> that, what about future LSNs? We could assume that for future LSNs,
> the TLI should be the same as the current TLI, but maybe that's also
> misleading, because recovery_target_timeline could be set.

FWIW, for future positions, I'd be rather on board with the concept of
using the TLI currently being replayed, but as you say that comes down
to the definition borders we want to use.  Another possibility would
be to return an error and kick the can down the road if we are unsure
of what the right behavior is.  For past positions, this should go
through a lookup of the timeline history file (the patch does not do
that at quick glance).

> I think it's really important to start by being precise about the
> question that we think pg_walfile_name() ought to be answering. If we
> don't know that, then we really can't say what TLI it should be using.
> It's not hard to make the function return SOME answer using SOME TLI,
> but then it's not clear that the answer is the right one for any
> particular purpose. And in that case the function is more dangerous
> than useful, because people will write code that uses it to do stuff,
> and then that stuff won't actually work correctly under all
> circumstances.

Agreed.
--
Michael

Attachment

Re: why pg_walfile_name() cannot be executed during recovery?

From
Bharath Rupireddy
Date:
On Thu, Apr 7, 2022 at 9:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 7, 2022 at 9:32 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > I spent some time today to allow pg_walfile_{name, name_offset} run in
> > recovery. Timeline ID is computed while in recovery as follows - WAL
> > receiver's last received and flushed WAL record's TLI if it's
> > streaming, otherwise the last replayed WAL record's TLI. This way,
> > these functions can be used on standby or PITR server or even in crash
> > recovery if the server opens up for read-only connections.
>
> I don't think this is a good definition. Suppose I ask for
> pg_walfile_name() using an older LSN. With this approach, we're going
> to get a filename based on the idea that the TLI that was in effect
> back then is the same one as the TLI that is in effect now, which
> might not be true. For example, suppose that the current TLI is 2 and
> it branched off of timeline 1 at 10/0. If I ask for
> pg_walfile_name('F/0'), it's going to give me the name of a WAL file
> that has never existed. That seems bad.
>
> It's also worth noting that there's a bit of a definitional problem
> here. If in the same situation, I ask for pg_walfile_name('11/0'),
> it's going to give me a filename based on TLI 2, but there's also a
> WAL file for that LSN with TLI 1. How do we know which one the user
> wants? Perhaps one idea would be to say that the relevant TLI is the
> one which was in effect at the time that LSN was replayed. If we do
> that, what about future LSNs? We could assume that for future LSNs,
> the TLI should be the same as the current TLI, but maybe that's also
> misleading, because recovery_target_timeline could be set.

Fundamental question - should the pg_walfile_{name, name_offset} check
whether the file with the computed WAL file name exists on the server
right now or ever existed earlier? Right now, they don't do that, see
[1].

I think we can make the functions more robust:
pg_walfile_{name, name_offset}(lsn, check_if_file_exists = false, tli
= invalid_timelineid) - when check_if_file_exists is true checks for
the computed WAL file existence and when a valid tli is provided uses
it in computing the WAL file name. When tli isn't provided, it
continues to use insert tli for primary, and in recovery it uses tli
as proposed in my patch. Perhaps, it can also do (as Michael
suggested) this - if check_if_file_exists is true and tli isn't
provided and there's timeline history, then it can go look at all the
timelines and whether the file exists with the computed name with
history tli.

> I think it's really important to start by being precise about the
> question that we think pg_walfile_name() ought to be answering. If we
> don't know that, then we really can't say what TLI it should be using.
> It's not hard to make the function return SOME answer using SOME TLI,
> but then it's not clear that the answer is the right one for any
> particular purpose. And in that case the function is more dangerous
> than useful, because people will write code that uses it to do stuff,
> and then that stuff won't actually work correctly under all
> circumstances.

Yes, once we agree on the semantics of these functions, having better
documentation will help.

Thoughts?

[1]
postgres=# select * from pg_walfile_name('50000/dfdf');
     pg_walfile_name
--------------------------
 000000010005000000000000
(1 row)
postgres=# select * from pg_walfile_name_offset('50000/dfdf');
        file_name         | file_offset
--------------------------+-------------
 000000010005000000000000 |       57311
(1 row)

Regards,
Bharath Rupireddy.



Re: why pg_walfile_name() cannot be executed during recovery?

From
Robert Haas
Date:
On Fri, Apr 8, 2022 at 9:31 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Fundamental question - should the pg_walfile_{name, name_offset} check
> whether the file with the computed WAL file name exists on the server
> right now or ever existed earlier? Right now, they don't do that, see
> [1].

I don't think that checking whether the file exists is the right
approach. However, I do think that it's important to be precise about
which TLI is going to be used. I think it would be reasonable to
redefine this function (on both the primary and the standby) so that
the TLI that is used is the one that was in effect at the time record
at the given LSN was either written or replayed. Then, you could
potentially use this function to figure out whether you still have the
WAL files that are needed to replay up to some previous point in the
WAL stream. However, what about the segments where we switched from
one TLI to the next in the middle of the segment? There, you probably
need both the old and the new segments, or maybe if you're trying to
stream them you only need the new one because we have some weird
special case that will send the segment from the new timeline when the
segment from the old timeline is requested. So you couldn't just call
this function on one LSN per segment and call it good, and it wouldn't
necessarily be the case that the filenames you got back were exactly
the ones you needed.

So I'm not entirely sure this proposal is good enough, but it at least
would have the advantage of meaning that the filename you get back is
one that existed at some point in time and somebody used it for
something.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: why pg_walfile_name() cannot be executed during recovery?

From
Bharath Rupireddy
Date:
On Fri, Apr 8, 2022 at 7:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 8, 2022 at 9:31 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Fundamental question - should the pg_walfile_{name, name_offset} check
> > whether the file with the computed WAL file name exists on the server
> > right now or ever existed earlier? Right now, they don't do that, see
> > [1].
>
> I don't think that checking whether the file exists is the right
> approach. However, I do think that it's important to be precise about
> which TLI is going to be used. I think it would be reasonable to
> redefine this function (on both the primary and the standby) so that
> the TLI that is used is the one that was in effect at the time record
> at the given LSN was either written or replayed. Then, you could
> potentially use this function to figure out whether you still have the
> WAL files that are needed to replay up to some previous point in the
> WAL stream. However, what about the segments where we switched from
> one TLI to the next in the middle of the segment? There, you probably
> need both the old and the new segments, or maybe if you're trying to
> stream them you only need the new one because we have some weird
> special case that will send the segment from the new timeline when the
> segment from the old timeline is requested. So you couldn't just call
> this function on one LSN per segment and call it good, and it wouldn't
> necessarily be the case that the filenames you got back were exactly
> the ones you needed.
>
> So I'm not entirely sure this proposal is good enough, but it at least
> would have the advantage of meaning that the filename you get back is
> one that existed at some point in time and somebody used it for
> something.

Using insert tli when not in recovery and using tli of the last WAL
replayed record in crash/archive/standby recovery, seems a reasonable
choice to me. I've also added a note in the docs.

Attaching v2 with the above change. Please review it further.

Regards,
Bharath Rupireddy.

Attachment

Re: why pg_walfile_name() cannot be executed during recovery?

From
Andrey Borodin
Date:

> On 9 Apr 2022, at 18:30, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Using insert tli when not in recovery and using tli of the last WAL
> replayed record in crash/archive/standby recovery, seems a reasonable
> choice to me.

Please excuse me if I'm not attentive enough. I've read this thread. And I could not find what is the problem that you
aresolving. What is the purpose of the WAL file name you want to obtain? 

pg_walfile_name() - is a formatting function. With TLI as an hidden argument. If we want it to work on Standby we
shouldjust convert it to pure formatting function without access the the DB state, pass TLI as an argument. 
Making implicit TLI computation with certain expectations is not a good idea IMV.

pg_walfile_name() could just read .history file, determine which TLI contains given LSN and format the name. And still
there'sa tricky segments during TLI switch. 

Either way we can rename the function to pg_walfile_name_as_if_on_timeline_of_last_wal_replayed().

Thanks!

Best regards, Andrey Borodin.


Re: why pg_walfile_name() cannot be executed during recovery?

From
Robert Haas
Date:
On Sat, Apr 9, 2022 at 12:25 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Please excuse me if I'm not attentive enough. I've read this thread. And I could not find what is the problem that
youare solving. What is the purpose of the WAL file name you want to obtain?
 

Yeah, I'd also like to know this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: why pg_walfile_name() cannot be executed during recovery?

From
Bharath Rupireddy
Date:
On Sat, Apr 9, 2022 at 10:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Apr 9, 2022 at 12:25 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > Please excuse me if I'm not attentive enough. I've read this thread. And I could not find what is the problem that
youare solving. What is the purpose of the WAL file name you want to obtain?
 
>
> Yeah, I'd also like to know this.

IMO, uses of pg_walfile_{name, name_offset} are plenty. Say, I have
LSNs (say, flush, insert, replayed or WAL receiver latest received)
and I would like to know the WAL file name and offset in an app
connecting to postgres or a control plane either for doing some
reporting or figuring out whether a WAL file exists given an LSN or
for some other reason. With these functions restricted when the server
is in recovery mode, the apps or control plane code can't use them and
they have to do if (!pg_is_in_recovery()) {select * from
pg_walfile_{name, name_offset}.

Am I missing any other important use-cases?

Regards,
Bharath Rupireddy.



Re: why pg_walfile_name() cannot be executed during recovery?

From
Andrey Borodin
Date:

> 22 апр. 2022 г., в 19:15, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> написал(а):
>
> On Sat, Apr 9, 2022 at 10:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Sat, Apr 9, 2022 at 12:25 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>>> Please excuse me if I'm not attentive enough. I've read this thread. And I could not find what is the problem that
youare solving. What is the purpose of the WAL file name you want to obtain? 
>>
>> Yeah, I'd also like to know this.
>
> IMO, uses of pg_walfile_{name, name_offset} are plenty. Say, I have
> LSNs (say, flush, insert, replayed or WAL receiver latest received)
AFAIK flush, receive and replay LSNs may be on 3 different timelines rendering two names incorrect. Actually, this
provesthat pg_wal_filename() should not be called on Standby with a present function prototype. 

> and I would like to know the WAL file name and offset in an app
> connecting to postgres or a control plane either for doing some
> reporting
What kind of reporting?

> or figuring out whether a WAL file exists given an LSN or
> for some other reason.
There might me many WAL files on the same LSN. Please, specify more detailed scenario to use WAL file name.

> With these functions restricted when the server
> is in recovery mode, the apps or control plane code can't use them and
> they have to do if (!pg_is_in_recovery()) {select * from
> pg_walfile_{name, name_offset}.
>
> Am I missing any other important use-cases?

I do not see correct use-case among these. You justify necessity to run pg_wal_filename() on Standby by having a LSN
(nota problem), by doing some kind of reporting (to broad problem) and checking existence of some WAL file (more
detailsneeded). What is the problem leading to checking the existence of the file? 

Thanks!

Best regards, Andrey Borodin.