Thread: Forcing current WAL file to be archived

Forcing current WAL file to be archived

From
Simon Riggs
Date:
Patch included to implement xlog switching, using an xlog record
"processing instruction" and forcibly moving xlog pointers.

1. Happens automatically on pg_stop_backup()

2. Can happen manually via pg_switch_xlog()

3. Implement range of utility functions:
    pg_current_wal_offset()
    pg_current_xlogfile()
    pg_current_xlogfile_offset() - for Hannu
    pg_xlogfile_from_wal_offset() for interpreting output from
pg_switch_xlog, pg_start/stop_backup()

Passes make check, applies cleanly to HEAD, includes doc patches with
clean SGML builds.

Design as clean as possible given and has implementation of
archive_timeout in mind also. Happy to work further on any code cleanups
requested.

I've done a variety of testing on it, doing concurrent pg_regress and
pg_switch_xlog(). All known issues resolved. Main test cases and sample
outputs are in switchtest.sh

Wide variety of cases need testing, so I'm expecting some further issues
to be reported.

I'm now working on completing the restartable recovery patch, which will
include further tests of PITR recoveries on the xswitch.patch.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com

Attachment

Re: Forcing current WAL file to be archived

From
"Albe Laurenz"
Date:
Simon Riggs wrote:
> Patch included to implement xlog switching, using an xlog record
> "processing instruction" and forcibly moving xlog pointers.
>
> 1. Happens automatically on pg_stop_backup()

Oh - so it will not be possible to do an online backup
_without_ forcing a WAL switch any more?

Laurenz Albe

Re: Forcing current WAL file to be archived

From
Tim Allen
Date:
Albe Laurenz wrote:
> Simon Riggs wrote:
>
>>Patch included to implement xlog switching, using an xlog record
>>"processing instruction" and forcibly moving xlog pointers.
>>
>>1. Happens automatically on pg_stop_backup()
>
>
> Oh - so it will not be possible to do an online backup
> _without_ forcing a WAL switch any more?

Well, previously, you would have always had to simulate a wal switch, by
working out which is the current wal file and copying that. Otherwise
your online backup wouldn't be complete.

What Simon is describing sounds like a big step forward from that
situation. It should let me delete half the code in my pitr
backup/failover scripts. Definitely a Good Thing.

> Laurenz Albe

Tim


Attachment

Re: Forcing current WAL file to be archived

From
"Albe Laurenz"
Date:
Tim Allen wrote:
>>>Patch included to implement xlog switching, using an xlog record
>>>"processing instruction" and forcibly moving xlog pointers.
>>>
>>>1. Happens automatically on pg_stop_backup()
>>
>>
>> Oh - so it will not be possible to do an online backup
>> _without_ forcing a WAL switch any more?
>
> Well, previously, you would have always had to simulate a wal
> switch, by
> working out which is the current wal file and copying that. Otherwise
> your online backup wouldn't be complete.
>
> What Simon is describing sounds like a big step forward from that
> situation. It should let me delete half the code in my pitr
> backup/failover scripts. Definitely a Good Thing.

Certainly a Good Thing, and it should be on by default.

But couldn't there be situations where you'd like to do an
online backup without a WAL switch? To avoid generating an
archive WAL every day on a database with few changes, e.g.?

Maybe not, I'm just wondering.

Laurenz Albe

Re: [HACKERS] Forcing current WAL file to be archived

From
"Florian G. Pflug"
Date:
Albe Laurenz wrote:
> Tim Allen wrote:
>>>> Patch included to implement xlog switching, using an xlog record
>>>> "processing instruction" and forcibly moving xlog pointers.
>>>>
>>>> 1. Happens automatically on pg_stop_backup()
>>>
>>> Oh - so it will not be possible to do an online backup
>>> _without_ forcing a WAL switch any more?
>> Well, previously, you would have always had to simulate a wal
>> switch, by
>> working out which is the current wal file and copying that. Otherwise
>> your online backup wouldn't be complete.
>>
>> What Simon is describing sounds like a big step forward from that
>> situation. It should let me delete half the code in my pitr
>> backup/failover scripts. Definitely a Good Thing.
>
> Certainly a Good Thing, and it should be on by default.
>
> But couldn't there be situations where you'd like to do an
> online backup without a WAL switch? To avoid generating an
> archive WAL every day on a database with few changes, e.g.?

But the online backup would be impossible to restore, if you don't
have enough wal archived to recover past the point where you called
pg_stop_backup().

So, doing a wal switch when pg_stop_backup() is called greatly reduces
the risk of a user error that leads to broken backups.

greetings, Florian Pflug




Re: Forcing current WAL file to be archived

From
Bruce Momjian
Date:
Albe Laurenz wrote:
> Tim Allen wrote:
> >>>Patch included to implement xlog switching, using an xlog record
> >>>"processing instruction" and forcibly moving xlog pointers.
> >>>
> >>>1. Happens automatically on pg_stop_backup()
> >>
> >>
> >> Oh - so it will not be possible to do an online backup
> >> _without_ forcing a WAL switch any more?
> >
> > Well, previously, you would have always had to simulate a wal
> > switch, by
> > working out which is the current wal file and copying that. Otherwise
> > your online backup wouldn't be complete.
> >
> > What Simon is describing sounds like a big step forward from that
> > situation. It should let me delete half the code in my pitr
> > backup/failover scripts. Definitely a Good Thing.
>
> Certainly a Good Thing, and it should be on by default.
>
> But couldn't there be situations where you'd like to do an
> online backup without a WAL switch? To avoid generating an
> archive WAL every day on a database with few changes, e.g.?
>
> Maybe not, I'm just wondering.

Considering the I/O caused by the backup, a new WAL file seems
insignificant, and until a log switch, the backup isn't useful.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Forcing current WAL file to be archived

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Patch included to implement xlog switching, using an xlog record
> "processing instruction" and forcibly moving xlog pointers.

Applied with revisions.  I didn't like the extra state you added to
track whether an xlog switch had occurred --- the more bits of
interdependent state the more chance for bugs, IMHO, and it seemed
unnecessary since it's easy enough to test whether we are at a
segment boundary.  I also made the new user-level functions a bit
more orthogonal, so that filenames could be extracted from the
existing functions like pg_stop_backup.

            regards, tom lane

Re: Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Sat, 2006-08-05 at 23:57 -0400, Tom Lane wrote:

> I also made the new user-level functions a bit
> more orthogonal, so that filenames could be extracted from the
> existing functions like pg_stop_backup.

Something Hannu wrote has just reminded me that
pg_current_xlog_location() returns the current Insert pointer rather
than the current Write pointer.

That would not be useful for streaming xlog records would it?

Methinks it should be the Write pointer all of the time, since I can't
think of a valid reason for wanting to know where the Insert pointer is
*before* we've written to the xlog file. Having it be the Insert pointer
could lead to some errors.

Any objections if I correct that?

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com


Re: Forcing current WAL file to be archived

From
Hannu Krosing
Date:
Ühel kenal päeval, K, 2006-08-09 kell 12:56, kirjutas Simon Riggs:
> On Sat, 2006-08-05 at 23:57 -0400, Tom Lane wrote:
>
> > I also made the new user-level functions a bit
> > more orthogonal, so that filenames could be extracted from the
> > existing functions like pg_stop_backup.
>
> Something Hannu wrote has just reminded me that
> pg_current_xlog_location() returns the current Insert pointer rather
> than the current Write pointer.
>
> That would not be useful for streaming xlog records would it?
>
> Methinks it should be the Write pointer all of the time, since I can't
> think of a valid reason for wanting to know where the Insert pointer is
> *before* we've written to the xlog file. Having it be the Insert pointer
> could lead to some errors.
>
> Any objections if I correct that?

What is the difference ?

I'd expect it to point either to last byte written or to the next byte
that will be written, and I want to know which one it is :)


And another question:

is is possible that under some circumstances the last few bytes of a WAL
file will not be written to ?

or is the writing done as if all the wal files together form one huge
tape, without any gaps between ?

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: Forcing current WAL file to be archived

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Something Hannu wrote has just reminded me that
> pg_current_xlog_location() returns the current Insert pointer rather
> than the current Write pointer.
> That would not be useful for streaming xlog records would it?

Good point.

> Methinks it should be the Write pointer all of the time, since I can't
> think of a valid reason for wanting to know where the Insert pointer is
> *before* we've written to the xlog file. Having it be the Insert pointer
> could lead to some errors.

However the start/stop_backup functions return the Insert pointer.
I can see scripts getting confused if pg_current_xlog_location reports
something less than what they just got from pg_stop_backup.

Is there value in exposing both pointers?  (Maybe not, it'll just cause
confusion probably.)

Another option is to have pg_current_xlog_location force a write (but
not fsync) as far as the Insert pointer it's about to return.  This
would eliminate any issues about inconsistency between results, but
perhaps there's too much performance penalty.

I'm not necessarily against your suggestion, just trying to be sure
we've thought about all the options.

            regards, tom lane

Re: Forcing current WAL file to be archived

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> Ühel kenal päeval, K, 2006-08-09 kell 12:56, kirjutas Simon Riggs:
>> Methinks it should be the Write pointer all of the time, since I can't
>> think of a valid reason for wanting to know where the Insert pointer is
>> *before* we've written to the xlog file. Having it be the Insert pointer
>> could lead to some errors.

> What is the difference ?

Insert points to the next byte to be written within the internal WAL
buffers.  The byte(s) preceding it haven't necessarily gotten out of
those buffers yet.  Write points to the end of what we've actually
written to the kernel, and there's also a Flush pointer that points
to the end of what we believe is down on disk.

Simon's point is that if you're going to use pg_current_xlog_location()
to control partial shipping of xlog files, you probably want to know
about the Write location, because that indicates the limit of what
is visible to an external process.

            regards, tom lane

Re: Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Wed, 2006-08-09 at 10:04 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Something Hannu wrote has just reminded me that
> > pg_current_xlog_location() returns the current Insert pointer rather
> > than the current Write pointer.
> > That would not be useful for streaming xlog records would it?
>
> Good point.
>
> > Methinks it should be the Write pointer all of the time, since I can't
> > think of a valid reason for wanting to know where the Insert pointer is
> > *before* we've written to the xlog file. Having it be the Insert pointer
> > could lead to some errors.
>
> However the start/stop_backup functions return the Insert pointer.
> I can see scripts getting confused if pg_current_xlog_location reports
> something less than what they just got from pg_stop_backup.
>
> Is there value in exposing both pointers?  (Maybe not, it'll just cause
> confusion probably.)
>
> Another option is to have pg_current_xlog_location force a write (but
> not fsync) as far as the Insert pointer it's about to return.  This
> would eliminate any issues about inconsistency between results, but
> perhaps there's too much performance penalty.
>
> I'm not necessarily against your suggestion, just trying to be sure
> we've thought about all the options.

Hannu's Use Case is to have the function called regularly from an
external polling agent. If we don't do the write it could be possible in
some circumstances for an XLogWrite to be delayed for some time if we
have both large wal_buffers and period of few commits, whereas we might
want to have the writes be fairly regular to support regular streaming.
So I do see there is a reasonable case for performing a write.

The way we have XLogWrite now, I would want that write to be a
"flexible" write, which would leave us in the position that calling
pg_current_xlog_location() would return something logically between the
Insert pointer and the immediately prior Write pointer (even though very
often there would be no difference at all).

I'm inclined to say we should add a flexible write (i.e.
XLogWrite(WriteRqst, true)) to pg_xlogfile_name_offset() and ignore the
esoteric difference between the return value and the Insert pointer.

I'm not clear how pg_xlogfile_name_offset() would ever return a
different answer to pg_stop_backup() - surely we just forcibly moved the
Insert and the Write pointer forwards together, so you'll get the same
answer from each.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com


Re: Forcing current WAL file to be archived

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Wed, 2006-08-09 at 10:04 -0400, Tom Lane wrote:
>> Another option is to have pg_current_xlog_location force a write (but
>> not fsync) as far as the Insert pointer it's about to return.  This
>> would eliminate any issues about inconsistency between results, but
>> perhaps there's too much performance penalty.

> Hannu's Use Case is to have the function called regularly from an
> external polling agent. If we don't do the write it could be possible in
> some circumstances for an XLogWrite to be delayed for some time if we
> have both large wal_buffers and period of few commits, whereas we might
> want to have the writes be fairly regular to support regular streaming.
> So I do see there is a reasonable case for performing a write.

Now the other side of that coin is that any commit forces a write anyway.
So any data that would hypothetically be written by pg_current_xlog_location
would be uncommitted data, which is maybe not so important to write yet?
Anyway, it's easy enough for a polling program to force a write if it
wants to.

> The way we have XLogWrite now, I would want that write to be a
> "flexible" write, which would leave us in the position that calling
> pg_current_xlog_location() would return something logically between the
> Insert pointer and the immediately prior Write pointer (even though very
> often there would be no difference at all).

I really don't want that; it makes it impossible to define what the
function is actually giving you.  It's not an "esoteric difference".

> I'm not clear how pg_xlogfile_name_offset() would ever return a
> different answer to pg_stop_backup() - surely we just forcibly moved the
> Insert and the Write pointer forwards together, so you'll get the same
> answer from each.

Hmm ... I guess given the just-added behavior of forcing an xlog switch,
that's probably true now, but it wasn't before.

Anyway, after further thought I've concluded that we really should
supply something that returns the Insert pointer, as this would be
useful for debugging and system-monitoring purposes.  It's clear however
that we also need something that returns the Write pointer, as that's
what's needed for partial log-shipping.  So my vote is for two
functions, both read-only (and hence not superuser-only).  Not sure
what to name them exactly.

            regards, tom lane

Re: Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Thu, 2006-08-10 at 08:57 -0400, Tom Lane wrote:

> Anyway, after further thought I've concluded that we really should
> supply something that returns the Insert pointer, as this would be
> useful for debugging and system-monitoring purposes.  It's clear however
> that we also need something that returns the Write pointer, as that's
> what's needed for partial log-shipping.

> So my vote is for two
> functions, both read-only (and hence not superuser-only).

Thats probably the most important consideration.

> Not sure
> what to name them exactly.

pg_current_xlog_location() - gives the write pointer i.e. the offset up
to which you can read() the xlog file and trust what it tells you

pg_current_wal_insert_pointer() - gives the insert pointer :-)

Named sufficiently differently that there is no confusion between them.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com


Re: [HACKERS] Forcing current WAL file to be archived

From
Jim Nasby
Date:
On Aug 10, 2006, at 7:57 AM, Tom Lane wrote:
> Anyway, after further thought I've concluded that we really should
> supply something that returns the Insert pointer, as this would be
> useful for debugging and system-monitoring purposes.  It's clear
> however
> that we also need something that returns the Write pointer, as that's
> what's needed for partial log-shipping.  So my vote is for two
> functions, both read-only (and hence not superuser-only).  Not sure
> what to name them exactly.

Dumb question... is there any need to be able to get those values in
sync (I'm assuming that in the time taken to call two separate
functions the value on the second function called could change from
what it was when the first function was called)? Should there be a
SRF that returns both values?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461



Re: Forcing current WAL file to be archived

From
Hannu Krosing
Date:
Ühel kenal päeval, K, 2006-08-09 kell 10:57, kirjutas Tom Lane:
> Hannu Krosing <hannu@skype.net> writes:
> > Ühel kenal päeval, K, 2006-08-09 kell 12:56, kirjutas Simon Riggs:
> >> Methinks it should be the Write pointer all of the time, since I can't
> >> think of a valid reason for wanting to know where the Insert pointer is
> >> *before* we've written to the xlog file. Having it be the Insert pointer
> >> could lead to some errors.
>
> > What is the difference ?
>
> Insert points to the next byte to be written within the internal WAL
> buffers.  The byte(s) preceding it haven't necessarily gotten out of
> those buffers yet.  Write points to the end of what we've actually
> written to the kernel,

I assume that it also points to the byte after what is written to
kernel, or is it tha last byte written ?

>  and there's also a Flush pointer that points
> to the end of what we believe is down on disk.
>
> Simon's point is that if you're going to use pg_current_xlog_location()
> to control partial shipping of xlog files, you probably want to know
> about the Write location, because that indicates the limit of what
> is visible to an external process.

Yes, that is what I need

>             regards, tom lane
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: Forcing current WAL file to be archived

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> Ühel kenal päeval, K, 2006-08-09 kell 10:57, kirjutas Tom Lane:
>> Insert points to the next byte to be written within the internal WAL
>> buffers.  The byte(s) preceding it haven't necessarily gotten out of
>> those buffers yet.  Write points to the end of what we've actually
>> written to the kernel,

> I assume that it also points to the byte after what is written to
> kernel, or is it tha last byte written ?

Right, it's really first-unwritten-byte for all three pointers.
The two newly added functions to convert WAL locations to filenames
use XLByteToPrevSeg(), so they should do the right thing here
(see comments in src/include/access/xlog_internal.h).

            regards, tom lane

Re: Forcing current WAL file to be archived

From
Hannu Krosing
Date:
Ühel kenal päeval, L, 2006-08-12 kell 10:59, kirjutas Tom Lane:
> Hannu Krosing <hannu@skype.net> writes:
> > Ühel kenal päeval, K, 2006-08-09 kell 10:57, kirjutas Tom Lane:
> >> Insert points to the next byte to be written within the internal WAL
> >> buffers.  The byte(s) preceding it haven't necessarily gotten out of
> >> those buffers yet.  Write points to the end of what we've actually
> >> written to the kernel,
>
> > I assume that it also points to the byte after what is written to
> > kernel, or is it tha last byte written ?
>
> Right, it's really first-unwritten-byte for all three pointers.
> The two newly added functions to convert WAL locations to filenames
> use XLByteToPrevSeg(), so they should do the right thing here
> (see comments in src/include/access/xlog_internal.h).

How do they behave exactly at the file boundary ?

That is will it point 1 byte past end of old file, or byte 0 of the new
one ?

>             regards, tom lane
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: Forcing current WAL file to be archived

From
Bruce Momjian
Date:
This issue is closed, right?

---------------------------------------------------------------------------

Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Something Hannu wrote has just reminded me that
> > pg_current_xlog_location() returns the current Insert pointer rather
> > than the current Write pointer.
> > That would not be useful for streaming xlog records would it?
>
> Good point.
>
> > Methinks it should be the Write pointer all of the time, since I can't
> > think of a valid reason for wanting to know where the Insert pointer is
> > *before* we've written to the xlog file. Having it be the Insert pointer
> > could lead to some errors.
>
> However the start/stop_backup functions return the Insert pointer.
> I can see scripts getting confused if pg_current_xlog_location reports
> something less than what they just got from pg_stop_backup.
>
> Is there value in exposing both pointers?  (Maybe not, it'll just cause
> confusion probably.)
>
> Another option is to have pg_current_xlog_location force a write (but
> not fsync) as far as the Insert pointer it's about to return.  This
> would eliminate any issues about inconsistency between results, but
> perhaps there's too much performance penalty.
>
> I'm not necessarily against your suggestion, just trying to be sure
> we've thought about all the options.
>
>             regards, tom lane

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Forcing current WAL file to be archived

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> This issue is closed, right?

We've agreed we need two functions, but it's not done yet.  Seems pretty
trivial though ...

            regards, tom lane

Re: [HACKERS] Forcing current WAL file to be archived

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > This issue is closed, right?
>
> We've agreed we need two functions, but it's not done yet.  Seems pretty
> trivial though ...

OK, that's what I was unclear about.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Sun, 2006-08-13 at 22:50 -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > This issue is closed, right?
>
> We've agreed we need two functions, but it's not done yet.  Seems pretty
> trivial though ...

Just back from India. I'll work on this tonight.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Fri, 2006-08-11 at 08:04 +0100, Simon Riggs wrote:
> On Thu, 2006-08-10 at 08:57 -0400, Tom Lane wrote:
>
> > Anyway, after further thought I've concluded that we really should
> > supply something that returns the Insert pointer, as this would be
> > useful for debugging and system-monitoring purposes.  It's clear however
> > that we also need something that returns the Write pointer, as that's
> > what's needed for partial log-shipping.
>
> > So my vote is for two
> > functions, both read-only (and hence not superuser-only).
>
> Thats probably the most important consideration.
>
> > Not sure
> > what to name them exactly.
>
> pg_current_xlog_location() - gives the write pointer i.e. the offset up
> to which you can read() the xlog file and trust what it tells you
>
> pg_current_wal_insert_pointer() - gives the insert pointer :-)
>
> Named sufficiently differently that there is no confusion between them.

Patch implementing the above attached.

Sample execution, with commentary at bottom.

postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
0/3A0824                      | 0/3A0824
(1 row)

postgres=# begin;insert into blah values (1);
BEGIN
INSERT 0 1

postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
0/3A085C                      | 0/3A0824
(1 row)

postgres=# insert into blah values (1);
INSERT 0 1

postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
0/3A0894                      | 0/3A0824
(1 row)

postgres=# commit;
COMMIT

postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
0/3A08BC                      | 0/3A08BC
(1 row)

postgres=# select pg_switch_xlog();
 pg_switch_xlog
----------------
 0/3A091C
(1 row)

postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
 pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
 0/1000020                     | 0/1000000
(1 row)

postgres=# select pg_xlogfile_name_offset(pg_current_xlog_location());
      pg_xlogfile_name_offset
-----------------------------------
 000000010000000000000000 16777216
(1 row)


The above shows that the Insert pointer is always ahead of or the same
as the Write pointer. After a log switch the current location is shown
as being in the next file, though the current filename still shows as
the previous filename (since there has been no write activity yet on the
new file) with an offset of 1 beyond EOF, to indicate that the whole
file may now be read.

pg_switch_xlog() shows the next-to-be written byte in the file that we
have just switched out of, or the current location if we just performed
a log switch. So the following sequence does *not* show there is an
error in the returned pointer values.

postgres=# insert into blah values (1);
INSERT 0 1
postgres=# select pg_xlogfile_name_offset(pg_current_xlog_location());
   pg_xlogfile_name_offset
------------------------------
 000000010000000000000001 372
(1 row)

postgres=# select pg_xlogfile_name_offset(pg_switch_xlog());
   pg_xlogfile_name_offset
------------------------------
 000000010000000000000001 400
(1 row)

...a log switch was performed

postgres=# select pg_xlogfile_name_offset(pg_switch_xlog());
      pg_xlogfile_name_offset
-----------------------------------
 000000010000000000000001 16777216
(1 row)

...a log switch was *not* performed, since we're already at EOF

I've not taken up Jim Nasby's suggestion to make this an SRF with
multiple return rows/columns since that much complexity isn't justified
IMHO.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Forcing current WAL file to be archived

From
Alvaro Herrera
Date:
Simon Riggs wrote:



> postgres=# select pg_xlogfile_name_offset(pg_switch_xlog());
>       pg_xlogfile_name_offset
> -----------------------------------
>  000000010000000000000001 16777216
> (1 row)

> I've not taken up Jim Nasby's suggestion to make this an SRF with
> multiple return rows/columns since that much complexity isn't justified
> IMHO.

Hum, but two columns here seem warranted, don't they?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Tue, 2006-08-15 at 11:10 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
>
>
>
> > postgres=# select pg_xlogfile_name_offset(pg_switch_xlog());
> >       pg_xlogfile_name_offset
> > -----------------------------------
> >  000000010000000000000001 16777216
> > (1 row)
>
> > I've not taken up Jim Nasby's suggestion to make this an SRF with
> > multiple return rows/columns since that much complexity isn't justified
> > IMHO.
>
> Hum, but two columns here seem warranted, don't they?

Maybe. People can write any function they like though, so I'm loathe to
agonize over this too much.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Forcing current WAL file to be archived

From
"Jim C. Nasby"
Date:
On Tue, Aug 15, 2006 at 06:07:12PM +0100, Simon Riggs wrote:
> On Tue, 2006-08-15 at 11:10 -0400, Alvaro Herrera wrote:
> > Simon Riggs wrote:
> >
> >
> >
> > > postgres=# select pg_xlogfile_name_offset(pg_switch_xlog());
> > >       pg_xlogfile_name_offset
> > > -----------------------------------
> > >  000000010000000000000001 16777216
> > > (1 row)
> >
> > > I've not taken up Jim Nasby's suggestion to make this an SRF with
> > > multiple return rows/columns since that much complexity isn't justified
> > > IMHO.
> >
> > Hum, but two columns here seem warranted, don't they?
>
> Maybe. People can write any function they like though, so I'm loathe to
> agonize over this too much.

True, but making people parse the output of a function to seperate the
two fields seems pretty silly. Is there some reason why
pg_xlogfile_name_offset shouldn't be a SRF, or use two out parameters?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: [HACKERS] Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Tue, 2006-08-15 at 12:13 -0500, Jim C. Nasby wrote:
> On Tue, Aug 15, 2006 at 06:07:12PM +0100, Simon Riggs wrote:
> > On Tue, 2006-08-15 at 11:10 -0400, Alvaro Herrera wrote:
> > > Simon Riggs wrote:
> > >
> > >
> > >
> > > > postgres=# select pg_xlogfile_name_offset(pg_switch_xlog());
> > > >       pg_xlogfile_name_offset
> > > > -----------------------------------
> > > >  000000010000000000000001 16777216
> > > > (1 row)
> > >
> > > > I've not taken up Jim Nasby's suggestion to make this an SRF with
> > > > multiple return rows/columns since that much complexity isn't justified
> > > > IMHO.
> > >
> > > Hum, but two columns here seem warranted, don't they?
> >
> > Maybe. People can write any function they like though, so I'm loathe to
> > agonize over this too much.
>
> True, but making people parse the output of a function to seperate the
> two fields seems pretty silly. Is there some reason why
> pg_xlogfile_name_offset shouldn't be a SRF, or use two out parameters?

If this makes a difference, then I'll do it. Does it make a difference?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Forcing current WAL file to be archived

From
"Jim C. Nasby"
Date:
On Tue, Aug 15, 2006 at 07:11:24PM +0100, Simon Riggs wrote:
> On Tue, 2006-08-15 at 12:13 -0500, Jim C. Nasby wrote:
> > On Tue, Aug 15, 2006 at 06:07:12PM +0100, Simon Riggs wrote:
> > > On Tue, 2006-08-15 at 11:10 -0400, Alvaro Herrera wrote:
> > > > Simon Riggs wrote:
> > > >
> > > >
> > > >
> > > > > postgres=# select pg_xlogfile_name_offset(pg_switch_xlog());
> > > > >       pg_xlogfile_name_offset
> > > > > -----------------------------------
> > > > >  000000010000000000000001 16777216
> > > > > (1 row)
> > > >
> > > > > I've not taken up Jim Nasby's suggestion to make this an SRF with
> > > > > multiple return rows/columns since that much complexity isn't justified
> > > > > IMHO.
> > > >
> > > > Hum, but two columns here seem warranted, don't they?
> > >
> > > Maybe. People can write any function they like though, so I'm loathe to
> > > agonize over this too much.
> >
> > True, but making people parse the output of a function to seperate the
> > two fields seems pretty silly. Is there some reason why
> > pg_xlogfile_name_offset shouldn't be a SRF, or use two out parameters?
>
> If this makes a difference, then I'll do it. Does it make a difference?

Well, many languages make it easier to grab data from seperate fields
than to parse out the contents of the field, and even on ones that don't
it's not like it's hard to combine the two fields together like
pg_xlogfile_name_offset() does right now. But more to the point, I can't
see any use case for combining them together... if you want both pieces
of info, you want them for different reasons, so cramming them together
doesn't make any sense to me.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: [HACKERS] Forcing current WAL file to be archived

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> True, but making people parse the output of a function to seperate the
> two fields seems pretty silly. Is there some reason why
> pg_xlogfile_name_offset shouldn't be a SRF, or use two out parameters?

It'd definitely be nicer that way, but given the current limitations of
bootstrap mode I see no non-kluge way to make a built-in function have
OUT parameters.  (Hint: array_in doesn't work in bootstrap mode.)
And the other alternatives like a predefined complex type seem even
more painful.  If you can think of a way to do this that has pain
not out of proportion to the gain, then I'm all for it ...

            regards, tom lane

Re: [HACKERS] Forcing current WAL file to be archived

From
Tom Lane
Date:
I wrote:
> It'd definitely be nicer that way, but given the current limitations of
> bootstrap mode I see no non-kluge way to make a built-in function have
> OUT parameters.  (Hint: array_in doesn't work in bootstrap mode.)

Actually, that turns out not to be so hard to fix as I thought.
array_in only needs to work for the array types used in the core system
tables, and bootstrap.c already has a hard-wired table of that info ...
we only have to make it available to array_in.   Which I just did.

So let's fix pg_xlogfile_name_offset() to have two OUT parameters
instead of returning a smushed-together string.

The reason I knew about the array_in problem was I'd tried to make some
other built-in function have OUT parameters ... I think it was probably
one of the ones that we currently have underneath system views.  It
might be worthwhile converting some or all of these to use OUT
parameters and not need the crutch of an AS clause in the view:

 pg_show_all_settings
 pg_lock_status
 pg_prepared_xact
 pg_stat_file
 pg_prepared_statement
 pg_cursor
 pg_timezonenames

            regards, tom lane

Re: [HACKERS] Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Tue, 2006-08-15 at 18:42 -0400, Tom Lane wrote:
> I wrote:
> > It'd definitely be nicer that way, but given the current limitations of
> > bootstrap mode I see no non-kluge way to make a built-in function have
> > OUT parameters.  (Hint: array_in doesn't work in bootstrap mode.)
>
> Actually, that turns out not to be so hard to fix as I thought.
> array_in only needs to work for the array types used in the core system
> tables, and bootstrap.c already has a hard-wired table of that info ...
> we only have to make it available to array_in.   Which I just did.

Cool; I'd noticed that this would have been the first such function.

> So let's fix pg_xlogfile_name_offset() to have two OUT parameters
> instead of returning a smushed-together string.

I'll do this, but I'm conscious that this is a cosmetic change.

I'm going on vacation very soon now, so test reports of the major
functionality would be greatly appreciated.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Forcing current WAL file to be archived

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Tue, 2006-08-15 at 18:42 -0400, Tom Lane wrote:
>> So let's fix pg_xlogfile_name_offset() to have two OUT parameters
>> instead of returning a smushed-together string.

> I'll do this, but I'm conscious that this is a cosmetic change.

Well, it's cosmetic, but it's also an API change, which means that this
is our only opportunity to get it right.  Once these functions are in a
release it will be too hard to change them.

            regards, tom lane

Re: [HACKERS] Forcing current WAL file to be archived

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> We want a single row output, with two columns, yes?
> Presumably:
>     xlogfilename    TEXT
>     offset        INTEGER

Sounds right to me.  int4 should be wide enough for practical xlog
segment sizes.

            regards, tom lane

Re: [HACKERS] Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Wed, 2006-08-16 at 08:51 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Tue, 2006-08-15 at 18:42 -0400, Tom Lane wrote:
> >> So let's fix pg_xlogfile_name_offset() to have two OUT parameters
> >> instead of returning a smushed-together string.
>
> > I'll do this, but I'm conscious that this is a cosmetic change.
>
> Well, it's cosmetic, but it's also an API change, which means that this
> is our only opportunity to get it right.  Once these functions are in a
> release it will be too hard to change them.

I've just started working this part, now I have the rest complete.

We want a single row output, with two columns, yes?
Presumably:
    xlogfilename    TEXT
    offset        INTEGER

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Wed, 2006-08-16 at 11:45 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > We want a single row output, with two columns, yes?
> > Presumably:
> >     xlogfilename    TEXT
> >     offset        INTEGER
>
> Sounds right to me.  int4 should be wide enough for practical xlog
> segment sizes.

Wise one: what should my pg_proc look like?

I'm the lucky man to break the "_null_ _null_ _null_" rule...

I've tried

DATA(insert OID = 2850 ( pg_xlogfile_name_offset    PGNSP PGUID 12 f f t f
i 1 2249 "25" "25 25 23" "i o o" _null_ pg_xlogfile_name_offset -
_null_ ));

but my initdb fails with

selecting default shared_buffers/max_fsm_pages ... 20000kB/1000000
creating configuration files ... ok
creating template1 database in a/base/1 ... FATAL:  cache lookup failed
for type 26
child process exited with exit code 1
initdb: removing data directory "a"

Thinking this might be an 0-referenced array issue, I also tried "24 24
22" in the above, but that bombs with the same error.

Currently, if I just leave it as it is, then initdb runs but then
hangs/bombs when you invokle the function (as you might expect).

As far as I can tell, the function isn't ever called correctly without
this... copied here for info.

/*
 * Compute an xlog file name and decimal byte offset given a WAL
location,
 * such as is returned by pg_stop_backup() or pg_xlog_switch().
 *
 * Note that a location exactly at a segment boundary is taken to be in
 * the previous segment.  This is usually the right thing, since the
 * expected usage is to determine which xlog file(s) are ready to
archive.
 */
Datum
pg_xlogfile_name_offset(PG_FUNCTION_ARGS)
{
    text       *location = PG_GETARG_TEXT_P(0);
    char       *locationstr;
    unsigned int uxlogid;
    unsigned int uxrecoff;
    uint32        xlogid;
    uint32        xlogseg;
    uint32        xrecoff;
    XLogRecPtr    locationpoint;
    char        xlogfilename[MAXFNAMELEN];
    TupleDesc   returnTupleDesc;
    Datum       values[2];
    bool        isnull[2];
    HeapTuple   returnHeapTuple;
    Datum        result;

    /*
     * Read input and parse
     */
    locationstr = DatumGetCString(DirectFunctionCall1(textout,
                                                PointerGetDatum(location)));

    if (sscanf(locationstr, "%X/%X", &uxlogid, &uxrecoff) != 2)
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("could not parse xlog location \"%s\"",
                        locationstr)));

    locationpoint.xlogid = uxlogid;
    locationpoint.xrecoff = uxrecoff;

    /* Construct a tuple descriptor for the result rows. */
    returnTupleDesc = CreateTemplateTupleDesc(2, false);
    TupleDescInitEntry(returnTupleDesc, (AttrNumber) 1, "xlogfilename",
                       TEXTOID, -1, 0);
    TupleDescInitEntry(returnTupleDesc, (AttrNumber) 2, "offset",
                       INT4OID, -1, 0);

    returnTupleDesc = BlessTupleDesc(returnTupleDesc);

    /*
     * xlogfilename
     */
    XLByteToPrevSeg(locationpoint, xlogid, xlogseg);

    XLogFileName(xlogfilename, ThisTimeLineID, xlogid, xlogseg);

    values[0] = PointerGetDatum(xlogfilename);
    isnull[0] = false;

    /*
     * offset
     */
    xrecoff = locationpoint.xrecoff - xlogseg * XLogSegSize;

    values[1] = UInt32GetDatum(xrecoff);
    isnull[1] = false;

    /*
     * Tuple jam: Having first prepared your Datums, then squash
together
     */
    returnHeapTuple = heap_form_tuple(returnTupleDesc, values, isnull);

    result = HeapTupleGetDatum(returnHeapTuple);

    PG_RETURN_DATUM(result);
}

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Forcing current WAL file to be archived

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> but my initdb fails with

> creating template1 database in a/base/1 ... FATAL:  cache lookup failed
> for type 26

Um ... when did you last "cvs update"?  That was the behavior up till I
fixed array_in for bootstrap mode, yesterday afternoon ...

            regards, tom lane

Re: [HACKERS] Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Wed, 2006-08-16 at 16:51 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > but my initdb fails with
>
> > creating template1 database in a/base/1 ... FATAL:  cache lookup failed
> > for type 26
>
> Um ... when did you last "cvs update"?  That was the behavior up till I
> fixed array_in for bootstrap mode, yesterday afternoon ...

Sounds like it must be so.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Forcing current WAL file to be archived

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Wise one: what should my pg_proc look like?

> DATA(insert OID = 2850 ( pg_xlogfile_name_offset    PGNSP PGUID 12 f f t f
> i 1 2249 "25" "25 25 23" "i o o" _null_ pg_xlogfile_name_offset -
> _null_ ));

Oh, as far as that goes, the array columns need to look like something
array_in will eat; and you should provide parameter names so that
"select * from" will produce useful headings.  So probably more like

DATA(insert OID = 2850 ( pg_xlogfile_name_offset    PGNSP PGUID 12 f f t f i 1 2249 "25" "{25,25,23}" "{i,o,o}"
"{wal_offset,filename,offset}"pg_xlogfile_name_offset - _null_ )); 

I think you can get away without inner quotes (ie, not "{'i','o','o'}")
as long as you aren't using anything weird like spaces in a parameter name.

            regards, tom lane

Re: [HACKERS] Forcing current WAL file to be archived

From
Simon Riggs
Date:
On Wed, 2006-08-16 at 17:09 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Wise one: what should my pg_proc look like?
>
> > DATA(insert OID = 2850 ( pg_xlogfile_name_offset    PGNSP PGUID 12 f f t f
> > i 1 2249 "25" "25 25 23" "i o o" _null_ pg_xlogfile_name_offset -
> > _null_ ));
>
> Oh, as far as that goes, the array columns need to look like something
> array_in will eat; and you should provide parameter names so that
> "select * from" will produce useful headings.  So probably more like
>
> DATA(insert OID = 2850 ( pg_xlogfile_name_offset    PGNSP PGUID 12 f f t f i 1 2249 "25" "{25,25,23}" "{i,o,o}"
"{wal_offset,filename,offset}"pg_xlogfile_name_offset - _null_ )); 
>
> I think you can get away without inner quotes (ie, not "{'i','o','o'}")
> as long as you aren't using anything weird like spaces in a parameter name.

archive_timeout++.patch re-submitted on other thread, now including
these changes also.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com