Thread: Documentation for initdb option --waldir
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/17/app-initdb.html Description: Hi all, During troubleshooting with a customer we found that it might help to know that initdb will create a symbolic link from data/pg_wal to the custom directory with the option --waldir. https://www.postgresql.org/docs/current/app-initdb.html For instance: This option specifies the directory where the write-ahead log should be stored. A symbolic link is created from pg_wal in the main data directory to the specified location. (Like in this documentation https://pgpedia.info/p/pg_wal.html, but there is a typo or missing word in the initdb section: "A symbolic link from pg_wal in the main data directory to the specified location.") And maybe also here a hint, that moving the pg_wal directory to another location and creating a symbolic is what initdb --waldir does. https://www.postgresql.org/docs/current/wal-internals.html For instance: It is advantageous if the WAL is located on a different disk from the main database files. This can be achieved by moving the pg_wal directory to another location (while the server is shut down, of course) and creating a symbolic link from the original location in the main data directory to the new location. This is also what the --waldir initdb option does. Thank you very much and kind regards Theo
On Wed, Mar 26, 2025 at 2:35 AM PG Doc comments form <noreply@postgresql.org> wrote:
The following documentation comment has been logged on the website:
Page: https://www.postgresql.org/docs/17/app-initdb.html
Description:
Hi all,
During troubleshooting with a customer we found that it might help to know
that initdb will create a symbolic link from data/pg_wal to the custom
directory with the option --waldir.
Thank you for the observation. This indeed seems like something we should cover in a bit more detail. How does the proposed patch (attached) look?
The main initdb change, added to Notes, is:
The pg_wal subdirectory will always exist within the data directory. This is where PostgreSQL sends its write-ahead log (WAL) files. Specifying the --waldir option turns this subdirectory entry into a symbolic link. In general, this is only useful if the remote location is on a different physical device. An existing directory must be empty and should not be a mount point. The directory will be created (including missing parents) if necessary.
David J.
Attachment
On Wed, 2025-03-26 at 17:34 -0700, David G. Johnston wrote: > + <para> > + The <filename>pg_wal</filename> subdirectory will always exist within the > + data directory. This is where <productname>PostgreSQL</productname> > + sends its write-ahead log (<acronym>WAL</acronym>) files. > + Specifying the <option>--waldir</option> option turns this subdirectory entry > + into a symbolic link. In general, this is only useful if the remote location > + is on a different physical device. An existing directory must be empty and > + should not be a mount point. The directory will be created > + (including missing parents) if necessary. > + </para> I think that it is very valuable to have WAL on a different file system on the same storage device. The idea is that growing data files cannot exhaust the space available for WAL. How about this: There is always a <filename>pg_wal</filename> within the data directory. By default, it is a directory where <productname>PostgreSQL</productname> places its write-ahead log (<acronym>WAL</acronym>) segment files. If you create the <acronym>WAL</acronym> location somewhere else using the option <option>--waldir</option>, <filename>pg_wal</filename> will be created as a symbolic link pointing to that <acronym>WAL</acronym> location. If the directory already exists, it must be empty and should not be a mount point. The directory will be created (including missing parents) if necessary. Yours, Laurenz Albe
Both look great to me! I thought the advantage of having different devices is also to easy I/O load. But I guess you haveto decide for yourself what is the best strategy. For me it was important to know what initdb does with ---waldir. Andthis is nicely explained here. One more little thing, I stumbled across " should not be a mount point", sounds thirst like it was not good to use the mountpoint. Maybe: " should not be the mount point itself but a directory in the mount point". It might be only me, you diced. Thank you very much! Regards, Theo -----Ursprüngliche Nachricht----- Von: Laurenz Albe <laurenz.albe@cybertec.at> Gesendet: Donnerstag, 27. März 2025 11:32 An: David G. Johnston <david.g.johnston@gmail.com>; Theodor Herlo <t.herlo@proventa.de>; pgsql-docs@lists.postgresql.org Betreff: Re: Documentation for initdb option --waldir [Sie erhalten nicht häufig E-Mails von laurenz.albe@cybertec.at. Weitere Informationen, warum dies wichtig ist, finden Sieunter https://aka.ms/LearnAboutSenderIdentification ] On Wed, 2025-03-26 at 17:34 -0700, David G. Johnston wrote: > + <para> > + The <filename>pg_wal</filename> subdirectory will always exist within the > + data directory. This is where <productname>PostgreSQL</productname> > + sends its write-ahead log (<acronym>WAL</acronym>) files. > + Specifying the <option>--waldir</option> option turns this subdirectory entry > + into a symbolic link. In general, this is only useful if the remote location > + is on a different physical device. An existing directory must be empty and > + should not be a mount point. The directory will be created > + (including missing parents) if necessary. > + </para> I think that it is very valuable to have WAL on a different file system on the same storage device. The idea is that growingdata files cannot exhaust the space available for WAL. How about this: There is always a <filename>pg_wal</filename> within the data directory. By default, it is a directory where <productname>PostgreSQL</productname> places its write-ahead log (<acronym>WAL</acronym>) segment files. If you create the <acronym>WAL</acronym> location somewhere else using the option <option>--waldir</option>, <filename>pg_wal</filename> will be created as a symbolic link pointing to that <acronym>WAL</acronym> location. If the directory already exists, it must be empty and should not be a mount point. The directory will be created (including missing parents) if necessary. Yours, Laurenz Albe
On Thu, Mar 27, 2025, at 8:40 AM, Theodor Herlo wrote:
Both look great to me! I thought the advantage of having different devices is also to easy I/O load. But I guess you have to decide for yourself what is the best strategy. For me it was important to know what initdb does with ---waldir. And this is nicely explained here.One more little thing, I stumbled across " should not be a mount point", sounds thirst like it was not good to use the mount point. Maybe: " should not be the mount point itself but a directory in the mount point". It might be only me, you diced.
Laurenz suggestion looks good to me with some suggestions. I agree with
Theodor's addition but I would use the same sentence from the source code that
says 'subdirectory under the mount point'. I would also consider and/or modify
pg_basebackup documentation that has a similar option. My suggestion is that
waldir option in both tools has similar sentences. The term 'WAL location' is
not appropriate because this terminology is already used to indicate the LSN. I
would replace it with WAL directory.
--
Euler Taveira
Version 2 Attached
There being two places now (plus doing it manually) I decided to write this material in the WAL chapter as opposed to application-specific Notes. A new section seemed warranted.
I expanded upon the material regarding using different file systems and disks.
I would like to add a similar "why" to the mount point recommendation but don't know what that would be. Suggestions welcomed.
David J.
Attachment
On Thu, Mar 27, 2025 at 12:06 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > Version 2 Attached > > - This option specifies the directory where the write-ahead log > - should be stored. > + This option specifies the directory in which to store write-ahead log files. > + See <xref linkend="wal-internals-relocation"/> for more information. To be honest, I think the proposed text is a little verbose, and we could cut it right here. As an example, we don't go into how the directory must be empty or chnmod settings for --pgdata. I'd also edit the pg_basebackup changes to match if it were me. > + the <option>--waldir</option> option. On an existing, but not running, > + cluster use operating system commands to move the contents of the > + <filename>pg_wal</filename> directory to the new location, remove the > + empty directory, and create the symbolic link named <filename>pg_wal</filename> > + pointing to the new location. I found this to be awkward to read, and not as significant a warning as the existing text... perhaps "For an existing cluster, once the cluster has been shut down, you can use operating system commands... " > There being two places now (plus doing it manually) I decided to write this material in the WAL chapter as opposed to application-specificNotes. A new section seemed warranted. > +1 > I expanded upon the material regarding using different file systems and disks. > > I would like to add a similar "why" to the mount point recommendation but don't know what that would be. Suggestions welcomed. > I'm not sure I follow what you are asking for... but a non-performance reason to use a seperate mount point for pg_wal, even if the underlying storage is the same, would be for something like using filesystem snapshots to grab contents of the data directory without grabbing wal (which can be handled separately). Robert Treat https://xzilla.net
On Thu, Mar 27, 2025 at 2:28 PM Robert Treat <rob@xzilla.net> wrote:
On Thu, Mar 27, 2025 at 12:06 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> Version 2 Attached
>
> - This option specifies the directory where the write-ahead log
> - should be stored.
> + This option specifies the directory in which to store write-ahead log files.
> + See <xref linkend="wal-internals-relocation"/> for more information.
To be honest, I think the proposed text is a little verbose, and we
could cut it right here. As an example, we don't go into how the
directory must be empty or chnmod settings for --pgdata. I'd also edit
the pg_basebackup changes to match if it were me.
Don't really see how completeness hurts here, but also not married to it.
> + the <option>--waldir</option> option. On an existing, but not running,
> + cluster use operating system commands to move the contents of the
> + <filename>pg_wal</filename> directory to the new location, remove the
> + empty directory, and create the symbolic link named <filename>pg_wal</filename>
> + pointing to the new location.
I found this to be awkward to read, and not as significant a warning
as the existing text... perhaps "For an existing cluster, once the
cluster has been shut down, you can use operating system commands... "
I'll probably put it back into parentheses to get the desired emphasis. That did nag at me too as I wrote it.
> There being two places now (plus doing it manually) I decided to write this material in the WAL chapter as opposed to application-specific Notes. A new section seemed warranted.
>
+1
> I expanded upon the material regarding using different file systems and disks.
>
> I would like to add a similar "why" to the mount point recommendation but don't know what that would be. Suggestions welcomed.
>
I'm not sure I follow what you are asking for... but a non-performance
reason to use a seperate mount point for pg_wal, even if the
underlying storage is the same, would be for something like using
filesystem snapshots to grab contents of the data directory without
grabbing wal (which can be handled separately).
If I mount the filesystem on disk2 to: /mnt/disk2
Why do I need to create "/mnt/disk2/wal_files/" and point there instead of: "/mnt/disk2/"?
David J.
On Thu, Mar 27, 2025 at 5:57 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > On Thu, Mar 27, 2025 at 2:28 PM Robert Treat <rob@xzilla.net> wrote: >> On Thu, Mar 27, 2025 at 12:06 PM David G. Johnston >> <david.g.johnston@gmail.com> wrote: >> > I expanded upon the material regarding using different file systems and disks. >> > >> > I would like to add a similar "why" to the mount point recommendation but don't know what that would be. Suggestionswelcomed. >> > >> >> I'm not sure I follow what you are asking for... but a non-performance >> reason to use a seperate mount point for pg_wal, even if the >> underlying storage is the same, would be for something like using >> filesystem snapshots to grab contents of the data directory without >> grabbing wal (which can be handled separately). >> > > If I mount the filesystem on disk2 to: /mnt/disk2 > Why do I need to create "/mnt/disk2/wal_files/" and point there instead of: "/mnt/disk2/"? > My immediate response to this was "because Postgres won't let you" which seemed unhelpful, and that I couldn't remember why was pretty unsatisfying, so I dug around in the source which was unhelpful but eventually came across this from https://www.postgresql.org/docs/devel/creating-cluster.html#CREATING-CLUSTER-MOUNT-POINTS "Best practice is to create a directory within the mount-point directory that is owned by the PostgreSQL user, and then create the data directory within that. This avoids permissions problems,..." Which I do remember having tried to do it directly and the OS complaining that my mount point wasn't owned by root and/or Postgres complaining that the xlog dir wasn't owned by Postgres, so I think this advice probably still holds. Robert Treat https://xzilla.net
On Saturday, March 29, 2025, Robert Treat <rob@xzilla.net> wrote:
On Thu, Mar 27, 2025 at 5:57 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Thu, Mar 27, 2025 at 2:28 PM Robert Treat <rob@xzilla.net> wrote:
>> On Thu, Mar 27, 2025 at 12:06 PM David G. Johnston
>> <david.g.johnston@gmail.com> wrote:
>> > I expanded upon the material regarding using different file systems and disks.
>> >
>> > I would like to add a similar "why" to the mount point recommendation but don't know what that would be. Suggestions welcomed.
>> >
>>
>
> If I mount the filesystem on disk2 to: /mnt/disk2
> Why do I need to create "/mnt/disk2/wal_files/" and point there instead of: "/mnt/disk2/"?
>
eventually came across this from
https://www.postgresql.org/docs/devel/creating-cluster. html#CREATING-CLUSTER-MOUNT- POINTS
"Best practice is to create a directory within the mount-point
directory that is owned by the PostgreSQL user, and then create the
data directory within that. This avoids permissions problems,..."
Which I do remember having tried to do it directly and the OS
complaining that my mount point wasn't owned by root and/or Postgres
complaining that the xlog dir wasn't owned by Postgres, so I think
this advice probably still holds.
Thank you, I can definitely work that in and it makes sense.
On the topic of verbosity, I found the wording for —pgdata in pg_basebackup to support the more complete description.
I believe that at least documenting external side-effects should be required. I’m less convinced that pre-conditions that will be checked by the application need to be listed. But for now I’m going to copy pg_basebackup as my example and at some point might get to doing a survey and proposing a new standard wording for —waldir and —pgdata descriptions regarding creation.
The comment regarding absolute paths will remain unwritten.
David J.
On Sat, Mar 29, 2025 at 10:58 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Saturday, March 29, 2025, Robert Treat <rob@xzilla.net> wrote: >> >> On Thu, Mar 27, 2025 at 5:57 PM David G. Johnston >> <david.g.johnston@gmail.com> wrote: >> > On Thu, Mar 27, 2025 at 2:28 PM Robert Treat <rob@xzilla.net> wrote: >> >> On Thu, Mar 27, 2025 at 12:06 PM David G. Johnston >> >> <david.g.johnston@gmail.com> wrote: >> >> > I expanded upon the material regarding using different file systems and disks. >> >> > >> >> > I would like to add a similar "why" to the mount point recommendation but don't know what that would be. Suggestionswelcomed. >> >> > >> >> >> > >> > If I mount the filesystem on disk2 to: /mnt/disk2 >> > Why do I need to create "/mnt/disk2/wal_files/" and point there instead of: "/mnt/disk2/"? >> > >> >> eventually came across this from >> https://www.postgresql.org/docs/devel/creating-cluster.html#CREATING-CLUSTER-MOUNT-POINTS >> >> "Best practice is to create a directory within the mount-point >> directory that is owned by the PostgreSQL user, and then create the >> data directory within that. This avoids permissions problems,..." >> >> Which I do remember having tried to do it directly and the OS >> complaining that my mount point wasn't owned by root and/or Postgres >> complaining that the xlog dir wasn't owned by Postgres, so I think >> this advice probably still holds. >> > > Thank you, I can definitely work that in and it makes sense. > > On the topic of verbosity, I found the wording for —pgdata in pg_basebackup to support the more complete description. > > https://www.postgresql.org/docs/current/app-pgbasebackup.html > > I believe that at least documenting external side-effects should be required. I’m less convinced that pre-conditions thatwill be checked by the application need to be listed. But for now I’m going to copy pg_basebackup as my example andat some point might get to doing a survey and proposing a new standard wording for —waldir and —pgdata descriptions regardingcreation. > > The comment regarding absolute paths will remain unwritten. > Works for me. Robert Treat https://xzilla.net
On Sat Mar 29, 2025 at 7:26 AM PDT, Robert Treat wrote: > On Thu, Mar 27, 2025 at 5:57 PM David G. Johnston > <david.g.johnston@gmail.com> wrote: >> On Thu, Mar 27, 2025 at 2:28 PM Robert Treat <rob@xzilla.net> wrote: >>> On Thu, Mar 27, 2025 at 12:06 PM David G. Johnston >>> <david.g.johnston@gmail.com> wrote: >>> > I expanded upon the material regarding using different file systems and disks. >>> > >>> > I would like to add a similar "why" to the mount point recommendation but don't know what that would be. Suggestionswelcomed. >>> > >>> >>> I'm not sure I follow what you are asking for... but a non-performance >>> reason to use a seperate mount point for pg_wal, even if the >>> underlying storage is the same, would be for something like using >>> filesystem snapshots to grab contents of the data directory without >>> grabbing wal (which can be handled separately). >>> >> >> If I mount the filesystem on disk2 to: /mnt/disk2 >> Why do I need to create "/mnt/disk2/wal_files/" and point there instead of: "/mnt/disk2/"? >> > > My immediate response to this was "because Postgres won't let you" > which seemed unhelpful, and that I couldn't remember why was pretty > unsatisfying, so I dug around in the source which was unhelpful but > eventually came across this from > https://www.postgresql.org/docs/devel/creating-cluster.html#CREATING-CLUSTER-MOUNT-POINTS > > "Best practice is to create a directory within the mount-point > directory that is owned by the PostgreSQL user, and then create the > data directory within that. This avoids permissions problems,..." > > Which I do remember having tried to do it directly and the OS > complaining that my mount point wasn't owned by root and/or Postgres > complaining that the xlog dir wasn't owned by Postgres, so I think > this advice probably still holds. I remembered an old advice related to lost+found being placed on mount points, but couldn't remember the specific, so searched and found Tom's advice for data directory. I feel that the advice would apply to WAL directory, as well. From https://www.postgresql.org/message-id/12168.1312921709%40sss.pgh.pa.us A couple of reasons for [not placing data diretory at a mount point] are: 1. Mount-point directories should be owned by root, never by an unprivileged account such as postgres. IIRC there are good security reasons for this practice, though I don't recall all the details right now. 2. Keeping the data directory one level down ensures a clean failure if the disk is for some reason not mounted when Postgres starts, or goes offline later. Otherwise, particularly if you're using a start script that will automatically try an initdb, you might end up with some data files on the / volume underneath where the mount point should have been. This is sure to lead to serious problems when the disk does come back online. There's at least one horror story in our archives from someone who had an auto-initdb startup script and one day his NFS disk was a few seconds slow to mount... Best regards, Gurjeet http://Gurje.et
On Sat, 2025-03-29 at 10:26 -0400, Robert Treat wrote: > "Best practice is to create a directory within the mount-point > directory that is owned by the PostgreSQL user, and then create the > data directory within that. This avoids permissions problems,..." > > Which I do remember having tried to do it directly and the OS > complaining that my mount point wasn't owned by root and/or Postgres > complaining that the xlog dir wasn't owned by Postgres, so I think > this advice probably still holds. The root directory of a file system, which will be mounted at the mount point, should be owned by root. As far as I know, the reason is that it contains a "lost+found" directory, which is used by file system checks to put orphaned files. If a non-root user owned the mount point, the user could remove that directory, which would be a bad idea. On the other hand, PostgreSQL will protest if the directory isn't empty... Yours, Laurenz Albe
On Sat Mar 29, 2025 at 9:26 AM PDT, Laurenz Albe wrote: > On the other hand, PostgreSQL will protest if the directory isn't > empty... Specifically, initdb will complain if the directory it's trying to initialize is not empty. Best regards, Gurjeet http://Gurje.et
On Saturday, March 29, 2025, Gurjeet Singh <gurjeet@singh.im> wrote:
On Sat Mar 29, 2025 at 9:26 AM PDT, Laurenz Albe wrote:
> On the other hand, PostgreSQL will protest if the directory isn't
> empty...
Specifically, initdb will complain if the directory it's trying to
initialize is not empty.
For this patch I intend to just point the reader to the “secondary file systems” section mentioned above for more considerations. We can improve that as desired separately.
David J.