Thread: What does "[backends] should seldom or never need to wait for a write to occur" mean?

What does "[backends] should seldom or never need to wait for a write to occur" mean?

From
PG Doc comments form
Date:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/13/runtime-config-resource.html
Description:

https://www.postgresql.org/docs/13/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-BACKGROUND-WRITER

says:

"There is a separate server process called the background writer, whose
function is to issue writes of “dirty” (new or modified) shared buffers. It
writes shared buffers so server processes handling user queries seldom or
never need to wait for a write to occur."

It's not clear what "wait for a write to occur" means: a write() syscall or
an fsync() syscall?

Re: What does "[backends] should seldom or never need to wait for a write to occur" mean?

From
"David G. Johnston"
Date:
On Thu, Oct 29, 2020 at 3:24 PM PG Doc comments form <noreply@postgresql.org> wrote:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/13/runtime-config-resource.html
Description:

https://www.postgresql.org/docs/13/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-BACKGROUND-WRITER

says:

"There is a separate server process called the background writer, whose
function is to issue writes of “dirty” (new or modified) shared buffers. It
writes shared buffers so server processes handling user queries seldom or
never need to wait for a write to occur."

It's not clear what "wait for a write to occur" means: a write() syscall or
an fsync() syscall?

Probably neither...think more abstract/general.

David J.

On Fri, Oct 30, 2020 at 11:24 AM PG Doc comments form
<noreply@postgresql.org> wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/13/runtime-config-resource.html
> Description:
>
> https://www.postgresql.org/docs/13/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-BACKGROUND-WRITER
>
> says:
>
> "There is a separate server process called the background writer, whose
> function is to issue writes of “dirty” (new or modified) shared buffers. It
> writes shared buffers so server processes handling user queries seldom or
> never need to wait for a write to occur."
>
> It's not clear what "wait for a write to occur" means: a write() syscall or
> an fsync() syscall?

It means pwrite().  That could block if your kernel cache is swamped,
but hopefully it just copies the data into the kernel and returns.
There is an fsync() call, but it's usually queued up for handling by
the checkpointer process some time later.



Hi all,

Thanks Thomas.

When the bgwriter flushes (cleans) a dirty Postgres buffer, it generates a write() syscall of its own, which I think must increase the number of dirty cache buffers in the Linux kernel (temporarily, until it actually flushes those cache buffers to disk). Therefore it temporarily increases the risk of a write stall (in any process, not just Postgres backends), is that correct?

I suppose that if dirty buffers are being cleaned regularly, then it reduces the risk that (1) a Postgres backend which is writing (dirtying buffers) suddenly needs an empty buffer when there are no clean buffers to evict, so it needs to flush a dirty one and (2) the resulting write() syscall would take the kernel over its background dirty limit, so the kernel must flush it immediately, and make the backend wait. By that mechanism I can see that it might reduce the chance of backends having to wait, but by writing more in general (as above) it could also increase it.

So when it says "It writes shared buffers so server processes handling user queries seldom or never need to wait for a write to occur", is that really justified, or is that sentence incorrect and we should remove it? Or have I missed something?

Thanks, Chris.

On Sun, 1 Nov 2020 at 21:00, Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Oct 30, 2020 at 11:24 AM PG Doc comments form
<noreply@postgresql.org> wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/13/runtime-config-resource.html
> Description:
>
> https://www.postgresql.org/docs/13/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-BACKGROUND-WRITER
>
> says:
>
> "There is a separate server process called the background writer, whose
> function is to issue writes of “dirty” (new or modified) shared buffers. It
> writes shared buffers so server processes handling user queries seldom or
> never need to wait for a write to occur."
>
> It's not clear what "wait for a write to occur" means: a write() syscall or
> an fsync() syscall?

It means pwrite().  That could block if your kernel cache is swamped,
but hopefully it just copies the data into the kernel and returns.
There is an fsync() call, but it's usually queued up for handling by
the checkpointer process some time later.
Hi all,

I did some more research and found this explanation in a presentation by 2ndQuadrant:

When a process wants a buffer, it asks BufferAlloc for the file/block. If the block is already cached, it gets pinned and then returned. Otherwise, a new buffer must be found to hold this data. If there are no buffers free (there usually aren’t) BufferAlloc selects a buffer to evict to make space for the new one. If that page is dirty, it is written out to disk. This can cause the backend trying to allocate that buffer to block as it waits for that write I/O to complete.

So it seems that both reads and writes can potentially have to wait for I/O. And the bgwriter reduces the risk of hitting a dirty page and needing to write it before evicting.

So perhaps the documentation should say:

"There is a separate server process called the background writer, whose function is to issue writes of “dirty” (new or modified) shared buffers. This reduces the chances that a backend needing an empty buffer must write a dirty one back to disk before evicting it."

Thanks, Chris.

On Mon, 2 Nov 2020 at 12:38, Chris Wilson <chris+google@qwirx.com> wrote:
Hi all,

Thanks Thomas.

When the bgwriter flushes (cleans) a dirty Postgres buffer, it generates a write() syscall of its own, which I think must increase the number of dirty cache buffers in the Linux kernel (temporarily, until it actually flushes those cache buffers to disk). Therefore it temporarily increases the risk of a write stall (in any process, not just Postgres backends), is that correct?

I suppose that if dirty buffers are being cleaned regularly, then it reduces the risk that (1) a Postgres backend which is writing (dirtying buffers) suddenly needs an empty buffer when there are no clean buffers to evict, so it needs to flush a dirty one and (2) the resulting write() syscall would take the kernel over its background dirty limit, so the kernel must flush it immediately, and make the backend wait. By that mechanism I can see that it might reduce the chance of backends having to wait, but by writing more in general (as above) it could also increase it.

So when it says "It writes shared buffers so server processes handling user queries seldom or never need to wait for a write to occur", is that really justified, or is that sentence incorrect and we should remove it? Or have I missed something?

Thanks, Chris.

On Sun, 1 Nov 2020 at 21:00, Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Oct 30, 2020 at 11:24 AM PG Doc comments form
<noreply@postgresql.org> wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/13/runtime-config-resource.html
> Description:
>
> https://www.postgresql.org/docs/13/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-BACKGROUND-WRITER
>
> says:
>
> "There is a separate server process called the background writer, whose
> function is to issue writes of “dirty” (new or modified) shared buffers. It
> writes shared buffers so server processes handling user queries seldom or
> never need to wait for a write to occur."
>
> It's not clear what "wait for a write to occur" means: a write() syscall or
> an fsync() syscall?

It means pwrite().  That could block if your kernel cache is swamped,
but hopefully it just copies the data into the kernel and returns.
There is an fsync() call, but it's usually queued up for handling by
the checkpointer process some time later.
On Tue, Nov  3, 2020 at 06:11:21PM +0000, Chris Wilson wrote:
> Hi all,
> 
> I did some more research and found this explanation in a presentation by
> 2ndQuadrant:
> 
> 
>     When a process wants a buffer, it asks BufferAlloc for the file/block. If
>     the block is already cached, it gets pinned and then returned. Otherwise, a
>     new buffer must be found to hold this data. If there are no buffers free
>     (there usually aren’t) BufferAlloc selects a buffer to evict to make space
>     for the new one. If that page is dirty, it is written out to disk. This can
>     cause the backend trying to allocate that buffer to block as it waits for
>     that write I/O to complete.
> 
> 
> So it seems that both reads and writes can potentially have to wait for I/O.
> And the bgwriter reduces the risk of hitting a dirty page and needing to write
> it before evicting.
> 
> So perhaps the documentation should say:
> 
> "There is a separate server process called the background writer, whose
> function is to issue writes of “dirty” (new or modified) shared buffers.
> This reduces the chances that a backend needing an empty buffer must write a
> dirty one back to disk before evicting it."

I think this would be a step backward.  The point is to say that writes
rarely happen in the foreground, not to explain when writes do happen. 
With your wording, there could be other cases where writes happen in the
foreground, and the point is they rarely happen.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




On Mon, Nov  9, 2020 at 08:36:32PM -0500, Bruce Momjian wrote:
> On Tue, Nov  3, 2020 at 06:11:21PM +0000, Chris Wilson wrote:
> > Hi all,
> > 
> > I did some more research and found this explanation in a presentation by
> > 2ndQuadrant:
> > 
> > 
> >     When a process wants a buffer, it asks BufferAlloc for the file/block. If
> >     the block is already cached, it gets pinned and then returned. Otherwise, a
> >     new buffer must be found to hold this data. If there are no buffers free
> >     (there usually aren’t) BufferAlloc selects a buffer to evict to make space
> >     for the new one. If that page is dirty, it is written out to disk. This can
> >     cause the backend trying to allocate that buffer to block as it waits for
> >     that write I/O to complete.
> > 
> > 
> > So it seems that both reads and writes can potentially have to wait for I/O.
> > And the bgwriter reduces the risk of hitting a dirty page and needing to write
> > it before evicting.
> > 
> > So perhaps the documentation should say:
> > 
> > "There is a separate server process called the background writer, whose
> > function is to issue writes of “dirty” (new or modified) shared buffers.
> > This reduces the chances that a backend needing an empty buffer must write a
> > dirty one back to disk before evicting it."
> 
> I think this would be a step backward.  The point is to say that writes
> rarely happen in the foreground, not to explain when writes do happen. 
> With your wording, there could be other cases where writes happen in the
> foreground, and the point is they rarely happen.

I thought some more about this, and it seems the problem really is that
"wait for a write" is unclear, as you said.  This patch fixes it by
referencing "wait for such writes".

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment
Hi Bruce,

Thanks, but I think it's more ambiguous than that. I was trying to discover how the bgwriter works in order to tune it successfully (to identify the correct tuning objectives). It's not documented anywhere else in the official docs that I can find, so this is the canonical place to learn about it. Quoting again for context:

There is a separate server process called the background writer, whose function is to issue writes of “dirty” (new or modified) shared buffers. It writes shared buffers so server processes handling user queries seldom or never need to wait for a write to occur.

These sentences are (I think) supposed to explain why we have a bgwriter at all, and how it works (why it does what it does) but they fail miserably due to being unclear and lacking vital information.

The sentence as it stands is ambiguous because it says "need to wait for a write to occur". The ambiguities are:
  • "need to wait", i.e. not just that a write will occur, but that it will be slow.
  • This could also be interpreted conditionally, as in "if the backend needs to write, then it will be slow."
  • "write to occur": who will do the writing? Does the backend need to wait for the bgwriter or someone else to write back the page?
So there are at least four possible readings of this (of what will happen if the bgwriter is not working well), only one of which is correct:
  • backends must do the write() themselves (increasing buffers_backend; I think this is the correct interpretation).
  • backends must do the fsync() themselves (i.e. wait for the bytes to hit the disk, increasing buffers_backend_fsync).
  • if backends must write, then the writes will be slow (we know that this can happen, because the next sentence says that the bgwriter increases net overall I/O load, but we don't measure write stalls in Postgres itself).
  • backends must wait for another process to do the write (this doesn't actually happen, so of course there are no stats for it in Postgres).
This is without even saying that the write in question (by the backend) is to clean a dirty buffer. One could perhaps guess that from the context, but one could also make incorrect assumptions (as listed above). I think the official documentation should be clear and plain and helpful (explanatory), and it wouldn't take much to achieve that, just a few words.

I don't understand why you say that "The point is to say that writes rarely happen in the foreground. With your wording, there could be other cases where writes happen in the foreground, and the point is they rarely happen." We are clearly in the context of explaining what the bgwriter does and why (or rather trying to explain, and failing). Although backends could of course write in other circumstances, the bgwriter is not expected to have any direct effect on that (and might even slow them down by increasing the overall I/O load).

Also, I think "the point is they rarely happen" only if the bgwriter is configured correctly, and determining whether it is (doing its job properly) is exactly what brought me to this part of the docs.

I think your proposed patch improves the documentation very slightly, by making it slightly clearer that the write is to clean a dirty buffer, but does not address the rest of the ambiguity in the statement.

I still believe that my original proposed change, to "This reduces the chances that a backend needing an empty buffer must [itself] write a dirty one back to disk before evicting it" (with one extra word added), resolves the ambiguity and also more clearly and directly focuses it on what the bgwriter does and why, making it better documentation. It might be incorrect if my understanding is incorrect - is it?

Thanks, Chris.

On Tue, 10 Nov 2020 at 16:08, Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Nov  9, 2020 at 08:36:32PM -0500, Bruce Momjian wrote:
> On Tue, Nov  3, 2020 at 06:11:21PM +0000, Chris Wilson wrote:
> > Hi all,
> >
> > I did some more research and found this explanation in a presentation by
> > 2ndQuadrant:
> >
> >
> >     When a process wants a buffer, it asks BufferAlloc for the file/block. If
> >     the block is already cached, it gets pinned and then returned. Otherwise, a
> >     new buffer must be found to hold this data. If there are no buffers free
> >     (there usually aren’t) BufferAlloc selects a buffer to evict to make space
> >     for the new one. If that page is dirty, it is written out to disk. This can
> >     cause the backend trying to allocate that buffer to block as it waits for
> >     that write I/O to complete.
> >
> >
> > So it seems that both reads and writes can potentially have to wait for I/O.
> > And the bgwriter reduces the risk of hitting a dirty page and needing to write
> > it before evicting.
> >
> > So perhaps the documentation should say:
> >
> > "There is a separate server process called the background writer, whose
> > function is to issue writes of “dirty” (new or modified) shared buffers.
> > This reduces the chances that a backend needing an empty buffer must write a
> > dirty one back to disk before evicting it."
>
> I think this would be a step backward.  The point is to say that writes
> rarely happen in the foreground, not to explain when writes do happen.
> With your wording, there could be other cases where writes happen in the
> foreground, and the point is they rarely happen.

I thought some more about this, and it seems the problem really is that
"wait for a write" is unclear, as you said.  This patch fixes it by
referencing "wait for such writes".

--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

On Wed, Nov 11, 2020 at 11:29:09AM +0000, Chris Wilson wrote:
> I still believe that my original proposed change, to "This reduces the chances
> that a backend needing an empty buffer must [itself] write a dirty one back to
> disk before evicting it" (with one extra word added), resolves the ambiguity
> and also more clearly and directly focuses it on what the bgwriter does and
> why, making it better documentation. It might be incorrect if my understanding
> is incorrect - is it?

You make some very good points.  Here is an updated patch.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment
Hi Bruce,

Thanks, yes I agree that that is much clearer. However when you say:

When the percentage of dirty shared buffers is high, the background writer writes some of them to the file system...

I haven't seen anything about a minimum percentage before the bgwriter kicks in, is that really the case? How is it configured?

Thanks, Chris.

On Wed, 11 Nov 2020 at 23:24, Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Nov 11, 2020 at 11:29:09AM +0000, Chris Wilson wrote:
> I still believe that my original proposed change, to "This reduces the chances
> that a backend needing an empty buffer must [itself] write a dirty one back to
> disk before evicting it" (with one extra word added), resolves the ambiguity
> and also more clearly and directly focuses it on what the bgwriter does and
> why, making it better documentation. It might be incorrect if my understanding
> is incorrect - is it?

You make some very good points.  Here is an updated patch.

--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

On Thu, Nov 12, 2020 at 02:40:04PM +0000, Chris Wilson wrote:
> Hi Bruce,
> 
> Thanks, yes I agree that that is much clearer. However when you say:
> 
> 
>     When the percentage of dirty shared buffers is high, the background writer
>     writes some of them to the file system...
> 
> 
> I haven't seen anything about a minimum percentage before the bgwriter kicks
> in, is that really the case? How is it configured?

Yes, I see your point.  My language was not accurate, and it didn't
match the actual background writer tuning parameters below this text. 
Here is an updated doc patch.

I agree this text should be as clear as possible because there is no way
to properly tune the background writer parameters unless we explain how
it works.   It is good you noticed this.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment
Hi Bruce,

Thanks, I absolutely agree that this documentation needs to explain properly how the bgwriter works. Your latest patch looks good, it significantly improves this section of the manual. I would just suggest changing "non-dirty" to "clean" in "When the number of non-dirty shared buffers appears to be insufficient", as this makes the language simpler and avoids introducing another new term (non-dirty, which means the same as clean).

Thanks again, Chris.

On Thu, 12 Nov 2020 at 16:54, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Nov 12, 2020 at 02:40:04PM +0000, Chris Wilson wrote:
> Hi Bruce,
>
> Thanks, yes I agree that that is much clearer. However when you say:
>
>
>     When the percentage of dirty shared buffers is high, the background writer
>     writes some of them to the file system...
>
>
> I haven't seen anything about a minimum percentage before the bgwriter kicks
> in, is that really the case? How is it configured?

Yes, I see your point.  My language was not accurate, and it didn't
match the actual background writer tuning parameters below this text.
Here is an updated doc patch.

I agree this text should be as clear as possible because there is no way
to properly tune the background writer parameters unless we explain how
it works.   It is good you noticed this.

--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

On Thu, Nov 12, 2020 at 05:25:30PM +0000, Chris Wilson wrote:
> Hi Bruce,
> 
> Thanks, I absolutely agree that this documentation needs to explain properly
> how the bgwriter works. Your latest patch looks good, it significantly improves
> this section of the manual. I would just suggest changing "non-dirty" to
> "clean" in "When the number of non-dirty shared buffers appears to be
> insufficient", as this makes the language simpler and avoids introducing
> another new term (non-dirty, which means the same as clean).

OK, done.  I wasn't sure 'clean' would be assumed to be non-dirty, but you
are right the language is clearer with 'clean'.  (I was afraid 'clean'
would be assumed to be 'empty'.)

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment
On Thu, Nov 12, 2020 at 12:37:30PM -0500, Bruce Momjian wrote:
> On Thu, Nov 12, 2020 at 05:25:30PM +0000, Chris Wilson wrote:
> > Hi Bruce,
> > 
> > Thanks, I absolutely agree that this documentation needs to explain properly
> > how the bgwriter works. Your latest patch looks good, it significantly improves
> > this section of the manual. I would just suggest changing "non-dirty" to
> > "clean" in "When the number of non-dirty shared buffers appears to be
> > insufficient", as this makes the language simpler and avoids introducing
> > another new term (non-dirty, which means the same as clean).
> 
> OK, done.  I wasn't sure 'clean' would be assumed to be non-dirty, but you
> are right the language is clearer with 'clean'.  (I was afraid 'clean'
> would be assumed to be 'empty'.)

Patch applied to all supported versions.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee