Hi,
On 2023-04-24 18:36:24 -0400, Melanie Plageman wrote:
> On Mon, Apr 24, 2023 at 6:13 PM Andres Freund <andres@anarazel.de> wrote:
> > > Also, it seems like this (given the current code) is only reachable for
> > > permanent relations (i.e. not for IO object temp relation). If other
> > backend
> > > types than checkpointer may call smgrwriteback(), we likely have to
> > consider
> > > the IO context.
> >
> > I think we should take it into account - it'd e.g. interesting to see a
> > COPY
> > is bottlenecked on smgrwriteback() rather than just writing the data.
> >
>
> With the quick and dirty attached patch and using your example but with a
> pgbench -T200 on my rather fast local NVMe SSD, you can still see quite
> a difference.
Quite a difference between what?
What scale of pgbench did you use?
-T200 is likely not a good idea, because a timed checkpoint might "interfere",
unless you use a non-default checkpoint_timeout. A timed checkpoint won't show
the issue as easily, because checkpointer spend most of the time sleeping.
> This is with a stats reset before the checkpoint.
>
> unpatched:
>
> backend_type | object | context | writes | write_time |
> fsyncs | fsync_time
> ---------------------+---------------+-----------+---------+------------+---------+------------
> background writer | relation | normal | 443 | 1.408 |
> 0 | 0
> checkpointer | relation | normal | 187804 | 396.829 |
> 47 | 254.226
>
> patched:
>
> backend_type | object | context | writes | write_time
> | fsyncs | fsync_time
> ---------------------+---------------+-----------+---------+--------------------+--------+------------
> background writer | relation | normal | 917 |
> 4.4670000000000005 | 0 | 0
> checkpointer | relation | normal | 375798 |
> 977.354 | 48 | 202.514
>
> I did compare client backend stats before and after pgbench and it made
> basically no difference. I'll do a COPY example like you mentioned.
> Patch needs cleanup/comments and a bit more work, but I could do with
> a sanity check review on the approach.
I was thinking we'd track writeback separately from the write, rather than
attributing the writeback to "write". Otherwise it looks good, based on a
quick skim.
Greetings,
Andres Freund