Re: Performance degradation on concurrent COPY into a single relation in PG16. - Mailing list pgsql-hackers
From | Jakub Wartak |
---|---|
Subject | Re: Performance degradation on concurrent COPY into a single relation in PG16. |
Date | |
Msg-id | CAKZiRmyQ76T83FCsQxNDxq_mf8fcwE4O=yZk8re0GVfJDS1mhg@mail.gmail.com Whole thread Raw |
In response to | Re: Performance degradation on concurrent COPY into a single relation in PG16. (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Performance degradation on concurrent COPY into a single relation in PG16.
|
List | pgsql-hackers |
On Mon, Jul 10, 2023 at 6:24 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-07-03 11:53:56 +0200, Jakub Wartak wrote: > > Out of curiosity I've tried and it is reproducible as you have stated : XFS > > @ 4.18.0-425.10.1.el8_7.x86_64: > >... > > According to iostat and blktrace -d /dev/sda -o - | blkparse -i - output , > > the XFS issues sync writes while ext4 does not, xfs looks like constant > > loop of sync writes (D) by kworker/2:1H-kblockd: > > That clearly won't go well. It's not reproducible on newer systems, > unfortunately :(. Or well, fortunately maybe. > > > I wonder if a trick to avoid this could be to memorialize the fact that we > bulk extended before and extend by that much going forward? That'd avoid the > swapping back and forth. I haven't seen this thread [1] "Question on slow fallocate", from XFS mailing list being mentioned here (it was started by Masahiko), but I do feel it contains very important hints even challenging the whole idea of zeroing out files (or posix_fallocate()). Please especially see Dave's reply. He also argues that posix_fallocate() != fallocate(). What's interesting is that it's by design and newer kernel versions should not prevent such behaviour, see my testing result below. All I can add is that this those kernel versions (4.18.0) seem to very popular across customers (RHEL, Rocky) right now and that I've tested on most recent available one (4.18.0-477.15.1.el8_8.x86_64) using Masahiko test.c and still got 6-7x slower time when using XFS on that kernel. After installing kernel-ml (6.4.2) the test.c result seems to be the same (note it it occurs only when 1st allocating space, but of course it doesnt if the same file is rewritten/"reallocated"): [root@rockyora ~]# uname -r 6.4.2-1.el8.elrepo.x86_64 [root@rockyora ~]# time ./test test.0 0 total 200000 fallocate 0 filewrite 200000 real 0m0.405s user 0m0.006s sys 0m0.391s [root@rockyora ~]# time ./test test.0 1 total 200000 fallocate 200000 filewrite 0 real 0m0.137s user 0m0.005s sys 0m0.132s [root@rockyora ~]# time ./test test.1 1 total 200000 fallocate 200000 filewrite 0 real 0m0.968s user 0m0.020s sys 0m0.928s [root@rockyora ~]# time ./test test.2 2 total 200000 fallocate 100000 filewrite 100000 real 0m6.059s user 0m0.000s sys 0m0.788s [root@rockyora ~]# time ./test test.2 2 total 200000 fallocate 100000 filewrite 100000 real 0m0.598s user 0m0.003s sys 0m0.225s [root@rockyora ~]# iostat -x reports during first "time ./test test.2 2" (as you can see w_awiat is not that high but it accumulates): Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 0.00 15394.00 0.00 122.02 0.00 13.00 0.00 0.08 0.00 0.05 0.75 0.00 8.12 0.06 100.00 dm-0 0.00 15407.00 0.00 122.02 0.00 0.00 0.00 0.00 0.00 0.06 0.98 0.00 8.11 0.06 100.00 So maybe that's just a hint that you should try on slower storage instead? (I think that on NVMe this issue would be hardly noticeable due to low IO latency, not like here) -J. [1] - https://www.spinics.net/lists/linux-xfs/msg73035.html
pgsql-hackers by date: