Re: standby recovery fails (tablespace related) (tentative patch and discussion) - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: standby recovery fails (tablespace related) (tentative patch and discussion) |
Date | |
Msg-id | CA+hUKGKU3p1=eSE7gQHaH=GxPLsVYCRnDdUaHcu_aV2r4GwdPA@mail.gmail.com Whole thread Raw |
In response to | Re: standby recovery fails (tablespace related) (tentative patch and discussion) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
|
List | pgsql-hackers |
On Sun, Jul 31, 2022 at 4:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > BTW, quite aside from stability, is it really necessary for this test to > be so freakin' slow? florican for instance reports > > [12:43:38] t/025_stuck_on_old_timeline.pl ....... ok 49010 ms ( 0.00 usr 0.00 sys + 3.64 cusr 2.49 csys = 6.13 CPU) > [12:44:12] t/026_overwrite_contrecord.pl ........ ok 34751 ms ( 0.01 usr 0.00 sys + 3.14 cusr 1.76 csys = 4.91 CPU) > [12:49:00] t/027_stream_regress.pl .............. ok 287278 ms ( 0.00 usr 0.00 sys + 9.66 cusr 6.95 csys = 16.60 CPU) > [12:50:04] t/028_pitr_timelines.pl .............. ok 64543 ms ( 0.00 usr 0.00 sys + 3.59 cusr 3.20 csys = 6.78 CPU) > [12:50:17] t/029_stats_restart.pl ............... ok 12505 ms ( 0.02 usr 0.00 sys + 3.16 cusr 1.40 csys = 4.57 CPU) > [12:50:51] t/030_stats_cleanup_replica.pl ....... ok 33933 ms ( 0.01 usr 0.01 sys + 3.55 cusr 2.46 csys = 6.03 CPU) > [12:51:25] t/031_recovery_conflict.pl ........... ok 34249 ms ( 0.00 usr 0.00 sys + 3.37 cusr 2.20 csys = 5.57 CPU) > [12:52:09] t/032_relfilenode_reuse.pl ........... ok 44274 ms ( 0.01 usr 0.00 sys + 3.21 cusr 2.05 csys = 5.27 CPU) > [12:54:07] t/033_replay_tsp_drops.pl ............ ok 117840 ms ( 0.01 usr 0.00 sys + 8.72 cusr 5.41 csys = 14.14 CPU) > > 027 is so bloated because it runs the core regression tests YA time, > which I'm not very happy about either; but that's no excuse for > every new test to contribute an additional couple of minutes. Complaints about 027 noted, I'm thinking about what we could do about that. As for 033, I worried that it might be the new ProcSignalBarrier stuff around tablespaces, but thankfully the DEBUG logging I added there recently shows those all completing in single digit milliseconds. I also confirmed there are no unexpected fsync'd being produced here. That is quite a lot of CPU, but it's a huge amount of total runtime. It runs in 5-8 seconds on various modern systems, 19 seconds on my Linux RPi4, and 50 seconds on my Celeron-powered NAS box with spinning disks. I noticed this is a 32 bit FBSD system. Is it running on UFS, perhaps on slow storage? Are soft updates enabled (visible as options in output of "mount")? Without soft updates, a lot more file system ops perform synchronous I/O, which really slows down our tests. In general, UFS isn't as good as modern file systems at avoiding I/O for short-lived files, and we set up and tear down a lot of them in our testing. Another thing that makes a difference is to use a filesystem with 8KB block size. This has been a subject of investigation for speeding up CI (see src/tools/ci/gcp_freebsd_repartition.sh), but several mysteries remain unsolved...
pgsql-hackers by date: