Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc? |
Date | |
Msg-id | 20230823192239.jxew5s3sjru63lio@awork3.anarazel.de Whole thread Raw |
In response to | Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc? (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?
|
List | pgsql-hackers |
Hi, On 2023-08-23 14:48:26 +0200, Daniel Gustafsson wrote: > > On 23 Aug 2023, at 08:58, Andres Freund <andres@anarazel.de> wrote: > > > I'm hoping to push this fairly soon, as I'll be on vacation the last week of > > August. I'll be online intermittently though, if there are issues, I can react > > (very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd > > appreciate a quick review or two. > > I've been reading over these and the thread, and while not within my area of > expertise, nothing really sticks out. Thanks! > I'll do another pass, but below are a few small comments so far. > > I don't know Windows to know the implications, but should the below file have > some sort of warning about not doing that for production/shared systems, only > for dedicated test instances? Ah, I should have explained that: I'm not planning to apply - regress: Check for postgres startup completion more often - ci: windows: Disabling write cache flushing during test right now. Compared to the other patches the wins are much smaller and/or more work is needed to make them good. I think it might be worth going for - ci: switch tasks to debugoptimized build because that provides a fair bit of gain. But it might be more hurtful than helpful due to costing more when ccache doesn't work... > +++ b/src/tools/ci/windows_write_cache.ps1 > @@ -0,0 +1,20 @@ > +# Define the write cache to be power protected. This reduces the rate of cache > +# flushes, which seems to help metadata heavy workloads on NTFS. We're just > +# testing here anyway, so ... > +# > +# Let's do so for all disks, this could be useful beyond cirrus-ci. > > One thing in 0010 caught my eye, and while not introduced in this patchset it > might be of interest here. In the below hunks we loop X ticks around > system(psql), with the loop assuming the server can come up really quickly and > sleeping if it doesn't. On my systems I always reach the pg_usleep after > failing the check, but if I reverse the check such it first sleeps and then > checks I only need to check once instead of twice. I think there's more effective ways to make this cheaper. The basic thing would be to use libpq instead of forking of psql to make a connection check. Medium term, I think we should invent a way for pg_ctl and other tooling (including pg_regress) to wait for the service to come up. E.g. having a named pipe that postmaster opens once the server is up, which should allow multiple clients to use select/epoll/... to wait for it without looping. ISTM making pg_regress use libpq w/ PQping() should be a pretty simple patch? The non-polling approach obviously is even better, but also requires more thought (and documentation and ...). > @@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[], > else > wait_seconds = 60; > > - for (i = 0; i < wait_seconds; i++) > + for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++) > { > /* Done if psql succeeds */ > fflush(NULL); > @@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[], > outputdir); > } > > - pg_usleep(1000000L); > + pg_usleep(1000000L / WAITS_PER_SEC); > } > if (i >= wait_seconds) > { > > It's a micro-optimization, but if we're changing things here to chase cycles it > might perhaps be worth doing? I wouldn't quite call not waiting for 1s for the server to start, when it does so within a few ms, chasing cycles ;). For short tests it's a substantial fraction of the overall runtime... Greetings, Andres Freund
pgsql-hackers by date: