Thread: Query execution in Perl TAP tests needs work
Hi, Every time we run a SQL query, we fork a new psql process and a new cold backend process. It's not free on Unix, and quite a lot worse on Windows, at around 70ms per query. Take amcheck/001_verify_heapam for example. It runs 272 subtests firing off a stream of queries, and completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on CI. Here are some timestamps I captured from CI by instrumenting various Perl and C bits: 0.000s: IPC::Run starts 0.023s: postmaster socket sees connection 0.025s: postmaster has created child process 0.033s: backend starts running main() 0.039s: backend has reattached to shared memory 0.043s: backend connection authorized message 0.046s: backend has executed and logged query 0.070s: IPC::Run returns I expected process creation to be slow on that OS, but it seems like something happening at the end is even slower. CI shows Windows consuming 4 CPUs at 100% for a full 10 minutes to run a test suite that finishes in 2-3 minutes everywhere else with the same number of CPUs. Could there be an event handling snafu in IPC::Run or elsewhere nearby? It seems like there must be either a busy loop or a busted sleep/wakeup... somewhere? But even if there's a weird bug here waiting to be discovered and fixed, I guess it'll always be too slow at ~10ms per process spawned, with two processes to spawn, and it's bad enough on Unix. As an experiment, I hacked up a not-good-enough-to-share experiment where $node->safe_psql() would automatically cache a BackgroundPsql object and reuse it, and the times for that test dropped ~51 -> ~9s on Windows, and ~7 -> ~2s on the Unixen. But even that seems non-ideal (well it's certainly non-ideal the way I hacked it up anyway...). I suppose there are quite a few ways we could do better: 1. Don't fork anything at all: open (and cache) a connection directly from Perl. 1a. Write xsub or ffi bindings for libpq. Or vendor (parts) of the popular Perl xsub library? 1b. Write our own mini pure-perl pq client module. Or vendor (parts) of some existing one. 2. Use long-lived psql sessions. 2a. Something building on BackgroundPsql. 2b. Maybe give psql or a new libpq-wrapper a new low level stdio/pipe protocol that is more fun to talk to from Perl/machines? In some other languages one can do FFI pretty easily so we could use the in-tree libpq without extra dependencies: >>> import ctypes >>> libpq = ctypes.cdll.LoadLibrary("/path/to/libpq.so") >>> libpq.PQlibVersion() 170000 ... but it seems you can't do either static C bindings or runtime FFI from Perl without adding a new library/package dependency. I'm not much of a Perl hacker so I don't have any particular feeling. What would be best? This message brought to you by the Lorax.
Hi, Every time we run a SQL query, we fork a new psql process and a new cold backend process. It's not free on Unix, and quite a lot worse on Windows, at around 70ms per query. Take amcheck/001_verify_heapam for example. It runs 272 subtests firing off a stream of queries, and completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on CI. Here are some timestamps I captured from CI by instrumenting various Perl and C bits: 0.000s: IPC::Run starts 0.023s: postmaster socket sees connection 0.025s: postmaster has created child process 0.033s: backend starts running main() 0.039s: backend has reattached to shared memory 0.043s: backend connection authorized message 0.046s: backend has executed and logged query 0.070s: IPC::Run returns I expected process creation to be slow on that OS, but it seems like something happening at the end is even slower. CI shows Windows consuming 4 CPUs at 100% for a full 10 minutes to run a test suite that finishes in 2-3 minutes everywhere else with the same number of CPUs. Could there be an event handling snafu in IPC::Run or elsewhere nearby? It seems like there must be either a busy loop or a busted sleep/wakeup... somewhere? But even if there's a weird bug here waiting to be discovered and fixed, I guess it'll always be too slow at ~10ms per process spawned, with two processes to spawn, and it's bad enough on Unix. As an experiment, I hacked up a not-good-enough-to-share experiment where $node->safe_psql() would automatically cache a BackgroundPsql object and reuse it, and the times for that test dropped ~51 -> ~9s on Windows, and ~7 -> ~2s on the Unixen. But even that seems non-ideal (well it's certainly non-ideal the way I hacked it up anyway...). I suppose there are quite a few ways we could do better: 1. Don't fork anything at all: open (and cache) a connection directly from Perl. 1a. Write xsub or ffi bindings for libpq. Or vendor (parts) of the popular Perl xsub library? 1b. Write our own mini pure-perl pq client module. Or vendor (parts) of some existing one. 2. Use long-lived psql sessions. 2a. Something building on BackgroundPsql. 2b. Maybe give psql or a new libpq-wrapper a new low level stdio/pipe protocol that is more fun to talk to from Perl/machines? In some other languages one can do FFI pretty easily so we could use the in-tree libpq without extra dependencies:import ctypes libpq = ctypes.cdll.LoadLibrary("/path/to/libpq.so") libpq.PQlibVersion()170000 ... but it seems you can't do either static C bindings or runtime FFI from Perl without adding a new library/package dependency. I'm not much of a Perl hacker so I don't have any particular feeling. What would be best? This message brought to you by the Lorax.
Thanks for raising this. Windows test times have bothered me for ages.
The standard perl DBI library has a connect_cached method. Of course we don't want to be dependent on it, especially if we might have changed libpq in what we're testing, and it would place a substantial new burden on testers like buildfarm owners.
I like the idea of using a pure perl pq implementation, not least because it could expand our ability to test things at the protocol level. Not sure how much work it would be. I'm willing to help if we want to go that way.
Yes you need an external library to use FFI in perl, but there's one that's pretty tiny. See <https://metacpan.org/pod/FFI::Library>. There is also FFI::Platypus, but it involves building a library. OTOH, that's the one that's available standard on my Fedora and Ubuntu systems. I haven't tried using either Maybe we could use some logic that would use the FFI interface if it's available, and fall back on current usage.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Mon, Aug 28, 2023 at 05:29:56PM +1200, Thomas Munro wrote: > CI shows Windows > consuming 4 CPUs at 100% for a full 10 minutes to run a test suite > that finishes in 2-3 minutes everywhere else with the same number of > CPUs. Could there be an event handling snafu in IPC::Run or elsewhere > nearby? It seems like there must be either a busy loop or a busted > sleep/wakeup... somewhere? That smells like this one: https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929 > As an experiment, I hacked up a not-good-enough-to-share experiment > where $node->safe_psql() would automatically cache a BackgroundPsql > object and reuse it, and the times for that test dropped ~51 -> ~9s on > Windows, and ~7 -> ~2s on the Unixen. Nice! > suppose there are quite a few ways we could do better: > > 1. Don't fork anything at all: open (and cache) a connection directly > from Perl. > 1a. Write xsub or ffi bindings for libpq. Or vendor (parts) of the > popular Perl xsub library? > 1b. Write our own mini pure-perl pq client module. Or vendor (parts) > of some existing one. > 2. Use long-lived psql sessions. > 2a. Something building on BackgroundPsql. > 2b. Maybe give psql or a new libpq-wrapper a new low level stdio/pipe > protocol that is more fun to talk to from Perl/machines? (2a) seems adequate and easiest to achieve. If DBD::Pg were under a compatible license, I'd say use it as the vendor for (1a). Maybe we can get it relicensed? Building a separate way of connecting from Perl would be sad.
On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote: > https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929 Interesting. But that shows a case with no pipes connected, using select() as a dumb sleep and ignoring SIGCHLD. In our usage we have pipes connected, and I think select() should return when the child's output pipes become readable due to EOF. I guess something about that might be b0rked on Windows? I see there is an extra helper process doing socket<->pipe conversion (hah, that explains an extra ~10ms at the start in my timestamps)... I can't really follow that code, but perhaps the parent forgot to close the far end of the socket pair, so there is no EOF?
On Tue, Aug 29, 2023 at 1:23 AM Andrew Dunstan <andrew@dunslane.net> wrote: > I like the idea of using a pure perl pq implementation, not least because it could expand our ability to test things atthe protocol level. Not sure how much work it would be. I'm willing to help if we want to go that way. Cool. Let's see what others think. And assuming we can pick *something* vaguely efficient and find a Perl hacker to implement it, a related question is how to expose it to our test suites. Should we figure out how to leave all our tests unchanged, by teaching $node->psql() et al to do caching implicitly? Or should we make it explicit, with $conn = $node->connect(), and $conn->do_stuff()? And if the latter, should do_stuff be DBI style or something that suits us better?
On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote: > On Mon, Aug 28, 2023 at 05:29:56PM +1200, Thomas Munro wrote: > > 1. Don't fork anything at all: open (and cache) a connection directly > > from Perl. > > 1a. Write xsub or ffi bindings for libpq. Or vendor (parts) of the > > popular Perl xsub library? > > 1b. Write our own mini pure-perl pq client module. Or vendor (parts) > > of some existing one. > > 2. Use long-lived psql sessions. > > 2a. Something building on BackgroundPsql. > > 2b. Maybe give psql or a new libpq-wrapper a new low level stdio/pipe > > protocol that is more fun to talk to from Perl/machines? > > (2a) seems adequate and easiest to achieve. If DBD::Pg were under a > compatible license, I'd say use it as the vendor for (1a). Maybe we can get > it relicensed? Building a separate way of connecting from Perl would be sad. Here's my minimal POC of 2a. It only changes ->safe_psql() and not the various other things like ->psql() and ->poll_query_until(). Hence use of amcheck/001_verify_heapam as an example: it runs a lot of safe_psql() queries. It fails in all kinds of ways if enabled generally, which would take some investigation (some tests require there to be no connections at various times, so we'd probably need to insert disable/re-enable calls at various places).
Attachment
On Tue, Aug 29, 2023 at 04:25:24PM +1200, Thomas Munro wrote: > On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote: > > https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929 > > Interesting. But that shows a case with no pipes connected, using > select() as a dumb sleep and ignoring SIGCHLD. In our usage we have > pipes connected, and I think select() should return when the child's > output pipes become readable due to EOF. I guess something about that > might be b0rked on Windows? I see there is an extra helper process > doing socket<->pipe conversion (hah, that explains an extra ~10ms at > the start in my timestamps)... In that case, let's assume it's not the same issue.
On Wed, Aug 30, 2023 at 1:49 AM Noah Misch <noah@leadboat.com> wrote: > On Tue, Aug 29, 2023 at 04:25:24PM +1200, Thomas Munro wrote: > > On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote: > > > https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929 > > > > Interesting. But that shows a case with no pipes connected, using > > select() as a dumb sleep and ignoring SIGCHLD. In our usage we have > > pipes connected, and I think select() should return when the child's > > output pipes become readable due to EOF. I guess something about that > > might be b0rked on Windows? I see there is an extra helper process > > doing socket<->pipe conversion (hah, that explains an extra ~10ms at > > the start in my timestamps)... > > In that case, let's assume it's not the same issue. Yeah, I think it amounts to the same thing, if EOF never arrives. I suspect that we could get ->safe_psql() down to about ~25ms baseline if someone could fix the posited IPC::Run EOF bug and change its internal helper process to a helper thread. Even if we fix tests to reuse backends, I expect that'd help CI measurably. (The native way would be to use pipes directly, changing select() to WaitForMultipleObjects(), but I suspect IPC::Run might have painted itself into a corner by exposing the psuedo-pipes and documenting that they can be used with select(). Oh well.)
Hi, On 2023-08-28 17:29:56 +1200, Thomas Munro wrote: > Every time we run a SQL query, we fork a new psql process and a new > cold backend process. It's not free on Unix, and quite a lot worse on > Windows, at around 70ms per query. Take amcheck/001_verify_heapam for > example. It runs 272 subtests firing off a stream of queries, and > completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on > CI. Whoa. > Here are some timestamps I captured from CI by instrumenting various > Perl and C bits: > > 0.000s: IPC::Run starts > 0.023s: postmaster socket sees connection > 0.025s: postmaster has created child process > 0.033s: backend starts running main() > 0.039s: backend has reattached to shared memory > 0.043s: backend connection authorized message > 0.046s: backend has executed and logged query > 0.070s: IPC::Run returns > > I expected process creation to be slow on that OS, but it seems like > something happening at the end is even slower. CI shows Windows > consuming 4 CPUs at 100% for a full 10 minutes to run a test suite > that finishes in 2-3 minutes everywhere else with the same number of > CPUs. It finishes in that time on linux, even with sanitizers enabled... > As an experiment, I hacked up a not-good-enough-to-share experiment > where $node->safe_psql() would automatically cache a BackgroundPsql > object and reuse it, and the times for that test dropped ~51 -> ~9s on > Windows, and ~7 -> ~2s on the Unixen. But even that seems non-ideal > (well it's certainly non-ideal the way I hacked it up anyway...). I > suppose there are quite a few ways we could do better: That's a really impressive win. Even if we "just" converted some of the safe_psql() cases and converted poll_query_until() to this, we'd win a lot. > 1. Don't fork anything at all: open (and cache) a connection directly > from Perl. One advantage of that is that the socket is entirely controlled by perl, so waiting for IO should be easy... > 2b. Maybe give psql or a new libpq-wrapper a new low level stdio/pipe > protocol that is more fun to talk to from Perl/machines? That does also seem promising - a good chunk of the complexity around some of the IPC::Run uses is that we end up parsing psql input/output... Greetings, Andres Freund
On 2023-08-28 Mo 01:29, Thomas Munro wrote:Hi, Every time we run a SQL query, we fork a new psql process and a new cold backend process. It's not free on Unix, and quite a lot worse on Windows, at around 70ms per query. Take amcheck/001_verify_heapam for example. It runs 272 subtests firing off a stream of queries, and completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on CI. Here are some timestamps I captured from CI by instrumenting various Perl and C bits: 0.000s: IPC::Run starts 0.023s: postmaster socket sees connection 0.025s: postmaster has created child process 0.033s: backend starts running main() 0.039s: backend has reattached to shared memory 0.043s: backend connection authorized message 0.046s: backend has executed and logged query 0.070s: IPC::Run returns I expected process creation to be slow on that OS, but it seems like something happening at the end is even slower. CI shows Windows consuming 4 CPUs at 100% for a full 10 minutes to run a test suite that finishes in 2-3 minutes everywhere else with the same number of CPUs. Could there be an event handling snafu in IPC::Run or elsewhere nearby? It seems like there must be either a busy loop or a busted sleep/wakeup... somewhere? But even if there's a weird bug here waiting to be discovered and fixed, I guess it'll always be too slow at ~10ms per process spawned, with two processes to spawn, and it's bad enough on Unix. As an experiment, I hacked up a not-good-enough-to-share experiment where $node->safe_psql() would automatically cache a BackgroundPsql object and reuse it, and the times for that test dropped ~51 -> ~9s on Windows, and ~7 -> ~2s on the Unixen. But even that seems non-ideal (well it's certainly non-ideal the way I hacked it up anyway...). I suppose there are quite a few ways we could do better: 1. Don't fork anything at all: open (and cache) a connection directly from Perl. 1a. Write xsub or ffi bindings for libpq. Or vendor (parts) of the popular Perl xsub library? 1b. Write our own mini pure-perl pq client module. Or vendor (parts) of some existing one. 2. Use long-lived psql sessions. 2a. Something building on BackgroundPsql. 2b. Maybe give psql or a new libpq-wrapper a new low level stdio/pipe protocol that is more fun to talk to from Perl/machines? In some other languages one can do FFI pretty easily so we could use the in-tree libpq without extra dependencies:import ctypes libpq = ctypes.cdll.LoadLibrary("/path/to/libpq.so") libpq.PQlibVersion()170000 ... but it seems you can't do either static C bindings or runtime FFI from Perl without adding a new library/package dependency. I'm not much of a Perl hacker so I don't have any particular feeling. What would be best? This message brought to you by the Lorax.Thanks for raising this. Windows test times have bothered me for ages.
The standard perl DBI library has a connect_cached method. Of course we don't want to be dependent on it, especially if we might have changed libpq in what we're testing, and it would place a substantial new burden on testers like buildfarm owners.
I like the idea of using a pure perl pq implementation, not least because it could expand our ability to test things at the protocol level. Not sure how much work it would be. I'm willing to help if we want to go that way.
Yes you need an external library to use FFI in perl, but there's one that's pretty tiny. See <https://metacpan.org/pod/FFI::Library>. There is also FFI::Platypus, but it involves building a library. OTOH, that's the one that's available standard on my Fedora and Ubuntu systems. I haven't tried using either Maybe we could use some logic that would use the FFI interface if it's available, and fall back on current usage.
I had a brief play with this. Here's how easy it was to wrap libpq in perl:
#!/usr/bin/perl
use strict; use warnings;
use FFI::Platypus;
my $ffi = FFI::Platypus->new(api=>1);
$ffi->lib("inst/lib/libpq.so");
$ffi->type('opaque' => 'PGconn');
$ffi->attach(PQconnectdb => [ 'string' ] => 'PGconn');
$ffi->attach(PQfinish => [ 'PGconn' ] => 'void');
$ffi->type('opaque' => 'PGresult');
$ffi->attach(PQexec => [ 'PGconn', 'string' ] => 'PGresult');
$ffi->attach(PQgetvalue => [ 'PGresult', 'int', 'int' ] => 'string');
my $pgconn = PQconnectdb("dbname=postgres host=/tmp");
my $res = PQexec($pgconn, "select count(*) from pg_class");
my $count = PQgetvalue( $res, 0, 0);
print "count: $count\n";
PQfinish($pgconn);
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Aug 31, 2023 at 10:32 AM Andrew Dunstan <andrew@dunslane.net> wrote: > #!/usr/bin/perl > > use strict; use warnings; > > use FFI::Platypus; > > my $ffi = FFI::Platypus->new(api=>1); > $ffi->lib("inst/lib/libpq.so"); > > > $ffi->type('opaque' => 'PGconn'); > $ffi->attach(PQconnectdb => [ 'string' ] => 'PGconn'); > $ffi->attach(PQfinish => [ 'PGconn' ] => 'void'); > > $ffi->type('opaque' => 'PGresult'); > $ffi->attach(PQexec => [ 'PGconn', 'string' ] => 'PGresult'); > $ffi->attach(PQgetvalue => [ 'PGresult', 'int', 'int' ] => 'string'); > > my $pgconn = PQconnectdb("dbname=postgres host=/tmp"); > my $res = PQexec($pgconn, "select count(*) from pg_class"); > my $count = PQgetvalue( $res, 0, 0); > > print "count: $count\n"; > > PQfinish($pgconn); It looks very promising so far. How hard would it be for us to add this dependency? Mostly pinging build farm owners? I'm still on the fence, but the more I know about IPC::Run, the better the various let's-connect-directly-from-Perl options sound...
On Thu, Aug 31, 2023 at 10:32 AM Andrew Dunstan <andrew@dunslane.net> wrote:#!/usr/bin/perl use strict; use warnings; use FFI::Platypus; my $ffi = FFI::Platypus->new(api=>1); $ffi->lib("inst/lib/libpq.so"); $ffi->type('opaque' => 'PGconn'); $ffi->attach(PQconnectdb => [ 'string' ] => 'PGconn'); $ffi->attach(PQfinish => [ 'PGconn' ] => 'void'); $ffi->type('opaque' => 'PGresult'); $ffi->attach(PQexec => [ 'PGconn', 'string' ] => 'PGresult'); $ffi->attach(PQgetvalue => [ 'PGresult', 'int', 'int' ] => 'string'); my $pgconn = PQconnectdb("dbname=postgres host=/tmp"); my $res = PQexec($pgconn, "select count(*) from pg_class"); my $count = PQgetvalue( $res, 0, 0); print "count: $count\n"; PQfinish($pgconn);It looks very promising so far. How hard would it be for us to add this dependency? Mostly pinging build farm owners? I'm still on the fence, but the more I know about IPC::Run, the better the various let's-connect-directly-from-Perl options sound...
Here's some progress. I have put it all in a perl module, which I have tested on Windows (both mingw and MSVC) as well as Ubuntu. I think this is probably something worth having in itself. I wrapped a substantial portion of libpq, but left out things to do with large objects, async processing, pipelining, SSL and some other things. We can fill in the gaps in due course.
The test program now looks like this:
use strict;
use warnings;
use lib ".";
use PqFFI;
PqFFI::setup("inst/lib");
my $conn = PQconnectdb("dbname=postgres host=/tmp");
my $res = PQexec($conn, 'select count(*) from pg_class');
my $count = PQgetvalue($res,0,0);
print "$count rows in pg_class\n";
PQfinish($conn);
I guess the next thing would be to test it on a few more platforms and also to see if we need to expand the coverage of libpq for the intended uses.
I confess I'm a little reluctant to impose this burden on buildfarm owners. We should think about some sort of fallback in case this isn't supported on some platform, either due to technological barriers or buildfarm owner reluctance.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Attachment
On Sun, Sep 3, 2023 at 6:42 AM Andrew Dunstan <andrew@dunslane.net> wrote: > I guess the next thing would be to test it on a few more platforms and also to see if we need to expand the coverage oflibpq for the intended uses. Nice. It works fine on my FreeBSD battlestation after "sudo pkg install p5-FFI-Platypus" and adjusting that lib path. I wonder if there is a nice way to extract those constants from our headers... It's using https://sourceware.org/libffi/ under the covers (like most other scripting language FFI things), and that knows calling conventions for everything we care about including weird OSes and architectures. It might be a slight pain to build it on systems that have no package manager, if cpan can't do it for you? I guess AIX would be the most painful? (Huh, while contemplating trying that, I just noticed that the GCC build farm's AIX 7.2 system seems to have given up the ghost a few weeks ago. I wonder if it'll come back online with the current release, or if that's the end... There is still the overloaded-to-the-point-of-being-hard-to-interact-with AIX 7.1 (=EOL) machine.) > I confess I'm a little reluctant to impose this burden on buildfarm owners. We should think about some sort of fallbackin case this isn't supported on some platform, either due to technological barriers or buildfarm owner reluctance. I guess you're thinking that it could be done in such a way that it is automatically used for $node->safe_psql() and various other things if Platypus is detected, but otherwise forking psql as now, for a transition period? Then we could nag build farm owners, and eventually turn off the fallback stuff after N months. After that it would begin to be possible to use this in more interesting and advanced ways.
On Sun, Sep 3, 2023 at 6:42 AM Andrew Dunstan <andrew@dunslane.net> wrote:I confess I'm a little reluctant to impose this burden on buildfarm owners. We should think about some sort of fallback in case this isn't supported on some platform, either due to technological barriers or buildfarm owner reluctance.I guess you're thinking that it could be done in such a way that it is automatically used for $node->safe_psql() and various other things if Platypus is detected, but otherwise forking psql as now, for a transition period? Then we could nag build farm owners, and eventually turn off the fallback stuff after N months. After that it would begin to be possible to use this in more interesting and advanced ways.
Yeah, that would be ideal. I'll prep a version of the module that doesn't fail if FFI::Platypus isn't available, but instead sets a flag we can test.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Sun, Sep 3, 2023 at 12:17 PM Thomas Munro <thomas.munro@gmail.com> wrote: > (Huh, while contemplating trying that, I just noticed that the GCC > build farm's AIX 7.2 system seems to have given up the ghost a few > weeks ago. I wonder if it'll come back online with the current > release, or if that's the end... There is still the > overloaded-to-the-point-of-being-hard-to-interact-with AIX 7.1 (=EOL) > machine.) FTR it (gcc119) appears to have come back online, now upgraded to AIX 7.3. No reports from "hoverfly" (I think it was on that host?). It probably needs some attention to start working again after the upgrade.
Hi, On 2023-08-28 17:29:56 +1200, Thomas Munro wrote: > As an experiment, I hacked up a not-good-enough-to-share experiment > where $node->safe_psql() would automatically cache a BackgroundPsql > object and reuse it, and the times for that test dropped ~51 -> ~9s on > Windows, and ~7 -> ~2s on the Unixen. But even that seems non-ideal > (well it's certainly non-ideal the way I hacked it up anyway...). I > suppose there are quite a few ways we could do better: > > 1. Don't fork anything at all: open (and cache) a connection directly > from Perl. > 1a. Write xsub or ffi bindings for libpq. Or vendor (parts) of the > popular Perl xsub library? > 1b. Write our own mini pure-perl pq client module. Or vendor (parts) > of some existing one. > 2. Use long-lived psql sessions. > 2a. Something building on BackgroundPsql. > 2b. Maybe give psql or a new libpq-wrapper a new low level stdio/pipe > protocol that is more fun to talk to from Perl/machines? While we can't easily use plain long-lived psql everywhere, due to tests depending on no additional connections being present, we could at least partially address that by adding a \disconnect to psql. Based on your numbers the subprocess that IPC::Run wraps around psql on windows is a substantial part of the overhead. Even if we default to reconnecting after every ->psql(), just saving the fork of the wrapper process and psql should reduce costs substantially. Famous last words, but it seems like that it should be quite doable to add that to psql and use it in Cluster->{psql,safe_psql,poll_query_until}? There might be a few cases that expect the full connection error message, but I can't imagine that to be too many? Greetings, Andres Freund
On Sat, Sep 2, 2023 at 2:42 PM Andrew Dunstan <andrew@dunslane.net> wrote: > I confess I'm a little reluctant to impose this burden on buildfarm owners. We should think about some sort of fallbackin case this isn't supported on some platform, either due to technological barriers or buildfarm owner reluctance. How much burden is it? Would anyone actually mind? I definitely don't want to put ourselves in a situation where we add a bunch of annoying dependencies that are required to be able to run the tests, not just because it will inconvenience buildfarm owners, but also because it will potentially inconvenience developers, and in particular, me. At the same time, fallbacks can be a problem too, because then you can end up with things that work one way on one developer's machine (or BF machine) and another way on another developer's machine (or BF machine) and it's not obvious that the reason for the difference is that one machine is using a fallback and the other is not. I feel like this tends to create so much aggravation in practice that it's best not to have fallbacks in this kind of situation - my vote is that we either stick with the current method and live with the performance characteristics thereof, or we put in place something that is faster and better and that new thing becomes a hard dependency for anyone who wants to be able to run the TAP tests. In terms of what that faster and better thing might be, AFAICS, there are two main options. First, we could proceed with the approach you've tried here, namely requiring everybody to get Platypus::FFI. I find that it's included in MacPorts on my machine, which is at least somewhat of a good sign that perhaps this is fairly widely available. That might need more investigation, though. Second, we could write a pure-Perl implementation, as you proposed earlier. That would be more work to write and maintain, but would avoid needing FFI. Personally, I feel like either an FFI-based approach or a pure-Perl approach would be pretty reasonable, as long as Platypus::FFI is widely available/usable. If we go with pure Perl, the hard part might be managing the authentication methods, but as Thomas pointed out to me yesterday, we now have UNIX sockets on Windows, and thus everywhere, so maybe we could get to a point where the pure-Perl implementation wouldn't need to do any non-trivial authentication. Another thing, also already mentioned, that we can do is cache psql connections instead of continuously respawing psql. That doesn't require any fundamentally new mechanism, and in some sense it's independent of the approaches above, because they could be implemented without caching connections, but they would benefit from caching connections, as the currently psql-based approach also does. I think it would be good to introduce new syntax for this, e.g.: $conn_primary = $node_primary->connect(); $conn_primary->simple_query('whatever'); $conn_primary->simple_query('whatever 2'); $conn_primary->disconnect(); Something like this would require a fairly large amount of mechanical work to implement across all of our TAP test cases, but I think it would be effort well spent. If we try to introduce connection caching "transparently," I think it will turn into another foot-gun that people keep getting wrong because they don't realize there is magic under the hood, or forget how it works. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Sep 2, 2023 at 2:42 PM Andrew Dunstan <andrew@dunslane.net> wrote: >>> How much burden is it? Would anyone actually mind? > ... At the same time, fallbacks can be a problem too, > because then you can end up with things that work one way on one > developer's machine (or BF machine) and another way on another > developer's machine (or BF machine) and it's not obvious that the > reason for the difference is that one machine is using a fallback and > the other is not. I agree with this worry. > In terms of what that faster and better thing might be, AFAICS, there > are two main options. First, we could proceed with the approach you've > tried here, namely requiring everybody to get Platypus::FFI. I find > that it's included in MacPorts on my machine, which is at least > somewhat of a good sign that perhaps this is fairly widely available. I did a bit of research on this on my favorite platforms, and did not like the results: RHEL8: does not seem to be packaged at all. Fedora 37: available, but the dependencies are a bit much: $ sudo yum install perl-FFI-Platypus Last metadata expiration check: 2:07:42 ago on Wed Oct 18 08:05:40 2023. Dependencies resolved. ================================================================================ Package Architecture Version Repository Size ================================================================================ Installing: perl-FFI-Platypus x86_64 2.08-1.fc37 updates 417 k Installing dependencies: libgfortran x86_64 12.3.1-1.fc37 updates 904 k libquadmath x86_64 12.3.1-1.fc37 updates 206 k libquadmath-devel x86_64 12.3.1-1.fc37 updates 48 k perl-FFI-CheckLib noarch 0.29-2.fc37 updates 29 k Installing weak dependencies: gcc-gfortran x86_64 12.3.1-1.fc37 updates 12 M Transaction Summary ================================================================================ Install 6 Packages Total download size: 14 M Installed size: 37 M Is this ok [y/N]: gfortran? Really?? I mean, I don't care about the disk space, but this is not promising for anyone who has to build it themselves. So I'm afraid that requiring Platypus::FFI might be a bridge too far for a lot of our older buildfarm machines. > Another thing, also already mentioned, that we can do is cache psql > connections instead of continuously respawing psql. This seems like it's worth thinking about. I agree with requiring the re-use to be explicit within a TAP test, else we might have mysterious behavioral changes (akin to connection-pooler-induced bugs). regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Sep 2, 2023 at 2:42 PM Andrew Dunstan <andrew@dunslane.net> wrote: >>>> How much burden is it? Would anyone actually mind? > >> ... At the same time, fallbacks can be a problem too, >> because then you can end up with things that work one way on one >> developer's machine (or BF machine) and another way on another >> developer's machine (or BF machine) and it's not obvious that the >> reason for the difference is that one machine is using a fallback and >> the other is not. > > I agree with this worry. > >> In terms of what that faster and better thing might be, AFAICS, there >> are two main options. First, we could proceed with the approach you've >> tried here, namely requiring everybody to get Platypus::FFI. I find >> that it's included in MacPorts on my machine, which is at least >> somewhat of a good sign that perhaps this is fairly widely available. > > I did a bit of research on this on my favorite platforms, and did > not like the results: > > RHEL8: does not seem to be packaged at all. > > Fedora 37: available, but the dependencies are a bit much: > > $ sudo yum install perl-FFI-Platypus > Last metadata expiration check: 2:07:42 ago on Wed Oct 18 08:05:40 2023. > Dependencies resolved. > ================================================================================ > Package Architecture Version Repository Size > ================================================================================ > Installing: > perl-FFI-Platypus x86_64 2.08-1.fc37 updates 417 k > Installing dependencies: > libgfortran x86_64 12.3.1-1.fc37 updates 904 k > libquadmath x86_64 12.3.1-1.fc37 updates 206 k > libquadmath-devel x86_64 12.3.1-1.fc37 updates 48 k > perl-FFI-CheckLib noarch 0.29-2.fc37 updates 29 k > Installing weak dependencies: > gcc-gfortran x86_64 12.3.1-1.fc37 updates 12 M > > Transaction Summary > ================================================================================ > Install 6 Packages > > Total download size: 14 M > Installed size: 37 M > Is this ok [y/N]: > > gfortran? Really?? I mean, I don't care about the disk space, > but this is not promising for anyone who has to build it themselves. The Fortran support for FFI::Platypus is in a separate CPAN distribution (FFI-Platypus-Lang-Fortran), so that must be some quirk of the Fedora packaging and not a problem for people building it themselves. They just need libffi and a working Perl/CPAN setup. On Debian the only things besides Perl and core perl modules it (build-)depends on are libffi, Capture::Tiny, FFI::Checklib (which depends on File::Which), Test2::Suite and pkg-config. - ilmari
On Wed, Oct 18, 2023 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I did a bit of research on this on my favorite platforms, and did > not like the results: Hmm. That's unfortunate. Is perl -MCPAN -e 'install Platypus::FFI' a viable alternative? -- Robert Haas EDB: http://www.enterprisedb.com
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> gfortran? Really?? I mean, I don't care about the disk space, >> but this is not promising for anyone who has to build it themselves. > The Fortran support for FFI::Platypus is in a separate CPAN distribution > (FFI-Platypus-Lang-Fortran), so that must be some quirk of the Fedora > packaging and not a problem for people building it themselves. Ah, they must've decided to combine FFI-Platypus-Lang-Fortran into the same RPM. Not quite as bad then, since clearly we don't need that. Another thing to worry about is whether it runs on the oldest perl versions we support. I tried it on a 5.14.0 installation, and it at least compiles and passes its self-test, so that's promising. "cpanm install FFI::Platypus" took about 5 minutes (on a 2012 mac mini, not the world's fastest machine). The list of dependencies it pulled in is still kinda long: Capture-Tiny-0.48 ExtUtils-ParseXS-3.51 Test-Simple-1.302195 constant-1.33 Scalar-List-Utils-1.63 Term-Table-0.017 Test2-Suite-0.000156 File-Which-1.27 FFI-CheckLib-0.31 Try-Tiny-0.31 Test-Fatal-0.017 Test-Needs-0.002010 Test-Warnings-0.032 URI-5.21 Algorithm-Diff-1.201 Text-Diff-1.45 Spiffy-0.46 Test-Base-0.89 Test-YAML-1.07 Test-Deep-1.204 YAML-1.30 File-chdir-0.1011 Path-Tiny-0.144 Alien-Build-2.80 Alien-Build-Plugin-Download-GitHub-0.10 Net-SSLeay-1.92 HTTP-Tiny-0.088 Mozilla-CA-20230821 IO-Socket-SSL-2.083 Mojo-DOM58-3.001 Alien-FFI-0.27 FFI-Platypus-2.08 A couple of these are things a buildfarm animal would need anyway, but on the whole it seems like this is pretty far up the food chain compared to our previous set of TAP dependencies (only three non-core-Perl modules). Still, writing our own equivalent is probably more work than it's worth, if this is a suitable solution in all other respects. Having said that ... I read the man page for FFI::Platypus, and I'm failing to draw a straight line between what it can do and what we need. Aren't we going to need a big chunk of new Perl code anyway? If so, why not write a big chunk of new Perl that doesn't have new external dependencies? regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Oct 18, 2023 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I did a bit of research on this on my favorite platforms, and did >> not like the results: > Hmm. That's unfortunate. Is perl -MCPAN -e 'install Platypus::FFI' a > viable alternative? Probably, see my followup. regards, tom lane
On Wed, Oct 18, 2023 at 11:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Having said that ... I read the man page for FFI::Platypus, > and I'm failing to draw a straight line between what it can do > and what we need. Aren't we going to need a big chunk of new > Perl code anyway? If so, why not write a big chunk of new Perl > that doesn't have new external dependencies? I think that the question here is what exactly the Perl code that we'd have to write would be doing. If we use FFI::Platypus, our Perl code can directly call libpq functions. The Perl code would just be a wrapper around those function calls, basically. I'm sure there's some work to be done there but a lot of it is probably boilerplate. Without FFI::Platypus, we have to write Perl code that can speak the wire protocol directly. Basically, we're writing our own PostgreSQL driver for Perl, though we might need only a subset of the things a real driver would need to handle, and we might add some extra things, like code that can send intentionally botched protocol messages. I think it's a judgement call which is better. Depending on FFI::Platypus is annoying, because nobody likes dependencies. But writing a new implementation of the wire protocol is probably more work, and once we wrote it, we'd also need to maintain it and debug it and stuff. We would probably be able to gain some test coverage of situations that libpq won't let you create, but we would also perhaps lose some test coverage for libpq itself. I feel like either way is a potentially viable way forward, and where we end up might end up depending on who is willing to do the work and what that person would prefer to do. -- Robert Haas EDB: http://www.enterprisedb.com
On 2023-Oct-18, Robert Haas wrote: > Without FFI::Platypus, we have to write Perl code that can speak the > wire protocol directly. Basically, we're writing our own PostgreSQL > driver for Perl, though we might need only a subset of the things a > real driver would need to handle, and we might add some extra things, > like code that can send intentionally botched protocol messages. We could revive the old src/interfaces/perl5 code, which was a libpq wrapper -- at least the subset of it that the tests need. It was moved to gborg by commit 9a0b4d7f8474 and a couple more versions were made there, which can be seen at https://www.postgresql.org/ftp/projects/gborg/pgperl/stable/, version 2.1.1 being apparently the latest. The complete driver was about 3000 lines, judging by the commit that removed it. Presumably we don't need the whole of that. Apparently the project was migrated from gborg to pgFoundry at some point, because this exists https://www.postgresql.org/ftp/projects/pgFoundry/pgperl/ and maybe they did some additional changes there, but at least our FTP site doesn't show anything. Perhaps there were never any releases, and we don't have the CVSROOT. But I doubt any changes at that point would have been critical. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On 2023-10-18 We 11:47, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Oct 18, 2023 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I did a bit of research on this on my favorite platforms, and did >>> not like the results: >> Hmm. That's unfortunate. Is perl -MCPAN -e 'install Platypus::FFI' a >> viable alternative? > Probably, see my followup. > > Interesting. OK, here's an attempt to push the cart a bit further down the road. The attached module wraps quite a lot of libpq, at least enough for most of the cases we would be interested in, I think. It also exports some constants such as connection status values, query status values, transaction status values and type Oids. It also makes the absence of FFI::Platypus not instantly fatal, but any attempt to use one of the wrapped functions will die with a message about the module being missing if it's not found. I guess the next step would be for someone to locate some of the hotspots in the TAP tests and try to convert them to using persistent connections with this gadget or similar and see how much faster we can make them. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Attachment
On Wed, 18 Oct 2023 18:25:01 +0200 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2023-Oct-18, Robert Haas wrote: > > > Without FFI::Platypus, we have to write Perl code that can speak the > > wire protocol directly. Basically, we're writing our own PostgreSQL > > driver for Perl, though we might need only a subset of the things a > > real driver would need to handle, and we might add some extra things, > > like code that can send intentionally botched protocol messages. > > We could revive the old src/interfaces/perl5 code, which was a libpq > wrapper -- at least the subset of it that the tests need. It was moved > to gborg by commit 9a0b4d7f8474 and a couple more versions were made > there, which can be seen at > https://www.postgresql.org/ftp/projects/gborg/pgperl/stable/, > version 2.1.1 being apparently the latest. The complete driver was > about 3000 lines, judging by the commit that removed it. Presumably we > don't need the whole of that. +1 to test this. I can give it some time to revive it and post results here if you agree and no one think of some show stopper.
On Wed, Aug 30, 2023 at 09:48:42AM +1200, Thomas Munro wrote: > On Wed, Aug 30, 2023 at 1:49 AM Noah Misch <noah@leadboat.com> wrote: > > On Tue, Aug 29, 2023 at 04:25:24PM +1200, Thomas Munro wrote: > > > On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote: > > > > https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929 > > > > > > Interesting. But that shows a case with no pipes connected, using > > > select() as a dumb sleep and ignoring SIGCHLD. In our usage we have > > > pipes connected, and I think select() should return when the child's > > > output pipes become readable due to EOF. I guess something about that > > > might be b0rked on Windows? I see there is an extra helper process > > > doing socket<->pipe conversion (hah, that explains an extra ~10ms at > > > the start in my timestamps)... > > > > In that case, let's assume it's not the same issue. > > Yeah, I think it amounts to the same thing, if EOF never arrives. > > I suspect that we could get ->safe_psql() down to about ~25ms baseline > if someone could fix the posited IPC::Run EOF bug I pushed optimizations in https://github.com/cpan-authors/IPC-Run/pull/172 that make the TAP portion of "make check-world" 7% faster on my GNU/Linux machine. I didn't confirm an EOF bug, but that change also reduces Windows idle time in simple tests. I didn't run Windows check-world with it. For non-Windows, we can get almost all the benefit from the attached one-liner. (The relative benefit is probably lower for parallel check-world, where idle threads matter less, and for slower machines.)