Thread: Query execution in Perl TAP tests needs work

Query execution in Perl TAP tests needs work

From

Thomas Munro

Date:

28 August 2023, 05:29:56

Hi,

Every time we run a SQL query, we fork a new psql process and a new
cold backend process.  It's not free on Unix, and quite a lot worse on
Windows, at around 70ms per query.  Take amcheck/001_verify_heapam for
example.  It runs 272 subtests firing off a stream of queries, and
completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on
CI.

Here are some timestamps I captured from CI by instrumenting various
Perl and C bits:

0.000s: IPC::Run starts
0.023s:   postmaster socket sees connection
0.025s:   postmaster has created child process
0.033s:     backend starts running main()
0.039s:     backend has reattached to shared memory
0.043s:     backend connection authorized message
0.046s:     backend has executed and logged query
0.070s: IPC::Run returns

I expected process creation to be slow on that OS, but it seems like
something happening at the end is even slower.  CI shows Windows
consuming 4 CPUs at 100% for a full 10 minutes to run a test suite
that finishes in 2-3 minutes everywhere else with the same number of
CPUs.  Could there be an event handling snafu in IPC::Run or elsewhere
nearby?  It seems like there must be either a busy loop or a busted
sleep/wakeup... somewhere?  But even if there's a weird bug here
waiting to be discovered and fixed, I guess it'll always be too slow
at ~10ms per process spawned, with two processes to spawn, and it's
bad enough on Unix.

As an experiment, I hacked up a not-good-enough-to-share experiment
where $node->safe_psql() would automatically cache a BackgroundPsql
object and reuse it, and the times for that test dropped ~51 -> ~9s on
Windows, and ~7 -> ~2s on the Unixen.  But even that seems non-ideal
(well it's certainly non-ideal the way I hacked it up anyway...).  I
suppose there are quite a few ways we could do better:

1.  Don't fork anything at all: open (and cache) a connection directly
from Perl.
1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
popular Perl xsub library?
1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
of some existing one.
2.  Use long-lived psql sessions.
2a.  Something building on BackgroundPsql.
2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
protocol that is more fun to talk to from Perl/machines?

In some other languages one can do FFI pretty easily so we could use
the in-tree libpq without extra dependencies:

>>> import ctypes
>>> libpq = ctypes.cdll.LoadLibrary("/path/to/libpq.so")
>>> libpq.PQlibVersion()
170000

... but it seems you can't do either static C bindings or runtime FFI
from Perl without adding a new library/package dependency.  I'm not
much of a Perl hacker so I don't have any particular feeling.  What
would be best?

This message brought to you by the Lorax.

Re: Query execution in Perl TAP tests needs work

From

Andrew Dunstan

Date:

28 August 2023, 13:23:16

On 2023-08-28 Mo 01:29, Thomas Munro wrote:

Hi,

Every time we run a SQL query, we fork a new psql process and a new
cold backend process.  It's not free on Unix, and quite a lot worse on
Windows, at around 70ms per query.  Take amcheck/001_verify_heapam for
example.  It runs 272 subtests firing off a stream of queries, and
completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on
CI.

Here are some timestamps I captured from CI by instrumenting various
Perl and C bits:

0.000s: IPC::Run starts
0.023s:   postmaster socket sees connection
0.025s:   postmaster has created child process
0.033s:     backend starts running main()
0.039s:     backend has reattached to shared memory
0.043s:     backend connection authorized message
0.046s:     backend has executed and logged query
0.070s: IPC::Run returns

I expected process creation to be slow on that OS, but it seems like
something happening at the end is even slower.  CI shows Windows
consuming 4 CPUs at 100% for a full 10 minutes to run a test suite
that finishes in 2-3 minutes everywhere else with the same number of
CPUs.  Could there be an event handling snafu in IPC::Run or elsewhere
nearby?  It seems like there must be either a busy loop or a busted
sleep/wakeup... somewhere?  But even if there's a weird bug here
waiting to be discovered and fixed, I guess it'll always be too slow
at ~10ms per process spawned, with two processes to spawn, and it's
bad enough on Unix.

As an experiment, I hacked up a not-good-enough-to-share experiment
where $node->safe_psql() would automatically cache a BackgroundPsql
object and reuse it, and the times for that test dropped ~51 -> ~9s on
Windows, and ~7 -> ~2s on the Unixen.  But even that seems non-ideal
(well it's certainly non-ideal the way I hacked it up anyway...).  I
suppose there are quite a few ways we could do better:

1.  Don't fork anything at all: open (and cache) a connection directly
from Perl.
1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
popular Perl xsub library?
1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
of some existing one.
2.  Use long-lived psql sessions.
2a.  Something building on BackgroundPsql.
2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
protocol that is more fun to talk to from Perl/machines?

In some other languages one can do FFI pretty easily so we could use
the in-tree libpq without extra dependencies:

import ctypes
libpq = ctypes.cdll.LoadLibrary("/path/to/libpq.so")
libpq.PQlibVersion()

170000

... but it seems you can't do either static C bindings or runtime FFI
from Perl without adding a new library/package dependency.  I'm not
much of a Perl hacker so I don't have any particular feeling.  What
would be best?

This message brought to you by the Lorax.

Thanks for raising this. Windows test times have bothered me for ages.

The standard perl DBI library has a connect_cached method. Of course we don't want to be dependent on it, especially if we might have changed libpq in what we're testing, and it would place a substantial new burden on testers like buildfarm owners.

I like the idea of using a pure perl pq implementation, not least because it could expand our ability to test things at the protocol level. Not sure how much work it would be. I'm willing to help if we want to go that way.

Yes you need an external library to use FFI in perl, but there's one that's pretty tiny. See <https://metacpan.org/pod/FFI::Library>. There is also FFI::Platypus, but it involves building a library. OTOH, that's the one that's available standard on my Fedora and Ubuntu systems. I haven't tried using either Maybe we could use some logic that would use the FFI interface if it's available, and fall back on current usage.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Query execution in Perl TAP tests needs work

From

Noah Misch

Date:

29 August 2023, 01:48:19

On Mon, Aug 28, 2023 at 05:29:56PM +1200, Thomas Munro wrote:
> CI shows Windows
> consuming 4 CPUs at 100% for a full 10 minutes to run a test suite
> that finishes in 2-3 minutes everywhere else with the same number of
> CPUs.  Could there be an event handling snafu in IPC::Run or elsewhere
> nearby?  It seems like there must be either a busy loop or a busted
> sleep/wakeup... somewhere?

That smells like this one:
https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929

> As an experiment, I hacked up a not-good-enough-to-share experiment
> where $node->safe_psql() would automatically cache a BackgroundPsql
> object and reuse it, and the times for that test dropped ~51 -> ~9s on
> Windows, and ~7 -> ~2s on the Unixen.

Nice!

> suppose there are quite a few ways we could do better:
> 
> 1.  Don't fork anything at all: open (and cache) a connection directly
> from Perl.
> 1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
> popular Perl xsub library?
> 1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
> of some existing one.
> 2.  Use long-lived psql sessions.
> 2a.  Something building on BackgroundPsql.
> 2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
> protocol that is more fun to talk to from Perl/machines?

(2a) seems adequate and easiest to achieve.  If DBD::Pg were under a
compatible license, I'd say use it as the vendor for (1a).  Maybe we can get
it relicensed?  Building a separate way of connecting from Perl would be sad.

Re: Query execution in Perl TAP tests needs work

From

Thomas Munro

Date:

29 August 2023, 04:25:24

On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote:
> https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929

Interesting.  But that shows a case with no pipes connected, using
select() as a dumb sleep and ignoring SIGCHLD.  In our usage we have
pipes connected, and I think select() should return when the child's
output pipes become readable due to EOF.  I guess something about that
might be b0rked on Windows?  I see there is an extra helper process
doing socket<->pipe conversion (hah, that explains an extra ~10ms at
the start in my timestamps)... I can't really follow that code, but
perhaps the parent forgot to close the far end of the socket pair, so
there is no EOF?

Re: Query execution in Perl TAP tests needs work

From

Thomas Munro

Date:

29 August 2023, 04:33:48

On Tue, Aug 29, 2023 at 1:23 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> I like the idea of using a pure perl pq implementation, not least because it could expand our ability to test things
atthe protocol level. Not sure how much work it would be. I'm willing to help if we want to go that way. 

Cool.  Let's see what others think.

And assuming we can pick *something* vaguely efficient and find a Perl
hacker to implement it, a related question is how to expose it to our
test suites.

Should we figure out how to leave all our tests unchanged, by teaching
$node->psql() et al to do caching implicitly?  Or should we make it
explicit, with $conn = $node->connect(), and $conn->do_stuff()?  And
if the latter, should do_stuff be DBI style or something that suits us
better?

Re: Query execution in Perl TAP tests needs work

From

Thomas Munro

Date:

29 August 2023, 06:41:59

On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote:
> On Mon, Aug 28, 2023 at 05:29:56PM +1200, Thomas Munro wrote:
> > 1.  Don't fork anything at all: open (and cache) a connection directly
> > from Perl.
> > 1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
> > popular Perl xsub library?
> > 1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
> > of some existing one.
> > 2.  Use long-lived psql sessions.
> > 2a.  Something building on BackgroundPsql.
> > 2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
> > protocol that is more fun to talk to from Perl/machines?
>
> (2a) seems adequate and easiest to achieve.  If DBD::Pg were under a
> compatible license, I'd say use it as the vendor for (1a).  Maybe we can get
> it relicensed?  Building a separate way of connecting from Perl would be sad.

Here's my minimal POC of 2a.  It only changes ->safe_psql() and not
the various other things like ->psql() and ->poll_query_until().
Hence use of amcheck/001_verify_heapam as an example: it runs a lot of
safe_psql() queries.  It fails in all kinds of ways if enabled
generally, which would take some investigation (some tests require
there to be no connections at various times, so we'd probably need to
insert disable/re-enable calls at various places).

Attachment

0001-XXX-cache-psql-processes.patch

Re: Query execution in Perl TAP tests needs work

From

Noah Misch

Date:

29 August 2023, 13:49:26

On Tue, Aug 29, 2023 at 04:25:24PM +1200, Thomas Munro wrote:
> On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote:
> > https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929
> 
> Interesting.  But that shows a case with no pipes connected, using
> select() as a dumb sleep and ignoring SIGCHLD.  In our usage we have
> pipes connected, and I think select() should return when the child's
> output pipes become readable due to EOF.  I guess something about that
> might be b0rked on Windows?  I see there is an extra helper process
> doing socket<->pipe conversion (hah, that explains an extra ~10ms at
> the start in my timestamps)...

In that case, let's assume it's not the same issue.

Re: Query execution in Perl TAP tests needs work

From

Thomas Munro

Date:

29 August 2023, 21:48:42

On Wed, Aug 30, 2023 at 1:49 AM Noah Misch <noah@leadboat.com> wrote:
> On Tue, Aug 29, 2023 at 04:25:24PM +1200, Thomas Munro wrote:
> > On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote:
> > > https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929
> >
> > Interesting.  But that shows a case with no pipes connected, using
> > select() as a dumb sleep and ignoring SIGCHLD.  In our usage we have
> > pipes connected, and I think select() should return when the child's
> > output pipes become readable due to EOF.  I guess something about that
> > might be b0rked on Windows?  I see there is an extra helper process
> > doing socket<->pipe conversion (hah, that explains an extra ~10ms at
> > the start in my timestamps)...
>
> In that case, let's assume it's not the same issue.

Yeah, I think it amounts to the same thing, if EOF never arrives.

I suspect that we could get ->safe_psql() down to about ~25ms baseline
if someone could fix the posited IPC::Run EOF bug and change its
internal helper process to a helper thread.  Even if we fix tests to
reuse backends, I expect that'd help CI measurably.  (The native way
would be to use pipes directly, changing select() to
WaitForMultipleObjects(), but I suspect IPC::Run might have painted
itself into a corner by exposing the psuedo-pipes and documenting that
they can be used with select().  Oh well.)

Re: Query execution in Perl TAP tests needs work

From

Andres Freund

Date:

30 August 2023, 22:19:26

Hi,

On 2023-08-28 17:29:56 +1200, Thomas Munro wrote:
> Every time we run a SQL query, we fork a new psql process and a new
> cold backend process.  It's not free on Unix, and quite a lot worse on
> Windows, at around 70ms per query.  Take amcheck/001_verify_heapam for
> example.  It runs 272 subtests firing off a stream of queries, and
> completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on
> CI.

Whoa.

> Here are some timestamps I captured from CI by instrumenting various
> Perl and C bits:
> 
> 0.000s: IPC::Run starts
> 0.023s:   postmaster socket sees connection
> 0.025s:   postmaster has created child process
> 0.033s:     backend starts running main()
> 0.039s:     backend has reattached to shared memory
> 0.043s:     backend connection authorized message
> 0.046s:     backend has executed and logged query
> 0.070s: IPC::Run returns
> 
> I expected process creation to be slow on that OS, but it seems like
> something happening at the end is even slower.  CI shows Windows
> consuming 4 CPUs at 100% for a full 10 minutes to run a test suite
> that finishes in 2-3 minutes everywhere else with the same number of
> CPUs.

It finishes in that time on linux, even with sanitizers enabled...


> As an experiment, I hacked up a not-good-enough-to-share experiment
> where $node->safe_psql() would automatically cache a BackgroundPsql
> object and reuse it, and the times for that test dropped ~51 -> ~9s on
> Windows, and ~7 -> ~2s on the Unixen.  But even that seems non-ideal
> (well it's certainly non-ideal the way I hacked it up anyway...).  I
> suppose there are quite a few ways we could do better:

That's a really impressive win.

Even if we "just" converted some of the safe_psql() cases and converted
poll_query_until() to this, we'd win a lot.


> 1.  Don't fork anything at all: open (and cache) a connection directly
> from Perl.

One advantage of that is that the socket is entirely controlled by perl, so
waiting for IO should be easy...


> 2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
> protocol that is more fun to talk to from Perl/machines?

That does also seem promising - a good chunk of the complexity around some of
the IPC::Run uses is that we end up parsing psql input/output...

Greetings,

Andres Freund

Re: Query execution in Perl TAP tests needs work

From

Andrew Dunstan

Date:

30 August 2023, 22:32:30

On 2023-08-28 Mo 09:23, Andrew Dunstan wrote:

On 2023-08-28 Mo 01:29, Thomas Munro wrote:

Hi,

Every time we run a SQL query, we fork a new psql process and a new
cold backend process.  It's not free on Unix, and quite a lot worse on
Windows, at around 70ms per query.  Take amcheck/001_verify_heapam for
example.  It runs 272 subtests firing off a stream of queries, and
completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on
CI.

Here are some timestamps I captured from CI by instrumenting various
Perl and C bits:

0.000s: IPC::Run starts
0.023s:   postmaster socket sees connection
0.025s:   postmaster has created child process
0.033s:     backend starts running main()
0.039s:     backend has reattached to shared memory
0.043s:     backend connection authorized message
0.046s:     backend has executed and logged query
0.070s: IPC::Run returns

I expected process creation to be slow on that OS, but it seems like
something happening at the end is even slower.  CI shows Windows
consuming 4 CPUs at 100% for a full 10 minutes to run a test suite
that finishes in 2-3 minutes everywhere else with the same number of
CPUs.  Could there be an event handling snafu in IPC::Run or elsewhere
nearby?  It seems like there must be either a busy loop or a busted
sleep/wakeup... somewhere?  But even if there's a weird bug here
waiting to be discovered and fixed, I guess it'll always be too slow
at ~10ms per process spawned, with two processes to spawn, and it's
bad enough on Unix.

As an experiment, I hacked up a not-good-enough-to-share experiment
where $node->safe_psql() would automatically cache a BackgroundPsql
object and reuse it, and the times for that test dropped ~51 -> ~9s on
Windows, and ~7 -> ~2s on the Unixen.  But even that seems non-ideal
(well it's certainly non-ideal the way I hacked it up anyway...).  I
suppose there are quite a few ways we could do better:

1.  Don't fork anything at all: open (and cache) a connection directly
from Perl.
1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
popular Perl xsub library?
1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
of some existing one.
2.  Use long-lived psql sessions.
2a.  Something building on BackgroundPsql.
2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
protocol that is more fun to talk to from Perl/machines?

In some other languages one can do FFI pretty easily so we could use
the in-tree libpq without extra dependencies:

import ctypes
libpq = ctypes.cdll.LoadLibrary("/path/to/libpq.so")
libpq.PQlibVersion()

170000

... but it seems you can't do either static C bindings or runtime FFI
from Perl without adding a new library/package dependency.  I'm not
much of a Perl hacker so I don't have any particular feeling.  What
would be best?

This message brought to you by the Lorax.

Thanks for raising this. Windows test times have bothered me for ages.

I had a brief play with this. Here's how easy it was to wrap libpq in perl:

#!/usr/bin/perl

use strict; use warnings;

use FFI::Platypus;

my $ffi = FFI::Platypus->new(api=>1);
$ffi->lib("inst/lib/libpq.so");

$ffi->type('opaque' => 'PGconn');
$ffi->attach(PQconnectdb => [ 'string' ] => 'PGconn');
$ffi->attach(PQfinish => [ 'PGconn' ] => 'void');

$ffi->type('opaque' => 'PGresult');
$ffi->attach(PQexec => [ 'PGconn', 'string' ] => 'PGresult');
$ffi->attach(PQgetvalue => [ 'PGresult', 'int', 'int' ] => 'string');

my $pgconn = PQconnectdb("dbname=postgres host=/tmp");
my $res = PQexec($pgconn, "select count(*) from pg_class");
my $count = PQgetvalue( $res, 0, 0);

print "count: $count\n";

PQfinish($pgconn);

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Query execution in Perl TAP tests needs work

From

Thomas Munro

Date:

31 August 2023, 01:29:37

On Thu, Aug 31, 2023 at 10:32 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> #!/usr/bin/perl
>
> use strict; use warnings;
>
> use FFI::Platypus;
>
> my $ffi = FFI::Platypus->new(api=>1);
> $ffi->lib("inst/lib/libpq.so");
>
>
> $ffi->type('opaque' => 'PGconn');
> $ffi->attach(PQconnectdb => [ 'string' ] => 'PGconn');
> $ffi->attach(PQfinish => [ 'PGconn' ] => 'void');
>
> $ffi->type('opaque' => 'PGresult');
> $ffi->attach(PQexec => [ 'PGconn', 'string' ] => 'PGresult');
> $ffi->attach(PQgetvalue => [ 'PGresult', 'int', 'int' ] => 'string');
>
> my $pgconn = PQconnectdb("dbname=postgres host=/tmp");
> my $res = PQexec($pgconn, "select count(*) from pg_class");
> my $count = PQgetvalue( $res, 0, 0);
>
> print "count: $count\n";
>
> PQfinish($pgconn);

It looks very promising so far.  How hard would it be for us to add
this dependency?  Mostly pinging build farm owners?

I'm still on the fence, but the more I know about IPC::Run, the better
the various let's-connect-directly-from-Perl options sound...

Re: Query execution in Perl TAP tests needs work

From

Andrew Dunstan

Date:

02 September 2023, 18:42:42

On 2023-08-30 We 21:29, Thomas Munro wrote:

On Thu, Aug 31, 2023 at 10:32 AM Andrew Dunstan <andrew@dunslane.net> wrote:

#!/usr/bin/perl

use strict; use warnings;

use FFI::Platypus;

my $ffi = FFI::Platypus->new(api=>1);
$ffi->lib("inst/lib/libpq.so");


$ffi->type('opaque' => 'PGconn');
$ffi->attach(PQconnectdb => [ 'string' ] => 'PGconn');
$ffi->attach(PQfinish => [ 'PGconn' ] => 'void');

$ffi->type('opaque' => 'PGresult');
$ffi->attach(PQexec => [ 'PGconn', 'string' ] => 'PGresult');
$ffi->attach(PQgetvalue => [ 'PGresult', 'int', 'int' ] => 'string');

my $pgconn = PQconnectdb("dbname=postgres host=/tmp");
my $res = PQexec($pgconn, "select count(*) from pg_class");
my $count = PQgetvalue( $res, 0, 0);

print "count: $count\n";

PQfinish($pgconn);

It looks very promising so far.  How hard would it be for us to add
this dependency?  Mostly pinging build farm owners?

I'm still on the fence, but the more I know about IPC::Run, the better
the various let's-connect-directly-from-Perl options sound...

Here's some progress. I have put it all in a perl module, which I have tested on Windows (both mingw and MSVC) as well as Ubuntu. I think this is probably something worth having in itself. I wrapped a substantial portion of libpq, but left out things to do with large objects, async processing, pipelining, SSL and some other things. We can fill in the gaps in due course.

The test program now looks like this:

use strict;
use warnings;

use lib ".";
use PqFFI;

PqFFI::setup("inst/lib");

my $conn = PQconnectdb("dbname=postgres host=/tmp");
my $res = PQexec($conn, 'select count(*) from pg_class');
my $count = PQgetvalue($res,0,0);
print "$count rows in pg_class\n";
PQfinish($conn);

I guess the next thing would be to test it on a few more platforms and also to see if we need to expand the coverage of libpq for the intended uses.

I confess I'm a little reluctant to impose this burden on buildfarm owners. We should think about some sort of fallback in case this isn't supported on some platform, either due to technological barriers or buildfarm owner reluctance.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment

PqFFI.pm

Re: Query execution in Perl TAP tests needs work

From

Thomas Munro

Date:

03 September 2023, 00:17:56

On Sun, Sep 3, 2023 at 6:42 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> I guess the next thing would be to test it on a few more platforms and also to see if we need to expand the coverage
oflibpq for the intended uses. 

Nice.  It works fine on my FreeBSD battlestation after "sudo pkg
install p5-FFI-Platypus" and adjusting that lib path.  I wonder if
there is a nice way to extract those constants from our headers...

It's using https://sourceware.org/libffi/ under the covers (like most
other scripting language FFI things), and that knows calling
conventions for everything we care about including weird OSes and
architectures.  It might be a slight pain to build it on systems that
have no package manager, if cpan can't do it for you?  I guess AIX
would be the most painful?

(Huh, while contemplating trying that, I just noticed that the GCC
build farm's AIX 7.2 system seems to have given up the ghost a few
weeks ago.  I wonder if it'll come back online with the current
release, or if that's the end...  There is still the
overloaded-to-the-point-of-being-hard-to-interact-with AIX 7.1 (=EOL)
machine.)

> I confess I'm a little reluctant to impose this burden on buildfarm owners. We should think about some sort of
fallbackin case this isn't supported on some platform, either due to technological barriers or buildfarm owner
reluctance.

I guess you're thinking that it could be done in such a way that it is
automatically used for $node->safe_psql() and various other things if
Platypus is detected, but otherwise forking psql as now, for a
transition period?  Then we could nag build farm owners, and
eventually turn off the fallback stuff after N months.  After that it
would begin to be possible to use this in more interesting and
advanced ways.

Re: Query execution in Perl TAP tests needs work

From

Andrew Dunstan

Date:

04 September 2023, 15:25:39

On 2023-09-02 Sa 20:17, Thomas Munro wrote:

On Sun, Sep 3, 2023 at 6:42 AM Andrew Dunstan <andrew@dunslane.net> wrote:

I confess I'm a little reluctant to impose this burden on buildfarm owners. We should think about some sort of fallback in case this isn't supported on some platform, either due to technological barriers or buildfarm owner reluctance.

I guess you're thinking that it could be done in such a way that it is
automatically used for $node->safe_psql() and various other things if
Platypus is detected, but otherwise forking psql as now, for a
transition period?  Then we could nag build farm owners, and
eventually turn off the fallback stuff after N months.  After that it
would begin to be possible to use this in more interesting and
advanced ways.

Yeah, that would be ideal. I'll prep a version of the module that doesn't fail if FFI::Platypus isn't available, but instead sets a flag we can test.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Query execution in Perl TAP tests needs work

From

Thomas Munro

Date:

11 September 2023, 20:52:50

On Sun, Sep 3, 2023 at 12:17 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> (Huh, while contemplating trying that, I just noticed that the GCC
> build farm's AIX 7.2 system seems to have given up the ghost a few
> weeks ago.  I wonder if it'll come back online with the current
> release, or if that's the end...  There is still the
> overloaded-to-the-point-of-being-hard-to-interact-with AIX 7.1 (=EOL)
> machine.)

FTR it (gcc119) appears to have come back online, now upgraded to AIX
7.3.  No reports from "hoverfly" (I think it was on that host?).  It
probably needs some attention to start working again after the
upgrade.

Re: Query execution in Perl TAP tests needs work

From

Andres Freund

Date:

13 September 2023, 04:14:12

Hi,

On 2023-08-28 17:29:56 +1200, Thomas Munro wrote:
> As an experiment, I hacked up a not-good-enough-to-share experiment
> where $node->safe_psql() would automatically cache a BackgroundPsql
> object and reuse it, and the times for that test dropped ~51 -> ~9s on
> Windows, and ~7 -> ~2s on the Unixen.  But even that seems non-ideal
> (well it's certainly non-ideal the way I hacked it up anyway...).  I
> suppose there are quite a few ways we could do better:
> 
> 1.  Don't fork anything at all: open (and cache) a connection directly
> from Perl.
> 1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
> popular Perl xsub library?
> 1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
> of some existing one.
> 2.  Use long-lived psql sessions.
> 2a.  Something building on BackgroundPsql.
> 2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
> protocol that is more fun to talk to from Perl/machines?

While we can't easily use plain long-lived psql everywhere, due to tests
depending on no additional connections being present, we could at least
partially address that by adding a \disconnect to psql.  Based on your numbers
the subprocess that IPC::Run wraps around psql on windows is a substantial
part of the overhead.  Even if we default to reconnecting after every
->psql(), just saving the fork of the wrapper process and psql should reduce
costs substantially.

Famous last words, but it seems like that it should be quite doable to add
that to psql and use it in Cluster->{psql,safe_psql,poll_query_until}? There
might be a few cases that expect the full connection error message, but I
can't imagine that to be too many?

Greetings,

Andres Freund

Re: Query execution in Perl TAP tests needs work

From

Robert Haas

Date:

18 October 2023, 14:03:06

On Sat, Sep 2, 2023 at 2:42 PM Andrew Dunstan <andrew@dunslane.net> wrote:
> I confess I'm a little reluctant to impose this burden on buildfarm owners. We should think about some sort of
fallbackin case this isn't supported on some platform, either due to technological barriers or buildfarm owner
reluctance.

How much burden is it? Would anyone actually mind?

I definitely don't want to put ourselves in a situation where we add a
bunch of annoying dependencies that are required to be able to run the
tests, not just because it will inconvenience buildfarm owners, but
also because it will potentially inconvenience developers, and in
particular, me. At the same time, fallbacks can be a problem too,
because then you can end up with things that work one way on one
developer's machine (or BF machine) and another way on another
developer's machine (or BF machine) and it's not obvious that the
reason for the difference is that one machine is using a fallback and
the other is not. I feel like this tends to create so much aggravation
in practice that it's best not to have fallbacks in this kind of
situation - my vote is that we either stick with the current method
and live with the performance characteristics thereof, or we put in
place something that is faster and better and that new thing becomes a
hard dependency for anyone who wants to be able to run the TAP tests.

In terms of what that faster and better thing might be, AFAICS, there
are two main options. First, we could proceed with the approach you've
tried here, namely requiring everybody to get Platypus::FFI. I find
that it's included in MacPorts on my machine, which is at least
somewhat of a good sign that perhaps this is fairly widely available.
That might need more investigation, though. Second, we could write a
pure-Perl implementation, as you proposed earlier. That would be more
work to write and maintain, but would avoid needing FFI. Personally, I
feel like either an FFI-based approach or a pure-Perl approach would
be pretty reasonable, as long as Platypus::FFI is widely
available/usable. If we go with pure Perl, the hard part might be
managing the authentication methods, but as Thomas pointed out to me
yesterday, we now have UNIX sockets on Windows, and thus everywhere,
so maybe we could get to a point where the pure-Perl implementation
wouldn't need to do any non-trivial authentication.

Another thing, also already mentioned, that we can do is cache psql
connections instead of continuously respawing psql. That doesn't
require any fundamentally new mechanism, and in some sense it's
independent of the approaches above, because they could be implemented
without caching connections, but they would benefit from caching
connections, as the currently psql-based approach also does. I think
it would be good to introduce new syntax for this, e.g.:

$conn_primary = $node_primary->connect();
$conn_primary->simple_query('whatever');
$conn_primary->simple_query('whatever 2');
$conn_primary->disconnect();

Something like this would require a fairly large amount of mechanical
work to implement across all of our TAP test cases, but I think it
would be effort well spent. If we try to introduce connection caching
"transparently," I think it will turn into another foot-gun that
people keep getting wrong because they don't realize there is magic
under the hood, or forget how it works.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Query execution in Perl TAP tests needs work

From

Tom Lane

Date:

18 October 2023, 14:27:59

Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Sep 2, 2023 at 2:42 PM Andrew Dunstan <andrew@dunslane.net> wrote:
>>> How much burden is it? Would anyone actually mind?

> ... At the same time, fallbacks can be a problem too,
> because then you can end up with things that work one way on one
> developer's machine (or BF machine) and another way on another
> developer's machine (or BF machine) and it's not obvious that the
> reason for the difference is that one machine is using a fallback and
> the other is not.

I agree with this worry.

> In terms of what that faster and better thing might be, AFAICS, there
> are two main options. First, we could proceed with the approach you've
> tried here, namely requiring everybody to get Platypus::FFI. I find
> that it's included in MacPorts on my machine, which is at least
> somewhat of a good sign that perhaps this is fairly widely available.

I did a bit of research on this on my favorite platforms, and did
not like the results:

RHEL8: does not seem to be packaged at all.

Fedora 37: available, but the dependencies are a bit much:

$ sudo yum install perl-FFI-Platypus
Last metadata expiration check: 2:07:42 ago on Wed Oct 18 08:05:40 2023.
Dependencies resolved.
================================================================================
 Package                 Architecture Version               Repository     Size
================================================================================
Installing:
 perl-FFI-Platypus       x86_64       2.08-1.fc37           updates       417 k
Installing dependencies:
 libgfortran             x86_64       12.3.1-1.fc37         updates       904 k
 libquadmath             x86_64       12.3.1-1.fc37         updates       206 k
 libquadmath-devel       x86_64       12.3.1-1.fc37         updates        48 k
 perl-FFI-CheckLib       noarch       0.29-2.fc37           updates        29 k
Installing weak dependencies:
 gcc-gfortran            x86_64       12.3.1-1.fc37         updates        12 M

Transaction Summary
================================================================================
Install  6 Packages

Total download size: 14 M
Installed size: 37 M
Is this ok [y/N]:

gfortran?   Really??  I mean, I don't care about the disk space,
but this is not promising for anyone who has to build it themselves.

So I'm afraid that requiring Platypus::FFI might be a bridge too
far for a lot of our older buildfarm machines.

> Another thing, also already mentioned, that we can do is cache psql
> connections instead of continuously respawing psql.

This seems like it's worth thinking about.  I agree with requiring
the re-use to be explicit within a TAP test, else we might have
mysterious behavioral changes (akin to connection-pooler-induced
bugs).

            regards, tom lane

Re: Query execution in Perl TAP tests needs work

From

Dagfinn Ilmari Mannsåker

Date:

18 October 2023, 15:06:59

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Sep 2, 2023 at 2:42 PM Andrew Dunstan <andrew@dunslane.net> wrote:
>>>> How much burden is it? Would anyone actually mind?
>
>> ... At the same time, fallbacks can be a problem too,
>> because then you can end up with things that work one way on one
>> developer's machine (or BF machine) and another way on another
>> developer's machine (or BF machine) and it's not obvious that the
>> reason for the difference is that one machine is using a fallback and
>> the other is not.
>
> I agree with this worry.
>
>> In terms of what that faster and better thing might be, AFAICS, there
>> are two main options. First, we could proceed with the approach you've
>> tried here, namely requiring everybody to get Platypus::FFI. I find
>> that it's included in MacPorts on my machine, which is at least
>> somewhat of a good sign that perhaps this is fairly widely available.
>
> I did a bit of research on this on my favorite platforms, and did
> not like the results:
>
> RHEL8: does not seem to be packaged at all.
>
> Fedora 37: available, but the dependencies are a bit much:
>
> $ sudo yum install perl-FFI-Platypus
> Last metadata expiration check: 2:07:42 ago on Wed Oct 18 08:05:40 2023.
> Dependencies resolved.
> ================================================================================
>  Package                 Architecture Version               Repository     Size
> ================================================================================
> Installing:
>  perl-FFI-Platypus       x86_64       2.08-1.fc37           updates       417 k
> Installing dependencies:
>  libgfortran             x86_64       12.3.1-1.fc37         updates       904 k
>  libquadmath             x86_64       12.3.1-1.fc37         updates       206 k
>  libquadmath-devel       x86_64       12.3.1-1.fc37         updates        48 k
>  perl-FFI-CheckLib       noarch       0.29-2.fc37           updates        29 k
> Installing weak dependencies:
>  gcc-gfortran            x86_64       12.3.1-1.fc37         updates        12 M
>
> Transaction Summary
> ================================================================================
> Install  6 Packages
>
> Total download size: 14 M
> Installed size: 37 M
> Is this ok [y/N]: 
>
> gfortran?   Really??  I mean, I don't care about the disk space,
> but this is not promising for anyone who has to build it themselves.

The Fortran support for FFI::Platypus is in a separate CPAN distribution
(FFI-Platypus-Lang-Fortran), so that must be some quirk of the Fedora
packaging and not a problem for people building it themselves.  They
just need libffi and a working Perl/CPAN setup.

On Debian the only things besides Perl and core perl modules it
(build-)depends on are libffi, Capture::Tiny, FFI::Checklib (which
depends on File::Which), Test2::Suite and pkg-config.

- ilmari

Re: Query execution in Perl TAP tests needs work

From

Robert Haas

Date:

18 October 2023, 15:41:28

On Wed, Oct 18, 2023 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I did a bit of research on this on my favorite platforms, and did
> not like the results:

Hmm. That's unfortunate. Is perl -MCPAN -e 'install Platypus::FFI' a
viable alternative?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Query execution in Perl TAP tests needs work

From

Tom Lane

Date:

18 October 2023, 15:43:29

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> gfortran?   Really??  I mean, I don't care about the disk space,
>> but this is not promising for anyone who has to build it themselves.

> The Fortran support for FFI::Platypus is in a separate CPAN distribution
> (FFI-Platypus-Lang-Fortran), so that must be some quirk of the Fedora
> packaging and not a problem for people building it themselves.

Ah, they must've decided to combine FFI-Platypus-Lang-Fortran into the
same RPM.  Not quite as bad then, since clearly we don't need that.

Another thing to worry about is whether it runs on the oldest
perl versions we support.  I tried it on a 5.14.0 installation,
and it at least compiles and passes its self-test, so that's
promising.  "cpanm install FFI::Platypus" took about 5 minutes
(on a 2012 mac mini, not the world's fastest machine).  The
list of dependencies it pulled in is still kinda long:

Capture-Tiny-0.48
ExtUtils-ParseXS-3.51
Test-Simple-1.302195
constant-1.33
Scalar-List-Utils-1.63
Term-Table-0.017
Test2-Suite-0.000156
File-Which-1.27
FFI-CheckLib-0.31
Try-Tiny-0.31
Test-Fatal-0.017
Test-Needs-0.002010
Test-Warnings-0.032
URI-5.21
Algorithm-Diff-1.201
Text-Diff-1.45
Spiffy-0.46
Test-Base-0.89
Test-YAML-1.07
Test-Deep-1.204
YAML-1.30
File-chdir-0.1011
Path-Tiny-0.144
Alien-Build-2.80
Alien-Build-Plugin-Download-GitHub-0.10
Net-SSLeay-1.92
HTTP-Tiny-0.088
Mozilla-CA-20230821
IO-Socket-SSL-2.083
Mojo-DOM58-3.001
Alien-FFI-0.27
FFI-Platypus-2.08

A couple of these are things a buildfarm animal would need anyway,
but on the whole it seems like this is pretty far up the food chain
compared to our previous set of TAP dependencies (only three
non-core-Perl modules).

Still, writing our own equivalent is probably more work than it's
worth, if this is a suitable solution in all other respects.

Having said that ... I read the man page for FFI::Platypus,
and I'm failing to draw a straight line between what it can do
and what we need.  Aren't we going to need a big chunk of new
Perl code anyway?  If so, why not write a big chunk of new Perl
that doesn't have new external dependencies?

            regards, tom lane

Re: Query execution in Perl TAP tests needs work

From

Tom Lane

Date:

18 October 2023, 15:47:17

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Oct 18, 2023 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I did a bit of research on this on my favorite platforms, and did
>> not like the results:

> Hmm. That's unfortunate. Is perl -MCPAN -e 'install Platypus::FFI' a
> viable alternative?

Probably, see my followup.

            regards, tom lane

Re: Query execution in Perl TAP tests needs work

From

Robert Haas

Date:

18 October 2023, 15:53:21

On Wed, Oct 18, 2023 at 11:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Having said that ... I read the man page for FFI::Platypus,
> and I'm failing to draw a straight line between what it can do
> and what we need.  Aren't we going to need a big chunk of new
> Perl code anyway?  If so, why not write a big chunk of new Perl
> that doesn't have new external dependencies?

I think that the question here is what exactly the Perl code that we'd
have to write would be doing.

If we use FFI::Platypus, our Perl code can directly call libpq
functions. The Perl code would just be a wrapper around those function
calls, basically. I'm sure there's some work to be done there but a
lot of it is probably boilerplate.

Without FFI::Platypus, we have to write Perl code that can speak the
wire protocol directly. Basically, we're writing our own PostgreSQL
driver for Perl, though we might need only a subset of the things a
real driver would need to handle, and we might add some extra things,
like code that can send intentionally botched protocol messages.

I think it's a judgement call which is better. Depending on
FFI::Platypus is annoying, because nobody likes dependencies. But
writing a new implementation of the wire protocol is probably more
work, and once we wrote it, we'd also need to maintain it and debug it
and stuff. We would probably be able to gain some test coverage of
situations that libpq won't let you create, but we would also perhaps
lose some test coverage for libpq itself.

I feel like either way is a potentially viable way forward, and where
we end up might end up depending on who is willing to do the work and
what that person would prefer to do.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Query execution in Perl TAP tests needs work

From

Alvaro Herrera

Date:

18 October 2023, 16:25:01

On 2023-Oct-18, Robert Haas wrote:

> Without FFI::Platypus, we have to write Perl code that can speak the
> wire protocol directly. Basically, we're writing our own PostgreSQL
> driver for Perl, though we might need only a subset of the things a
> real driver would need to handle, and we might add some extra things,
> like code that can send intentionally botched protocol messages.

We could revive the old src/interfaces/perl5 code, which was a libpq
wrapper -- at least the subset of it that the tests need.  It was moved
to gborg by commit 9a0b4d7f8474 and a couple more versions were made
there, which can be seen at
https://www.postgresql.org/ftp/projects/gborg/pgperl/stable/,
version 2.1.1 being apparently the latest.  The complete driver was
about 3000 lines, judging by the commit that removed it.  Presumably we
don't need the whole of that.

Apparently the project was migrated from gborg to pgFoundry at some
point, because this exists
https://www.postgresql.org/ftp/projects/pgFoundry/pgperl/
and maybe they did some additional changes there, but at least
our FTP site doesn't show anything.  Perhaps there were never any
releases, and we don't have the CVSROOT.  But I doubt any changes at
that point would have been critical.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Query execution in Perl TAP tests needs work

From

Andrew Dunstan

Date:

24 October 2023, 12:48:58

On 2023-10-18 We 11:47, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Oct 18, 2023 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I did a bit of research on this on my favorite platforms, and did
>>> not like the results:
>> Hmm. That's unfortunate. Is perl -MCPAN -e 'install Platypus::FFI' a
>> viable alternative?
> Probably, see my followup.
>
>             

Interesting. OK, here's an attempt to push the cart a bit further down 
the road. The attached module wraps quite a lot of libpq, at least 
enough for most of the cases we would be interested in, I think. It also 
exports some constants such as connection status values, query status 
values, transaction status values and type Oids. It also makes the 
absence of FFI::Platypus not instantly fatal, but any attempt to use one 
of the wrapped functions will die with a message about the module being 
missing if it's not found.

I guess the next step would be for someone to locate some of the 
hotspots in the TAP tests and try to convert them to using persistent 
connections with this gadget or similar and see how much faster we can 
make them.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment

PqFFI.pm

Re: Query execution in Perl TAP tests needs work

From

Jehan-Guillaume de Rorthais

Date:

06 November 2023, 10:14:25

On Wed, 18 Oct 2023 18:25:01 +0200
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> On 2023-Oct-18, Robert Haas wrote:
> 
> > Without FFI::Platypus, we have to write Perl code that can speak the
> > wire protocol directly. Basically, we're writing our own PostgreSQL
> > driver for Perl, though we might need only a subset of the things a
> > real driver would need to handle, and we might add some extra things,
> > like code that can send intentionally botched protocol messages.  
> 
> We could revive the old src/interfaces/perl5 code, which was a libpq
> wrapper -- at least the subset of it that the tests need.  It was moved
> to gborg by commit 9a0b4d7f8474 and a couple more versions were made
> there, which can be seen at
> https://www.postgresql.org/ftp/projects/gborg/pgperl/stable/,
> version 2.1.1 being apparently the latest.  The complete driver was
> about 3000 lines, judging by the commit that removed it.  Presumably we
> don't need the whole of that.

+1 to test this. I can give it some time to revive it and post results here if
you agree and no one think of some show stopper.

Re: Query execution in Perl TAP tests needs work

From

Noah Misch

Date:

31 March 2024, 05:03:10

On Wed, Aug 30, 2023 at 09:48:42AM +1200, Thomas Munro wrote:
> On Wed, Aug 30, 2023 at 1:49 AM Noah Misch <noah@leadboat.com> wrote:
> > On Tue, Aug 29, 2023 at 04:25:24PM +1200, Thomas Munro wrote:
> > > On Tue, Aug 29, 2023 at 1:48 PM Noah Misch <noah@leadboat.com> wrote:
> > > > https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929
> > >
> > > Interesting.  But that shows a case with no pipes connected, using
> > > select() as a dumb sleep and ignoring SIGCHLD.  In our usage we have
> > > pipes connected, and I think select() should return when the child's
> > > output pipes become readable due to EOF.  I guess something about that
> > > might be b0rked on Windows?  I see there is an extra helper process
> > > doing socket<->pipe conversion (hah, that explains an extra ~10ms at
> > > the start in my timestamps)...
> >
> > In that case, let's assume it's not the same issue.
> 
> Yeah, I think it amounts to the same thing, if EOF never arrives.
> 
> I suspect that we could get ->safe_psql() down to about ~25ms baseline
> if someone could fix the posited IPC::Run EOF bug

I pushed optimizations in https://github.com/cpan-authors/IPC-Run/pull/172
that make the TAP portion of "make check-world" 7% faster on my GNU/Linux
machine.  I didn't confirm an EOF bug, but that change also reduces Windows
idle time in simple tests.  I didn't run Windows check-world with it.  For
non-Windows, we can get almost all the benefit from the attached one-liner.
(The relative benefit is probably lower for parallel check-world, where idle
threads matter less, and for slower machines.)

Attachment

tap-sigchld-optimize-v1.patch