Re: [PATCH] Hooks at XactCommand level - Mailing list pgsql-hackers

From Gilles Darold
Subject Re: [PATCH] Hooks at XactCommand level
Date
Msg-id d3ca6093-156d-e8c3-fb92-bd892ca905ec@darold.net
Whole thread Raw
In response to Re: [PATCH] Hooks at XactCommand level  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Hooks at XactCommand level  (Gilles Darold <gilles@darold.net>)
List pgsql-hackers
Le 13/08/2021 à 11:58, Andres Freund a écrit :
> Hi,
>
> On 2021-08-10 10:12:26 +0200, Gilles Darold wrote:
>> Sorry for the response delay. I have though about adding this odd hook to be
>> able to implement this feature through an extension because I don't think
>> this is something that should be implemented in core. There were also
>> patches proposals which were all rejected.
>>
>> We usually implement the feature at client side which is imo enough for the
>> use cases. But the problem is that this a catastrophe in term of
>> performances. I have done a small benchmark to illustrate the problem. This
>> is a single process client on the same host than the PG backend.
>>
>> For 10,000 tuples inserted with 50% of failures and rollback at statement
>> level handled at client side:
>>
>>         Expected: 5001, Count:  5001
>>         DML insert took: 13 wallclock secs ( 0.53 usr +  0.94 sys =  1.47
>> CPU)
> Something seems off here. This suggests every insert took 2.6ms. That
> seems awfully long, unless your network latency is substantial. I did a
> quick test implementing this in the naive-most way in pgbench, and I get
> better times - and there's *lots* of room for improvement.
>
> I used a pgbench script that sent the following:
> BEGIN;
> SAVEPOINT insert_fail;
> INSERT INTO testinsert(data) VALUES (1);
> ROLLBACK TO SAVEPOINT insert_fail;
> SAVEPOINT insert_success;
> INSERT INTO testinsert(data) VALUES (1);
> RELEASE SAVEPOINT insert_success;
> {repeat 5 times}
> COMMIT;
>
> I.e. 5 failing and 5 succeeding insertions wrapped in one transaction. I
> get >2500 tps, i.e. > 25k rows/sec. And it's not hard to optimize that
> further - the {ROLLBACK TO,RELEASE} SAVEPOINT; SAVEPOINT; INSERT can be
> sent in one roundtrip. That gets me to somewhere around 40k rows/sec.
>
>
> BEGIN;
>
> \startpipeline
> SAVEPOINT insert_fail;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> \startpipeline
> ROLLBACK TO SAVEPOINT insert_fail;
> SAVEPOINT insert_success;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> \startpipeline
> RELEASE SAVEPOINT insert_success;
> SAVEPOINT insert_fail;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> \startpipeline
> ROLLBACK TO SAVEPOINT insert_fail;
> SAVEPOINT insert_success;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> {repeat last two blocks three times}
> COMMIT;
>
> Greetings,
>
> Andres Freund


I have written a Perl script to mimic what I have found in an Oracle
batch script to import data in a table. I had this use case in a recent
migration the only difference is that the batch was written in Java.


$dbh->do("BEGIN") or die "FATAL: " . $dbh->errstr . "\n";
my $start = new Benchmark;
my $sth = $dbh->prepare("INSERT INTO t1 VALUES (?, ?)");
exit 1 if (not defined $sth);
for (my $i = 0; $i <= 10000; $i++)
{
        $dbh->do("SAVEPOINT foo") or die "FATAL: " . $dbh->errstr . "\n";
        # Generate a duplicate key each two row inserted
        my $val = $i;
        $val = $i-1 if ($i % 2 != 0);
        unless ($sth->execute($val, 'insert '.$i)) {
                $dbh->do("ROLLBACK TO foo") or die "FATAL: " .
$dbh->errstr . "\n";
        } else {
                $dbh->do("RELEASE foo") or die "FATAL: " . $dbh->errstr
. "\n";
        }
}
$sth->finish();
my $end = new Benchmark;

$dbh->do("COMMIT;");

my $td = timediff($end, $start);
print "DML insert took: " . timestr($td) . "\n";


The timing reported are from my personal computer, there is no network
latency, it uses localhost. Anyway, the objective was not to bench the
DML throughput but the overhead of the rollback at statement level made
at client side versus server side. I guess that you might have the same
speed gain around x3 to x5 or more following the number of tuples?


The full script can be found here
https://github.com/darold/pg_statement_rollbackv2/blob/main/test/batch_script_example.pl


Cheers,

--
Gilles Darold





pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Shared memory size computation oversight?
Next
From: Michael Paquier
Date:
Subject: Re: SI messages sent when excuting ROLLBACK PREPARED command