non-bulk inserts and tuple routing - Mailing list pgsql-hackers

From Amit Langote
Subject non-bulk inserts and tuple routing
Date
Msg-id 8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp
Whole thread Raw
Responses Re: non-bulk inserts and tuple routing  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: non-bulk inserts and tuple routing  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List pgsql-hackers
Hi.

I have a patch that rearranges the code around partition tuple-routing,
such that allocation of per-partition objects (ResultRelInfo,
TupleConversionMap, etc.) is delayed until a given partition is actually
inserted into (i.e., a tuple is routed to it).  I can see good win for
non-bulk inserts with the patch and the patch is implemented such that it
doesn't affect the bulk-insert case much.

Performance numbers:

* Uses following hash-partitioned table:

create table t1 (a int, b int) partition by hash (a);
create table t1_x partition of t1 for values with (modulus M, remainder R)
...


* Non-bulk insert uses the following code (insert 100,000 rows one-by-one):

do $$
begin
  for i in 1..100000 loop
    insert into t1 values (i, i+1);
  end loop;
end; $$;

* Times in milliseconds:

#parts           HEAD        Patched

     8       6216.300       4977.670
    16       9061.388       6360.093
    32      14081.656       8752.405
    64      24887.110      13919.384
   128      45926.251      24582.411
   256      88088.084      45490.894

As you can see the performance can be as much as 2x faster with the patch,
although time taken still increases as the number of partitions increases,
because we still lock *all* partitions at the beginning.

* Bulk-inserting 100,000 rows using COPY:

copy t1 from '/tmp/t1.csv' csv;

* Times in milliseconds:

#parts           HEAD        Patched

     8        458.301        450.875
    16        409.271        510.723
    32        500.960        612.003
    64        430.687        795.046
   128        449.314        565.786
   256        493.171        490.187

Not much harm here, although numbers are a bit noisy.

Patch is divided into 4, first 3 of which are refactoring patches.

I know this patch will conflict severely with [1] and [2], so it's fine if
we consider applying these later.  Will add this to next CF.

Thanks,
Amit

[1] https://commitfest.postgresql.org/16/1023/

[2] https://commitfest.postgresql.org/16/1184/

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Next
From: Ashutosh Bapat
Date:
Subject: Re: non-bulk inserts and tuple routing