Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Multi Inserts in CREATE TABLE AS - revived patch
Date
Msg-id CALj2ACXWVrJ0W-6AQTp=Z8Hhb=ymMWjNuYBT8F15reZmp8eGfA@mail.gmail.com
Whole thread Raw
In response to Multi Inserts in CREATE TABLE AS - revived patch  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Tue, Nov 3, 2020 at 4:54 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Use case 1- 100mn tuples, 2 integer columns, exec time in sec:
> HEAD: 131.507 when the select part is not parallel, 128.832 when the select part is parallel
> Patch: 98.925 when the select part is not parallel, 52.901 when the select part is parallel
>
> Use case 2- 10mn tuples, 4 integer and 6 text columns, exec time in sec:
> HEAD: 76.801 when the select part is not parallel, 66.074 when the select part is parallel
> Patch: 74.083 when the select part is not parallel, 57.739 when the select part is parallel
>

I did some more testing with v1 patch: execution time is in seconds, each test is run 2 times, with custom configuration [1].

Use case 3: 1 int and 1 text column. each row size 129 bytes, size of 1 text column 101 bytes, number of rows 100million, size of heap file 12.9GB.
HEAD: 253.227, 259.575
Patch: 177.921, 174.196

We get better performance 1.4X with the patch.

Use case 4: 1 int and 30 text columns. each row size 28108 bytes, size of 1 text column 932 bytes, number of rows 10000, size of heap file 281.08MB.
HEAD: 222.812, 218.837
Patch: 222.492, 222.295

We don't see much difference with and without patch. Each time only 2 tuples(2*28108 = 56216 bytes < MAX_MULTI_INSERT_BUFFERED_BYTES(65535 bytes)) are buffered and flushed.

Use case 5: 1 int and 75 text columns. each row size 70228 bytes, size of 1 text column 932 bytes, number of rows 10000, size of heap file 702.28MB.
HEAD: 554.709, 562.745
Patch: 553.378, 560.370

We don't see much difference with and without patch. Since each row size(70228 bytes) is bigger than the MAX_MULTI_INSERT_BUFFERED_BYTES(65535 bytes), multi insert code is not picked, each single row is inserted with table_tuple_insert() itself.

Use case 6: 1 int and 1 text column. each row size 9205 bytes, size of 1 text column 9173 bytes, number of rows 10000, size of heap file 92.05MB.
HEAD: 70.583, 70251
Patch: 72.633, 73.521

We see 2-3 seconds more with patch. When I intentionally made the computed tuple size to 0(sz =0) after GetTupleSize(), which means the single inserts happen, the results are 70.364, 70.406. Looks like this 2-3 seconds extra time is due to the multi insert code and happens for with this use case only. And I think this should not be a problem as the difference is not huge.

+    sz = GetTupleSize(slot, MAX_MULTI_INSERT_BUFFERED_BYTES);
+
+.  sz = 0;
+
+    /* In case the computed tuple size is 0, we go for single inserts. */
+    if (sz != 0)
+    {

[1] - The postgresql.conf used:
shared_buffers = 40GB
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Move OpenSSL random under USE_OPENSSL_RANDOM
Next
From: John Naylor
Date:
Subject: Re: Move catalog toast table and index declarations