Home > mailing lists

Batch insert in CTAS/MatView code - Mailing list pgsql-hackers

From	Paul Guo
Subject	Batch insert in CTAS/MatView code
Date	March 6, 2019 14:06:27
Msg-id	CAEET0ZHRWxbRUgwzUK_tOFDWx7VE2-P=xMBT6-N+gAa9WQ=xxA@mail.gmail.com Whole thread
Responses	Re: Batch insert in CTAS/MatView code Re: Batch insert in CTAS/MatView code Re: Batch insert in CTAS/MatView code
List	pgsql-hackers

Tree view

Hello, Postgres hackers,

The copy code has used batch insert with function heap_multi_insert() to speed up. It seems that Create Table As or Materialized View could leverage that code also to boost the performance also. Attached is a patch to implement that. That was done by Taylor (cc-ed) and me.

The patch also modifies heap_multi_insert() a bit to do a bit further code-level optimization by using static memory, instead of using memory context and dynamic allocation. For Modifytable->insert, it seems that there are more limitations for batch insert (trigger, etc?) but it seems that it is possible that we could do batch insert for the case that we could do?

By the way, while looking at the code, I noticed that there are 9 local arrays with large length in toast_insert_or_update() which seems to be a risk of stack overflow. Maybe we should put it as static or global.

Here is a quick simple performance testing on a mirrorless Postgres instance with the SQLs below. The tests cover tables with small column length, large column length and toast.

-- tuples with small size.

drop table if exists t1;

create table t1 (a int);

insert into t1 select * from generate_series(1, 10000000);

drop table if exists t2;

\timing

create table t2 as select * from t1;

\timing

-- tuples that are untoasted and data that is 1664 bytes wide

drop table if exists t1;

create table t1 (a name, b name, c name, d name, e name, f name, g name, h name, i name, j name, k name, l name, m name, n name, o name, p name, q name, r name, s name, t name, u name, v name, w name, x name, y name, z name);

insert into t1 select 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z' from generate_series(1, 500000);

drop table if exists t2;

\timing

create table t2 as select * from t1;

\timing

-- tuples that are toastable.

drop table if exists t1;

create table t1 (a text, b text, c text, d text, e text, f text, g text, h text, i text, j text, k text, l text, m text, n text, o text, p text, q text, r text, s text, t text, u text, v text, w text, x text, y text, z text);

insert into t1 select i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i from (select repeat('123456789', 10000) from generate_series(1,2000)) i;

drop table if exists t2;

\timing

create table t2 as select * from t1;

\timing

Here are the timing results:

With the patch,

Time: 4728.142 ms (00:04.728)

Time: 14203.983 ms (00:14.204)

Time: 1008.669 ms (00:01.009)

Baseline,

Time: 11096.146 ms (00:11.096)

Time: 13106.741 ms (00:13.107)

Time: 1100.174 ms (00:01.100)

While for toast and large column size there is < 10% decrease but for small column size the improvement is super good. Actually if I hardcode the batch count as 4 all test cases are better but the improvement for small column size is smaller than that with current patch. Pretty much the number 4 is quite case specific so I can not hardcode that in the patch. Of course we could further tune that but the current value seems to be a good trade-off?

Thanks.

Attachment

0001-Heap-batch-insert-for-CTAS.patch

pgsql-hackers by date:

From: "Karl O. Pinc"
Date: 06 March 2019, 13:09:48
Subject: Re: Patch to document base64 encoding

From: Tom Lane
Date: 06 March 2019, 14:36:07
Subject: Re: pg_dump is broken for partition tablespaces

Batch insert in CTAS/MatView code - Mailing list pgsql-hackers

Attachment

Previous

Next