Re: Parallel copy - Mailing list pgsql-hackers

From vignesh C
Subject Re: Parallel copy
Date
Msg-id CALDaNm3r8cPsk0Vo_-6AXipTrVwd0o9U2S0nCmRdku1Dn-Tpqg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Responses Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
I have got the execution breakdown for few scenarios with normal disk and RAM disk.

Execution breakup in Normal disk:
Test/ Time(In Seconds)
Total TImeFile Read Timecopyreadline Time
Remaining 
Execution Time
Read line percentage
Test1(3 index + 1 trigger)2099.0170.31110.2562088.450.4886096682
Test2(2 index)657.9940.30310.171647.521.545758776
Test3(no index, no trigger)112.410.29610.996101.1189.782047861
Test4(toast)360.0281.4346.556312.04212.93121646

Execution breakup in RAM disk:
Test/ Time(In Seconds)
Total TImeFile Read Timecopyreadline Time
Remaining 
Execution Time
Read line percentage
Test1(3 index + 1 trigger)1571.5580.2596.9861564.3130.4445270235
Test2(2 index)369.9420.2636.848362.8311.851100983
Test3(no index, no trigger)54.0770.2396.80547.03312.58390813
Test4(toast)96.3230.91826.60368.80227.61853348

Steps for the scenarios:
Test1(Table with 3 indexes and 1 trigger):
CREATE TABLE census2 (year int,age int,ethnic int,sex int,area text,count text);
CREATE TABLE census3(year int,age int,ethnic int,sex int,area text,count text);

CREATE INDEX idx1_census2 on census2(year);
CREATE INDEX idx2_census2 on census2(age);
CREATE INDEX idx3_census2 on census2(ethnic);

CREATE or REPLACE FUNCTION census2_afterinsert()
RETURNS TRIGGER
AS $$
BEGIN
  INSERT INTO census3  SELECT * FROM census2 limit 1;
  RETURN NEW;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER census2_trigger AFTER INSERT  ON census2 FOR EACH ROW EXECUTE PROCEDURE census2_afterinsert();
COPY census2 FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);

Test2 (Table with 2 indexes):
CREATE TABLE census1 (year int,age int,ethnic int,sex int,area text,count text);
CREATE INDEX idx1_census1 on census1(year);
CREATE INDEX idx2_census1 on census1(age);
COPY census1 FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);

Test3 (Table without indexes/triggers):
CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count text);
COPY census FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);

Random open data set from the web, about 800MB of narrow rows CSV [1] was used in the above tests, the same which Ants Aasma had used.

Test4 (Toast table):
CREATE TABLE indtoasttest(descr text, cnt int DEFAULT 0, f1 text, f2 text);
alter table indtoasttest alter column f1 set storage external;
alter table indtoasttest alter column f2 set storage external;
inserted 262144 records
copy indtoasttest to '/mnt/magnetic/vignesh.c/postgres/toast_data3.csv'  WITH (FORMAT 'csv', HEADER true);

CREATE TABLE indtoasttest1(descr text, cnt int DEFAULT 0, f1 text, f2 text);
alter table indtoasttest1 alter column f1 set storage external;
alter table indtoasttest1 alter column f2 set storage external;
copy indtoasttest1 from '/mnt/magnetic/vignesh.c/postgres/toast_data3.csv'  WITH (FORMAT 'csv', HEADER true);

We could infer that Read line Time cannot be parallelized, this is mainly because if the data has quote present we will not be able to differentiate if it is part of previous record or it is part of current record. The rest of the execution time can be parallelized. Read line Time takes about 0.5%, 1.5%, 9.8% & 12.9% of the total time. We could parallelize the remaining  phases of the copy. The performance improvement will vary based on the scenario(indexes/triggers), it will be proportionate to the number of indexes and triggers. Read line can also be parallelized in txt format(non csv). We feel parallelize copy could give significant improvement in many scenarios.

Attached patch for reference which was used to capture the execution time breakup.

Thoughts?

Regards,
Vignesh

On Tue, Mar 3, 2020 at 11:44 AM vignesh C <vignesh21@gmail.com> wrote:
On Wed, Feb 26, 2020 at 8:47 PM Ants Aasma <ants@cybertec.at> wrote:
>
> On Tue, 25 Feb 2020 at 18:00, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
> > so I still think we need to do some measurements first. I'm willing to
> > do that, but (a) I doubt I'll have time for that until after 2020-03,
> > and (b) it'd be good to agree on some set of typical CSV files.
>
> I agree that getting a nice varied dataset would be nice. Including
> things like narrow integer only tables, strings with newlines and
> escapes in them, extremely wide rows.
>
> I tried to capture a quick profile just to see what it looks like.
> Grabbed a random open data set from the web, about 800MB of narrow
> rows CSV [1].
>
> Script:
> CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count text);
> COPY census FROM '.../Data8277.csv' WITH (FORMAT 'csv', HEADER true);
>
> Profile:
> # Samples: 59K of event 'cycles:u'
> # Event count (approx.): 57644269486
> #
> # Overhead  Command   Shared Object       Symbol
> # ........  ........  ..................
> .......................................
> #
>     18.24%  postgres  postgres            [.] CopyReadLine
>      9.23%  postgres  postgres            [.] NextCopyFrom
>      8.87%  postgres  postgres            [.] NextCopyFromRawFields
>      5.82%  postgres  postgres            [.] pg_verify_mbstr_len
>      5.45%  postgres  postgres            [.] pg_strtoint32
>      4.16%  postgres  postgres            [.] heap_fill_tuple
>      4.03%  postgres  postgres            [.] heap_compute_data_size
>      3.83%  postgres  postgres            [.] CopyFrom
>      3.78%  postgres  postgres            [.] AllocSetAlloc
>      3.53%  postgres  postgres            [.] heap_form_tuple
>      2.96%  postgres  postgres            [.] InputFunctionCall
>      2.89%  postgres  libc-2.30.so        [.] __memmove_avx_unaligned_erms
>      1.82%  postgres  libc-2.30.so        [.] __strlen_avx2
>      1.72%  postgres  postgres            [.] AllocSetReset
>      1.72%  postgres  postgres            [.] RelationPutHeapTuple
>      1.47%  postgres  postgres            [.] heap_prepare_insert
>      1.31%  postgres  postgres            [.] heap_multi_insert
>      1.25%  postgres  postgres            [.] textin
>      1.24%  postgres  postgres            [.] int4in
>      1.05%  postgres  postgres            [.] tts_buffer_heap_clear
>      0.85%  postgres  postgres            [.] pg_any_to_server
>      0.80%  postgres  postgres            [.] pg_comp_crc32c_sse42
>      0.77%  postgres  postgres            [.] cstring_to_text_with_len
>      0.69%  postgres  postgres            [.] AllocSetFree
>      0.60%  postgres  postgres            [.] appendBinaryStringInfo
>      0.55%  postgres  postgres            [.] tts_buffer_heap_materialize.part.0
>      0.54%  postgres  postgres            [.] palloc
>      0.54%  postgres  libc-2.30.so        [.] __memmove_avx_unaligned
>      0.51%  postgres  postgres            [.] palloc0
>      0.51%  postgres  postgres            [.] pg_encoding_max_length
>      0.48%  postgres  postgres            [.] enlargeStringInfo
>      0.47%  postgres  postgres            [.] ExecStoreVirtualTuple
>      0.45%  postgres  postgres            [.] PageAddItemExtended
>
> So that confirms that the parsing is a huge chunk of overhead with
> current splitting into lines being the largest portion. Amdahl's law
> says that splitting into tuples needs to be made fast before
> parallelizing makes any sense.
>

I had taken perf report with the same test data that you had used, I was getting the following results:
.....
+   99.61%     0.00%  postgres  postgres            [.] PortalRun
+   99.61%     0.00%  postgres  postgres            [.] PortalRunMulti
+   99.61%     0.00%  postgres  postgres            [.] PortalRunUtility
+   99.61%     0.00%  postgres  postgres            [.] ProcessUtility
+   99.61%     0.00%  postgres  postgres            [.] standard_ProcessUtility
+   99.61%     0.00%  postgres  postgres            [.] DoCopy
+   99.30%     0.94%  postgres  postgres            [.] CopyFrom
+   51.61%     7.76%  postgres  postgres            [.] NextCopyFrom
+   23.66%     0.01%  postgres  postgres            [.] CopyMultiInsertInfoFlush
+   23.61%     0.28%  postgres  postgres            [.] CopyMultiInsertBufferFlush
+   21.99%     1.02%  postgres  postgres            [.] NextCopyFromRawFields
+   19.79%     0.01%  postgres  postgres            [.] table_multi_insert
+   19.32%     3.00%  postgres  postgres            [.] heap_multi_insert
+   18.27%     2.44%  postgres  postgres            [.] InputFunctionCall
+   15.19%     0.89%  postgres  postgres            [.] CopyReadLine
+   13.05%     0.18%  postgres  postgres            [.] ExecMaterializeSlot
+   13.00%     0.55%  postgres  postgres            [.] tts_buffer_heap_materialize
+   12.31%     1.77%  postgres  postgres            [.] heap_form_tuple
+   10.43%     0.45%  postgres  postgres            [.] int4in
+   10.18%     8.92%  postgres  postgres            [.] CopyReadLineText 
......

In my results I observed execution table_multi_insert was nearly 20%. Also I felt like once we have made few tuples from CopyReadLine, the parallel workers should be able to start consuming the data and process the data. We need not wait for the complete tokenisation to be finished. Once few tuples are tokenised parallel workers should start consuming the data parallelly and tokenisation should happen simultaneously. In this way once the copy is done parallelly total execution time should be CopyReadLine Time + delta processing time. 

Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: DROP and ddl_command_end.
Next
From: Pavel Stehule
Date:
Subject: Re: SQL/JSON: functions