Re: Parallel copy - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel copy
Date
Msg-id CALj2ACWrQz-=PWc0e5QOwetVNoBOaOTKvTWyz4=2y0=NVOcOcg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 2. Do we have tests for toast tables? I think if you implement the
> > previous point some existing tests might cover it but I feel we should
> > have at least one or two tests for the same.
> >
> Toast table use case 1: 10000 tuples, 9.6GB data, 3 indexes 2 on integer columns, 1 on text column(not the toast column), csv file, each row is > 1320KB:
> (222.767, 0, 1X), (134.171, 1, 1.66X), (93.749, 2, 2.38X), (93.672, 4, 2.38X), (94.827, 8, 2.35X), (93.766, 16, 2.37X), (98.153, 20, 2.27X), (122.721, 30, 1.81X)
>
> Toast table use case 2: 100000 tuples, 96GB data, 3 indexes 2 on integer columns, 1 on text column(not the toast column), csv file, each row is > 1320KB:
> (2255.032, 0, 1X), (1358.628, 1, 1.66X), (901.170, 2, 2.5X), (912.743, 4, 2.47X), (988.718, 8, 2.28X), (938.000, 16, 2.4X), (997.556, 20, 2.26X), (1000.586, 30, 2.25X)
>
> Toast table use case3: 10000 tuples, 9.6GB, no indexes, binary file, each row is > 1320KB:
> (136.983, 0, 1X), (136.418, 1, 1X), (81.896, 2, 1.66X), (62.929, 4, 2.16X), (52.311, 8, 2.6X), (40.032, 16, 3.49X), (44.097, 20, 3.09X), (62.310, 30, 2.18X)
>
> In the case of a Toast table, we could achieve upto 2.5X for csv files, and 3.5X for binary files. We are analyzing this point and will post an update on our findings soon.
>

I analyzed the above point of getting only upto 2.5X performance improvement for csv files with a toast table with 3 indexers - 2 on integer columns and 1 on text column(not the toast column). Reason is that workers are fast enough to do the work and they are waiting for the leader to fill in the data blocks and in this case the leader is able to serve the workers at its maximum possible speed. Hence most of the time the workers are waiting not doing any beneficial work.

Having observed the above point, I tried to make workers perform more work to avoid waiting time. For this, I added a gist index on the toasted text column. The use and results are as follows.

Toast table use case4: 10000 tuples, 9.6GB, 4 indexes - 2 on integer columns, 1 on non-toasted text column and 1 gist index on toasted text column, csv file, each row is  ~ 12.2KB:

(1322.839, 0, 1X), (1261.176, 1, 1.05X), (632.296, 2, 2.09X), (321.941, 4, 4.11X), (181.796, 8, 7.27X), (105.750, 16, 12.51X), (107.099, 20, 12.35X), (123.262, 30, 10.73X)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: speed up unicode normalization quick check
Next
From: Peter Eisentraut
Date:
Subject: Re: dynamic result sets support in extended query protocol