Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | CALj2ACUyzFO_VPbrAjJeAf09pdAOmc1+ED33MQ3o4QfyfNzADw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Inserts in CREATE TABLE AS (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
RE: Parallel Inserts in CREATE TABLE AS
|
List | pgsql-hackers |
On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > I analyzed performance of parallel inserts in CTAS for different cases > > with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could > > gain if the tuple sizes are lower. But if the tuple size is larger > > i..e 1064bytes, there's a regression with parallel inserts. Upon > > further analysis, it turned out that the parallel workers are > > requiring frequent extra blocks addition while concurrently extending > > the relation(in RelationAddExtraBlocks) and the majority of the time > > spent is going into flushing those new empty pages/blocks onto the > > disk. > > > > How you have ensured that the cost is due to the flushing of pages? > AFAICS, we don't flush the pages rather just write them and then > register those to be flushed by checkpointer, now it is possible that > the checkpointer sync queue gets full and the backend has to write by > itself but have we checked that? I think we can check via wait events, > if it is due to flush then we should see a lot of file sync > (WAIT_EVENT_DATA_FILE_SYNC) wait events. The other possibility could > be that the free pages added to FSM by one worker are not being used > by another worker due to some reason. Can we debug and check if the > pages added by one worker are being used by another worker? Thanks! I will work on the above points sometime later. BTW, I forgot to mention one point earlier that we see a benefit without parallelism if only multi inserts are used for CTAS instead of single inserts. See [2] for more testing results. I used "New Table Access Methods for Multi and Single Inserts" patches from [1] for this testing. I think it's a good idea to revisit that work. [1] - https://www.postgresql.org/message-id/CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg%40mail.gmail.com [2] case 1 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples on master - 130sec on master with multi inserts - 105sec, gain - 1.23X on parallel CTAS patch without multi inserts - (2 workers, 82sec, 1.58X), (4 workers, 83sec, 1.56X) on parallel CTAS patch with multi inserts - (2 workers, 45sec, 2.33X, overall gain if seen from master 2.88X), (4 workers, 33sec, 3.18X, overall gain if seen from master 3.9X) case 2 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size 59 bytes, 100mn tuples on master - 185sec on master with multi inserts - 121sec, gain - 1.52X on parallel CTAS patch without multi inserts - (2 workers, 120sec, 1.54X), (4 workers, 123sec, 1.5X) on parallel CTAS patch with multi inserts - (2 workers, 68sec, 1.77X, overall gain if seen from master 2.72X), (4 workers, 61sec, 1.98X, overall gain if seen from master 3.03X) Above two cases are the best cases with tuple size a few bytes where parallel CTAS + multi inserts would give up to 3.9X and 3.03X benefits. case 3 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each) columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples on master - 367sec on master with multi inserts - 291sec, gain - 1.26X on parallel CTAS patch without multi inserts - (2 workers, 334sec, 1.09X), (4 workers, 336sec, 1.09X) on parallel CTAS patch with multi inserts - (2 workers, 284sec, 1.02X, overall gain if seen from master 1.29X), (4 workers, 278sec, 1.04X, overall gain if seen from master 1.32X) Above case where tuple size is 241 bytes, we don't gain much. case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each) columns, tuple size 1064 bytes, 10mn tuples on master - 120sec on master with multi inserts - 115sec, gain - 1.04X on parallel CTAS patch without multi inserts - (2 workers, 140sec, 0.85X), (4 workers, 142sec, 0.84X) on parallel CTAS patch with multi inserts - (2 workers, 133sec, 0.86X, overall loss if seen from master 0.9X), (4 workers, 134sec, 0.85X, overall loss if seen from master 0.89X) Above case where tuple size is 1064 bytes, we gain very little with multi inserts and with parallel inserts we cause regression. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: