RE: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | houzj.fnst@fujitsu.com |
---|---|
Subject | RE: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | OS0PR01MB5716B20F085CCA5104D0BA8E94259@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
RE: Parallel Inserts in CREATE TABLE AS
RE: Parallel Inserts in CREATE TABLE AS Re: Parallel Inserts in CREATE TABLE AS |
List | pgsql-hackers |
Hi Bharath-san, From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Sent: Friday, May 21, 2021 6:49 PM > > On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > > > > I analyzed performance of parallel inserts in CTAS for different > > > cases with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We > > > could gain if the tuple sizes are lower. But if the tuple size is > > > larger i..e 1064bytes, there's a regression with parallel inserts. > > > Upon further analysis, it turned out that the parallel workers are > > > requiring frequent extra blocks addition while concurrently > > > extending the relation(in RelationAddExtraBlocks) and the majority > > > of the time spent is going into flushing those new empty > > > pages/blocks onto the disk. > > > > > > > How you have ensured that the cost is due to the flushing of pages? > > AFAICS, we don't flush the pages rather just write them and then > > register those to be flushed by checkpointer, now it is possible that > > the checkpointer sync queue gets full and the backend has to write by > > itself but have we checked that? I think we can check via wait events, > > if it is due to flush then we should see a lot of file sync > > (WAIT_EVENT_DATA_FILE_SYNC) wait events. The other possibility could > > be that the free pages added to FSM by one worker are not being used > > by another worker due to some reason. Can we debug and check if the > > pages added by one worker are being used by another worker? > > Thanks! I will work on the above points sometime later. I noticed one place which could be one of the reasons that cause the performance degradation. + /* + * We don't need to skip contacting FSM while inserting tuples for + * parallel mode, while extending the relations, workers instead of + * blocking on a page while another worker is inserting, can check the + * FSM for another page that can accommodate the tuples. This results + * in major benefit for parallel inserts. + */ + myState->ti_options = 0; I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring performance gain. In my test environment, if I change this code to use option " TABLE_INSERT_SKIP_FSM ", then there seems no performance degradation . Could you please have a try on it ? (I test with the SQL you provided earlier[1]) [1] https://www.postgresql.org/message-id/CALj2ACWFvNm4d_uqT2iECPqaXZjEd-O%2By8xbghvqXeMLj0pxGw%40mail.gmail.com Best regards, houzj
pgsql-hackers by date: