Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w@mail.gmail.com Whole thread Raw |
In response to | RE: Parallel Inserts in CREATE TABLE AS ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>) |
Responses |
RE: Parallel Inserts in CREATE TABLE AS
|
List | pgsql-hackers |
On Tue, May 25, 2021 at 12:05 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > I noticed one place which could be one of the reasons that cause the performance degradation. > > + /* > + * We don't need to skip contacting FSM while inserting tuples for > + * parallel mode, while extending the relations, workers instead of > + * blocking on a page while another worker is inserting, can check the > + * FSM for another page that can accommodate the tuples. This results > + * in major benefit for parallel inserts. > + */ > + myState->ti_options = 0; > > I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring performance gain. > In my test environment, if I change this code to use option " TABLE_INSERT_SKIP_FSM ", then there > seems no performance degradation . Could you please have a try on it ? > (I test with the SQL you provided earlier[1]) Thanks for trying that out. Please see the code around the use_fsm flag in RelationGetBufferForTuple for more understanding of the points below. What happens if FSM is skipped i.e. myState->ti_options = TABLE_INSERT_SKIP_FSM;? 1) The flag use_fsm will be false in heap_insert->RelationGetBufferForTuple. 2) Each worker initially gets a block and keeps inserting into it until it is full. When the block is full, the worker doesn't look in FSM GetPageWithFreeSpace as use_fsm is false. It directly goes for relation extension and tries to acquire relation extension lock with LockRelationForExtension. Note that the bulk extension of blocks with RelationAddExtraBlocks is not reached as use_fsm is false. 3) After acquiring the relation extension lock, it adds an extra new block with ReadBufferBI(relation, P_NEW, ...), see the comment "In addition to whatever extension we performed above, we always add at least one block to satisfy our own request." The tuple is inserted into this new block. Basically, the workers can't look for the empty pages from the pages added by other workers, they keep doing the above steps in silos. What happens if FSM is not skipped i.e. myState->ti_options = 0;? 1) The flag use_fsm will be true in heap_insert->RelationGetBufferForTuple. 2) Each worker initially gets a block and keeps inserting into it until it is full. When the block is full, the worker looks for the page with free space in FSM GetPageWithFreeSpace as use_fsm is true. If it can't find any page with the required amount of free space, it goes for bulk relation extension(RelationAddExtraBlocks) after acquiring relation extension lock with ConditionalLockRelationForExtension. Then the worker adds extraBlocks = Min(512, lockWaiters * 20); new blocks in RelationAddExtraBlocks and immediately updates the bottom level of FSM for each block (see the comment around RecordPageWithFreeSpace for why only the bottom level, not the entire FSM tree). After all the blocks are added, then it updates the entire FSM tree FreeSpaceMapVacuumRange. 4) After the bulk extension, then the worker adds another block see the comment "In addition to whatever extension we performed above, we always add at least one block to satisfy our own request." and inserts tuple into this new block. Basically, the workers can benefit from the bulk extension of the relation and they always can look for the empty pages from the pages added by other workers. There are high chances that the blocks will be available after bulk extension. Having said that, if the added extra blocks are consumed by the workers so fast i.e. if the tuple sizes are big i.e very less tuples per page, then, the bulk extension too can't help much and there will be more contention on the relation extension lock. Well, one might think to add more blocks at a time, say Min(1024, lockWaiters * 128/256/512) than currently extraBlocks = Min(512, lockWaiters * 20);. This will work (i.e. we don't see any regression with parallel inserts in CTAS patches), but it can't be a practical solution. Because the total pages for the relation will be more with many pages having more free space. Furthermore, the future sequential scans on that relation might take a lot of time. If myState->ti_options = TABLE_INSERT_SKIP_FSM; in only the place(within if (myState->is_parallel)), then it will be effective for leader i.e. leader will not look for FSM, but all the workers will, because within if (myState->is_parallel_worker) in intorel_startup, myState->ti_options = 0; for workers. I ran tests with configuration shown at [1] for the case 4 (2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each) columns, tuple size 1064 bytes, 10mn tuples) with leader participation where I'm seeing regression: 1) when myState->ti_options = TABLE_INSERT_SKIP_FSM; for both leader and workers, then my results are as follows: 0 workers - 116934.137, 2 workers - 209802.060, 4 workers - 248580.275 2) when myState->ti_options = 0; for both leader and workers, then my results are as follows: 0 workers - 1116184.718, 2 workers - 139798.055, 4 workers - 143022.409 I hope the above explanation and the test results should clarify the fact that skipping FSM doesn't solve the problem. Let me know if anything is not clear or I'm missing something. [1] postgresql.conf parameters used: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off port = 5440 System Configuration: RAM: 528GB Disk Type: SSD Disk Size: 1.5TB lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 47 Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz Stepping: 2 CPU MHz: 1064.000 CPU max MHz: 2129.0000 CPU min MHz: 1064.0000 BogoMIPS: 4266.62 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 24576K With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: