RE: Locking B-tree leafs immediately in exclusive mode - Mailing list pgsql-hackers
From | Imai, Yoshikazu |
---|---|
Subject | RE: Locking B-tree leafs immediately in exclusive mode |
Date | |
Msg-id | 0F97FA9ABBDBE54F91744A9B37151A51186E9D@g01jpexmbkw24 Whole thread Raw |
In response to | Re: Locking B-tree leafs immediately in exclusive mode (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Responses |
Re: Locking B-tree leafs immediately in exclusive mode
Re: Locking B-tree leafs immediately in exclusive mode RE: Locking B-tree leafs immediately in exclusive mode |
List | pgsql-hackers |
On Mon, June 11, 2018 at 4:31 PM, Alexander Korotkov wrote: > On Mon, Jun 11, 2018 at 1:06 PM Simon Riggs <simon@2ndquadrant.com> wrote: > > On 5 June 2018 at 14:45, Alexander Korotkov <a.korotkov@postgrespro.ru> > > wrote: > > > Currently _bt_search() always locks leaf buffer in shared mode (aka > > > BT_READ), while caller can relock it later. However, I don't see > > > what prevents > > > _bt_search() > > > from locking leaf immediately in exclusive mode (aka BT_WRITE) when > > > required. > > > When traversing downlink from non-leaf page of level 1 (lowest level > > > of non-leaf pages just prior to leaf pages), we know that next page > > > is going to be leaf. In this case, we can immediately lock next page > > > in exclusive mode if required. > > > For sure, it might happen that we didn't manage to exclusively lock > > > leaf in this way when _bt_getroot() points us to leaf page. But > > > this case could be handled in _bt_search() by relock. Please, find > > > implementation of this approach in the attached patch. > > > > It's a good idea. How does it perform with many duplicate entries? > > Our B-tree is currently maintaining duplicates unordered. So, during > insertion we can traverse rightlinks in order to find page, which would > fit new index tuple. > However, in this case we're traversing pages in exclusive lock mode, and > that happens already after re-lock. _bt_search() doesn't do anything > with that. > So, I assume there shouldn't be any degradation in the case of many > duplicate entries. > > But this case definitely worth testing, and I'm planning to do it. > Hi, I'm reviewing this. Firstly, I did performance tests on 72-cores machines(AWS c5.18xlarge) same as you did. > # postgresql.conf > max_connections = 300 > shared_buffers = 32GB > fsync = off > synchronous_commit = off > > > > # DDL: > CREATE UNLOGGED TABLE ordered (id serial primary key, value text not null); > CREATE UNLOGGED TABLE unordered (i integer not null, value text not null); > > > > # script_ordered.sql > INSERT INTO ordered (value) VALUES ('abcdefghijklmnoprsqtuvwxyz'); > > > > # script_unordered.sql > \set i random(1, 1000000) > INSERT INTO unordered VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz'); > > > > # commands > pgbench -T 60 -P 1 -M prepared -f script_ordered.sql -c 150 -j 150 postgres > pgbench -T 60 -P 1 -M prepared -f script_unordered.sql -c 150 -j 150 postgres > > > > # results > ordered, master: 157473 TPS > ordered, patched 231374 TPS > unordered, master: 232372 TPS > unordered, patched: 232535 TPS # my results ordered, master: 186045 TPS ordered, patched: 265842 TPS unordered, master: 313011 TPS unordered, patched: 309636 TPS I confirmed that this patch improves ordered insertion. In addition to your tests, I did the following tests with many duplicate entries # DDL CREATE UNLOGGED TABLE duplicated (i integer not null, value text not null); # script_duplicated.sql INSERT INTO unordered VALUES (1, 'abcdefghijklmnoprsqtuvwxyz'); # commands pgbench -T 60 -P 1 -M prepared -f script_duplicated.sql -c 150 -j 150 postgres # results duplicated, master: 315450 TPS duplicated, patched: 311584 TPS It seems there are almostly no performance degression in case of many duplicated entries. I'm planning to do code review and send it in the next mail. Yoshikazu Imai
pgsql-hackers by date: