RE: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers
From | Jakub Wartak |
---|---|
Subject | RE: Use simplehash.h instead of dynahash in SMgr |
Date | |
Msg-id | VI1PR0701MB696001353B6E41166E43BEC8F6599@VI1PR0701MB6960.eurprd07.prod.outlook.com Whole thread Raw |
In response to | Re: Use simplehash.h instead of dynahash in SMgr (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Use simplehash.h instead of dynahash in SMgr
|
List | pgsql-hackers |
Hey David, > I think you'd have to batch by filenode and transaction in that case. Each batch might be pretty small on a typical OLTPworkload, so it might not help much there, or it might hinder. True, it is very workload dependent (I was chasing mainly INSERTs multiValues, INSERT-SELECT) that often hit the same $block,certainly not OLTP. I would even say that INSERT-as-SELECT would be more suited for DWH-like processing. > But having said that, I don't think any of those possibilities should stop us speeding up smgropen(). Of course! I've tried a couple of much more smaller ideas, but without any big gains. I was able to squeeze like 300-400kfunction calls per second (WAL records/s), that was the point I think where I think smgropen() got abused. > > Another potential option that we've discussed is that the redo generation > itself is likely a brake of efficient recovery performance today (e.g. INSERT- > SELECT on table with indexes, generates interleaved WAL records that touch > often limited set of blocks that usually put Smgr into spotlight). > > I'm not quite sure if I understand what you mean here. Is this queuing up > WAL records up during transactions and flush them out to WAL every so > often after rearranging them into an order that's more optimal for replay? Why not both? 😉 We were very concentrated on standby side, but on primary side one could also change how WAL records aregenerated: 1) Minimalization of records towards same repeated $block eg. Heap2 table_multi_insert() API already does this and it mattersto generate more optimal stream for replay: postgres@test=# create table t (id bigint primary key); postgres@test=# insert into t select generate_series(1, 10); results in many calls due to interleave heap with btree records for the same block from Smgr perspective (this is especiallyvisible on highly indexed tables) => rmgr: Btree len (rec/tot): 64/ 64, tx: 17243284, lsn: 4/E7000108, prev 4/E70000A0, desc: INSERT_LEAF off 1,blkref #0: rel 1663/16384/32794 blk 1 rmgr: Heap len (rec/tot): 63/ 63, tx: 17243284, lsn: 4/E7000148, prev 4/E7000108, desc: INSERT off 2 flags0x00, blkref #0: rel 1663/16384/32791 blk 0 rmgr: Btree len (rec/tot): 64/ 64, tx: 17243284, lsn: 4/E7000188, prev 4/E7000148, desc: INSERT_LEAF off 2,blkref #0: rel 1663/16384/32794 blk 1 rmgr: Heap len (rec/tot): 63/ 63, tx: 17243284, lsn: 4/E70001C8, prev 4/E7000188, desc: INSERT off 3 flags0x00, blkref #0: rel 1663/16384/32791 blk 0 rmgr: Btree len (rec/tot): 64/ 64, tx: 17243284, lsn: 4/E7000208, prev 4/E70001C8, desc: INSERT_LEAF off 3,blkref #0: rel 1663/16384/32794 blk 1 rmgr: Heap len (rec/tot): 63/ 63, tx: 17243284, lsn: 4/E7000248, prev 4/E7000208, desc: INSERT off 4 flags0x00, blkref #0: rel 1663/16384/32791 blk 0 rmgr: Btree len (rec/tot): 64/ 64, tx: 17243284, lsn: 4/E7000288, prev 4/E7000248, desc: INSERT_LEAF off 4,blkref #0: rel 1663/16384/32794 blk 1 rmgr: Heap len (rec/tot): 63/ 63, tx: 17243284, lsn: 4/E70002C8, prev 4/E7000288, desc: INSERT off 5 flags0x00, blkref #0: rel 1663/16384/32791 blk 0 [..] Similar stuff happens for UPDATE. It basically prevents recent-buffer optimization that avoid repeated calls to smgropen(). And here's already existing table_multi_inserts v2 API (Heap2) sample with obvious elimination of unnecessary individualcalls to smgopen() via one big MULTI_INSERT instead (for CTAS/COPY/REFRESH MV) : postgres@test=# create table t (id bigint primary key); postgres@test=# copy (select generate_series (1, 10)) to '/tmp/t'; postgres@test=# copy t from '/tmp/t'; => rmgr: Heap2 len (rec/tot): 210/ 210, tx: 17243290, lsn: 4/E9000028, prev 4/E8004410, desc: MULTI_INSERT+INIT10 tuples flags 0x02, blkref #0: rel 1663/16384/32801 blk 0 rmgr: Btree len (rec/tot): 102/ 102, tx: 17243290, lsn: 4/E9000100, prev 4/E9000028, desc: NEWROOT lev 0, blkref#0: rel 1663/16384/32804 blk 1, blkref #2: rel 1663/16384/32804 blk 0 rmgr: Btree len (rec/tot): 64/ 64, tx: 17243290, lsn: 4/E9000168, prev 4/E9000100, desc: INSERT_LEAF off 1,blkref #0: rel 1663/16384/32804 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 17243290, lsn: 4/E90001A8, prev 4/E9000168, desc: INSERT_LEAF off 2,blkref #0: rel 1663/16384/32804 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 17243290, lsn: 4/E90001E8, prev 4/E90001A8, desc: INSERT_LEAF off 3,blkref #0: rel 1663/16384/32804 blk 1 [..] Here Btree it is very localized (at least when concurrent sessions are not generating WAL) and it enables Thomas's recent-bufferto kick in DELETE is much more simple (thanks to not chewing out those Btree records) and also thanks to Thomas's recent-buffer shouldtheoretically put much less stress on smgropen() already: rmgr: Heap len (rec/tot): 54/ 54, tx: 17243296, lsn: 4/ED000028, prev 4/EC002800, desc: DELETE off 1 flags0x00 KEYS_UPDATED , blkref #0: rel 1663/16384/32808 blk 0 rmgr: Heap len (rec/tot): 54/ 54, tx: 17243296, lsn: 4/ED000060, prev 4/ED000028, desc: DELETE off 2 flags0x00 KEYS_UPDATED , blkref #0: rel 1663/16384/32808 blk 0 rmgr: Heap len (rec/tot): 54/ 54, tx: 17243296, lsn: 4/ED000098, prev 4/ED000060, desc: DELETE off 3 flags0x00 KEYS_UPDATED , blkref #0: rel 1663/16384/32808 blk 0 rmgr: Heap len (rec/tot): 54/ 54, tx: 17243296, lsn: 4/ED0000D0, prev 4/ED000098, desc: DELETE off 4 flags0x00 KEYS_UPDATED , blkref #0: rel 1663/16384/32808 blk 0 [..] 2) So what's missing - I may be wrong on this one - something like "index_multi_inserts" Btree2 API to avoid repeatedly overwhelmingsmgropen() on recovery side for same index's $buffer. Not sure it is worth the effort, though especially recent-bufferfixes that: rmgr: Btree len (rec/tot): 64/ 64, tx: 17243290, lsn: 4/E9000168, prev 4/E9000100, desc: INSERT_LEAF off 1,blkref #0: rel 1663/16384/32804 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 17243290, lsn: 4/E90001A8, prev 4/E9000168, desc: INSERT_LEAF off 2,blkref #0: rel 1663/16384/32804 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 17243290, lsn: 4/E90001E8, prev 4/E90001A8, desc: INSERT_LEAF off 3,blkref #0: rel 1663/16384/32804 blk 1 right? 3) Concurrent DML sessions mixing WAL records: the buffering on backend's side of things (on private "thread" of WAL - inprivate memory - that would be simply "copied" into logwriter's main WAL buffer when committing/buffer full) - it wouldseem like an very interesting idea to limit interleaving concurrent sessions WAL records between each other and exploitthe recent-buffer enhancement to avoid repeating the same calls to Smgr, wouldn't it? (I'm just mentioning it as Isaw you were benchmarking it here and called out this idea). I could be wrong though with many of those simplifications, in any case please consult with Thomas as he knows much betterand is much more trusted source than me 😉 -J.
pgsql-hackers by date: