13.08.2019 18:45, Anastasia Lubennikova wrote:
I also added a nearby FIXME comment to
_bt_insertonpg_in_posting() -- I don't think think that the code for
splitting a posting list in two is currently crash-safe.
Good catch. It seems, that I need to rearrange the code.
I'll send updated patch this week.
Attached is v7.
In this version of the patch, I heavily refactored the code of insertion into
posting tuple. bt_split logic is quite complex, so I omitted a couple of
optimizations. They are mentioned in TODO comments.
Now the algorithm is the following:
- If bt_findinsertloc() found out that tuple belongs to existing posting tuple's
TID interval, it sets 'in_posting_offset' variable and passes it to
_bt_insertonpg()
- If 'in_posting_offset' is valid and origtup is valid,
merge our itup into origtup.
It can result in one tuple neworigtup, that must replace origtup; or two tuples:
neworigtup and newrighttup, if the result exceeds BTMaxItemSize,
- If two new tuple(s) fit into the old page, we're lucky.
call _bt_delete_and_insert(..., neworigtup, newrighttup, newitemoff) to
atomically replace oldtup with new tuple(s) and generate xlog record.
- In case page split is needed, pass both tuples to _bt_split().
_bt_findsplitloc() is now aware of upcoming replacement of origtup with
neworigtup, so it uses correct item size where needed.
It seems that now all replace operations are crash-safe. The new patch passes
all regression tests, so I think it's ready for review again.
In the meantime, I'll run more stress-tests.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company