Thread: Corrupted btree index on HEAD because of covering indexes
Hi all, I was just testing the VACUUM truncation logic, and bumped into what looks like a corrupted btree index. Here is a reproducer: create table aa (a int primary key, b bool); insert into aa values (generate_series(1,1000000), false); checkpoint; update aa set b = false where a > 500000; -- Dirties a set of shared buffers delete from aa where a > 750000; -- Delete a set of rows vacuum aa; delete from aa where a > 10; vacuum aa; -- error on btree with right sibling And here is the actual failure when the second vacuum: ERROR: XX000: right sibling 4132 of block 2128 is not next child 5396 of block 412 in index "aa_pkey" LOCATION: _bt_mark_page_halfdead, nbtpage.c:1564 This works on REL_10_STABLE, so I am adding an open item. I have not investigated the exact problem yet, but bisect is showing me covering indexes as the culprit (8224de4). Thanks, -- Michael
Attachment
Will see... Michael Paquier wrote: > Hi all, > > I was just testing the VACUUM truncation logic, and bumped into what > looks like a corrupted btree index. Here is a reproducer: > create table aa (a int primary key, b bool); > insert into aa values (generate_series(1,1000000), false); > checkpoint; > update aa set b = false where a > 500000; -- Dirties a set of shared > buffers > delete from aa where a > 750000; -- Delete a set of rows > vacuum aa; > delete from aa where a > 10; > vacuum aa; -- error on btree with right sibling > > And here is the actual failure when the second vacuum: > ERROR: XX000: right sibling 4132 of block 2128 is not next child 5396 of block 412 in index "aa_pkey" > LOCATION: _bt_mark_page_halfdead, nbtpage.c:1564 > > This works on REL_10_STABLE, so I am adding an open item. I have not > investigated the exact problem yet, but bisect is showing me covering > indexes as the culprit (8224de4). > > Thanks, > -- > Michael > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Wed, Apr 18, 2018 at 10:29 PM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Will see... I'll take a look tomorrow. -- Peter Geoghegan
> I'll take a look tomorrow. Interesting, contrib/amcheck doesn't find any error in index. Seems, it's subject for further improvement. Nevertheless, seems, I found. In _bt_mark_page_halfdead() we use truncated high key IndexTuple as a storage of blocknumber of top parent to remove. And sets BTreeTupleSetNAtts(&trunctuple, 0) - it's stored in ip_posid. But some later, in _bt_unlink_halfdead_page() we check ItemPointer high key with ItemPointerIsValid macro - and it returns false, because offset is actually InvalidOffsetNumber - i.e. 0 which was set by BTreeTupleSetNAtts. And some wrong decisions are follows, I didn't look at that. Trivial and naive fix is attached, but for me it looks a bit annoing that we store pointer (leafhikey) somewhere inside unlocked page. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On Thu, Apr 19, 2018 at 9:42 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Interesting, contrib/amcheck doesn't find any error in index. Seems, it's > subject for further improvement. I think that you're right that this should be detectable by bt_index_parent_check(). I have an idea about how we can make this happen with low overhead: Iff we're readonly (that is, called through bt_index_parent_check()), use a bloom filter to fingerprint downlink block numbers on each non-leaf level, starting from the root. On the next level down, check that we encountered a downlink at the parent level for non-P_IGNORE() pages. If we didn't, throw an error. This should only need a small Bloom filter to work well. I suppose we could only do it when "heapallindexed = true" was specified to bt_index_parent_check(), and size the Bloom filter based on the same principle as heapallindexed verification. This check is correct because the downlink in the parent is atomically removed when the page (that the downlink points to) is marked half-dead by VACUUM (and because there cannot be a concurrent VACUUM, since we have a ShareLock on the relation). If we find a non-P_IGNORE() page whose block number was not fingerprinted one level up, then something must be wrong. (We should also throw an error within bt_downlink_check() if the page turns out to be P_IGNORE(), to make our testing "symmetrical".) The really scary thing about corruption like this is not that it leads to wrong answers to queries. It's that it *doesn't* lead to wrong answers to queries (I mentioned this issue to Alexander a few weeks ago, actually). This is true because we'll move right one level down, maybe more than once, because it just looks like there was a concurrent page split to index scans. (There is actually another subtlety about this new bt_index_parent_check() check, which is legitimate incomplete page splits that we detect using P_INCOMPLETE_SPLIT(). The proposed check still seems totally doable, though.) I would like to go and implement this check now, and if all goes well I may propose that you commit it to the master branch for v11. I don't see this as a new feature. I see it as essential testing infrastructure. What do you think about adding this new check soon? > Nevertheless, seems, I found. In _bt_mark_page_halfdead() we use truncated > high key IndexTuple as a storage of blocknumber of top parent to remove. And > sets BTreeTupleSetNAtts(&trunctuple, 0) - it's stored in ip_posid. > > But some later, in _bt_unlink_halfdead_page() we check ItemPointer high key > with ItemPointerIsValid macro - and it returns false, because offset is > actually InvalidOffsetNumber - i.e. 0 which was set by BTreeTupleSetNAtts. > And some wrong decisions are follows, I didn't look at that. I'm not at all surprised. I strongly suspected that it was some simple issue with the representation, since the INCLUDE patch didn't actually change the page deletion algorithm in any way. > Trivial and naive fix is attached, but for me it looks a bit annoing that we > store pointer (leafhikey) somewhere inside unlocked page. Well, it has worked that way for a long time. No reason to change it now. I also think that we could have better conventional regression test coverage here. -- Peter Geoghegan
> I would like to go and implement this check now, and if all goes well > I may propose that you commit it to the master branch for v11. I don't > see this as a new feature. I see it as essential testing > infrastructure. What do you think about adding this new check soon? Agree. Hope, nobody found a way how to use amcheck module in production to serve user requests. But, it should be implemented before BETA stage, in opposite case we will get a lot of objections. > >> Nevertheless, seems, I found. In _bt_mark_page_halfdead() we use truncated >> high key IndexTuple as a storage of blocknumber of top parent to remove. And >> sets BTreeTupleSetNAtts(&trunctuple, 0) - it's stored in ip_posid. >> >> But some later, in _bt_unlink_halfdead_page() we check ItemPointer high key >> with ItemPointerIsValid macro - and it returns false, because offset is >> actually InvalidOffsetNumber - i.e. 0 which was set by BTreeTupleSetNAtts. >> And some wrong decisions are follows, I didn't look at that. > > I'm not at all surprised. I strongly suspected that it was some simple > issue with the representation, since the INCLUDE patch didn't actually > change the page deletion algorithm in any way. +1 > >> Trivial and naive fix is attached, but for me it looks a bit annoing that we >> store pointer (leafhikey) somewhere inside unlocked page. > > Well, it has worked that way for a long time. No reason to change it now. Agree, but at least this place needs a comment - why it's safe. > I also think that we could have better conventional regression test > coverage here. Will try to invent not so large test. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> I also think that we could have better conventional regression test > coverage here. I tried to minimize Michael's test case and add it to patch. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On Thu, Apr 19, 2018 at 10:47 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
I also think that we could have better conventional regression test
coverage here.
I tried to minimize Michael's test case and add it to patch.
- if (ItemPointerIsValid(leafhikey))
+ if (ItemPointerGetBlockNumberNoCheck(leafhikey) != InvalidBlockNumber)
Should we use BTreeInnerTupleGetDownLink() as soon as we use
BTreeInnerTupleSetDownLink() for setting this?
Or even invent BTreeInnerTupleDownLinkIsValid() macro?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Thu, Apr 19, 2018 at 2:00 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > - if (ItemPointerIsValid(leafhikey)) > + if (ItemPointerGetBlockNumberNoCheck(leafhikey) != InvalidBlockNumber) > > Should we use BTreeInnerTupleGetDownLink() as soon as we use > BTreeInnerTupleSetDownLink() for setting this? > Or even invent BTreeInnerTupleDownLinkIsValid() macro? +1 to at least using BTreeInnerTupleGetDownLink(). I think it's okay to use ItemPointerGetBlockNumberNoCheck() in amcheck, because it's inconvenient for amcheck to know if it's on the leaf level or not (it's rather low-context code). However, the nbtree code doesn't have that problem, and already has very few places that could possibly need ItemPointerGetBlockNumberNoCheck(). In fact, Teodor's fix touches the only remaining ItemPointerGetBlockNumberNoCheck() caller code within all of nbtree. In summary, I think that we should find a way to never use ItemPointerGetBlockNumberNoCheck() in nbtree, while still using it in amcheck. -- Peter Geoghegan
On Thu, Apr 19, 2018 at 11:59 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Agree. Hope, nobody found a way how to use amcheck module in production to > serve user requests. But, it should be implemented before BETA stage, in > opposite case we will get a lot of objections. It shouldn't take that long to get a patch together. I've already started. I think that any objections to pushing this amcheck enhancement after feature freeze are unlikely, for these reasons: * There is a precedent, since the integrity checking functions added to pg_visibility by commit e472ce96 were added well after feature freeze, and only 3 months before the release of 9.6. * Anyone paying attention to the ongoing "VM map freeze corruption" thread would have to agree that that was an extremely good idea. We're the ones that have to maintain this code, and I think that we're entitled to add a little extra C code to a contrib extension, in order to make testing easier. Including for beta testing. * The extra overhead is quite low, and will only affect bt_index_parent_check() (not bt_index_check()). * I deliberately created bt_index_parent_check() so that we'd have something that does very thorough testing, that's too slow and/or requires too heavy a relation-level lock to be useful to most users. That was the plan from day one. If you only care about hardware issues, then bt_index_check() is better in almost every way, and is really all you should be using in production unless you really don't care about the extra overhead, or about the ShareLock. The docs more or less say this. I have certainly heard of people using bt_index_parent_check() in production, but only when they already knew that their database was corrupt, and wanted to isolate the problem. I imagine that people that use bt_index_parent_check() in production do so because they want as much information as possible, and are willing to do whatever it takes to get more information. > Agree, but at least this place needs a comment - why it's safe. Good idea. >> I also think that we could have better conventional regression test >> coverage here. > > Will try to invent not so large test.oif it means they may get a little extra Your smaller test takes about 350ms to run on a debug build on my laptop, which seems worth it (honestly, this test should have been there already). Maybe we can add it to the amcheck regression tests instead, since those are run less often. This also allows us to add a test case specifically as part of the amcheck enhancement, which makes a bit more sense. -- Peter Geoghegan
On Thu, Apr 19, 2018 at 10:47:02PM +0300, Teodor Sigaev wrote: > I tried to minimize Michael's test case and add it to patch. I cannot comment on the actual fix as I lack background in the area, but having a test case and even more having pg_upgrade do some work on those pages are good things. This addresses the failure as far as I can see. -- Michael
Attachment
> I tried to minimize Michael's test case and add it to patch. > > > -if (ItemPointerIsValid(leafhikey)) > +if (ItemPointerGetBlockNumberNoCheck(leafhikey) != InvalidBlockNumber) > > Should we use BTreeInnerTupleGetDownLink() as soon as we use > BTreeInnerTupleSetDownLink() for setting this? > Or even invent BTreeInnerTupleDownLinkIsValid() macro? I am not sure. Here we actually store UP link - to top parent to remove. I'm afraid using BTreeInnerTupleGetDownLink/BTreeInnerTupleSetDownLink could cause a confusion, in other hand, introducing TreeInnerTupleGetUpLink/BTreeInnerTupleSetUpLink seems over-engineering -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> heard of people using bt_index_parent_check() in production, but only > when they already knew that their database was corrupt, and wanted to > isolate the problem. I imagine that people that use > bt_index_parent_check() in production do so because they want as much > information as possible, and are willing to do whatever it takes to > get more information. That why I think we need improve amcheck - ideally, it should not miss any mistakes in tree structure. >> Agree, but at least this place needs a comment - why it's safe. > > Good idea. Could you write it? I'm afraid I can't give good explanation why we believe that nobody update this page ant it's high key while page is unlocked but pinned. > >>> I also think that we could have better conventional regression test >>> coverage here. >> >> Will try to invent not so large test.oif it means they may get a little extra > > Your smaller test takes about 350ms to run on a debug build on my > laptop, which seems worth it (honestly, this test should have been > there already). Maybe we can add it to the amcheck regression tests > instead, since those are run less often. This also allows us to add a > test case specifically as part of the amcheck enhancement, which makes > a bit more sense. Hm, it seems to me, that 350ms is short enough to place it in both core and amcheck test. I think, basic functionality should be covered by core tests as we test insert/create. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
>> Should we use BTreeInnerTupleGetDownLink() as soon as we use >> BTreeInnerTupleSetDownLink() for setting this? >> Or even invent BTreeInnerTupleDownLinkIsValid() macro? > I am not sure. Here we actually store UP link - to top parent to remove. I'm > afraid using BTreeInnerTupleGetDownLink/BTreeInnerTupleSetDownLink could cause a > confusion, in other hand, introducing > TreeInnerTupleGetUpLink/BTreeInnerTupleSetUpLink seems over-engineering > After close look I change my opinion. To have a clean code it's much better to have new pair get/set macroses specialy to manage link to top pare during page deletion. This removes last naked usage of ItemPointer(SetInvalid/IsInvalid/GetBlockNumberNoCheck) and uses self-described macroses. Patch is attached. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On Thu, Apr 19, 2018 at 11:41 PM, Teodor Sigaev <teodor@sigaev.ru> wrote: >> heard of people using bt_index_parent_check() in production, but only >> when they already knew that their database was corrupt, and wanted to >> isolate the problem. I imagine that people that use >> bt_index_parent_check() in production do so because they want as much >> information as possible, and are willing to do whatever it takes to >> get more information. > > That why I think we need improve amcheck - ideally, it should not miss any > mistakes in tree structure. I have finished writing a prototype that works, and passes all tests. Without the fix, this is what the complaint from VACUUM looks like on my machine: ERROR: right sibling 265 of block 134 is not next child 395 of block 135 in index "delete_test_table_pkey" After the final VACUUM in your test case runs (the one that throws this error), my working copy of amcheck can detect the same issue with low overhead: pg@~[20995]=# select bt_index_parent_check('delete_test_table_pkey', true); ERROR: index block lacks downlink in index "delete_test_table_pkey" DETAIL: Block=265 level=1 page lsn=0/909F4B18. There is no complaint from amcheck if called before the final VACUUM in your test case, since that's when the trouble starts (and ends). It's easy to account for concurrent page splits within amcheck by checking for BTP_SPLIT_END on the child page that could lack a downlink. Don't confuse this with checking the BTP_INCOMPLETE_SPLIT flag, which is what we set on the left half/sibling of a split -- not the "new" right half/sibling. Of course, the right sibling is the type of block that can lack a downlink until after the second phase of a pagesplit (i.e. until it's fully finished), or after the first phase of page deletion. (As you know, page deletion is like a page split "in reverse".) I said concurrent page split, but I really meant incomplete page split, since a concurrent insert/split is impossible given that amcheck holds a ShareLock here. (See commit 40dae7ec5.) Of course, we don't see details about the next level up in the amcheck error, unlike the VACUUM error. That shouldn't matter, because we're already verifying a lot about the relationship between blocks at the next level up, and even between the two levels. This enhancement adds the one piece we were missing. It could in theory detect other problems that VACUUM cannot detect, too. Expect me to post a patch during the work day on Monday or Tuesday (Pacific time). I need to polish this patch some more. I particularly need to think about the use of memory for the Bloom filter (right now I just use work_mem). > Could you write it? I'm afraid I can't give good explanation why we believe > that nobody update this page ant it's high key while page is unlocked but > pinned. Okay. I'll come up with something soon. > Hm, it seems to me, that 350ms is short enough to place it in both core and > amcheck test. I think, basic functionality should be covered by core tests > as we test insert/create. I think you're right. There is really no excuse for not having thorough code coverage for a module like nbtree. -- Peter Geoghegan
On Fri, Apr 20, 2018 at 7:18 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: > After close look I change my opinion. To have a clean code it's much better > to have new pair get/set macroses specialy to manage link to top pare during > page deletion. This removes last naked usage of > ItemPointer(SetInvalid/IsInvalid/GetBlockNumberNoCheck) and uses > self-described macroses. Patch is attached. I see your point. Maybe don't have the newline between the get and set, though, to match the existing style. And, the note about the assertion seems unnecessary. I suggest putting something about what general area this deals with. Perhaps something like the following: "Get/set leaf page highkey's link. During the second phase of deletion, the target leaf page's high key may point to an ancestor page (at all other times, the leaf level high key's link is not used). See the nbtree README for full details." -- Peter Geoghegan
On Thu, Apr 19, 2018 at 9:42 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: > But some later, in _bt_unlink_halfdead_page() we check ItemPointer high key > with ItemPointerIsValid macro - and it returns false, because offset is > actually InvalidOffsetNumber - i.e. 0 which was set by BTreeTupleSetNAtts. > And some wrong decisions are follows, I didn't look at that. The problem here is that multi-level page deletion incorrectly thinks that it's safe to fully delete a half-dead leaf page. Of course, this is due to the issue with the representation of high keys that you identified. IOW, VACUUM incorrect believes that fully deleting the leaf page will leave no "dangling downlink" one level up, but, in fact, there is one such dangling downlink. This could eventually lead to wrong answers to queries, since the deleted page will eventually be fully recycled. I refined the amcheck enhancement quite a bit today. It will not just check that a downlink is not missing; It will also confirm that it wasn't a legitimately interrupted multi-level deletion, by descending to the leaf page to match the leaf high key pointer to the top most parent, which should be the target page (the page that lacked a downlink according to the new Bloom filter). We need to worry about multi-level deletions that are interrupted by an error or a hard crash, which presents a legitimate case where there'll be no downlink for an internal page in its parent. VACUUM is okay with that, so we must be too. I'll finish it off early next week as planned. It should comprehensively cover issues with multi-level page deletion, including cases where you have this kind of index corruption and page recycling later makes it much worse. So, it won't just cover the exact test case from Michael, which produced corruption that can be detected by throwing an error when a downlink leads to a fully deleted page (though only when amcheck has a ShareLock, of course). Note that the problem here is best understood as a problem around the presence of a dangling downlink, as oppose to a problem around the absence of a needed downlink. There is an absent downlink, but that's beside the point, and in any case missing a downlink is not *inherently* wrong (as I said, it's not inherently wrong because of the issue of interrupted multi-level page deletes). -- Peter Geoghegan
Thank you, pushed > I see your point. Maybe don't have the newline between the get and > set, though, to match the existing style. And, the note about the > assertion seems unnecessary. Removed newline and note > "Get/set leaf page highkey's link. During the second phase of > deletion, the target leaf page's high key may point to an ancestor > page (at all other times, the leaf level high key's link is not used). > See the nbtree README for full details." Changed as you suggested. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Sat, Apr 21, 2018 at 6:02 PM, Peter Geoghegan <pg@bowt.ie> wrote: > I refined the amcheck enhancement quite a bit today. It will not just > check that a downlink is not missing; It will also confirm that it > wasn't a legitimately interrupted multi-level deletion, by descending > to the leaf page to match the leaf high key pointer to the top most > parent, which should be the target page (the page that lacked a > downlink according to the new Bloom filter). We need to worry about > multi-level deletions that are interrupted by an error or a hard > crash, which presents a legitimate case where there'll be no downlink > for an internal page in its parent. VACUUM is okay with that, so we > must be too. Attached patch lets amcheck detect the issue when bt_index_parent_check() is called, though only when heapallindexed verification was requested (that is, only when bt_index_parent_check() is doing something with a Bloom filter already). The new checks will probably also detect any possible issue with multi-level page deletions. The patch tightens up our general expectations around half-dead and fully deleted pages, which seems necessary but also independently useful. I'm using work_mem to constrain the size of the second Bloom filter, whereas the heapallindexed Bloom filter is constrained by maintenance_work_mem. This seems fine to me, since we have always used an additional work_mem budget for spool2 when building a unique index within nbtsort.c. Besides, it will probably be very common for the downlink Bloom filter to be less than 1% the size of the first Bloom filter when we have adequate memory for both Bloom filters (i.e. very small). I thought about mentioning this work_mem allocation in the docs, but decided that there was no need, since the CREATE INDEX spool2 work_mem stuff isn't documented anywhere either. Note that the "c.relkind = 'i'" change in the docs is about not breaking the amcheck query when local partitioned indexes happen to be in use (when the user changed the sample SQL query to not just look at pg_catalog indexes). See the "Local partitioned indexes and pageinspect" thread I just started for full details. The new P_ISDELETED() test within bt_downlink_missing_check() is what actually detects the corruption that the test case causes, since the fully deleted leaf page still has a sane top parent block number left behind (that is, we don't get as far as testing "if (BTreeTupleGetTopParent(itup) == state->targetblock)"; that's not how the the leaf page can get corrupt in the test case that Michael posted). Note that there are also two new similar P_ISDELETED() tests added to two existing functions (bt_downlink_check() and bt_check_level_from_leftmost()), but those tests won't detect the corruption that we saw. They're really there to nail down how we think about fully deleted pages. -- Peter Geoghegan
Attachment
Perfect! I would like to commit it but have some suggestions: 1) TRUNCATE bttest_multi; INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i; SELECT bt_index_parent_check('bttest_multi_idx', true); to improve test stability it would be better to disable autovacuum: ALTER TABLE bttest_multi SET (autovacuum_enabled = false) 2) Pls, add to test DELETE as it done in 6db4b49986be3fe59a1f6ba6fabf9852864efc3e 3) It's not directly connected to this patch, but allocation of BtreeCheckState is not needed, it could be allocated on stack. 4) Nevertheless, I suggest to use palloc0 (or memset(0)) for BtreeCheckState. Now several fields of that structure could be not inited. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Tue, Apr 24, 2018 at 9:06 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Perfect! Thanks! > I would like to commit it but have some suggestions: I attach a revised version, which has changes based on your feedback. > to improve test stability it would be better to disable autovacuum: > ALTER TABLE bttest_multi SET (autovacuum_enabled = false) Done. > 2) Pls, add to test DELETE as it done in > 6db4b49986be3fe59a1f6ba6fabf9852864efc3e Done. I will leave it to you to decide whether or not the original create_index.sql test can now be removed. > 3) It's not directly connected to this patch, but allocation of > BtreeCheckState is not needed, it could be allocated on stack. > > 4) Nevertheless, I suggest to use palloc0 (or memset(0)) for > BtreeCheckState. Now several fields of that structure could be not inited. I like the idea of using palloc0(). Done that way. -- Peter Geoghegan
Attachment
Thank you, pushed. Peter Geoghegan wrote: > On Tue, Apr 24, 2018 at 9:06 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: >> Perfect! -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
>> 2) Pls, add to test DELETE as it done in >> 6db4b49986be3fe59a1f6ba6fabf9852864efc3e > > Done. I will leave it to you to decide whether or not the original > create_index.sql test can now be removed. Leave it in both places. In ideal world, in amcheck test we need to create a broken index, but I don't know a correct way to do it :) -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Wed, Apr 25, 2018 at 8:06 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Leave it in both places. In ideal world, in amcheck test we need to create a > broken index, but I don't know a correct way to do it :) There are many ways to create a broken index, but they're hard to make work with pg_regress. :-) It looks like you pushed v1, which didn't have the tests and other changes you asked for. Attached patch adds those back. Thanks -- Peter Geoghegan
Attachment
> It looks like you pushed v1, which didn't have the tests and other > changes you asked for. Attached patch adds those back. Oops, I missed, I don't know how... Thank you very much for your quick eye! -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Wed, Apr 25, 2018 at 12:06 PM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Oops, I missed, I don't know how... Thank you very much for your quick eye! Thanks Teodor. -- Peter Geoghegan
Teodor Sigaev <teodor@sigaev.ru> writes: >> It looks like you pushed v1, which didn't have the tests and other >> changes you asked for. Attached patch adds those back. > Oops, I missed, I don't know how... Thank you very much for your quick eye! Teodor, are you caught up to the point where it'd be okay to run pgindent, or are there still patches waiting? regards, tom lane
On Wed, Apr 25, 2018 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Teodor, are you caught up to the point where it'd be okay to run pgindent, > or are there still patches waiting? I can't speak for Teodor, but fwiw I am not aware of any more patches waiting. -- Peter Geoghegan
On Wed, Apr 25, 2018 at 11:48 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Apr 25, 2018 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Teodor, are you caught up to the point where it'd be okay to run pgindent,
> or are there still patches waiting?
I can't speak for Teodor, but fwiw I am not aware of any more patches waiting.
Great. Thank you very much for your efforts on this feature!
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Wed, Apr 25, 2018 at 2:09 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Great. Thank you very much for your efforts on this feature! You're welcome. I also appreciate the efforts of all Postgres Pro people. I especially appreciate Anastasia's work on this project. I hope that we see more B-Tree patches from Anastasia, and others. Getting this patch over the line was, it must be said, pretty painful for everyone. There were many reasons for that, some good, others not so good. However, I think that we've learned a few things about how to manage the risks around a patch like this one. This will hopefully make it significantly easier to do more work in this area. I think that it's badly needed. I think that its importance is generally underestimated, actually. The Gray and Reuter book says "B-Trees are by far the most important access path structure in database and file systems". I believe that that's just as true now, 25 years later. The core algorithms that are used by nbtree are generally very well implemented, and I don't think it's a good idea to work on features that change that. At least, I won't be working on something like merging of non-empty pages myself (or anything that touches concurrency, page deletion, etc). However, I do think that we can significantly improve nbtree for certain workloads by optimizing the representation at the page level, which I'm glad to see more interest in in the past few years. The major difficulty here is choosing an on-disk representation that lets us do everything that we'll want to do in the future. That requires strong cooperation. -- Peter Geoghegan
> Teodor, are you caught up to the point where it'd be okay to run pgindent, > or are there still patches waiting? There is a gin predicate lock patch from Heikki, I will work on it. But I have not objection to run pgindent, I believe gin patch will be easy to change. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/