On 2020-Jul-30, Anastasia Lubennikova wrote:
> While testing this fix, Alexander Lakhin spotted another problem. I
> simplified the test case to this:
Ah, good catch. I think a cleaner way to fix this problem is to just
consider the range as not summarized and return NULL from there, as in
the attached patch. Running your test case with a telltale WARNING
added at that point, it's clear that it's being hit.
By returning NULL, we're forcing the caller to scan the heap, which is
not great. But note that if you retry, and your VACUUM hasn't run yet
by the time we go through the loop again, the same thing would happen.
So it seems to me a good enough answer.
A much more troubling thought is what happens if the range is
desummarized, then the index item is used for the summary of a different
range. Then the index might end up returning corrupt results.
> At first, I tried to fix it by holding the lock on revmap->rm_currBuf until
> we locked the regular page, but it causes a deadlock with brinsummarize(),
> It can be easily reproduced with the same test as above.
> Is there any rule about the order of locking revmap and regular pages in
> brin? I haven't found anything in README.
Umm, I thought that stuff was in the README, but it seems I didn't add
it there. I think I had a .org file with my notes on that ... must be
in an older laptop disk, because it's not in my worktree for that. I'll
see if I can fish it out.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services