Re: GiST VACUUM - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: GiST VACUUM
Date
Msg-id EB87A69B-EE5E-4259-9EEB-DA9DC1F7E265@yandex-team.ru
Whole thread Raw
In response to Re: GiST VACUUM  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: GiST VACUUM  (Andrey Borodin <x4mmm@yandex-team.ru>)
Re: GiST VACUUM  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Cool, thanks!

> 2 янв. 2019 г., в 20:35, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> In patch #1, to do the vacuum scan in physical order:
> ...
> I think this is ready to be committed, except that I didn't do any testing. We discussed the need for testing
earlier.Did you write some test scripts for this, or how have you been testing? 
Please see test I used to check left jumps for v18:
0001-Physical-GiST-scan-in-VACUUM-v18-with-test-modificat.patch
0002-Test-left-jumps-v18.patch

To trigger FollowRight GiST sometimes forget to clear follow-right marker simulating crash of an insert. This fills
logswith "fixing incomplete split" messages. Search for "REMOVE THIS" to disable these ill-behavior triggers. 
To trigger NSN jump GiST allocate empty page after every real allocation.

In log output I see
2019-01-03 22:27:30.028 +05 [54596] WARNING:  RESCAN TRIGGERED BY NSN
WARNING:  RESCAN TRIGGERED BY NSN
2019-01-03 22:27:30.104 +05 [54596] WARNING:  RESCAN TRIGGERED BY FollowRight
This means that code paths were really executed (for v18).

>
> Patch #2:

>
> * Bitmapset stores 32 bit signed integers, but BlockNumber is unsigned. So this will fail with an index larger than
2^31blocks. That's perhaps academical, I don't think anyone will try to create a 16 TB GiST index any time soon. But it
feelswrong to introduce an arbitrary limitation like that. 
Looks like bitmapset is unsuitable for collecting block numbers due to the interface. Let's just create custom
containerfor this? 
Or there is one other option: for each block number throw away sign bit and consider page potentially internal and
potentiallyempty leaf if (blkno & 0x7FFFFFF) is in bitmapset. 
And then we will have to create at least one 17Tb GiST to check it actually works.

> * I was surprised that the bms_make_empty() function doesn't set the 'nwords' field to match the allocated size.
Perhapsthat was intentional, so that you don't need to scan the empty region at the end, when you scan through all
matchingbits? Still, seems surprising, and needs a comment at least. 
Explicitly set nwords to zero. Looking at the code now, I do not see this nwords==0 as a very good idea. Probably, it's
effective,but it's hacky, creates implicit expectations in code. 

> * We're now scanning all internal pages in the 2nd phase. Not only those internal pages that contained downlinks to
emptyleaf pages. That's probably OK, but the comments need updating on that. 
Adjusted comments. That if before loop
> if (vstate.emptyPages > 0)
seems superfluous. But I kept it until we solve the problem with 31-bit bitmapset.
> * In general, comments need to be fixed and added in several places. For example, there's no clear explanation on
whatthe "delete XID", stored in pd_prune_xid, means. (I know what it is, because I'm familiar with the same mechanism
inB-tree, but it's not clear from the code itself.) 
I've added comment to GistPageSetDeleteXid()

* In this check
> if (GistPageIsDeleted(page) && TransactionIdPrecedes(GistPageGetDeleteXid(page), RecentGlobalXmin))
I've switched using RecentGlobalDataXmin to RecentGlobalXmin, because we have done so in similar mechanics for GIN (for
uniformitywith B-tree). 


Thanks for working on this!


Best regards, Andrey Borodin.



Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Remove WITH OIDS support, change oid catalog column visibility.
Next
From: Peter Eisentraut
Date:
Subject: Re: Unified logging system for command-line programs