Re: GiST VACUUM - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: GiST VACUUM
Date
Msg-id 844f3373-486e-a34d-d5a1-6a048e19355c@iki.fi
Whole thread Raw
In response to Re: GiST VACUUM  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: GiST VACUUM
List pgsql-hackers
On 29/07/18 14:47, Andrey Borodin wrote:
> Fixed both problems. PFA v14.

Thanks, took a really quick look at this.

The text being added to README is outdated for these latest changes.

> In second step I still use paloc's memory, but only to store two
> bitmaps: bitmap of internal pages and bitmap of empty leafs. Second
> physical scan only reads internal pages. I can omit that bitmap, if
> I'll scan everything. Also, I can replace emptyLeafs bitmap with
> array\list, but I do not really think it will be big.

On a typical GiST index, what's the ratio of leaf vs. internal pages? 
Perhaps an array would indeed be better. If you have a really large 
index, the bitmaps can take a fair amount of memory, on top of the 
memory used for tracking the dead TIDs. I.e. that memory will be in 
addition to maintenance_work_mem. That's not nice, but I think it's OK 
in practice, and not worth spending too much effort to eliminate. For a 
1 TB index with default 8k block size, the two bitmaps will take 32 MB 
of memory in total. If you're dealing with a database of that size, you 
ought to have some memory to spare. But if an array would use less 
memory, that'd be better.

If you go with bitmaps, please use the existing Bitmapset instead of 
rolling your own. Saves some code, and it provides more optimized 
routines for iterating through all the set bits, too 
(bms_next_member()). Another possibility would be to use Tidbitmap, in 
the "lossy" mode, i.e. add the pages with tbm_add_page(). That might 
save some memory, compared to Bitmapset, if the bitmap is very sparse. 
Not sure how it compares with a plain array.

A straightforward little optimization would be to skip scanning the 
internal pages, when the first scan didn't find any empty pages. And 
stop the scan of the internal pages as soon as all the empty pages have 
been recycled.

- Heikki


pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: Explain buffers wrong counter with parallel plans
Next
From: "Bossart, Nathan"
Date:
Subject: Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack