Re: GiST VACUUM - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: GiST VACUUM
Date
Msg-id 4A1FF988-A69D-4607-8A6E-B3EC0FD6C2CF@yandex-team.ru
Whole thread Raw
In response to Re: GiST VACUUM  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: GiST VACUUM
List pgsql-hackers
Hi! Thanks for looking into the patch!

> 30 июля 2018 г., в 18:39, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> On 29/07/18 14:47, Andrey Borodin wrote:
>> Fixed both problems. PFA v14.
>
> Thanks, took a really quick look at this.
>
> The text being added to README is outdated for these latest changes.
Fixed.
>
>> In second step I still use paloc's memory, but only to store two
>> bitmaps: bitmap of internal pages and bitmap of empty leafs. Second
>> physical scan only reads internal pages. I can omit that bitmap, if
>> I'll scan everything. Also, I can replace emptyLeafs bitmap with
>> array\list, but I do not really think it will be big.
>
> On a typical GiST index, what's the ratio of leaf vs. internal pages? Perhaps an array would indeed be better.
Typical GiST has around 200 tuples per internal page. I've switched to List since it's more efficient than bitmap. Is
> If you have a really large index, the bitmaps can take a fair amount of memory, on top of the memory used for
trackingthe dead TIDs. I.e. that memory will be in addition to maintenance_work_mem. That's not nice, but I think it's
OKin practice, and not worth spending too much effort to eliminate. For a 1 TB index with default 8k block size, the
twobitmaps will take 32 MB of memory in total. If you're dealing with a database of that size, you ought to have some
memoryto spare. But if an array would use less memory, that'd be better. 

>
> If you go with bitmaps, please use the existing Bitmapset instead of rolling your own. Saves some code, and it
providesmore optimized routines for iterating through all the set bits, too (bms_next_member()). Another possibility
wouldbe to use Tidbitmap, in the "lossy" mode, i.e. add the pages with tbm_add_page(). That might save some memory,
comparedto Bitmapset, if the bitmap is very sparse. Not sure how it compares with a plain array. 
Yeah, I've stopped reinventing that bicycle. But I have to note that default growth strategy of Bitmap is not good: we
willbe repallocing byte by byte. 

>
> A straightforward little optimization would be to skip scanning the internal pages, when the first scan didn't find
anyempty pages. And stop the scan of the internal pages as soon as all the empty pages have been recycled. 
Done.

PFA v15.

Best regards, Andrey Borodin.

Attachment

pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Online enabling of checksums
Next
From: Andres Freund
Date:
Subject: Re: Should contrib modules install .h files?