Re: Relation extension scalability - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Relation extension scalability
Date
Msg-id CA+TgmoZotDBrYAPXjnFcy0AG0fqDFUFWNBvAVKk_MYN1+6Vx9g@mail.gmail.com
Whole thread Raw
In response to Re: Relation extension scalability  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Relation extension scalability
List pgsql-hackers
On Thu, Mar 24, 2016 at 7:17 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> Yet another possibility could be to call it as
>> GetPageWithFreeSpaceExtended and call it from GetPageWithFreeSpace with
>> value of oldPage as InvalidBlockNumber.
>
> Yes I like this.. Changed the same.

After thinking about this some more, I don't think this is the right
approach.  I finally understand what's going on here:
RecordPageWithFreeSpace updates the FSM lazily, only adjusting the
leaves and not the upper levels.  It relies on VACUUM to update the
upper levels.  This seems like it might be a bad policy in general,
because VACUUM on a very large relation may be quite infrequent, and
you could lose track of a lot of space for a long time, leading to a
lot of extra bloat.  However, it's a particularly bad policy for bulk
relation extension, because you're stuffing a large number of totally
free pages in there in a way that doesn't make them particularly easy
for anybody else to discover.  There are two ways we can fail here:

1. Callers who use GetPageWithFreeSpace() rather than
GetPageFreeSpaceExtended() will fail to find the new pages if the
upper map levels haven't been updated by VACUUM.

2. Even callers who use GetPageFreeSpaceExtended() may fail to find
the new pages.  This can happen in two separate ways, namely (a) the
lastValidBlock saved by RelationGetBufferForTuple() can be in the
middle of the relation someplace rather than near the end, or (b) the
bulk-extension performed by some other backend can have overflowed
onto some new FSM page that won't be searched even though a relatively
plausible lastValidBlock was passed.

It seems to me that since we're adding a whole bunch of empty pages at
once, it's worth the effort to update the upper levels of the FSM.
This isn't a case of discovering a single page with an extra few bytes
of storage available due to a HOT prune or something - this is a case
of putting at least 20 and plausibly hundreds of extra pages into the
FSM.   The extra effort to update the upper FSM pages is trivial by
comparison with the cost of extending the relation by many blocks.

So, I suggest adding a new function FreeSpaceMapBulkExtend(BlockNumber
first_block, BlockNumber last_block) which sets all the FSM entries
for pages between first_block and last_block to 255 and then bubbles
that up to the higher levels of the tree and all the way to the root.
Have the bulk extend code use that instead of repeatedly calling
RecordPageWithFreeSpace.  That should actually be much more efficient,
because it can call fsm_readbuf(), LockBuffer(), and
UnlockReleaseBuffer() just once per FSM page instead of once per FSM
page *per byte modified*.  Maybe that makes no difference in practice,
but it can't hurt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [BUGS] Re: BUG #13854: SSPI authentication failure: wrong realm name used
Next
From: Peter Geoghegan
Date:
Subject: Re: [WIP] Effective storage of duplicates in B-tree index.