Re: FSM rewrite: doc changes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: FSM rewrite: doc changes
Date
Msg-id 11060.1222695968@sss.pgh.pa.us
Whole thread Raw
In response to Re: FSM rewrite: doc changes  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: FSM rewrite: doc changes  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> In fsm_rebuild_page, surely we needn't check "if (lchild < NodesPerPage)".

> Yes, we do.

But the loop starting point is such that you must be visiting a parent
with at least one child, no?

>> reveals a rather fundamental problem: it is clearly possible
>> for this test to fail on valid request sizes, because the page
>> header overhead is less than FSM_CAT_STEP (especially if BLCKSZ
>> is more than 8K).  I'm not sure about a really clean solution
>> here.

> Hmph. The other alternative is to use 2 bytes instead of one per page, 
> and track the free space exactly. But I'd rather not do that just to 
> deal with the very special case of huge requests.

Yeah, I thought about that too.  It's got another problem besides the
sheer space cost: it would result in a whole lot more update traffic for
upper levels of the tree.  The quantization of possible values in the
current design is good because it avoids updates of parents for
relatively small deltas of free space.

> Or we could just return -1 instead of throwing an error. Requests higher 
> than the limit would then always have to extend the heap. That's not 
> good, but I think we already have that problem for tuples of exactly 
> MaxHeapTupleSize bytes. Since PageGetFreeSpace subtracts the size of a 
> new line pointer, only newly extended pages that have never had any 
> tuples on them have enough space, as determined by PagetGetFreeSpace, to 
> fit a tuple of MaxHeapTupleSize bytes.

That seems like something we'll want to fix sometime, rather than
hardwiring into the FSM design.

I suppose an alternative possibility is to set MaxHeapTupleSize at
255/256's of a block by definition, so that no request will ever exceed
what the FSM stuff can handle.  But I'm sure that'd make somebody
unhappy --- somewhere out there is a table with tuples wider than that.

Probably the least bad alternative here is to allow FSM's category
scaling to depend on MaxHeapTupleSize.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: FSM rewrite: doc changes
Next
From: Dimitri Fontaine
Date:
Subject: Re: parallel pg_restore - WIP patch