Home > mailing lists

Re: Simple improvements to freespace allocation - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Simple improvements to freespace allocation
Date	January 8, 2014 08:17:22
Msg-id	CA+U5nMJSTrxfW-5+dFYG_KwB1Lsd3ytihM2Ng1E3jEr6HcwC3Q@mail.gmail.com Whole thread
In response to	Re: Simple improvements to freespace allocation (Heikki Linnakangas <hlinnakangas@vmware.com>)
List	pgsql-hackers

Tree view

On 8 January 2014 07:43, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 01/08/2014 08:56 AM, Simon Riggs wrote:
>>
>> Current freesapce code gives a new block insert target (NBT) from
>> anywhere in table. That isn't very useful with bigger tables and it
>> would be useful to be able to specify different algorithms for
>> producing NBTs.
>
>
> I've actually been surprised how little demand there has been for
> alternative algorithms. When I wrote the current FSM implementation, I
> expected people to start coming up with all kinds of wishes, but it didn't
> happen. There has been very few complaints, everyone seems to be satisfied
> with the way it works now. So I'm not convinced there's much need for this.

That would require someone to conduct detailed analysis on problems,
which few people are capable of doing at a level that we would accept.
No doubt I will soon be challenged to prove beyond doubt that anything
here is required, which becomes chicken and egg. For the vast majority
of cases, the general approach we have works well enough - this area
has had lots of very useful attention ove the years.

The problem is tables have multiple use cases and we support only one,
with no easy way for people to experiment with alternatives in
production.

Its been on my list for years... but its not been a top priority, for sure.

>> ALTER TABLE foo WITH (freespace = XXXX);
>>
>> Three simple and useful models come to mind
>>
>> * CONCURRENT
>> This is the standard/current model. Naming it likes this emphasises
>> why we pick NBTs in the way we do.
>>
>> * PACK
>> We want the table to be smaller, so rather than run a VACUUM FULL we
>> want to force the table to choose an NBT at start of table, even at
>> the expense of concurrency. By avoiding putting new data at the top of
>> the table we allow the possibility that VACUUM will shrink table size.
>> This is same as current except we always reset the FSM pointer to zero
>> and re-seek from there. This takes some time to have an effect, but is
>> much less invasive than VACUUM FULL.
>
>
> We already reset the FSM pointer to zero on vacuum. Would the above actually
> make any difference in practice?

The Pack algo would emphasise tight packing over assigning concurrent
blocks. It would be useful if that also included not doing HOT updates
in favour of migrating rows to an earlier block in the table. Emphasis
on avoiding VACUUM FULL in certain cases, not for general use.

>> * RECENT
>> For large tables that are append-mostly use case it would be easier to
>> prefer NBTs from the last two 1GB segments of a table, allowing them
>> to be more easily cached. This is same as current except when we wrap
>> we don't go to block 0 we go to first block of penultimate (max - 1)
>> segment. For tables <= 2 segments this is no change from existing
>> algorithm. For larger tables it would focus updates/inserts into a
>> much reduced and yet still large area and allow better cacheing.
>
>
> Umm, wouldn't that bloat the table with no limit? Putting my DBA/developer
> hat on, I don't understand when I would want to use that setting.

That would depend on the use case; no algo suggested works for all or
even general cases.

If you have a large table, allocating freespace in blocks unlikely to
be accessed by queries means we introduce additional cache pressure.
If we allocate NBTs from newer blocks we will more likely find them in
cache.

Allowing older data to become write-seldom allows us to consider
things like compressing particular segments or moving them onto
cheaper storage.

(Another suggestion might be to use the VM so that we tend to add data
to already dirty blocks.)

> There's one policy that I'd like to see: maintaining cluster order. When
> inserting a new tuple, try to place it close to other tuples with similar
> keys, to keep the table clustered.

Agreed. Do you have a particular algorithm in mind? I can think of a few.

> In practice, CLUSTER CONCURRENTLY might be more useful, though.

I think we want both wholesale and retail. Certainly in the absence of
the former, the latter seems good addition.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 08 January 2014, 07:44:01
Subject: Re: Simple improvements to freespace allocation

From: Simon Riggs
Date: 08 January 2014, 08:34:03
Subject: Turning off HOT/Cleanup sometimes

Re: Simple improvements to freespace allocation - Mailing list pgsql-hackers

Previous

Next