Re: Freeze avoidance of very large table. - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Freeze avoidance of very large table.
Date
Msg-id 55390A01.3090200@BlueTreble.com
Whole thread Raw
In response to Re: Freeze avoidance of very large table.  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 4/23/15 8:42 AM, Robert Haas wrote:
> On Thu, Apr 23, 2015 at 4:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> We were talking about having an incremental backup map also. Which sounds a
>> lot like the freeze map.
>
> Yeah, possibly.  I think we should try to set things up so that the
> backup map can be updated asynchronously by a background worker, so
> that we're not adding more work to the foreground path just for the
> benefit of maintenance operations.  That might make the logic for
> autovacuum to use it a little bit more complex, but it seems
> manageable.

I'm not sure an actual map makes sense... for incremental backups you 
need some kind of stream that tells you not only what changed but when 
it changed. A simple freeze map won't work for that because the 
operation of freezing itself writes data (and the same can be true for 
VM). Though, if the backup utility was actually comparing live data to 
an actual backup maybe this would work...

>> We only need a freeze/backup map for larger relations. So if we map 1000
>> blocks per map page, we skip having a map at all when size < 1000.
>
> Agreed.  We might also want to map multiple blocks per map slot - e.g.
> one slot per 32 blocks.  That would keep the map quite small even for
> very large relations, and would not compromise efficiency that much
> since reading 256kB sequentially probably takes only a little longer
> than reading 8kB.

The problem with mapping a range of pages per bit is dealing with 
locking when you set the bit. Currently that's easy because we're 
holding the cleanup lock on the page, but you can't do that if you have 
a range of pages. Though, if each 'slot' wasn't a simple binary value we 
could have a 3rd state that indicates we're in the process of marking 
that slot as all visible/frozen, but you still need to consider the bit 
as cleared.

Honestly though, I think concerns about the size of the map are a bit 
overblown. Even if we double it's size, it's still 32,000 times smaller 
than the heap is with 8k pages. I suspect if you have tables large 
enough where you'll care, you'll also be using 32k pages, which means 
it'd be 128,000 times smaller than the heap. I have a hard time 
believing that's going to be even a faint blip on the performance radar.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: Fabrízio de Royes Mello
Date:
Subject: Re: Add CINE for ALTER TABLE ... ADD COLUMN
Next
From: Andres Freund
Date:
Subject: Re: tablespaces inside $PGDATA considered harmful