Home > mailing lists

Re: Storage Model for Partitioning - Mailing list pgsql-hackers

From	Richard Huxton
Subject	Re: Storage Model for Partitioning
Date	January 11, 2008 12:26:23
Msg-id	47876E75.3040001@archonet.com Whole thread Raw
In response to	Re: Storage Model for Partitioning (Simon Riggs <simon@2ndquadrant.com>)
List	pgsql-hackers

Tree view

Simon Riggs wrote:
> On Fri, 2008-01-11 at 11:34 +0000, Richard Huxton wrote:
> 
>> Is the following basically the same as option #3 (multiple RelFileNodes)?
>>
>> 1. Make an on-disk "chunk" much smaller (e.g. 64MB). Each chunk is a 
>> contigous range of blocks.
>> 2. Make a table-partition (implied or explicit constraints) map to 
>> multiple "chunks".
>> That would reduce fragmentation (you'd have on average 32MB's worth of 
>> blocks wasted per partition) and allow for stretchy partitions at the 
>> cost of an extra layer of indirection.
>>
>> For the single-partition case you'd not need to split the file of 
>> course, so it would end up looking much like the current arrangement.
> 
> We need to think about the "data model" of the storage layer. Space
> itself isn't the issue, its the assumptions that all of the other
> subsystems currently make about what how a table is structured, indexed,
> accessed and manipulated.

Which was why I was thinking you'd want to maintain indexes etc. 
thinking in terms of a table being a contiguous set of blocks, with the  mapping to an actual on-disk block taking
placebelow that level. (If 

I've understood you).

> Currently: Table 1:M Segments
> 
> Option 1: Table 1:M Segments and *separately* Table 1:M Partitions, so
> partitions are always have a maximum size. The size just changes the
> impact, doesn't change the impact of holes, max sizes etc.
> e.g. empty table with 10 partitions would be
> a) 0 bytes in 1 file
> b) 0 bytes in 1 file, plus 9GB in 9 files all full of empty blocks

Well, presumably 0GB in 10 files, but 10GB-worth of block-numbers 
"pre-allocated".

> e.g. table with 10 partitions each of 1.5GB would be
> a) 15 GB in 15 files

With the limitation that any given partition might contain a mix of 
data-ranges (e.g. 2005 lies half in partition 2 and half in partition 3).

> b) hit max size limit of partition: ERROR

In the case of 1b, you could have a segment mapping to more than 1 
partition, avoiding the error. So 2004 data is in partition 1, 2005 is 
in partitions 2,3 (where 3 is half empty), 2006 is in partition 4. 
However, this does mean you've got a lot of wasted block numbers. If you 
were using explicit (fixed) partitioning and chose a bad set of criteria 
your maximum table size could be substantially reduced.

> Option 2: Table 1:M Child Tables 1:M Segments 
> e.g. empty table with 10 partitions would be
> 0 bytes in each of 10 files
> 
> e.g. table with 10 partitions each of 1.5GB would be
> 15GB in 10 groups of 2 files

Cross-table indexes and constraints would be useful outside of the 
current scenario.

> Option 3: Table 1:M Nodes 1:M Segments 
> e.g. empty table with 10 partitions would be
> 0 bytes in each of 10 files
> 
> e.g. table with 10 partitions each of 1.5GB would be
> 15GB in 10 groups of 2 files

Ah, so this does seem to be roughly the same as I was rambling about. 
This would presumably mean that rather than (table, block #) specifying 
the location of a row you'd need (table, node #, block #).

> So 1b) seems definitely out.
> 
> The implications of 2 and 3 are what I'm worried about, which is why the
> shortcomings of 1a) seem acceptable currently.

--   Richard Huxton  Archonet Ltd

pgsql-hackers by date:

From: Simon Riggs
Date: 11 January 2008, 11:39:58
Subject: Re: Dynamic Partitioning using Segment Visibility Maps

From: Michael Meskes
Date: 11 January 2008, 14:16:22
Subject: scan.l: check_escape_warning()

Re: Storage Model for Partitioning - Mailing list pgsql-hackers

Previous

Next