Re: Storage Model for Partitioning - Mailing list pgsql-hackers
From | Richard Huxton |
---|---|
Subject | Re: Storage Model for Partitioning |
Date | |
Msg-id | 47876E75.3040001@archonet.com Whole thread Raw |
In response to | Re: Storage Model for Partitioning (Simon Riggs <simon@2ndquadrant.com>) |
List | pgsql-hackers |
Simon Riggs wrote: > On Fri, 2008-01-11 at 11:34 +0000, Richard Huxton wrote: > >> Is the following basically the same as option #3 (multiple RelFileNodes)? >> >> 1. Make an on-disk "chunk" much smaller (e.g. 64MB). Each chunk is a >> contigous range of blocks. >> 2. Make a table-partition (implied or explicit constraints) map to >> multiple "chunks". >> That would reduce fragmentation (you'd have on average 32MB's worth of >> blocks wasted per partition) and allow for stretchy partitions at the >> cost of an extra layer of indirection. >> >> For the single-partition case you'd not need to split the file of >> course, so it would end up looking much like the current arrangement. > > We need to think about the "data model" of the storage layer. Space > itself isn't the issue, its the assumptions that all of the other > subsystems currently make about what how a table is structured, indexed, > accessed and manipulated. Which was why I was thinking you'd want to maintain indexes etc. thinking in terms of a table being a contiguous set of blocks, with the mapping to an actual on-disk block taking placebelow that level. (If I've understood you). > Currently: Table 1:M Segments > > Option 1: Table 1:M Segments and *separately* Table 1:M Partitions, so > partitions are always have a maximum size. The size just changes the > impact, doesn't change the impact of holes, max sizes etc. > e.g. empty table with 10 partitions would be > a) 0 bytes in 1 file > b) 0 bytes in 1 file, plus 9GB in 9 files all full of empty blocks Well, presumably 0GB in 10 files, but 10GB-worth of block-numbers "pre-allocated". > e.g. table with 10 partitions each of 1.5GB would be > a) 15 GB in 15 files With the limitation that any given partition might contain a mix of data-ranges (e.g. 2005 lies half in partition 2 and half in partition 3). > b) hit max size limit of partition: ERROR In the case of 1b, you could have a segment mapping to more than 1 partition, avoiding the error. So 2004 data is in partition 1, 2005 is in partitions 2,3 (where 3 is half empty), 2006 is in partition 4. However, this does mean you've got a lot of wasted block numbers. If you were using explicit (fixed) partitioning and chose a bad set of criteria your maximum table size could be substantially reduced. > Option 2: Table 1:M Child Tables 1:M Segments > e.g. empty table with 10 partitions would be > 0 bytes in each of 10 files > > e.g. table with 10 partitions each of 1.5GB would be > 15GB in 10 groups of 2 files Cross-table indexes and constraints would be useful outside of the current scenario. > Option 3: Table 1:M Nodes 1:M Segments > e.g. empty table with 10 partitions would be > 0 bytes in each of 10 files > > e.g. table with 10 partitions each of 1.5GB would be > 15GB in 10 groups of 2 files Ah, so this does seem to be roughly the same as I was rambling about. This would presumably mean that rather than (table, block #) specifying the location of a row you'd need (table, node #, block #). > So 1b) seems definitely out. > > The implications of 2 and 3 are what I'm worried about, which is why the > shortcomings of 1a) seem acceptable currently. -- Richard Huxton Archonet Ltd
pgsql-hackers by date: