Simon Riggs wrote:
> On Fri, 2008-01-04 at 10:22 +0000, Richard Huxton wrote:
>> Simon Riggs wrote:
>>> We would keep a dynamic visibility map at *segment* level, showing which
>>> segments have all rows as 100% visible. No freespace map data would be
>>> held at this level.
>> Small dumb-user question.
>>
>> I take it you've considered some more flexible consecutive-run-of-blocks
>> unit of flagging rather than file-segments. That obviously complicates
>> the tracking but means you can cope with infrequent updates as well as
>> mark most of the "most recent" segment for log-style tables.
>
> I'm writing the code to abstract that away, so yes.
>
> Now you mention it, it does seem straightforward to have a table storage
> parameter for partition size, which defaults to 1GB. The partition size
> is simply a number of consecutive blocks, as you say.
>
> The smaller the partition size the greater the overhead of managing it.
Oh, obviously, but with smaller partition sizes this also becomes useful
for low-end systems as well as high-end ones. Skipping 80% of a seq-scan
on a date-range query is a win for even small (by your standards)
tables. I shouldn't be surprised if the sensible-number-of-partitions
remained more-or-less constant as you scaled the hardware, but the
partition size grew.
> Also I've been looking at read-only tables and compression, as you may
> know. My idea was that in the future we could mark segments as either
> - read-only
> - compressed
> - able to be shipped off to hierarchical storage
>
> Those ideas work best if the partitioning is based around the physical
> file sizes we use for segments.
I can see why you've chosen file segments. It certainly makes things easier.
Hmm - thinking about the date-range scenario above, it occurs to me that for seq-scan purposes the correct partition
sizedepends upon the
data value you are interested in. What I want to know is what blocks Jan
07 covers (or rather what blocks it doesn't) rather than knowing blocks
1-9999999 cover 2005-04-12 to 2007-10-13. Of course that means that
you'd eventually want different partition sizes tracking visibility for
different columns (e.g. id, timestamp).
I suspect the same would be true for read-only/compressed/archived
flags, but I can see how they are tightly linked to physical files
(particularly the last two).
-- Richard Huxton Archonet Ltd