Re: Dynamic Partitioning using Segment Visibility Maps - Mailing list pgsql-hackers

From Markus Schiltknecht
Subject Re: Dynamic Partitioning using Segment Visibility Maps
Date
Msg-id 4780B213.7070303@bluegap.ch
Whole thread Raw
In response to Re: Dynamic Partitioning using Segment Visibility Maps  (Robert Treat <xzilla@users.sourceforge.net>)
Responses Re: Dynamic Partitioning using Segment Visibility Maps
List pgsql-hackers
Hi,

Robert Treat wrote:
> On Saturday 05 January 2008 14:02, Markus Schiltknecht wrote:
>>>> To satisfy all the different requirements of partitioning with segments
>>>> based partitioning, we'd have to allow a table to span multiple table
>>>> spaces. I'm not very keen on going that way.
>>> Why?
>> Uh.. if a table (RELKIND_RELATION) can only span one table space, as it
>> is now, all of its segments are in the same table space. I don't quite
>> call that partitioning. Well, sure, you could call it so, but then, each
>> and every Postgres table is already partitioned in 1G segments.
>>
>> It all depends on the definitions, but in my world, horizontal
>> partitioning for databases involves multiple table spaces (and is quite
>> useless without that). Calling anything else partitioning is confusing,
>> IMO.
> 
> I'm not following this.  If we can work out a scheme, I see no reason not to 
> allow a single table to span multiple tablespaces.  Do you see a problem with 
> that?

Uhm... well, no. I was just pointing out that it's a requirement. It 
depends on how you define things, but I'm seeing it that way:

table -- 1:n -- partition -- 1:1 -- table space -- 1:n -- segments

What I'm advocating is making partitions available to the DBA as some 
kind of a relation, she can query separately and move around between 
table spaces.

>> Why should that not be possible with other schemes? Moving the split
>> point between two partitions involves moving tuples around, no matter if
>> you are going to move them between segments or between relations
>> building the partitions.
> 
> The difference is that, if I currently have a table split by month, I 
> can "re-partition" it into weekly segments, and only shuffle one months data 
> at a time minimize impact on the system while I shuffle it. This can even be 
> used to do dynamic management, where data from the current month is archived 
> by day, data from the past year by week, and data beyond that done monthly.

This should be possible for both schemes, I see no connection to what 
we've discussed. SE doesn't magically give you this level of control you 
are requesting here. Quite the opposite: referring to CLUSTERing to 
makes me wonder, if that's not going to shuffle way too many tuples around.

What I'm saying is, that SE doesn't partition the segments into 
different table spaces. Thus I don't consider it "database partitioning" 
in the first place. As I currently understand it, it's:

table -- 1:1 -- table space -- 1:n -- partitions -- 1:n -- segments

> On many other databases, if you change the partition scheme, it requires 
> exclusive locks and a shuffleing of all of the data, even data whose 
> partitions arent being redefined.  Even worse are systems like mysql, where 
> you need to rewrite the indexes as well.  To me, these requirements always 
> seem like show stoppers; I generally can't afford to lock a table while the 
> database rewrites a billion rows of data. 

I fully agree here. How do you plan to solve that problem on top of SE?

> In a more general sense, a global index is a an index that spans multiple 
> partitions, as opposed to a local index, which is an index on specific 
> partitions; postgresql current supports the latter, not the former.
> 
> In any case, my thinking is if we had the segment exclusion technique, I could 
> convert that partitioned table into a regular table again,

... on a single table space ...

> use segment 
> exclusion to handle what is currently handled by partitions,

... except, that there is no partitioning (!?!) (between table spaces)

> and create 
> a "global index" across all the other data for that other, currently killer, 
> query. 

I thought the table you are referring to is bigger than your fastest 
table space? That would even make it impossible.

See where I'm coming from? And why I'm stating that SE is an 
optimization (for seq scans), but not partitioning?

Regards

Markus



pgsql-hackers by date:

Previous
From: Markus Schiltknecht
Date:
Subject: Re: Dynamic Partitioning using Segment Visibility Maps
Next
From: "Gokulakannan Somasundaram"
Date:
Subject: Re: Dynamic Partitioning using Segment Visibility Maps