Re: Dynamic Partitioning using Segment Visibility Maps - Mailing list pgsql-hackers

From Chris Browne
Subject Re: Dynamic Partitioning using Segment Visibility Maps
Date
Msg-id 60abnfymho.fsf@dba2.int.libertyrms.com
Whole thread Raw
In response to Dynamic Partitioning using Segment Visibility Maps  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Dynamic Partitioning using Segment Visibility Maps  (Gavin Sherry <swm@alcove.com.au>)
Re: Dynamic Partitioning using Segment Visibility Maps  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
List pgsql-hackers
simon@2ndquadrant.com (Simon Riggs) writes:
> I think we have an opportunity to bypass the legacy-of-thought that
> Oracle has left us and implement something more usable.

This seems like a *very* good thing to me, from a couple of
perspectives.

1.  I think you're right on in terms of the issue of the cost of   "running all that DDL" in managing partitioning
schemes.
   When I was working as DBA, I was decidedly *NOT* interested in   doing a lot of low level partition management work,
andthose that   are in that role now would, I'm quite sure, agree that they are   not keen on spending a lot of their
timetrying to figure out what   tablespace to shift a particular table into, or what tablespace   filesystem to get
sysadminsto set up.
 

2.  Blindly following what Oracle does has always been a dangerous   sort of thing to do.
   There are two typical risks:
     a) There's always the worry that they may have patented some        part of how they implement things, and if you
followtoo        closely, There Be Dragons...
 
     b) They have enough billion$ of development dollar$ and        development re$ource$ that they can follow
strategiesthat        are too expensive for us to even try to follow.
 

3.  If, rather than blindly following, we create something at least   quasi-new, there is the chance of doing
fundamentallybetter.
 
   This very thing happened when it was discovered that IBM had a   patent on the ARC cacheing scheme; the "clock"
systemthat emerged   was a lot better than ARC ever was.
 

> One major advantage of the dynamic approach is that it can work on
> multiple dimensions simultaneously, which isn't possible with
> declarative partitioning. For example if you have a table of Orders then
> you will be able to benefit from Segment Exclusion on all of these
> columns, rather than just one of them: OrderId, OrderDate,
> RequiredByDate, LastModifiedDate. This will result in some "sloppiness"
> in the partitioning, e.g. if we fill 1 partition a day of Orders, then
> the OrderId and OrderData columns will start out perfectly arranged. Any
> particular RequiredByDate will probably be spread out over 7 partitions,
> but thats way better than being spread out over 365+ partitions.

I think it's worth observing both the advantages and demerits of this
together.

In effect, with the dynamic approach, Segment Exclusion provides its
benefits as an emergent property of the patterns of how INSERTs get
drawn into segments.

The tendancy will correspondly be that Segment Exclusion will be able
to provide useful constraints for those patterns that can naturally
emerge from the INSERTs.

We can therefore expect useful constraints for attributes that are
assigned in some kind of more or less chronological order.  Such
attributes will include:
- Object ID, if set by a sequence- Processing dates

There may be a bit of sloppiness, but the constraints may still be
useful enough to exclude enough segments to improve efficiency.

_On The Other Hand_, there will be attributes that are *NOT* set in a
more-or-less chronological order, and Segment Exclusion will be pretty
useless for these attributes.  

In order to do any sort of "Exclusion" for non-"chronological"
attributes, it will be necessary to use some mechanism other than the
patterns that fall out of "natural chronological insertions."  If you
want exclusion on such attributes, then there needs to be some sort of
rule system to spread such items across additional partitions.  Mind
you, if you do such, that will weaken the usefulness of Segment
Exclusion.  For instance, suppose you have 4 regions, and scatter
insertions by region.  In that case, there will be more segments that
overlap any given chronological range.

> When we look at the data in the partition we can look at any number of
> columns. When we declaratively partition, you get only one connected set
> of columns, which is one of the the reasons you want multi-dimensional
> partitioning in the first place.

Upside: Yes, you get to exclude based on examining any number of
columns.

Downside: You only get the exclusions that are "emergent properties"
of the data...

The more I'm looking at the dynamic approach, the more I'm liking
it...
-- 
"cbbrowne","@","cbbrowne.com"
http://linuxfinances.info/info/linuxxian.html
"Feel free  to contribute build  files.  Or work on  your motivational
skills, and maybe someone somewhere will write them for you..."
-- "Fredrik Lundh" <effbot@telia.com>


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: OUTER JOIN performance regression remains in 8.3beta4
Next
From: Simon Riggs
Date:
Subject: Re: Named vs Unnamed Partitions