Thread: Table Partitioning, Part 1

Table Partitioning, Part 1

From
Simon Riggs
Date:
A more in-depth consideration of the major design options and trade-offs
available to us... this is an internals related discussion.

Comments?

1. Embellish inheritance or separate infrastructure?

Inheritance is a somewhat strange PostgreSQL feature, with a few
gotchas. The "big one" is the lack of a multi-relation index that can be
used against all of the inherited relations also.
The full complexity of multiple inheritance, possible multi-level
inheritance doesn't seem to solve any problems for us. The main argument
in favour of improving inheritance as a route to partitioning is that
many people have already developed applications using it to provide some
of the desirable aspects of partitioning. The amount of code used to
provide some of these features is actually quite low, and the code
itself is already specialised. 

It seems prudent to avoid building on that foundation, even though we
may decide to use some similar approaches.

2. Individual Relations explicitly in the plan or MultiRelation plan
nodes? (i.e. is a SeqScan of a Partitioned Table one Node or many
nodes?)

Currently, Inheritance produces a plan for a SeqScan that looks like
this...

Append SeqScan SeqScan SeqScan

ISTM fairly straightforward to produce a similar "static" plan along the
same lines, using Result nodes to implement Partition Elimination.

Append Result   SeqScan Result   SeqScan Result   SeqScan

Some may think that looks a little clunky, especially since the plan
length would increase as the number of Partitions increased. An
alternative, would be to have a single Scan node that performs a Multi-
Partition Scan, which looks like this...

PartitionSeqScan

There'd always one node, no matter how many partitions. Again, easy to
implement, since we just add a new type of Scan access method, or alter
the existing ones iff we access partitioned tables. We'd be able to use
physical tlists, and all the complexity is hidden.

Both solve the basic case of Direct restriction of Partitioning Key
(PPUC1)

The multi-scan node stops us from doing a mixed Scan plan:

Append Result   IndexScan Result   SeqScan Result   SeqScan

Why? The node acts at execution time, though the planner is the usual
place to decide between scan types. 

But why do we need that? Why not just make all partitions the same size,
give them all the same Statistics etc.. Well, there are Use Cases that
strongly suggest that partition size could vary considerably (PPUC1) and
also that there is some incentive to have tables that have indexes on
the most recent partitions yet none on the older partitions (PPUC5.2)

My current thinking is to implement explicitly partition plans.

But what about join plans? There are Use Cases that depend almost
entirely upon joins for Partition Elimination rather than direct
restriction of the PPK. (PPUC2)

How could a join plan be performed with the explicit plan shown above?
The only option would seem to be a Sort/Merge, which could be a disaster
if we only wanted a few rows returned from a very large table.

If all partitions in the query had identical indexes on them, then we
have another option. In that case, each index could be thought to form
part of a larger index ordered initially on the Partitioning Key (PPK).
If the first column was actually the PPK, then the set of indexes would
be exactly equivalent to a Global Index. We can call this a Pseudo
Global Index.

The Pseudo Global Index could also be used to enforce uniqueness. If all
of the composite indexes were defined unique and the index contained the
PPK as one of its columns, this would work. The index enforces
uniqueness within each partition and the PPK enforces uniqueness across
partitions because the same PPK value cannot be in two partitions.

We can also use the Pseudo Global Index as the object of a multi-index
scan node, since the PGI would be ordered on the PPK. The plan node
would use the partition boundary definitions to locate the starting
partition, then seek within the partition to the correct part of the
index then scan sideways, skipping to the next partition as required.

This sounds a great idea, but it is exactly the technique I argued
against only 4 paragraphs ago for use with SeqScans.

My current thinking is that we need a mix of the two approaches, though
I raise all of this here to gain everybody's input...

3. Partitions are relations? Or sub-relations?

If we ever want to implement explicit plans, as discussed above, ISTM
that we should consider partitions to be full relations.

If partitions become sub-relations then we complicate many other parts
of the machinery in the planner. This would also complicate many other
things, since each rel currently has a single relfilenode for almost
every aspect. 

We might think about doing this as a way of making "Global Indexes". If
its all one relation, we already have them! But that would only work for
indexes defined with the PPK as the leading column because otherwise we
would have no way of adding or dropping new partitions without ripping
the guts out of each index (and failing PPUC4.2 as a result). It turns
out that we can have roughly the same kind of Pseudo Global Index
whether we consider partitions to be relations or sub-relations.

It seems much simpler to have partitions as relations...

4. Should all partitions have the same indexes?

It also seems that we have somewhat opposing Use Cases: PPUC2 requires
joins to achieve Partition Elimination, PPUC 5.2 requires us to have no
indexes on older partitions (since they reside on tape).

Some people will say that having all indexes defined the same eases
maintenance. Others will say it gives us flexibility.

I would argue in favour of flexibility now and automation later.

5. Constraints or specific Partitioning syntax?

Partition Elimination relies upon being able to prove at execution time
that two clauses in a query restriction are always false when taken
together. 

Taken generically, extending the concept of Partitioning to the max, we
might imagine that we could take any number of constraints, then AND
them together with an arbitrarily complex restriction clause, solve it
and then eliminate partitions. That seems too much work to solve that
generically and provably. I aim for an easier subset, with a more
bounded solution algorithm.

If we know which constraints are to be used for Partition Elimination,
and we make some restrictions on the types of restricion clauses that
can take part in Partition Elimination, then the task gets easier.

So, I think this requires explicit syntax, even though I am very
attracted to minimal syntax implementations.



Anyway, that was a bit of a ramble, so thanks for reading this far.

Best regards, Simon Riggs





Re: Table Partitioning, Part 1

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> 1. Embellish inheritance or separate infrastructure?

> It seems prudent to avoid building on that foundation, even though we
> may decide to use some similar approaches.

I disagree.  The code is there, it could use work, and what you are
basically proposing is to duplicate both the existing work and much
of the improvement it needs.

> 2. Individual Relations explicitly in the plan or MultiRelation plan
> nodes? (i.e. is a SeqScan of a Partitioned Table one Node or many
> nodes?)

This one is so obvious it hardly requires any discussion.  You cannot
have intelligent planning if you fold everything into a single plan
node.  I just finished getting rid of a similarly bad decision in the
context of indexscan planning (ie, a hardwired approach to OR logic)
--- let's not make that mistake again.

> 5. Constraints or specific Partitioning syntax?

> Partition Elimination relies upon being able to prove at execution time

You mean plan time.

> that two clauses in a query restriction are always false when taken
> together. 

We already have a reasonably decent implementation of such proving for
partial-index predicate handling.  I see no reason not to use that.
So I don't agree with the idea of special-purpose syntax or logic.
        regards, tom lane


Re: Table Partitioning, Part 1

From
Simon Riggs
Date:
On Mon, 2005-05-09 at 18:53 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > 1. Embellish inheritance or separate infrastructure?
> 
> > It seems prudent to avoid building on that foundation, even though we
> > may decide to use some similar approaches.
> 
> I disagree.  The code is there, it could use work, and what you are
> basically proposing is to duplicate both the existing work and much
> of the improvement it needs.

Minefields need clearing someday, I suppose. 

Multiple inheritance isn't something I'll be spending time on though.

> > 2. Individual Relations explicitly in the plan or MultiRelation plan
> > nodes? (i.e. is a SeqScan of a Partitioned Table one Node or many
> > nodes?)
> 
> This one is so obvious it hardly requires any discussion.  You cannot
> have intelligent planning if you fold everything into a single plan
> node.  

That was my first thought, and my proposed approach. But many people
have asked for the discussion to be aired, so thats why its mentioned.

Could allow me to complete something useful though basic for 8.1

> I just finished getting rid of a similarly bad decision in the
> context of indexscan planning (ie, a hardwired approach to OR logic)
> --- let's not make that mistake again.

Well, yes, I saw.

...but my problem remains: how do we handle the Partition Elimination
via joins. What do you think of my later discussion on the possibility
of nested loops joins via Multi-indexscans. I don't like that answer,
but it does give us something...can you see a different way?

> > 5. Constraints or specific Partitioning syntax?
> 
> > Partition Elimination relies upon being able to prove at execution time
> 
> You mean plan time.

No, definitely execution time. Estimated only at plan time.

We need to support Bound plans, with constant supplied as parameter. We
also need to support Partition Elimination via restrictions using time-
functions, most of which are Stable functions, which again are not
evaluated until execution time.

Yeh, we might be able to prove it at Plan time, but an implementation
that offered Plan-time-only PE would be too restrictive.

> > that two clauses in a query restriction are always false when taken
> > together. 
> 
> We already have a reasonably decent implementation of such proving for
> partial-index predicate handling.  I see no reason not to use that.
> So I don't agree with the idea of special-purpose syntax or logic.

Oh? Thanks. I wasn't looking forward to that bit...

I'd seen the partial CNF stuff for OR handling and the range clause
identification for clause selectivity.

Many thanks,

Best Regards, Simon Riggs



Re: Table Partitioning, Part 1

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > Partition Elimination relies upon being able to prove at execution time
> 
> You mean plan time.

Fwiw, both are possible.

In oracle there are (at least) three different cases:

1. For queries like "select * from tab" the plan shows a multiple partition
scan.

2. For queries like "select * from tab where partition_key = 1" the plan
shows the partition that the planner proved would suffice.

3. For queries like select * from tab where partition_key = ?" the plan
shows a single partition scan of a partition to be determined at run-time.

This last case can be very important for joins as well. For queries like
select * from tab,tab2 where tab1.partition_key = tab2.fk" the planner can
prove that each nested loop iteration will only require a single btree lookup.
This is important since if it can't prove this the nested loop has to do
multiple index lookups making it look very poor compared to a merge join or
hash join.

There are also cases where Oracle knows that the plan node will involve
reading multiple partitions but not all.

-- 
greg



Re: Table Partitioning, Part 1

From
Hannu Krosing
Date:
On E, 2005-05-09 at 23:30 +0100, Simon Riggs wrote:
> A more in-depth consideration of the major design options and trade-offs
> available to us... this is an internals related discussion.
> 
> Comments?
> 
> 1. Embellish inheritance or separate infrastructure?
> 
> Inheritance is a somewhat strange PostgreSQL feature, with a few
> gotchas. The "big one" is the lack of a multi-relation index that can be
> used against all of the inherited relations also.

I think this can be solevd in  two ways - 

1. create a "merged index" which is similar to a partitioned table,
where inserts go to the index corresponding to underlying table but
selects (including uniqueness checks) are done by merging data from
several indexes. This also solves the problem of fast adding and
deleting the partitions to/from the global index.

2. second is variation of my old suggestion of using the fact that we
have 1G storage tables for implementing "partitioned tables" - just make
a mapping from each 1G (128kpage) range to subtable/offset and then
build the index as usual. This has problems with max partitioned table
size (same as our current max for ordinary tables) and also slow adding
of new partitions - deleting can be done fast by mapping the allocated
range to 'removed'.

If 1. can be implemented reasonably efficiently for large number of
merged indexes then there is no point in investigating 2. .

> The full complexity of multiple inheritance, possible multi-level
> inheritance doesn't seem to solve any problems for us. The main argument
> in favour of improving inheritance as a route to partitioning is that
> many people have already developed applications using it to provide some
> of the desirable aspects of partitioning. The amount of code used to
> provide some of these features is actually quite low, and the code
> itself is already specialised. 
> 
> It seems prudent to avoid building on that foundation, even though we
> may decide to use some similar approaches.

I would not mind if we give up multiple inheritance. 

OTOH multi-level inheritance (and thus multi-level partitioning) can be
quite useful sometimes. For example I could have old data partitioned by
YEAR, but newer also sub-partitioned by month or even day.

Then I could manually PE all previous year by simply doing my query
against sales_2005 and let the automatic PE worry for doing the right
thing within this year.

Also, the multi-level nature of inheritance should be hidden from
planner/optimiser by moving the sub-sub-sub tables up the hierarchy into
a single flat append-like plan. 

At least until/if we start applying check constraints to table and its
all child tables, in which case some of the PE could be done on
intermediate nodes before expansion.

> 2. Individual Relations explicitly in the plan or MultiRelation plan
> nodes? (i.e. is a SeqScan of a Partitioned Table one Node or many
> nodes?)
> 
> Currently, Inheritance produces a plan for a SeqScan that looks like
> this...
> 
> Append
>   SeqScan
>   SeqScan
>   SeqScan

Seems reasonable for a seqscan :)

I have an inheritance-paritioned table of about 50 tables and when I
restrict it on some column that has an index then the plan does index-
scan on bigger tables and seqscan with some old tables which have only a
few thousands of rows,

> ISTM fairly straightforward to produce a similar "static" plan along the
> same lines, using Result nodes to implement Partition Elimination.
> 
> Append
>   Result
>     SeqScan
>   Result
>     SeqScan
>   Result
>     SeqScan

So you mean another node that looks inside seqscans restrictions and
then determines if there is any chance that seqscan will return any
rows ? 

Why can't this be done at planning time ? And if it can't be done at
planning time, then how do you determine which plan is cheapest ?

> Some may think that looks a little clunky, especially since the plan
> length would increase as the number of Partitions increased. An
> alternative, would be to have a single Scan node that performs a Multi-
> Partition Scan, which looks like this...
> 
> PartitionSeqScan
> 
> There'd always one node, no matter how many partitions. Again, easy to
> implement, since we just add a new type of Scan access method, or alter
> the existing ones iff we access partitioned tables. We'd be able to use
> physical tlists, and all the complexity is hidden.

Pehaps we could have just one scan type which looks like this

Scan

And hide all the complexity within it ? ;p

Greater visibility of internals is good, both for optimiser and for user
trying to understand what is going on (explain analyse). 

I don't think many nodes are a problem per se, only when they cause some
non-linear growth in some operations.

>
...
> 
> My current thinking is to implement explicitly partition plans.

Agreed.

<OT: philosopical musing>

As a general guideline I think it is (almost) never a good idea to trade
CPU cycles for disk accesses.

The speed difference between CPU and disk seems destined to grow in the
forseeable future and unless some cheap RAM-like storage suddenly
emerges, this does not change.

</OT: philosopical musing>

> But what about join plans? There are Use Cases that depend almost
> entirely upon joins for Partition Elimination rather than direct
> restriction of the PPK. (PPUC2)
> 
> How could a join plan be performed with the explicit plan shown above?
> The only option would seem to be a Sort/Merge, which could be a disaster
> if we only wanted a few rows returned from a very large table.

I think the new bitmap scan code in 8.1 solves a similar problem, only
at a smaller scale (eliminating pages/tuples instead of partitions). 

Perhaps something similar can be used here for PE.

> If all partitions in the query had identical indexes on them, then we
> have another option. In that case, each index could be thought to form
> part of a larger index ordered initially on the Partitioning Key (PPK).
> If the first column was actually the PPK, then the set of indexes would
> be exactly equivalent to a Global Index. We can call this a Pseudo
> Global Index.
> 
> The Pseudo Global Index could also be used to enforce uniqueness. If all
> of the composite indexes were defined unique and the index contained the
> PPK as one of its columns, this would work. 
>
> The index enforces
> uniqueness within each partition and the PPK enforces uniqueness across
> partitions because the same PPK value cannot be in two partitions.

But only uniqueness of PPK, not any other columns.

> We can also use the Pseudo Global Index as the object of a multi-index
> scan node, since the PGI would be ordered on the PPK. The plan node
> would use the partition boundary definitions to locate the starting
> partition, then seek within the partition to the correct part of the
> index then scan sideways, skipping to the next partition as required.
> 
> This sounds a great idea, but it is exactly the technique I argued
> against only 4 paragraphs ago for use with SeqScans.

Also this works only for PPK and not other columns. 

> My current thinking is that we need a mix of the two approaches, though
> I raise all of this here to gain everybody's input...

I'm afraid that hiding the decisions inside the access methods will make
things harder for the optimiser .

Still there may be cases where smarter access methods make sense as an
additional feture, though I cant come up with an example right now.

> 3. Partitions are relations? Or sub-relations?
> 
> If we ever want to implement explicit plans, as discussed above, ISTM
> that we should consider partitions to be full relations.
> 
...
> It seems much simpler to have partitions as relations...

Agreed

> 4. Should all partitions have the same indexes?
> 
> It also seems that we have somewhat opposing Use Cases: PPUC2 requires
> joins to achieve Partition Elimination, PPUC 5.2 requires us to have no
> indexes on older partitions (since they reside on tape).
> 
> Some people will say that having all indexes defined the same eases
> maintenance. Others will say it gives us flexibility.
> 
> I would argue in favour of flexibility now and automation later.

Yes.

> 5. Constraints or specific Partitioning syntax?
> 
> Partition Elimination relies upon being able to prove at execution time
> that two clauses in a query restriction are always false when taken
> together. 

As much as possible of PE should be done at planning time, as only then
can our cost-based optimiser make use of it. 

Execution-time PE (smart access methods) is mostly a black box for
optimiser, and thus may often produce pessimal plans.

> Taken generically, extending the concept of Partitioning to the max, we
> might imagine that we could take any number of constraints, then AND
> them together with an arbitrarily complex restriction clause, solve it
> and then eliminate partitions. That seems too much work to solve that
> generically and provably. I aim for an easier subset, with a more
> bounded solution algorithm.

This could be done by converting the "arbitrarily complex restriction
clause" and convert it into a list of "ANDed simpler clauses" (I guess
we already have most code needed for that) and then run over "simple
enough" clauses. If we can prove that any 2 of them contradict then the
partition can be liminated.

There are 2 possibly expensive steps: 

1. the conversion to "AND'ed list of simple clauses" (unknown
complexity) 

2. matching each of "simple" clauses in the and list with all others
(should be N+(N-1)+(N-2)+..+(1) ~= 2N) complexity)

fortunately 1. needs to be only once, as this part stays constant for
the query and all constraints bfrom the subtable/partition (CHECK's and
other) are appended to AND list. The appended list does also not need to
be checked among themselves as they are by definition non-contradictory
(unless the partition is empty ;) further decreasing the complexity.


This approach can be made equal to your proposal if the check for
simplicity of clauses returns true only for range and equality
comparisons on PPK. 

Then later when the system is made to deal with more complicated pairs
of clauses , the "is simple clause" check will be expanded
correspondingly. 

Of course the checks themselves can bail out with "don't know" response
any time, so the "is simple clause" is just an optimisation. 

For a query with a where clause expanding to 4 part ANDed list and a
subtable with 3 check's we need to do only 4 x 3 = 12 PE comparisons per
partition if we compare each restrictio from where clause with each from
checks.

If we only have 1 "simple enough" clause in WHERE and 1 in CHECK then we
need to do 4 "simple enough" check per plan + 3 per partiotion + one
actual PE check per partition.

> If we know which constraints are to be used for Partition Elimination,
> and we make some restrictions on the types of restricion clauses that
> can take part in Partition Elimination, then the task gets easier.
> 
> So, I think this requires explicit syntax, even though I am very
> attracted to minimal syntax implementations.

I don't like explicit syntax. We have objected adding optimiser hints so
far, why should we start now ?

Maybe we could just consider all range and equality checks (IN expands
to a list of OR's and does not qualify here, NOT IN does, as this is a
list of ANDs) for the start and move on later if needed.

> Anyway, that was a bit of a ramble, so thanks for reading this far.

So were my comments :) Thanks for everybody who has read this far !

-- 
Hannu Krosing <hannu@skype.net>



Re: Table Partitioning, Part 1

From
Hannu Krosing
Date:
On T, 2005-05-10 at 16:31 +0300, Hannu Krosing wrote:
> On E, 2005-05-09 at 23:30 +0100, Simon Riggs wrote:

> There are 2 possibly expensive steps: 
> 
> 1. the conversion to "AND'ed list of simple clauses" (unknown
> complexity) 
> 
> 2. matching each of "simple" clauses in the and list with all others
> (should be N+(N-1)+(N-2)+..+(1) ~= 2N) complexity)

actually not 2N but (N * ((N-1)/2) , thus 3 clauses need 2+1=3 checks
and 11 clasues need (10+9+..+1) = 55 checks.

-- 
Hannu Krosing <hannu@skype.net>



Re: Table Partitioning, Part 1

From
"Jim C. Nasby"
Date:
On Tue, May 10, 2005 at 12:16:17AM +0100, Simon Riggs wrote:
> On Mon, 2005-05-09 at 18:53 -0400, Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > 1. Embellish inheritance or separate infrastructure?
> > 
> > > It seems prudent to avoid building on that foundation, even though we
> > > may decide to use some similar approaches.
> > 
> > I disagree.  The code is there, it could use work, and what you are
> > basically proposing is to duplicate both the existing work and much
> > of the improvement it needs.
> 
> Minefields need clearing someday, I suppose. 
> 
> Multiple inheritance isn't something I'll be spending time on though.

I'm also not sure that inheritance would support all cases. For example,
in some situations PPUC3 leads to doing individual value partitioning,
where a partition is guaranteed to have only one value for part of the
PPK, meaning that there's no reason to store that part of the key in the
partition itself. Currently this is possible with partitions built out
of views but not out of inherited tables.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Table Partitioning, Part 1

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> On Tue, May 10, 2005 at 12:16:17AM +0100, Simon Riggs wrote:
>> On Mon, 2005-05-09 at 18:53 -0400, Tom Lane wrote:
>>> I disagree.  The code is there, it could use work, and what you are
>>> basically proposing is to duplicate both the existing work and much
>>> of the improvement it needs.
>> 
>> Minefields need clearing someday, I suppose. 
>> 
>> Multiple inheritance isn't something I'll be spending time on though.

> I'm also not sure that inheritance would support all cases.

My point seems to have been widely misunderstood ;-)

I was not suggesting that partitioning must be built on top of
inheritance, nor vice versa, nor that they need to support exactly
the same feature sets.  What I am saying is that if you adopt an
NIH attitude to the existing code, you are going to end up with a
lot of duplication.  There is a substantial amount of potentially
common infrastructure, as well as common problems that you might as
well solve for both cases at once.  (Remember the inventor's paradox:
the more general problem is often easier to solve.)  In particular,
the planning problems look essentially the same to me, as does the
indexing problem.
        regards, tom lane


Re: Table Partitioning, Part 1

From
Simon Riggs
Date:
On Tue, 2005-05-10 at 16:31 +0300, Hannu Krosing wrote:
> On E, 2005-05-09 at 23:30 +0100, Simon Riggs wrote:
> > ISTM fairly straightforward to produce a similar "static" plan along the
> > same lines, using Result nodes to implement Partition Elimination.
> > 
> > Append
> >   Result
> >     SeqScan
> >   Result
> >     SeqScan
> >   Result
> >     SeqScan
> 
> So you mean another node that looks inside seqscans restrictions and
> then determines if there is any chance that seqscan will return any
> rows ? 
> 
> Why can't this be done at planning time ? And if it can't be done at
> planning time, then how do you determine which plan is cheapest ?

It can if you have constants. If you have you have bind variables or
Stable functions then it can only be done for certain at execution time,
though can be estimated prior at planning time. This is exactly what is
done currently in the planner for constant eval and simplification.

> > If all partitions in the query had identical indexes on them, then we
> > have another option. In that case, each index could be thought to form
> > part of a larger index ordered initially on the Partitioning Key (PPK).
> > If the first column was actually the PPK, then the set of indexes would
> > be exactly equivalent to a Global Index. We can call this a Pseudo
> > Global Index.
> > 
> > The Pseudo Global Index could also be used to enforce uniqueness. If all
> > of the composite indexes were defined unique and the index contained the
> > PPK as one of its columns, this would work. 
> >
> > The index enforces
> > uniqueness within each partition and the PPK enforces uniqueness across
> > partitions because the same PPK value cannot be in two partitions.
> 
> But only uniqueness of PPK, not any other columns.

No, it would work for *any* set of columns that included the PPK.

> > We can also use the Pseudo Global Index as the object of a multi-index
> > scan node, since the PGI would be ordered on the PPK. The plan node
> > would use the partition boundary definitions to locate the starting
> > partition, then seek within the partition to the correct part of the
> > index then scan sideways, skipping to the next partition as required.
> > 
> > This sounds a great idea, but it is exactly the technique I argued
> > against only 4 paragraphs ago for use with SeqScans.
> 
> Also this works only for PPK and not other columns. 

This would work for any set of columns, as long as the PPK were the
leftmost column in the set.

> > My current thinking is that we need a mix of the two approaches, though
> > I raise all of this here to gain everybody's input...
> 
> I'm afraid that hiding the decisions inside the access methods will make
> things harder for the optimiser .

Me too.

> Still there may be cases where smarter access methods make sense as an
> additional feture, though I cant come up with an example right now.

Look at PPUC 2 Join partition elimination, which is the classic Fact to
TimeDimension join.


Hannu,

It's time to follow your thinking for the first implementation:
just implement Constraints as possible gating tests on each inherited
relation. No syntax changes, just some light but complex internals.

Once that works, we can discuss improving it to solve other required use
cases. Until we're at that stage, we should leave further discussion to
one side, now that we have the basic architecture agreed: inheritance.

Best Regards, Simon Riggs



Re: Table Partitioning, Part 1

From
Simon Riggs
Date:
On Tue, 2005-05-10 at 16:44 +0300, Hannu Krosing wrote:
> On T, 2005-05-10 at 16:31 +0300, Hannu Krosing wrote:
> > On E, 2005-05-09 at 23:30 +0100, Simon Riggs wrote:
> 
> > There are 2 possibly expensive steps: 
> > 
> > 1. the conversion to "AND'ed list of simple clauses" (unknown
> > complexity) 
> > 
> > 2. matching each of "simple" clauses in the and list with all others
> > (should be N+(N-1)+(N-2)+..+(1) ~= 2N) complexity)
> 
> actually not 2N but (N * ((N-1)/2) , thus 3 clauses need 2+1=3 checks
> and 11 clasues need (10+9+..+1) = 55 checks.

Well, it doesn't need to be quite that bad.

We can just check each of the table constraints against each of the
Restrict clauses. 

We don't need to test the constraints against themselves, since the
table would be empty if any ever turned out to be mutually exclusive, so
we wouldn't be saving time doing it. Later, we can enforce constraints
to be mutually exclusive at the time they are created.

We don't need to test the Restrict clauses against themselves either,
since we can assume that the user is smart enough to submit SQL that
returns some rows. Eval will pick up the stupid case if there is one.

Anyway, the existing code has an even simpler heuristic for avoiding the
full check...

Best Regards, Simon Riggs




Re: Table Partitioning, Part 1

From
Simon Riggs
Date:
On Tue, 2005-05-10 at 15:01 -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > On Tue, May 10, 2005 at 12:16:17AM +0100, Simon Riggs wrote:
> >> On Mon, 2005-05-09 at 18:53 -0400, Tom Lane wrote:
> >>> I disagree.  The code is there, it could use work, and what you are
> >>> basically proposing is to duplicate both the existing work and much
> >>> of the improvement it needs.
> >> 
> >> Minefields need clearing someday, I suppose. 
> >> 
> >> Multiple inheritance isn't something I'll be spending time on though.
> 
> > I'm also not sure that inheritance would support all cases.
> 
> My point seems to have been widely misunderstood ;-)
> 
> I was not suggesting that partitioning must be built on top of
> inheritance, nor vice versa, nor that they need to support exactly
> the same feature sets.  What I am saying is that if you adopt an
> NIH attitude to the existing code, you are going to end up with a
> lot of duplication.  There is a substantial amount of potentially
> common infrastructure, as well as common problems that you might as
> well solve for both cases at once.  (Remember the inventor's paradox:
> the more general problem is often easier to solve.)  In particular,
> the planning problems look essentially the same to me, as does the
> indexing problem.

Agreed - no worries.

I was already seeing that I can't solve all of the Bizgres use cases at
once, so its time for me to pick one and make that work. If I can do at
least one in a way that solves a general case problem, then I'm happy to
do it that way. We may learn something during the journey about the best
approach for the other use cases.

...and at least we would have something to put into 8.1

Best Regards, Simon Riggs



Re: Table Partitioning, Part 1

From
Hannu Krosing
Date:
On T, 2005-05-10 at 23:09 +0100, Simon Riggs wrote:
> On Tue, 2005-05-10 at 16:31 +0300, Hannu Krosing wrote:

> > > If all partitions in the query had identical indexes on them, then we
> > > have another option. In that case, each index could be thought to form
> > > part of a larger index ordered initially on the Partitioning Key (PPK).
> > > If the first column was actually the PPK, then the set of indexes would
> > > be exactly equivalent to a Global Index. We can call this a Pseudo
> > > Global Index.
> > > 
> > > The Pseudo Global Index could also be used to enforce uniqueness. If all
> > > of the composite indexes were defined unique and the index contained the
> > > PPK as one of its columns, this would work. 
> > >
> > > The index enforces
> > > uniqueness within each partition and the PPK enforces uniqueness across
> > > partitions because the same PPK value cannot be in two partitions.
> > 
> > But only uniqueness of PPK, not any other columns.
> 
> No, it would work for *any* set of columns that included the PPK.

What I meant was that it would quarantee the uniqueness of the whole set
only, not any other columns except the PPK. and in case PPK was itself
uniqe the other columns don't matter at all.

> > Still there may be cases where smarter access methods make sense as an
> > additional feture, though I cant come up with an example right now.
> 
> Look at PPUC 2 Join partition elimination, which is the classic Fact to
> TimeDimension join.

Maybe using a global index (+ bitmap scan if number of matching rows is
large and not clustered) here is enough for stage 1 implementation. 

As PPUC2 needs PE elimination step for each and every value, using
global index for "kind of PE" could be the most efficient way to do it
for quite large class of PPUC2 queries.

-- 
Hannu Krosing <hannu@skype.net>