Thread: Join optimization for inheritance tables

Join optimization for inheritance tables

From
Nedyalko Borisov
Date:
Hi all,

We are working with Aster for the summer and we would like to bounce some ideas that we are having for some possible
PostgreSQLextensions. In order to describe our ideas we will use the following example:
 

create table msg(  msg_id int,  msg text );

create table receiver(  msg_id int,  user_id int,  ts timestamp );


create table msg_100(  check ( 1 <= msg_id and msg_id < 100 )  ) inherits (msg);

create table msg_200(  check ( 100 <= msg_id and msg_id < 200 )  ) inherits (msg);

create table msg_300(  check ( 200 <= msg_id and msg_id < 300 )  ) inherits (msg);


create table receiver_100(  check ( 1 <= msg_id and msg_id < 100 )  ) inherits (receiver);

create table receiver_200(  check ( 100 <= msg_id and msg_id < 200 )  ) inherits (receiver);

create table receiver_300(  check ( 200 <= msg_id and msg_id < 300 )  ) inherits (receiver);


When we are issuing queries on one of the parent tables, like,
   SELECT * FROM msg WHERE msg_id BETWEEN 50 AND 70;

PostgreSQL is smart enough to filter out child tables with check constraints that are refuted by the filter conditions.
Inthis example, the optimizer will pick a plan that only considers the parent table 'msg' and one of the child tables
'msg_100':
   Result      ->  Append            ->  Seq Scan on msg                  Filter: ((msg_id >= 50) AND (msg_id <= 70))
        ->  Seq Scan on msg_100 msg                  Filter: ((msg_id >= 50) AND (msg_id <= 70))
 

Plan costs are removed for simplicity of the presentation.

Now, if we issue a join query between the two parent tables, like,
   SELECT * FROM msg m JOIN receiver r ON m.msg_id = r.msg_id;

the execution plan will be:
   Merge Join      Merge Cond: (m.msg_id = r.msg_id)      ->  Sort            Sort Key: m.msg_id            ->  Append
               ->  Seq Scan on msg m                  ->  Seq Scan on msg_100 m                  ->  Seq Scan on
msg_200m                  ->  Seq Scan on msg_300 m      ->  Sort            Sort Key: r.msg_id            ->  Append
              ->  Seq Scan on receiver r                  ->  Seq Scan on receiver_100 r                  ->  Seq Scan
onreceiver_200 r                  ->  Seq Scan on receiver_300 r
 


During the planning phase, the optimizer treats an entire hierarchy as a single entity. Hence, it first considers the
mostefficient way to create the append paths for the two hierarchies, and then the best way to join them. However,
thereare some optimizations that are possible here, similar to the table filtering described above. In particular,
insteadof joining the two appends, we could "push down" the join to the child relations - that is, create pairwise
joinsbetween the children and then append the join results together.
 

Based on the check conditions of the children and the join predicate, it is possible to filter out joins that cannot
produceany results. For example, joining 'msg_100' with 'receiver_300' is redundant since the check constraints of
thesetwo tables do not overlap. Tuples in 'msg_100' have 'msg_id' between 1 and 100, whereas tuples in 'receiver_300'
have'msg_id' between 200 and 300. Therefore, no tuples can be produce from this join.
 

A plan with such optimizations could be:
 Result   ->  Append        ->  Hash Join              Hash Cond: (msg.msg_id = receiver.msg_id)              ->  Seq
Scanon msg msg              ->  Hash                    ->  Seq Scan on receiver receiver        ->  Hash Join
   Hash Cond: (msg.msg_id = receiver.msg_id)              ->  Seq Scan on msg msg              ->  Hash
  ->  Seq Scan on receiver_100 receiver        ->  Hash Join              Hash Cond: (msg.msg_id = receiver.msg_id)
        ->  Seq Scan on msg msg              ->  Hash                    ->  Seq Scan on receiver_200 receiver
-> Hash Join              Hash Cond: (msg.msg_id = receiver.msg_id)              ->  Seq Scan on msg msg
-> Hash                    ->  Seq Scan on receiver_300 receiver        ->  Hash Join              Hash Cond:
(msg.msg_id= receiver.msg_id)              ->  Seq Scan on msg_100 msg              ->  Hash                    ->  Seq
Scanon receiver receiver        ->  Hash Join              Hash Cond: (msg.msg_id = receiver.msg_id)              ->
SeqScan on msg_100 msg              ->  Hash                    ->  Seq Scan on receiver_100 receiver        ->  Hash
Join             Hash Cond: (msg.msg_id = receiver.msg_id)              ->  Seq Scan on msg_200 msg              ->
Hash                   ->  Seq Scan on receiver receiver        ->  Hash Join              Hash Cond: (msg.msg_id =
receiver.msg_id)             ->  Seq Scan on msg_200 msg              ->  Hash                    ->  Seq Scan on
receiver_200receiver        ->  Hash Join              Hash Cond: (msg.msg_id = receiver.msg_id)              ->  Seq
Scanon msg_300 msg              ->  Hash                    ->  Seq Scan on receiver receiver        ->  Hash Join
       Hash Cond: (msg.msg_id = receiver.msg_id)              ->  Seq Scan on msg_300 msg              ->  Hash
          ->  Seq Scan on receiver_300 receiver
 

The parent tables don't have any check constraints in the above example and hence will need to join with all the child
relations.However, based on usage and feedback among Aster's customers, we believe that it is common practice not to
storeany data in the parent tables. Therefore, we were thinking of creating an "Empty Check Constraint". This
constraintwill enforce that a particular table is empty and will prevent the insertion of any rows into it. If the
parenttables have the empty check constraint, then the optimizer will be able to further eliminate some of the child
joins.

A plan with all optimizations and features suggested could be:
 Result   ->  Append        ->  Hash Join              Hash Cond: (msg.msg_id = receiver.msg_id)              ->  Seq
Scanon msg_100 msg              ->  Hash                    ->  Seq Scan on receiver_100 receiver        ->  Hash Join
           Hash Cond: (msg.msg_id = receiver.msg_id)              ->  Seq Scan on msg_200 msg              ->  Hash
              ->  Seq Scan on receiver_200 receiver        ->  Hash Join              Hash Cond: (msg.msg_id =
receiver.msg_id)             ->  Seq Scan on msg_300 msg              ->  Hash                    ->  Seq Scan on
receiver_300receiver
 


In summary, we are making two suggestions:
1. Extend the optimizer to consider joins between child tables when hierarchies are joined together.
2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty.

We've created some basic implementation, just to verify the benefits of these ideas. The initial results seem promising
aswe observed decreased execution time and for large data set the query execution time could be several times faster
thanbefore.
 
We would greatly appreciate your comments, thoughts or suggestions that you might have regarding these ideas.

Regards,
Nedyalko Borisov and Herodotos Herodotou





Re: Join optimization for inheritance tables

From
Tom Lane
Date:
Nedyalko Borisov <nedyalko@asterdata.com> writes:
> In summary, we are making two suggestions:
> 1. Extend the optimizer to consider joins between child tables when hierarchies are joined together.

We already handle this for the case where the join is nestloop with
inner index scan, and I'm not convinced that there's any real gain to be
had for other join types.

> 2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty.

The trouble with that is that a constraint that doesn't propagate to its
child tables is a weird beast that I'd just as soon not invent.

We are currently thinking about inventing an explicit notion of
partitioned tables.  If we had that, it would be reasonable to have
a special kind of "parent" table for a partitioned set and refuse to
allow any data in that relation.  But I'm not excited about contorting
the general constraint mechanism in the way that would be necessary to
express this as a constraint.
        regards, tom lane


Re: Join optimization for inheritance tables

From
Nedyalko Borisov
Date:
Tom Lane wrote:
> Nedyalko Borisov <nedyalko@asterdata.com> writes:
>   
>> In summary, we are making two suggestions:
>> 1. Extend the optimizer to consider joins between child tables when hierarchies are joined together.
>>     
>
> We already handle this for the case where the join is nestloop with
> inner index scan, and I'm not convinced that there's any real gain to be
> had for other join types.
>
>   From OLTP perspective this proposal won't introduce any benefits due to 
the fact that queries operate on small parts of the data, so we can add 
a flag that will disable/enable the inherited join.
However, the OLAP queries process significant amount of data and to 
leverage this fact the DB admins partition the data. We think that the 
optimizer should take advantage of this partitioning and consider plans 
where the joins are performed on small parts of the data.

For example, typical observed scenario is: optimizer chooses Merge Join 
for two partitioned tables like the plan below:Merge Cond: (table1.id = table2.id)   -> Sort       Sort Key: id
->Append           -> Seq Scan on table1_part1           -> Seq Scan on table1_part2       -> ....           -> Seq
Scanon table1_partN      ->  Sort           Sort Key: id           -> Append              -> Seq Scan on table2_part1
          -> Seq Scan on table2_part2              -> ....              -> Seq Scan on table2_partM
 

This plan ignores the fact there are indexes on the table2 partitions 
and that the pairwise partitions joins (index nested loop or hash join) 
will be faster than scanning all the partitions and sorting them.

To see the effect of the pairwise joins we performed some experiments 
with the initial implementation. The experiments consisted of joining 
two partitioned tables where each of the tables have around 200 children 
and the 2 int columns id and count. We generated data of different sizes 
and measured the execution times and here are the results:
0.5 million records -> regular plan 0.69s -> modified plan 0.51
1 million records -> regular plan 2.9s -> modified plan 1
2.5 million records -> regular plan 4.17s -> modified plan 2.28
5 million records -> regular plan 11.25s -> modified plan 4.46

Increasing the data size or adding more columns will increase the 
difference between the current plan that the database picks and the 
proposed modification of the plans. Thus, we thing that it might be 
useful if the optimizer considers plans with inherited joins.

>> 2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty.
>>     
>
> The trouble with that is that a constraint that doesn't propagate to its
> child tables is a weird beast that I'd just as soon not invent.
>
> We are currently thinking about inventing an explicit notion of
> partitioned tables.  If we had that, it would be reasonable to have
> a special kind of "parent" table for a partitioned set and refuse to
> allow any data in that relation.  But I'm not excited about contorting
> the general constraint mechanism in the way that would be necessary to
> express this as a constraint.
>
>   
OK, implementing a special "abstract"/"parent" table would make more 
sense. In this line of thoughts could you elaborate on the explicit 
notion of partitioned tables or give us some references.

Thanks,
Nedyalko Borisov and Herodotos Herodotou

>             regards, tom lane
>   



Re: Join optimization for inheritance tables

From
Greg Stark
Date:
On Mon, Jul 6, 2009 at 10:23 PM, Nedyalko Borisov<nedyalko@asterdata.com> wrote:
> For example, typical observed scenario is: optimizer chooses Merge Join for
> two partitioned tables like the plan below:
> Merge Cond: (table1.id = table2.id)
>   -> Sort
>       Sort Key: id
>       -> Append
>           -> Seq Scan on table1_part1
>           -> Seq Scan on table1_part2
>       -> ....
>           -> Seq Scan on table1_partN
>      ->  Sort
>           Sort Key: id
>           -> Append
>              -> Seq Scan on table2_part1
>              -> Seq Scan on table2_part2
>              -> ....
>              -> Seq Scan on table2_partM
>
> This plan ignores the fact there are indexes on the table2 partitions and
> that the pairwise partitions joins (index nested loop or hash join) will be
> faster than scanning all the partitions and sorting them.

To some degree my merge-append patch would mitigate this case. It
would allow the use of indexes on some or all the partitions to avoid
the sorts.

However it would still force all the partitions to be appended on each
side and then merged. If we could match up all the partitions then I
think this plan would be faster with the Append on top and separate
merge joins for each pair of partitions.

Aside from skipping the cpu cost of the merge-append I think it would
win for a few other reasons as well. Each join would be able to come
up with much better statistics which would enable it to pick a better
join when one is available. Even if the planner still picks a merge
join it would be much more likely to finish early and skip the
remainder of a partition on one side or the other.

> OK, implementing a special "abstract"/"parent" table would make more sense.

I had in mind to tackle this in a bit of a roundabout way. If we mark
the parent table "read-only" then notice that all tuples (all 0 of
them) in the table are frozen then we can discard that table from the
plans. Since changing the "read-only" attribute would have to be
treated as a DDL operation which would invalidate any cached plans we
can trust that it won't change as long as the plan lives so no new
tuples can be inserted.

The reason I wanted to take such a roundabout route instead of having
an "abstract" or "empty" property is that a wanted to generalize this.
Once we know a table is read-only then there are lots of properties we
could find useful in planning aside from emptiness. We could have
statistics like the minimum and maximum value for a column which the
planner would be able to trust and exclude partitions without having
to explicitly declare constraints on every column.

This is all just my musings, not any kind of consensus. Does it make
sense to others or is it too baroque when a simple "abstract" flag
would do?

--
greg
http://mit.edu/~gsstark/resume.pdf


Re: Join optimization for inheritance tables

From
Herodotos Herodotou
Date:
This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together.

Short description: when the optimizer considers join paths between two tables with child tables, it only creates join
pathsover two append paths. In this patch we extend the optimizer to consider joins between the child tables from the
twoparent relations, based on the child table constraints. The idea is that only child tables with overlapping
constraintsneed to be joined, and not the entire hierarchies with each other. This optimization can provide significant
benefitsin terms of execution time. A short motivation example as well as the overall algorithm description could be
foundin the attached presentation. 

Herodotos & Nedyalko


________________________________________
From: gsstark@gmail.com [gsstark@gmail.com] On Behalf Of Greg Stark [gsstark@mit.edu]
Sent: Monday, July 06, 2009 3:14 PM
To: Nedyalko Borisov
Cc: Tom Lane; pgsql-hackers@postgresql.org; Herodotos Herodotou; Eric Friedman; John Cieslewicz; Dheeraj Pandey
Subject: Re: Join optimization for inheritance tables

On Mon, Jul 6, 2009 at 10:23 PM, Nedyalko Borisov<nedyalko@asterdata.com> wrote:
> For example, typical observed scenario is: optimizer chooses Merge Join for
> two partitioned tables like the plan below:
> Merge Cond: (table1.id = table2.id)
>   -> Sort
>       Sort Key: id
>       -> Append
>           -> Seq Scan on table1_part1
>           -> Seq Scan on table1_part2
>       -> ....
>           -> Seq Scan on table1_partN
>      ->  Sort
>           Sort Key: id
>           -> Append
>              -> Seq Scan on table2_part1
>              -> Seq Scan on table2_part2
>              -> ....
>              -> Seq Scan on table2_partM
>
> This plan ignores the fact there are indexes on the table2 partitions and
> that the pairwise partitions joins (index nested loop or hash join) will be
> faster than scanning all the partitions and sorting them.

To some degree my merge-append patch would mitigate this case. It
would allow the use of indexes on some or all the partitions to avoid
the sorts.

However it would still force all the partitions to be appended on each
side and then merged. If we could match up all the partitions then I
think this plan would be faster with the Append on top and separate
merge joins for each pair of partitions.

Aside from skipping the cpu cost of the merge-append I think it would
win for a few other reasons as well. Each join would be able to come
up with much better statistics which would enable it to pick a better
join when one is available. Even if the planner still picks a merge
join it would be much more likely to finish early and skip the
remainder of a partition on one side or the other.

> OK, implementing a special "abstract"/"parent" table would make more sense.

I had in mind to tackle this in a bit of a roundabout way. If we mark
the parent table "read-only" then notice that all tuples (all 0 of
them) in the table are frozen then we can discard that table from the
plans. Since changing the "read-only" attribute would have to be
treated as a DDL operation which would invalidate any cached plans we
can trust that it won't change as long as the plan lives so no new
tuples can be inserted.

The reason I wanted to take such a roundabout route instead of having
an "abstract" or "empty" property is that a wanted to generalize this.
Once we know a table is read-only then there are lots of properties we
could find useful in planning aside from emptiness. We could have
statistics like the minimum and maximum value for a column which the
planner would be able to trust and exclude partitions without having
to explicitly declare constraints on every column.

This is all just my musings, not any kind of consensus. Does it make
sense to others or is it too baroque when a simple "abstract" flag
would do?

--
greg
http://mit.edu/~gsstark/resume.pdf

Attachment

Re: Join optimization for inheritance tables

From
Euler Taveira de Oliveira
Date:
Herodotos Herodotou escreveu:
> This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together.
> 
> Short description: when the optimizer considers join paths between two tables with child tables, it only creates join
pathsover two append paths. In this patch we extend the optimizer to consider joins between the child tables from the
twoparent relations, based on the child table constraints. The idea is that only child tables with overlapping
constraintsneed to be joined, and not the entire hierarchies with each other. This optimization can provide significant
benefitsin terms of execution time. A short motivation example as well as the overall algorithm description could be
foundin the attached presentation.
 
> 
I added it to the next commitfest [1].


[1] https://commitfest.postgresql.org/action/commitfest_view?id=3


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: Join optimization for inheritance tables

From
Tom Lane
Date:
Herodotos Herodotou <Herodotos.Herodotou@asterdata.com> writes:
> This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together.

I looked over this patch a bit.  I am of the opinion that this is a lot
of work towards a dead-end direction :-(.  The patch contains a large
amount of rather inefficient code that attempts to reverse-engineer
knowledge about whether an inheritance tree forms a range-partitioned
table.  We already know that reasoning from first principles using a set
of arbitrary per-table constraints doesn't scale very well, and this
would make it a lot worse.  pgsql-hackers have been talking for some
time about implementing an explicit partitioning feature, which would
give the planner direct knowledge about this type of table structure
without expending nearly so much work (and then expending it all over
again for the next query).  I think the right way to go at it is to
implement that first, and then see about teaching the planner to plan
like this.

We could possibly extract a subset of the patch that just deals with
pushing joins down through appends, without the constraint-exclusion
logic that tries to eliminate provably empty join pairs.  But I'm
dubious that that would be worthwhile by itself.  I think you need the
constraint exclusion to eliminate a good deal of work before this is
actually better than doing a single join above the append.

There are a number of other things I don't like, such as the restrictive
and not-obviously-necessary requirement that all the join clauses be
simple "Var op Var" clauses.  But the main thing is that a large
fraction of the patch is doing something I don't think we should try
to do.
        regards, tom lane


Re: Join optimization for inheritance tables

From
Emmanuel Cecchet
Date:
Tom,

If the partitioning implementation does not make progress (and does not 
make it for 8.5), don't you think that this could be an interesting 
contribution to consider for this release?
I have put on the wiki 
(http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables) 
the results obtained so far and the use case where it is most used.

As I understand it, partitioning will certainly lead to some significant 
changes/enhancements to the planner. Do you think it is realistic to get 
that for 8.5?

manu

> Herodotos Herodotou <Herodotos.Herodotou@asterdata.com> writes:
>   
>> This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together.
>>     
>
> I looked over this patch a bit.  I am of the opinion that this is a lot
> of work towards a dead-end direction :-(.  The patch contains a large
> amount of rather inefficient code that attempts to reverse-engineer
> knowledge about whether an inheritance tree forms a range-partitioned
> table.  We already know that reasoning from first principles using a set
> of arbitrary per-table constraints doesn't scale very well, and this
> would make it a lot worse.  pgsql-hackers have been talking for some
> time about implementing an explicit partitioning feature, which would
> give the planner direct knowledge about this type of table structure
> without expending nearly so much work (and then expending it all over
> again for the next query).  I think the right way to go at it is to
> implement that first, and then see about teaching the planner to plan
> like this.
>
> We could possibly extract a subset of the patch that just deals with
> pushing joins down through appends, without the constraint-exclusion
> logic that tries to eliminate provably empty join pairs.  But I'm
> dubious that that would be worthwhile by itself.  I think you need the
> constraint exclusion to eliminate a good deal of work before this is
> actually better than doing a single join above the append.
>
> There are a number of other things I don't like, such as the restrictive
> and not-obviously-necessary requirement that all the join clauses be
> simple "Var op Var" clauses.  But the main thing is that a large
> fraction of the patch is doing something I don't think we should try
> to do.
>
>             regards, tom lane
>
>   


-- 
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com



Re: Join optimization for inheritance tables

From
Jeff Davis
Date:
On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote:
> If the partitioning implementation does not make progress (and does not 
> make it for 8.5), don't you think that this could be an interesting 
> contribution to consider for this release?
> I have put on the wiki 
> (http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables) 
> the results obtained so far and the use case where it is most used.

I think you mean that the planning time is in milliseconds, not seconds.
Also, you use "data" in a few places you meant "date".

The results seem good, and trading planning time for execution time
seems like a reasonable idea in the case of partitioned tables. We
already work harder planning when constraint_exclusion='partition', so
there is some precedent (I don't know if that's a good precedent to
follow or not).

How does it compare to using merge-append?

I haven't looked at the actual patch yet.

Regards,Jeff Davis



Re: Join optimization for inheritance tables

From
Josh Berkus
Date:
> As I understand it, partitioning will certainly lead to some significant
> changes/enhancements to the planner. Do you think it is realistic to get
> that for 8.5?

I don't think that waiting for our plans for a more robust partitioning
implementation is a reason to put off improvements to the implementation
we have.  If Simon or Pavel or someone was going all-out on getting the
new partitioning ready, that would be one thing.  But to date, nobody
has volunteered to work on it; we just know we need it.

Of course, that completely leaves aside Tom's critique of the
implementation, which sounds like it needs some work.  Trying to fit the
target table into a range partitioning mold would break with a lot of
real partitionings; for example I have several client DBs which are
partitioned active/inactive-by-date.

What about simply eliminating joins between partitioned tables by
checking which columns' constraints match exactly or are subsets?  A lot
of partitioned DBs partition everything by month, and joining two tables
which were partitioned by month would be useful by itself.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Join optimization for inheritance tables

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I don't think that waiting for our plans for a more robust partitioning
> implementation is a reason to put off improvements to the implementation
> we have.

The complaint I had was that this patch consisted largely of code that
we'd want to throw away once a better partitioning implementation is in
place.  I don't object to partial patches that move us forward on the
agreed-on path, but this one seemed to be going sideways.

> What about simply eliminating joins between partitioned tables by
> checking which columns' constraints match exactly or are subsets?

If it could be done with a <emphasis>small</> amount of throwaway
code, maybe ...
        regards, tom lane


Re: Join optimization for inheritance tables

From
Josh Berkus
Date:
> If it could be done with a <emphasis>small</> amount of throwaway
> code, maybe ...

Well, that's a reasonable goal ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Join optimization for inheritance tables

From
Herodotos Herodotou
Date:
Hi Jeff,

On Tue, Sep 22, 2009 at 8:06 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> I think you mean that the planning time is in milliseconds, not seconds.
>

The planning time is actually in seconds. Even without our feature,
planning takes a few seconds since the optimizer deals with hundreds
or even thousands of child tables. With our feature, planning time
increases by 2-3X but then again, running time improves by 6-10X. I
have added a paragraph under performance evaluation in the wiki page (
http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables
) in order to provide a better insight on what's happening under the
covers.

> The results seem good, and trading planning time for execution time
> seems like a reasonable idea in the case of partitioned tables. We
> already work harder planning when constraint_exclusion='partition', so
> there is some precedent (I don't know if that's a good precedent to
> follow or not).
>

With constraint_exclusion=partition, single child tables that cannot
produce tuples (based on their check constraints and filter
conditions) are filtered out. Our patch goes one step further by
identifying which joins of child tables can/cannot produce tuples
(based on their check constraints and join conditions).


> How does it compare to using merge-append?
>

Merge-append is essentially a special case of our patch. Greg Stark
compared it with our patch and made some good commends in a previous
email thread ( message id =
407d949e0907061514i5f1a044r691d1d74eaefb067@mail.gmail.com )

Thank you,

~Hero

-- 
Herodotos Herodotou
Graduate Student
Department of Computer Science, Duke University
Email: hero@cs.duke.edu


Re: Join optimization for inheritance tables

From
Simon Riggs
Date:
On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote:

> If the partitioning implementation does not make progress (and does not 
> make it for 8.5)

Manu, not sure if you are referring to Kedar's patch I reviewed earlier
in July, but that patch didn't implement anything like the right
internals for this to work. 

As Tom says, what we need is a single internal data structure that
describes all the partitions. That can then be used to automatically
route inserts, perform the push down merge joins and speed-up constraint
exclusion. We should build this once and store it in the relcache for
the parent relation and so would need some changes in relcache
invalidation as well to make sure child ddl changes invalidate that.

-- Simon Riggs           www.2ndQuadrant.com



Re: Join optimization for inheritance tables

From
Tom Lane
Date:
Herodotos Herodotou <hero@cs.duke.edu> writes:
> On Tue, Sep 22, 2009 at 8:06 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> I think you mean that the planning time is in milliseconds, not seconds.

> The planning time is actually in seconds.

This is exactly why I think this is a dead-end approach.  Trying to do
theorem proving from an unstructured collection of constraints simply
cannot scale to hundreds of partitions, at least not if you want
reasonable planning performance.  There are other aspects of our current
partitioning approach that don't scale either, eg the complexity of the
insert redirection triggers.  We could handle standard cases where
there's a simple partitioning rule with far less overhead than this,
if we had an explicit model of the partitioning rule inside the system.
        regards, tom lane


Re: Join optimization for inheritance tables

From
Emmanuel Cecchet
Date:
Hi Simon,

Thanks for the insight. I might take that as a long term project. I have 
to discuss that with my colleagues at Aster. It would certainly help to 
put together a wiki page with the key insights on the design of such 
implementation to be able to better scope the project and agree on what 
is the right approach for this.

manu

> On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote:
>
>   
>> If the partitioning implementation does not make progress (and does not 
>> make it for 8.5)
>>     
>
> Manu, not sure if you are referring to Kedar's patch I reviewed earlier
> in July, but that patch didn't implement anything like the right
> internals for this to work. 
>
> As Tom says, what we need is a single internal data structure that
> describes all the partitions. That can then be used to automatically
> route inserts, perform the push down merge joins and speed-up constraint
> exclusion. We should build this once and store it in the relcache for
> the parent relation and so would need some changes in relcache
> invalidation as well to make sure child ddl changes invalidate that.
>
>   


-- 
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com



Re: Join optimization for inheritance tables

From
Josh Berkus
Date:
On 9/26/09 3:00 PM, Emmanuel Cecchet wrote:
> Hi Simon,
> 
> Thanks for the insight. I might take that as a long term project. I have
> to discuss that with my colleagues at Aster. It would certainly help to
> put together a wiki page with the key insights on the design of such
> implementation to be able to better scope the project and agree on what
> is the right approach for this.

In the meantime, can you look at the possibility I suggested?  If you
could produce a patch which would implement the most common and least
complex case of partition joining: where the constraints on two
partitions match exactly (or are mathematically provable subsets) with a
small patch, it would probably work for 8.5.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com