Thread: Indexing UNIONs

Indexing UNIONs

From
Josh Berkus
Date:
Folks,

I have two tables which are often browsed together through a UNION view, like:

CREATE VIEW two_tables AS
SELECT t1.id, t1.name, t1.abbreviation, t1.juris_id
FROM t1
UNION ALL
SELECT t2.id, t2.name, NULL, t2.juris_id
FROM t2;

This works fine as a view, since I have made the id's unique between the two
tables (using a sequence).   However, as t1 has 100,000 records, it is
vitally important that queries against this view use an index.

As it is a Union view, though, they ignore any indexes:

jwnet=> explain select * from two_tables where id = 101072;
NOTICE:  QUERY PLAN:

Subquery Scan two_tables  (cost=0.00..3340.82 rows=99182 width=55) ->  Append  (cost=0.00..3340.82 rows=99182 width=55)
     ->  Subquery Scan *SELECT* 1  (cost=0.00..3339.81 rows=99181 width=55)             ->  Seq Scan on t1
(cost=0.00..3339.81rows=99181 width=55)       ->  Subquery Scan *SELECT* 2  (cost=0.00..1.01 rows=1 width=28)
 ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=28) 

EXPLAIN
jwnet=> explain select * from t1 where id = 101072;
NOTICE:  QUERY PLAN:

Index Scan using t1_pkey on cases  (cost=0.00..5.99 rows=1 width=150)


How can I make this happen?  Ideas, suggestions?   And no, putting the data
from both tables into one is not an option for various schema reasons.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Indexing UNIONs

From
Stephan Szabo
Date:
On Mon, 15 Jul 2002, Josh Berkus wrote:

> Folks,
>
> I have two tables which are often browsed together through a UNION view, like:
>
> CREATE VIEW two_tables AS
> SELECT t1.id, t1.name, t1.abbreviation, t1.juris_id
> FROM t1
> UNION ALL
> SELECT t2.id, t2.name, NULL, t2.juris_id
> FROM t2;
>
> This works fine as a view, since I have made the id's unique between the two
> tables (using a sequence).   However, as t1 has 100,000 records, it is
> vitally important that queries against this view use an index.

We had a discussion recently on -general about this.  Right now the
planner won't push the conditions down into the arms of the union because
noone's been sure under what conditions the optimization is safe.



Re: Indexing UNIONs

From
Josh Berkus
Date:
Stephan,

> We had a discussion recently on -general about this.  Right now the
> planner won't push the conditions down into the arms of the union because
> noone's been sure under what conditions the optimization is safe.

So, if performance is horrible with the view, I should use a dummy table to
hold the Unioned data and index that instead?

I can understand the difficultyof optimization.  However, the example I
supplied is the simplest of unions, and the two Seq Scans do seem to be
proceeding against each table seperately.   I think for very simple Unions
(i.e. no grouping, no filtering within subqueries, etc.) that index usage
would be reasonable to implement.

However, I can't program it myself, so I'll have to just stick to whining and
pitiful pleading <blink puppy-dog eyes, sniffle>

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Indexing UNIONs

From
Stephan Szabo
Date:
On Mon, 15 Jul 2002, Josh Berkus wrote:

> Stephan,
>
> > We had a discussion recently on -general about this.  Right now the
> > planner won't push the conditions down into the arms of the union because
> > noone's been sure under what conditions the optimization is safe.
>
> So, if performance is horrible with the view, I should use a dummy table to
> hold the Unioned data and index that instead?

Possibly.

> I can understand the difficultyof optimization.  However, the example I
> supplied is the simplest of unions, and the two Seq Scans do seem to be
> proceeding against each table seperately.   I think for very simple Unions
> (i.e. no grouping, no filtering within subqueries, etc.) that index usage
> would be reasonable to implement.

I don't think it's a difficulty of implementation thing.  I'd guess that
alot of the current stuff for shoving down conditions would apply (I
haven't looked, though).  It's more a case of making sure the optimization
cannot be a false one that changes the results.  What we need is someone
to sit down and analyze the cases in a serious way.

I think that for union all, conditions other than non-stable ones can be
pushed down.  For union, I think there might be issues due to the removal
of duplicates in certain cases where the results will change, but that the
results may not be deterministic in such cases anyway (like a case where
two values are not exactly the same but aren't distinct due to collation
or some such and so the system picks an arbitrary one and that arbitrary
one affects the output of query).  I have no good idea for EXCEPT and
INTERSECT.



Re: Indexing UNIONs

From
Bruno Wolff III
Date:
On Mon, Jul 15, 2002 at 17:31:24 -0700, Josh Berkus <josh@agliodbs.com> wrote:
> Stephan,
> 
> > We had a discussion recently on -general about this.  Right now the
> > planner won't push the conditions down into the arms of the union because
> > noone's been sure under what conditions the optimization is safe.
> 
> So, if performance is horrible with the view, I should use a dummy table to 
> hold the Unioned data and index that instead?

It wouldn't have to be a dummy table. You could have both sets of data
in the same table. Since they seem to be related enough that you went to
the trouble to give them compatible primary keys this may not be
inappropiate (though you must have had some reason for keeping them separate).
You can use a flag to indicate what the data type is. If you need fast
access to the smaller part of the table, a partial index might work.
If the column that didn't apply to the one table is always not null in the
other table, you could use is null on that column as your flag.


Re: Indexing UNIONs

From
"Josh Berkus"
Date:
Bruno,
> It wouldn't have to be a dummy table. You could have both sets of
> data
> in the same table. 

Per my original e-mail, this is not an option.

Basically, the two tables have nothing in commmon *except* that events
can be scheduled against either table.   Otherwise, the two tables have
vastly different data, which comes from completely different sources,
and is related to a totally different set of dependant tables.

So, no go.   

I run into this sort of thing a lot.  Is it just the way I design
databases, or is there a need for a more sophisticated model of
relationality for SQL03?

-Josh


Re: Indexing UNIONs

From
Bruno Wolff III
Date:
On Tue, Jul 16, 2002 at 09:36:31 -0700, Josh Berkus <josh@agliodbs.com> wrote:
> Bruno,
>  
> > It wouldn't have to be a dummy table. You could have both sets of
> > data
> > in the same table. 
> 
> Per my original e-mail, this is not an option.
> 
> Basically, the two tables have nothing in commmon *except* that events
> can be scheduled against either table.   Otherwise, the two tables have
> vastly different data, which comes from completely different sources,
> and is related to a totally different set of dependant tables.
> 
> So, no go.   
> 
> I run into this sort of thing a lot.  Is it just the way I design
> databases, or is there a need for a more sophisticated model of
> relationality for SQL03?

This sounds like a design issue. This makes it seem like the events
should be broken out into their own table and the other two tables
should get joined with the events table when needed.


Re: Indexing UNIONs

From
Josh Berkus
Date:
Bruno,

> This sounds like a design issue. This makes it seem like the events
> should be broken out into their own table and the other two tables
> should get joined with the events table when needed.
>

OK, I guess I'll have to get into detail:

Table "cases" is the database's third largest table, with 100,000 records,
plus three dependant tables and 19 attributes (fields).

Table "trial groups" is a small table listing a few dozen "cases" which are
aggregated for settlement bargaining.  Thus, each "trial group" relates to
one to many "cases".  Beyond this relationship, trial groups has only 5
attributes and 2 dependant tables.

Table "events", the largest table in the database, contains event schedule
listing with 11 attributes and one dependant table as well as recursive
relationships between events.   Each event record can be (and Must be)
related to either one Case or one Trial Group.

Thus, I need to relate (in views and queries) each Event to the Union of Cases
and Trial Groups.   I just can't figure out how to do so without the database
discarding the indexes on Cases in the process and things slowing to a crawl.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Indexing UNIONs

From
Richard Huxton
Date:
On Tuesday 16 Jul 2002 11:42 pm, Josh Berkus wrote:
> OK, I guess I'll have to get into detail:
>
[detail on cases and trial-groups]
>
> Thus, I need to relate (in views and queries) each Event to the Union of
> Cases and Trial Groups.   I just can't figure out how to do so without the
> database discarding the indexes on Cases in the process and things slowing
> to a crawl.

Well, if there was some commonality between cases and trial groups you'd have
noticed it. How about two event tables, one for each type of schedulable
event and unioning those?

Of course, that's just shuffling the complexity around since you'll need a
view with the relevant rewrites and possibly some way of detecting scheduling
conflicts?

- Richard Huxton


Re: Indexing UNIONs

From
Bruno Wolff III
Date:
On Tue, Jul 16, 2002 at 15:42:23 -0700, Josh Berkus <josh@agliodbs.com> wrote:
> 
> Table "events", the largest table in the database, contains event schedule 
> listing with 11 attributes and one dependant table as well as recursive 
> relationships between events.   Each event record can be (and Must be) 
> related to either one Case or one Trial Group.
> 
> Thus, I need to relate (in views and queries) each Event to the Union of Cases 
> and Trial Groups.   I just can't figure out how to do so without the database 
> discarding the indexes on Cases in the process and things slowing to a crawl.

I think you might be able to do this using (one sided) outer joins of the event
table to the Case and Trial Group tables. The join rules will need to work for
exactly one of the two tables. You probably will want to use case statements in
the select list to pick values from the right table.


Re: Indexing UNIONs

From
Josh Berkus
Date:
Bruno,

> I think you might be able to do this using (one sided) outer joins of the
event
> table to the Case and Trial Group tables. The join rules will need to work
for
> exactly one of the two tables. You probably will want to use case statements
in
> the select list to pick values from the right table.

Yeah, I'm doing that in some places.  It just doesn't work for all queries.
COALESCE() is my friend.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Indexing UNIONs

From
Bruno Wolff III
Date:
Just in case there was some misunderstanding of my suggestion here is
what I had in mind.

Your query:
SELECT t1.id, t1.name, t1.abbreviation, t1.juris_id
FROM t1
UNION ALL
SELECT t2.id, t2.name, NULL, t2.juris_id
FROM t2;

My suggestion:
SELECT t3.id, coalesce(t1.name, t2.name), t1.abbreviation, coalesce(t1.juris_id, t2.juris_id) from (t3 left join t1
using(id)) left join t2 using (id);
 

t3 is the event table.
This will result in one row for each row in t3 (since id is unique accross
t1 and t2). It will contain the name, juris_id and abbreviation from
whichever table matched. I expect the query to be able to make use of
indexes in this form, though I haven;t tested it to make sure.


Re: Indexing UNIONs

From
Josh Berkus
Date:
Bruno,

> My suggestion:
> SELECT t3.id, coalesce(t1.name, t2.name), t1.abbreviation,
>   coalesce(t1.juris_id, t2.juris_id) from
>   (t3 left join t1 using (id)) left join t2 using (id);

Cool!   I didn't think of that.   I'll give it a try.

-Josh