Re: Inheritance & Indexes - Mailing list pgsql-general

From Stephan Szabo
Subject Re: Inheritance & Indexes
Date
Msg-id 20030624120902.M9143-100000@megazone23.bigpanda.com
Whole thread Raw
In response to Re: Inheritance & Indexes  (Alan Williams <alan_williams@affymetrix.com>)
Responses Re: Inheritance & Indexes
List pgsql-general
On Tue, 24 Jun 2003, Alan Williams wrote:

>
> On Tue, 24 Jun 2003, Stephan Szabo wrote:
> > >  hs.exon.2=> explain select * from ga_psr_transcript_1 t,
> > ga_psr_exon_1e where e.parent = t.id;
> > >                                                      QUERY PLAN
> > >
> > ------------------------------------------------------------------------
> > --------------------------------------------
> > >  Merge Join  (cost=0.00..9087.71 rows=176908 width=98)
> > >    Merge Cond: ("outer".id = "inner".parent)
> > >    ->  Index Scan using ga_psr_transcript_1_pkey on
> > ga_psr_transcript_1 t  (cost=0.00..1066.17 rows=43398 width=47)
> > >    ->  Index Scan using ga_psr_exon_1_parent on ga_psr_exon_1 e
> > (cost=0.00..5259.52 rows=176908 width=51)
> > > (4 rows)
> > >
> > > If I do a join on the parent table, the optimizer refuses to use the
> > > indicies:
> > >
> > > hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e
> > where e.parent = t.id;
> >
> > In this case, you can't use a single index scan to get the rows in order
> > so the part that makes the above a nice plan doesn't really apply.  If
> > you're getting all the rows and sorting them, index scans are probably a
> > waste of time unless you have alot of dead space.  If we supported
> > multi-table indexes, that'd potentially let you get a plan like
> > the above.
>
> Because of the foreign key constraint, the database engine could do
> the above query on each of the child tables and concatenate the
> results. This is because there is a notion in our schema of "paired"

I don't think it can do exon_1 -> transcript_1 union exon_2 ->
transcript_2 etc from the above unless there's also a guarantee of
uniqueness since if the same id showed up in transcript_1 and transcript_2
you'd have to join them both to a parent in exon_1.  The individual id
primary keys are not sufficient to show that though so you'd have to join
exon_1 -> transcript_1 union exon_1 -> transcript_2 union exon_2 ->
transcript_1... to guarantee the same results I think.

I don't think that we're ever likely to figure out the optimization for
those cases in any case. Multi-table indexes will probably be coming
eventually which will allow a scan over that rather than the append step.

> > ------------------------------------------------------------------------
> > -----------------------
> > >  Merge Join  (cost=1239155.37..70188119.40 rows=5514877218 width=334)
> > >    Merge Cond: ("outer".id = "inner".parent)
> > >    ->  Sort  (cost=243481.37..244816.14 rows=533908 width=165)
> > >          Sort Key: t.id
> > >          ->  Append  (cost=0.00..10980.08 rows=533908 width=165)
> > [lots of seqscans snipped]
> > >    ->  Sort  (cost=995674.00..1000838.64 rows=2065853 width=169)
> > >          Sort Key: e.parent
> > >          ->  Append  (cost=0.00..43563.52 rows=2065853 width=169)
> > [more seqscans snipped]
> >
> > > Same thing even if I'm querying for a specific tuple:
> > >
> > > hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e
> > > where e.parent = t.id and t.id = 123;
> >
> > ISTM it's willing to use an index scan on at least some of t's
> > subtables.
> > Does explicitly saying e.parent=123 help?
>
> Yes, adding e.parent=123 results in the desired result of index scans
> into both tables. However, without including this the optimizer still
> predicts 31 results from the index scans on ga_psr_transcript* and yet
> insists on using a seq scan into each ga_psr_exon* table. It expects
> to get 2065853 rows back from the ga_psr_exon* tables when in reality
> it is more like 310 rows.

Yeah, it's guessing the number of rows rather poorly.  Without the
implied search condition, the index scan wouldn't help barring a small
estimated number of rows in t making nested loop look good (I assume the
t estimate is way off too, does analyzing the various tables or possibly
raising the analyze buckets for the id column and analyzing get that
estimate to something reasonable?).  I *think* 7.4 may be smarter about
implying these conditions as well.



pgsql-general by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: A creepy story about dates. How to prevent it?
Next
From: Mike Mascari
Date:
Subject: Re: Failure to install 7.3.3