Re: Inheritance & Indexes - Mailing list pgsql-general
From | Stephan Szabo |
---|---|
Subject | Re: Inheritance & Indexes |
Date | |
Msg-id | 20030624120902.M9143-100000@megazone23.bigpanda.com Whole thread Raw |
In response to | Re: Inheritance & Indexes (Alan Williams <alan_williams@affymetrix.com>) |
Responses |
Re: Inheritance & Indexes
|
List | pgsql-general |
On Tue, 24 Jun 2003, Alan Williams wrote: > > On Tue, 24 Jun 2003, Stephan Szabo wrote: > > > hs.exon.2=> explain select * from ga_psr_transcript_1 t, > > ga_psr_exon_1e where e.parent = t.id; > > > QUERY PLAN > > > > > ------------------------------------------------------------------------ > > -------------------------------------------- > > > Merge Join (cost=0.00..9087.71 rows=176908 width=98) > > > Merge Cond: ("outer".id = "inner".parent) > > > -> Index Scan using ga_psr_transcript_1_pkey on > > ga_psr_transcript_1 t (cost=0.00..1066.17 rows=43398 width=47) > > > -> Index Scan using ga_psr_exon_1_parent on ga_psr_exon_1 e > > (cost=0.00..5259.52 rows=176908 width=51) > > > (4 rows) > > > > > > If I do a join on the parent table, the optimizer refuses to use the > > > indicies: > > > > > > hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e > > where e.parent = t.id; > > > > In this case, you can't use a single index scan to get the rows in order > > so the part that makes the above a nice plan doesn't really apply. If > > you're getting all the rows and sorting them, index scans are probably a > > waste of time unless you have alot of dead space. If we supported > > multi-table indexes, that'd potentially let you get a plan like > > the above. > > Because of the foreign key constraint, the database engine could do > the above query on each of the child tables and concatenate the > results. This is because there is a notion in our schema of "paired" I don't think it can do exon_1 -> transcript_1 union exon_2 -> transcript_2 etc from the above unless there's also a guarantee of uniqueness since if the same id showed up in transcript_1 and transcript_2 you'd have to join them both to a parent in exon_1. The individual id primary keys are not sufficient to show that though so you'd have to join exon_1 -> transcript_1 union exon_1 -> transcript_2 union exon_2 -> transcript_1... to guarantee the same results I think. I don't think that we're ever likely to figure out the optimization for those cases in any case. Multi-table indexes will probably be coming eventually which will allow a scan over that rather than the append step. > > ------------------------------------------------------------------------ > > ----------------------- > > > Merge Join (cost=1239155.37..70188119.40 rows=5514877218 width=334) > > > Merge Cond: ("outer".id = "inner".parent) > > > -> Sort (cost=243481.37..244816.14 rows=533908 width=165) > > > Sort Key: t.id > > > -> Append (cost=0.00..10980.08 rows=533908 width=165) > > [lots of seqscans snipped] > > > -> Sort (cost=995674.00..1000838.64 rows=2065853 width=169) > > > Sort Key: e.parent > > > -> Append (cost=0.00..43563.52 rows=2065853 width=169) > > [more seqscans snipped] > > > > > Same thing even if I'm querying for a specific tuple: > > > > > > hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e > > > where e.parent = t.id and t.id = 123; > > > > ISTM it's willing to use an index scan on at least some of t's > > subtables. > > Does explicitly saying e.parent=123 help? > > Yes, adding e.parent=123 results in the desired result of index scans > into both tables. However, without including this the optimizer still > predicts 31 results from the index scans on ga_psr_transcript* and yet > insists on using a seq scan into each ga_psr_exon* table. It expects > to get 2065853 rows back from the ga_psr_exon* tables when in reality > it is more like 310 rows. Yeah, it's guessing the number of rows rather poorly. Without the implied search condition, the index scan wouldn't help barring a small estimated number of rows in t making nested loop look good (I assume the t estimate is way off too, does analyzing the various tables or possibly raising the analyze buckets for the id column and analyzing get that estimate to something reasonable?). I *think* 7.4 may be smarter about implying these conditions as well.
pgsql-general by date: