Re: Inheritance & Indexes - Mailing list pgsql-general

From Stephan Szabo
Subject Re: Inheritance & Indexes
Date
Msg-id 20030624110450.O8003-100000@megazone23.bigpanda.com
Whole thread Raw
In response to Inheritance & Indexes  (Alan Williams <alan_williams@affymetrix.com>)
Responses Re: Inheritance & Indexes
List pgsql-general
On Tue, 24 Jun 2003, Alan Williams wrote:

>  hs.exon.2=> \d ga_psr_transcript_1
>         Table "public.ga_psr_transcript_1"
>     Column    |          Type          | Modifiers
> --------------+------------------------+-----------
>  id           | integer                | not null
>  parent       | integer                |
>  seqname      | character varying(100) | not null
>  source_type  | smallint               | not null
>  feature_type | smallint               | not null
>  start        | integer                | not null
>  stop         | integer                | not null
>  strand       | character(1)           | not null
>  annot_name   | character varying(100) | not null
>  depth        | integer                | not null
> Indexes: ga_psr_transcript_1_pkey primary key btree (id),
>          ga_psr_transcript_1_start_stop btree ("start", stop),
>          ga_psr_transcript_1_stop btree (stop)
> Check constraints: "aw_psr_transcript_1_strand" (((strand = '+'::bpchar) OR (strand = '-'::bpchar)) OR (strand =
'.'::bpchar))
> Triggers: RI_ConstraintTrigger_1412526244,
>           RI_ConstraintTrigger_1412526245
>
> hs.exon.2=> \d ga_psr_exon_1
>                 Table "public.ga_psr_exon_1"
>         Column         |          Type          | Modifiers
> -----------------------+------------------------+-----------
>  id                    | integer                | not null
>  parent                | integer                |
>  seqname               | character varying(100) | not null
>  source_type           | smallint               | not null
>  feature_type          | smallint               | not null
>  start                 | integer                | not null
>  stop                  | integer                | not null
>  strand                | character(1)           | not null
>  annot_name            | character varying(100) | not null
>  transcript_cluster_id | integer                | not null
>  depth                 | integer                | not null
> Indexes: ga_psr_exon_1_pkey primary key btree (id),
>          ga_psr_exon_1_parent btree (parent),
>          ga_psr_exon_1_start_stop btree ("start", stop),
>          ga_psr_exon_1_stop btree (stop)
> Check constraints: "aw_psr_exon_1_strand" (((strand = '+'::bpchar) OR (strand = '-'::bpchar)) OR (strand =
'.'::bpchar))
> Triggers: RI_ConstraintTrigger_1412526088,
>           RI_ConstraintTrigger_1412526089
>
> hs.exon.2=> select count(*) from ga_psr_transcript_1;
>  count
> -------
>  43398
> (1 row)
>
> hs.exon.2=> select count(*) from ga_psr_exon_1;
>  count
> --------
>  176908
> (1 row)
>
> Now if I do a join on the "leaf" tables everything looks good:
>
>  hs.exon.2=> explain select * from ga_psr_transcript_1 t, ga_psr_exon_1e where e.parent = t.id;
>                                                      QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=0.00..9087.71 rows=176908 width=98)
>    Merge Cond: ("outer".id = "inner".parent)
>    ->  Index Scan using ga_psr_transcript_1_pkey on ga_psr_transcript_1 t  (cost=0.00..1066.17 rows=43398 width=47)
>    ->  Index Scan using ga_psr_exon_1_parent on ga_psr_exon_1 e (cost=0.00..5259.52 rows=176908 width=51)
> (4 rows)
>
> If I do a join on the parent table, the optimizer refuses to use the
> indicies:
>
> hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e where e.parent = t.id;

In this case, you can't use a single index scan to get the rows in order
so the part that makes the above a nice plan doesn't really apply.  If
you're getting all the rows and sorting them, index scans are probably a
waste of time unless you have alot of dead space.  If we supported
multi-table indexes, that'd potentially let you get a plan like
the above.

> -----------------------------------------------------------------------------------------------
>  Merge Join  (cost=1239155.37..70188119.40 rows=5514877218 width=334)
>    Merge Cond: ("outer".id = "inner".parent)
>    ->  Sort  (cost=243481.37..244816.14 rows=533908 width=165)
>          Sort Key: t.id
>          ->  Append  (cost=0.00..10980.08 rows=533908 width=165)
[lots of seqscans snipped]
>    ->  Sort  (cost=995674.00..1000838.64 rows=2065853 width=169)
>          Sort Key: e.parent
>          ->  Append  (cost=0.00..43563.52 rows=2065853 width=169)
[more seqscans snipped]

> Same thing even if I'm querying for a specific tuple:
>
> hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e
> where e.parent = t.id and t.id = 123;

ISTM it's willing to use an index scan on at least some of t's subtables.
Does explicitly saying e.parent=123 help?

>                                                         QUERY PLAN
> ------------------------------------------------------------------------
> ---------------------------------------------------
>  Hash Join  (cost=99.06..73488.33 rows=320207 width=334)
>    Hash Cond: ("outer".parent = "inner".id)
>    ->  Append  (cost=0.00..43563.52 rows=2065853 width=169)
[lots of seqscans snipped again]
>    ->  Hash  (cost=98.98..98.98 rows=31 width=165)
>          ->  Append  (cost=0.00..98.98 rows=31 width=165)
>                ->  Seq Scan on ga_psr_transcript t  (cost=0.00..22.50 rows=5 width=165)
>                      Filter: (id = 123)
[snipping a bunch of index scans]
>                ->  Seq Scan on ga_psr_transcript_m t  (cost=0.00..1.12 rows=1 width=47)
>                      Filter: (id = 123)
> (86 rows)



pgsql-general by date:

Previous
From: Ernest E Vogelsinger
Date:
Subject: Re: Database design problem: multilingual strings
Next
From: "Daniel E. Fisher"
Date:
Subject: Failure to install 7.3.3