Inheritance & Indexes - Mailing list pgsql-general

From Alan Williams
Subject Inheritance & Indexes
Date
Msg-id Pine.LNX.4.44.0306241034140.21113-100000@anzio.staff.emeryville.affymetrix.com
Whole thread Raw
Responses Re: Inheritance & Indexes  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-general
Hi,

I'm trying to understand an issue we've encountered when using
inheritance. We've observed this behavior with both 7.2 and 7.3.
The crux of the issue is that indexes are not getting used when
one has a join on the parent table.

Say I have two tables:

  hs.exon.2=> \d ga_psr_transcript
         Table "public.ga_psr_transcript"
    Column    |          Type          | Modifiers
--------------+------------------------+-----------
 id           | integer                | not null
 parent       | integer                |
 seqname      | character varying(100) | not null
 source_type  | smallint               | not null
 feature_type | smallint               | not null
 start        | integer                | not null
 stop         | integer                | not null
 strand       | character(1)           | not null
 annot_name   | character varying(100) | not null
 depth        | integer                | not null

hs.exon.2=> \d ga_psr_exon
                 Table "public.ga_psr_exon"
        Column         |          Type          | Modifiers
-----------------------+------------------------+-----------
 id                    | integer                | not null
 parent                | integer                |
 seqname               | character varying(100) | not null
 source_type           | smallint               | not null
 feature_type          | smallint               | not null
 start                 | integer                | not null
 stop                  | integer                | not null
 strand                | character(1)           | not null
 annot_name            | character varying(100) | not null
 transcript_cluster_id | integer                | not null
 depth                 | integer                | not null

Both of these tables inherit from a base table ga_annot. Each of these
tables are further subclassed by chromosome (these tables hold genome
annotation information with several 10's thousands to several 100's
thousands of rows in each of the chromosome level tables). Tuples are
only loaded into the chromosome level tables. So we have:

                                     /--- ga_psr_transcript_1
                                    /---- ga_psr_transcript_2
           /---- ga_psr_transcript <----- ga_psr_transcript_3
          /                         \---- ...
ga_annot <
          \                          /--- ga_psr_transcript_1
           \                        /---- ga_psr_transcript_2
            \--- ga_psr_exon ----- <----- ga_psr_transcript_3
                                    \---- ...

                                          ^
                                          All Tuples Loaded into here

There is also a foreign key relationship where the parent of
ga_psr_exon_N references id in ga_psr_transcript_N. An example of
"leaf" tables:

 hs.exon.2=> \d ga_psr_transcript_1
        Table "public.ga_psr_transcript_1"
    Column    |          Type          | Modifiers
--------------+------------------------+-----------
 id           | integer                | not null
 parent       | integer                |
 seqname      | character varying(100) | not null
 source_type  | smallint               | not null
 feature_type | smallint               | not null
 start        | integer                | not null
 stop         | integer                | not null
 strand       | character(1)           | not null
 annot_name   | character varying(100) | not null
 depth        | integer                | not null
Indexes: ga_psr_transcript_1_pkey primary key btree (id),
         ga_psr_transcript_1_start_stop btree ("start", stop),
         ga_psr_transcript_1_stop btree (stop)
Check constraints: "aw_psr_transcript_1_strand" (((strand = '+'::bpchar) OR (strand = '-'::bpchar)) OR (strand =
'.'::bpchar))
Triggers: RI_ConstraintTrigger_1412526244,
          RI_ConstraintTrigger_1412526245

hs.exon.2=> \d ga_psr_exon_1
                Table "public.ga_psr_exon_1"
        Column         |          Type          | Modifiers
-----------------------+------------------------+-----------
 id                    | integer                | not null
 parent                | integer                |
 seqname               | character varying(100) | not null
 source_type           | smallint               | not null
 feature_type          | smallint               | not null
 start                 | integer                | not null
 stop                  | integer                | not null
 strand                | character(1)           | not null
 annot_name            | character varying(100) | not null
 transcript_cluster_id | integer                | not null
 depth                 | integer                | not null
Indexes: ga_psr_exon_1_pkey primary key btree (id),
         ga_psr_exon_1_parent btree (parent),
         ga_psr_exon_1_start_stop btree ("start", stop),
         ga_psr_exon_1_stop btree (stop)
Check constraints: "aw_psr_exon_1_strand" (((strand = '+'::bpchar) OR (strand = '-'::bpchar)) OR (strand =
'.'::bpchar))
Triggers: RI_ConstraintTrigger_1412526088,
          RI_ConstraintTrigger_1412526089

hs.exon.2=> select count(*) from ga_psr_transcript_1;
 count
-------
 43398
(1 row)

hs.exon.2=> select count(*) from ga_psr_exon_1;
 count
--------
 176908
(1 row)

Now if I do a join on the "leaf" tables everything looks good:

 hs.exon.2=> explain select * from ga_psr_transcript_1 t, ga_psr_exon_1e where e.parent = t.id;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..9087.71 rows=176908 width=98)
   Merge Cond: ("outer".id = "inner".parent)
   ->  Index Scan using ga_psr_transcript_1_pkey on ga_psr_transcript_1 t  (cost=0.00..1066.17 rows=43398 width=47)
   ->  Index Scan using ga_psr_exon_1_parent on ga_psr_exon_1 e (cost=0.00..5259.52 rows=176908 width=51)
(4 rows)

If I do a join on the parent table, the optimizer refuses to use the
indicies:

hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e where e.parent = t.id;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Merge Join  (cost=1239155.37..70188119.40 rows=5514877218 width=334)
   Merge Cond: ("outer".id = "inner".parent)
   ->  Sort  (cost=243481.37..244816.14 rows=533908 width=165)
         Sort Key: t.id
         ->  Append  (cost=0.00..10980.08 rows=533908 width=165)
               ->  Seq Scan on ga_psr_transcript t  (cost=0.00..20.00rows=1000 width=165)
               ->  Seq Scan on ga_psr_transcript_1 t  (cost=0.00..881.98rows=43398 width=47)
               ->  Seq Scan on ga_psr_transcript_2 t  (cost=0.00..905.44rows=44544 width=47)
               ->  Seq Scan on ga_psr_transcript_3 t  (cost=0.00..694.82rows=34182 width=47)
               ->  Seq Scan on ga_psr_transcript_4 t  (cost=0.00..617.48rows=30348 width=47)
               ->  Seq Scan on ga_psr_transcript_5 t  (cost=0.00..653.92rows=32192 width=47)
               ->  Seq Scan on ga_psr_transcript_6 t  (cost=0.00..653.88rows=32088 width=47)
               ->  Seq Scan on ga_psr_transcript_7 t  (cost=0.00..570.88rows=28088 width=47)
               ->  Seq Scan on ga_psr_transcript_8 t  (cost=0.00..537.06rows=26406 width=47)
               ->  Seq Scan on ga_psr_transcript_9 t  (cost=0.00..460.06rows=22606 width=47)
               ->  Seq Scan on ga_psr_transcript_10 t(cost=0.00..543.20 rows=26020 width=48)
               ->  Seq Scan on ga_psr_transcript_11 t(cost=0.00..542.82 rows=25982 width=48)
               ->  Seq Scan on ga_psr_transcript_12 t(cost=0.00..490.66 rows=23466 width=48)
               ->  Seq Scan on ga_psr_transcript_13 t(cost=0.00..346.64 rows=16564 width=48)
               ->  Seq Scan on ga_psr_transcript_14 t(cost=0.00..345.68 rows=16568 width=47)
               ->  Seq Scan on ga_psr_transcript_15 t(cost=0.00..358.46 rows=17146 width=48)
               ->  Seq Scan on ga_psr_transcript_16 t(cost=0.00..352.76 rows=16876 width=48)
               ->  Seq Scan on ga_psr_transcript_17 t(cost=0.00..375.72 rows=17972 width=48)
               ->  Seq Scan on ga_psr_transcript_18 t(cost=0.00..295.60 rows=14160 width=48)
               ->  Seq Scan on ga_psr_transcript_19 t(cost=0.00..262.24 rows=12524 width=48)
               ->  Seq Scan on ga_psr_transcript_20 t(cost=0.00..268.34 rows=12834 width=47)
               ->  Seq Scan on ga_psr_transcript_21 t(cost=0.00..145.18 rows=6918 width=47)
               ->  Seq Scan on ga_psr_transcript_22 t(cost=0.00..157.54 rows=7554 width=46)
               ->  Seq Scan on ga_psr_transcript_x t  (cost=0.00..431.08rows=21208 width=47)
               ->  Seq Scan on ga_psr_transcript_y t  (cost=0.00..60.40rows=2940 width=47)
               ->  Seq Scan on ga_psr_transcript_un t  (cost=0.00..7.14rows=314 width=55)
               ->  Seq Scan on ga_psr_transcript_m t  (cost=0.00..1.10rows=10 width=47)
   ->  Sort  (cost=995674.00..1000838.64 rows=2065853 width=169)
         Sort Key: e.parent
         ->  Append  (cost=0.00..43563.52 rows=2065853 width=169)
               ->  Seq Scan on ga_psr_exon e  (cost=0.00..0.00 rows=1width=169)
               ->  Seq Scan on ga_psr_exon_1 e  (cost=0.00..3691.08rows=176908 width=51)
               ->  Seq Scan on ga_psr_exon_2 e  (cost=0.00..3565.00rows=170800 width=51)
               ->  Seq Scan on ga_psr_exon_3 e  (cost=0.00..2709.88rows=129788 width=51)
               ->  Seq Scan on ga_psr_exon_4 e  (cost=0.00..2206.62rows=105662 width=51)
               ->  Seq Scan on ga_psr_exon_5 e  (cost=0.00..2376.48rows=113848 width=51)
               ->  Seq Scan on ga_psr_exon_6 e  (cost=0.00..2559.56rows=122156 width=51)
               ->  Seq Scan on ga_psr_exon_7 e  (cost=0.00..2352.78rows=112778 width=51)
               ->  Seq Scan on ga_psr_exon_8 e  (cost=0.00..1981.14rows=94914 width=51)
               ->  Seq Scan on ga_psr_exon_9 e  (cost=0.00..1807.76rows=86576 width=51)
               ->  Seq Scan on ga_psr_exon_10 e  (cost=0.00..2144.76rows=100376 width=52)
               ->  Seq Scan on ga_psr_exon_11 e  (cost=0.00..2158.90rows=100990 width=52)
               ->  Seq Scan on ga_psr_exon_12 e  (cost=0.00..2035.66rows=95266 width=52)
               ->  Seq Scan on ga_psr_exon_13 e  (cost=0.00..1212.48rows=56748 width=52)
               ->  Seq Scan on ga_psr_exon_14 e  (cost=0.00..1380.52rows=64652 width=51)
               ->  Seq Scan on ga_psr_exon_15 e  (cost=0.00..1529.88rows=71588 width=52)
               ->  Seq Scan on ga_psr_exon_16 e  (cost=0.00..1575.04rows=73704 width=52)
               ->  Seq Scan on ga_psr_exon_17 e  (cost=0.00..1811.04rows=84704 width=52)
               ->  Seq Scan on ga_psr_exon_18 e  (cost=0.00..1033.48rows=48348 width=52)
               ->  Seq Scan on ga_psr_exon_19 e  (cost=0.00..1375.44rows=64344 width=52)
               ->  Seq Scan on ga_psr_exon_20 e  (cost=0.00..1037.50rows=48550 width=51)
               ->  Seq Scan on ga_psr_exon_21 e  (cost=0.00..537.40rows=25140 width=51)
               ->  Seq Scan on ga_psr_exon_22 e  (cost=0.00..736.96rows=34596 width=50)
               ->  Seq Scan on ga_psr_exon_x e  (cost=0.00..1512.30rows=72430 width=51)
               ->  Seq Scan on ga_psr_exon_y e  (cost=0.00..205.00rows=9800 width=51)
               ->  Seq Scan on ga_psr_exon_un e  (cost=0.00..25.58rows=1158 width=59)
               ->  Seq Scan on ga_psr_exon_m e  (cost=0.00..1.28 rows=28 width=51)
(62 rows)


Same thing even if I'm querying for a specific tuple:

hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e
where e.parent = t.id and t.id = 123;
                                                        QUERY PLAN
------------------------------------------------------------------------
---------------------------------------------------
 Hash Join  (cost=99.06..73488.33 rows=320207 width=334)
   Hash Cond: ("outer".parent = "inner".id)
   ->  Append  (cost=0.00..43563.52 rows=2065853 width=169)
         ->  Seq Scan on ga_psr_exon e  (cost=0.00..0.00 rows=1width=169)
         ->  Seq Scan on ga_psr_exon_1 e  (cost=0.00..3691.08rows=176908 width=51)
         ->  Seq Scan on ga_psr_exon_2 e  (cost=0.00..3565.00rows=170800 width=51)
         ->  Seq Scan on ga_psr_exon_3 e  (cost=0.00..2709.88rows=129788 width=51)
         ->  Seq Scan on ga_psr_exon_4 e  (cost=0.00..2206.62rows=105662 width=51)
         ->  Seq Scan on ga_psr_exon_5 e  (cost=0.00..2376.48rows=113848 width=51)
         ->  Seq Scan on ga_psr_exon_6 e  (cost=0.00..2559.56rows=122156 width=51)
         ->  Seq Scan on ga_psr_exon_7 e  (cost=0.00..2352.78rows=112778 width=51)
         ->  Seq Scan on ga_psr_exon_8 e  (cost=0.00..1981.14 rows=94914 width=51)
         ->  Seq Scan on ga_psr_exon_9 e  (cost=0.00..1807.76 rows=86576 width=51)
         ->  Seq Scan on ga_psr_exon_10 e  (cost=0.00..2144.76rows=100376 width=52)
         ->  Seq Scan on ga_psr_exon_11 e  (cost=0.00..2158.90rows=100990 width=52)
         ->  Seq Scan on ga_psr_exon_12 e  (cost=0.00..2035.66rows=95266 width=52)
         ->  Seq Scan on ga_psr_exon_13 e  (cost=0.00..1212.48rows=56748 width=52)
         ->  Seq Scan on ga_psr_exon_14 e  (cost=0.00..1380.52rows=64652 width=51)
         ->  Seq Scan on ga_psr_exon_15 e  (cost=0.00..1529.88rows=71588 width=52)
         ->  Seq Scan on ga_psr_exon_16 e  (cost=0.00..1575.04rows=73704 width=52)
         ->  Seq Scan on ga_psr_exon_17 e  (cost=0.00..1811.04rows=84704 width=52)
         ->  Seq Scan on ga_psr_exon_18 e  (cost=0.00..1033.48rows=48348 width=52)
         ->  Seq Scan on ga_psr_exon_19 e  (cost=0.00..1375.44rows=64344 width=52)
         ->  Seq Scan on ga_psr_exon_20 e  (cost=0.00..1037.50rows=48550 width=51)
         ->  Seq Scan on ga_psr_exon_21 e  (cost=0.00..537.40 rows=25140 width=51)
         ->  Seq Scan on ga_psr_exon_22 e  (cost=0.00..736.96 rows=34596 width=50)
         ->  Seq Scan on ga_psr_exon_x e  (cost=0.00..1512.30 rows=72430 width=51)
         ->  Seq Scan on ga_psr_exon_y e  (cost=0.00..205.00 rows=9800 width=51)
         ->  Seq Scan on ga_psr_exon_un e  (cost=0.00..25.58 rows=1158 width=59)
         ->  Seq Scan on ga_psr_exon_m e  (cost=0.00..1.28 rows=28 width=51)
   ->  Hash  (cost=98.98..98.98 rows=31 width=165)
         ->  Append  (cost=0.00..98.98 rows=31 width=165)
               ->  Seq Scan on ga_psr_transcript t  (cost=0.00..22.50 rows=5 width=165)
                     Filter: (id = 123)
               ->  Index Scan using ga_psr_transcript_1_pkey on ga_psr_transcript_1 t  (cost=0.00..3.03 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_2_pkey on ga_psr_transcript_2 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_3_pkey on ga_psr_transcript_3 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_4_pkey on ga_psr_transcript_4 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_5_pkey on ga_psr_transcript_5 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_6_pkey on ga_psr_transcript_6 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_7_pkey on ga_psr_transcript_7 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_8_pkey on ga_psr_transcript_8 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_9_pkey on ga_psr_transcript_9 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_10_pkey on ga_psr_transcript_10 t  (cost=0.00..3.01 rows=1
width=48)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_11_pkey on ga_psr_transcript_11 t  (cost=0.00..3.01 rows=1
width=48)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_12_pkey on ga_psr_transcript_12 t  (cost=0.00..3.01 rows=1
width=48)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_13_pkey on ga_psr_transcript_13 t  (cost=0.00..3.01 rows=1
width=48)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_14_pkey on ga_psr_transcript_14 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_15_pkey on ga_psr_transcript_15 t  (cost=0.00..3.01 rows=1
width=48)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_16_pkey on ga_psr_transcript_16 t  (cost=0.00..3.01 rows=1
width=48)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_17_pkey on ga_psr_transcript_17 t  (cost=0.00..3.01 rows=1
width=48)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_18_pkey on ga_psr_transcript_18 t  (cost=0.00..3.01 rows=1
width=48)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_19_pkey on ga_psr_transcript_19 t  (cost=0.00..3.01 rows=1
width=48)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_20_pkey on ga_psr_transcript_20 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_21_pkey on ga_psr_transcript_21 t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_22_pkey on ga_psr_transcript_22 t  (cost=0.00..3.01 rows=1
width=46)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_x_pkey on ga_psr_transcript_x t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_y_pkey on ga_psr_transcript_y t  (cost=0.00..3.01 rows=1
width=47)
                     Index Cond: (id = 123)
               ->  Index Scan using ga_psr_transcript_un_pkey on ga_psr_transcript_un t  (cost=0.00..3.01 rows=1
width=55)
                     Index Cond: (id = 123)
               ->  Seq Scan on ga_psr_transcript_m t  (cost=0.00..1.12 rows=1 width=47)
                     Filter: (id = 123)
(86 rows)

Any comments or suggestions would be greatly appreciated. Thanks.


---------------------------------------------------------
Alan Williams, PhD           Alan_Williams@Affymetrix.com
Bioinformatics Manager
---------------------------------------------------------



pgsql-general by date:

Previous
From: Antonios Christofides
Date:
Subject: Database design problem: multilingual strings
Next
From: Karsten Hilbert
Date:
Subject: Re: postgres 7.3.3 problem - not talking across port