Inheritance & Indexes - Mailing list pgsql-general
From | Alan Williams |
---|---|
Subject | Inheritance & Indexes |
Date | |
Msg-id | Pine.LNX.4.44.0306241034140.21113-100000@anzio.staff.emeryville.affymetrix.com Whole thread Raw |
Responses |
Re: Inheritance & Indexes
|
List | pgsql-general |
Hi, I'm trying to understand an issue we've encountered when using inheritance. We've observed this behavior with both 7.2 and 7.3. The crux of the issue is that indexes are not getting used when one has a join on the parent table. Say I have two tables: hs.exon.2=> \d ga_psr_transcript Table "public.ga_psr_transcript" Column | Type | Modifiers --------------+------------------------+----------- id | integer | not null parent | integer | seqname | character varying(100) | not null source_type | smallint | not null feature_type | smallint | not null start | integer | not null stop | integer | not null strand | character(1) | not null annot_name | character varying(100) | not null depth | integer | not null hs.exon.2=> \d ga_psr_exon Table "public.ga_psr_exon" Column | Type | Modifiers -----------------------+------------------------+----------- id | integer | not null parent | integer | seqname | character varying(100) | not null source_type | smallint | not null feature_type | smallint | not null start | integer | not null stop | integer | not null strand | character(1) | not null annot_name | character varying(100) | not null transcript_cluster_id | integer | not null depth | integer | not null Both of these tables inherit from a base table ga_annot. Each of these tables are further subclassed by chromosome (these tables hold genome annotation information with several 10's thousands to several 100's thousands of rows in each of the chromosome level tables). Tuples are only loaded into the chromosome level tables. So we have: /--- ga_psr_transcript_1 /---- ga_psr_transcript_2 /---- ga_psr_transcript <----- ga_psr_transcript_3 / \---- ... ga_annot < \ /--- ga_psr_transcript_1 \ /---- ga_psr_transcript_2 \--- ga_psr_exon ----- <----- ga_psr_transcript_3 \---- ... ^ All Tuples Loaded into here There is also a foreign key relationship where the parent of ga_psr_exon_N references id in ga_psr_transcript_N. An example of "leaf" tables: hs.exon.2=> \d ga_psr_transcript_1 Table "public.ga_psr_transcript_1" Column | Type | Modifiers --------------+------------------------+----------- id | integer | not null parent | integer | seqname | character varying(100) | not null source_type | smallint | not null feature_type | smallint | not null start | integer | not null stop | integer | not null strand | character(1) | not null annot_name | character varying(100) | not null depth | integer | not null Indexes: ga_psr_transcript_1_pkey primary key btree (id), ga_psr_transcript_1_start_stop btree ("start", stop), ga_psr_transcript_1_stop btree (stop) Check constraints: "aw_psr_transcript_1_strand" (((strand = '+'::bpchar) OR (strand = '-'::bpchar)) OR (strand = '.'::bpchar)) Triggers: RI_ConstraintTrigger_1412526244, RI_ConstraintTrigger_1412526245 hs.exon.2=> \d ga_psr_exon_1 Table "public.ga_psr_exon_1" Column | Type | Modifiers -----------------------+------------------------+----------- id | integer | not null parent | integer | seqname | character varying(100) | not null source_type | smallint | not null feature_type | smallint | not null start | integer | not null stop | integer | not null strand | character(1) | not null annot_name | character varying(100) | not null transcript_cluster_id | integer | not null depth | integer | not null Indexes: ga_psr_exon_1_pkey primary key btree (id), ga_psr_exon_1_parent btree (parent), ga_psr_exon_1_start_stop btree ("start", stop), ga_psr_exon_1_stop btree (stop) Check constraints: "aw_psr_exon_1_strand" (((strand = '+'::bpchar) OR (strand = '-'::bpchar)) OR (strand = '.'::bpchar)) Triggers: RI_ConstraintTrigger_1412526088, RI_ConstraintTrigger_1412526089 hs.exon.2=> select count(*) from ga_psr_transcript_1; count ------- 43398 (1 row) hs.exon.2=> select count(*) from ga_psr_exon_1; count -------- 176908 (1 row) Now if I do a join on the "leaf" tables everything looks good: hs.exon.2=> explain select * from ga_psr_transcript_1 t, ga_psr_exon_1e where e.parent = t.id; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..9087.71 rows=176908 width=98) Merge Cond: ("outer".id = "inner".parent) -> Index Scan using ga_psr_transcript_1_pkey on ga_psr_transcript_1 t (cost=0.00..1066.17 rows=43398 width=47) -> Index Scan using ga_psr_exon_1_parent on ga_psr_exon_1 e (cost=0.00..5259.52 rows=176908 width=51) (4 rows) If I do a join on the parent table, the optimizer refuses to use the indicies: hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e where e.parent = t.id; QUERY PLAN ----------------------------------------------------------------------------------------------- Merge Join (cost=1239155.37..70188119.40 rows=5514877218 width=334) Merge Cond: ("outer".id = "inner".parent) -> Sort (cost=243481.37..244816.14 rows=533908 width=165) Sort Key: t.id -> Append (cost=0.00..10980.08 rows=533908 width=165) -> Seq Scan on ga_psr_transcript t (cost=0.00..20.00rows=1000 width=165) -> Seq Scan on ga_psr_transcript_1 t (cost=0.00..881.98rows=43398 width=47) -> Seq Scan on ga_psr_transcript_2 t (cost=0.00..905.44rows=44544 width=47) -> Seq Scan on ga_psr_transcript_3 t (cost=0.00..694.82rows=34182 width=47) -> Seq Scan on ga_psr_transcript_4 t (cost=0.00..617.48rows=30348 width=47) -> Seq Scan on ga_psr_transcript_5 t (cost=0.00..653.92rows=32192 width=47) -> Seq Scan on ga_psr_transcript_6 t (cost=0.00..653.88rows=32088 width=47) -> Seq Scan on ga_psr_transcript_7 t (cost=0.00..570.88rows=28088 width=47) -> Seq Scan on ga_psr_transcript_8 t (cost=0.00..537.06rows=26406 width=47) -> Seq Scan on ga_psr_transcript_9 t (cost=0.00..460.06rows=22606 width=47) -> Seq Scan on ga_psr_transcript_10 t(cost=0.00..543.20 rows=26020 width=48) -> Seq Scan on ga_psr_transcript_11 t(cost=0.00..542.82 rows=25982 width=48) -> Seq Scan on ga_psr_transcript_12 t(cost=0.00..490.66 rows=23466 width=48) -> Seq Scan on ga_psr_transcript_13 t(cost=0.00..346.64 rows=16564 width=48) -> Seq Scan on ga_psr_transcript_14 t(cost=0.00..345.68 rows=16568 width=47) -> Seq Scan on ga_psr_transcript_15 t(cost=0.00..358.46 rows=17146 width=48) -> Seq Scan on ga_psr_transcript_16 t(cost=0.00..352.76 rows=16876 width=48) -> Seq Scan on ga_psr_transcript_17 t(cost=0.00..375.72 rows=17972 width=48) -> Seq Scan on ga_psr_transcript_18 t(cost=0.00..295.60 rows=14160 width=48) -> Seq Scan on ga_psr_transcript_19 t(cost=0.00..262.24 rows=12524 width=48) -> Seq Scan on ga_psr_transcript_20 t(cost=0.00..268.34 rows=12834 width=47) -> Seq Scan on ga_psr_transcript_21 t(cost=0.00..145.18 rows=6918 width=47) -> Seq Scan on ga_psr_transcript_22 t(cost=0.00..157.54 rows=7554 width=46) -> Seq Scan on ga_psr_transcript_x t (cost=0.00..431.08rows=21208 width=47) -> Seq Scan on ga_psr_transcript_y t (cost=0.00..60.40rows=2940 width=47) -> Seq Scan on ga_psr_transcript_un t (cost=0.00..7.14rows=314 width=55) -> Seq Scan on ga_psr_transcript_m t (cost=0.00..1.10rows=10 width=47) -> Sort (cost=995674.00..1000838.64 rows=2065853 width=169) Sort Key: e.parent -> Append (cost=0.00..43563.52 rows=2065853 width=169) -> Seq Scan on ga_psr_exon e (cost=0.00..0.00 rows=1width=169) -> Seq Scan on ga_psr_exon_1 e (cost=0.00..3691.08rows=176908 width=51) -> Seq Scan on ga_psr_exon_2 e (cost=0.00..3565.00rows=170800 width=51) -> Seq Scan on ga_psr_exon_3 e (cost=0.00..2709.88rows=129788 width=51) -> Seq Scan on ga_psr_exon_4 e (cost=0.00..2206.62rows=105662 width=51) -> Seq Scan on ga_psr_exon_5 e (cost=0.00..2376.48rows=113848 width=51) -> Seq Scan on ga_psr_exon_6 e (cost=0.00..2559.56rows=122156 width=51) -> Seq Scan on ga_psr_exon_7 e (cost=0.00..2352.78rows=112778 width=51) -> Seq Scan on ga_psr_exon_8 e (cost=0.00..1981.14rows=94914 width=51) -> Seq Scan on ga_psr_exon_9 e (cost=0.00..1807.76rows=86576 width=51) -> Seq Scan on ga_psr_exon_10 e (cost=0.00..2144.76rows=100376 width=52) -> Seq Scan on ga_psr_exon_11 e (cost=0.00..2158.90rows=100990 width=52) -> Seq Scan on ga_psr_exon_12 e (cost=0.00..2035.66rows=95266 width=52) -> Seq Scan on ga_psr_exon_13 e (cost=0.00..1212.48rows=56748 width=52) -> Seq Scan on ga_psr_exon_14 e (cost=0.00..1380.52rows=64652 width=51) -> Seq Scan on ga_psr_exon_15 e (cost=0.00..1529.88rows=71588 width=52) -> Seq Scan on ga_psr_exon_16 e (cost=0.00..1575.04rows=73704 width=52) -> Seq Scan on ga_psr_exon_17 e (cost=0.00..1811.04rows=84704 width=52) -> Seq Scan on ga_psr_exon_18 e (cost=0.00..1033.48rows=48348 width=52) -> Seq Scan on ga_psr_exon_19 e (cost=0.00..1375.44rows=64344 width=52) -> Seq Scan on ga_psr_exon_20 e (cost=0.00..1037.50rows=48550 width=51) -> Seq Scan on ga_psr_exon_21 e (cost=0.00..537.40rows=25140 width=51) -> Seq Scan on ga_psr_exon_22 e (cost=0.00..736.96rows=34596 width=50) -> Seq Scan on ga_psr_exon_x e (cost=0.00..1512.30rows=72430 width=51) -> Seq Scan on ga_psr_exon_y e (cost=0.00..205.00rows=9800 width=51) -> Seq Scan on ga_psr_exon_un e (cost=0.00..25.58rows=1158 width=59) -> Seq Scan on ga_psr_exon_m e (cost=0.00..1.28 rows=28 width=51) (62 rows) Same thing even if I'm querying for a specific tuple: hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e where e.parent = t.id and t.id = 123; QUERY PLAN ------------------------------------------------------------------------ --------------------------------------------------- Hash Join (cost=99.06..73488.33 rows=320207 width=334) Hash Cond: ("outer".parent = "inner".id) -> Append (cost=0.00..43563.52 rows=2065853 width=169) -> Seq Scan on ga_psr_exon e (cost=0.00..0.00 rows=1width=169) -> Seq Scan on ga_psr_exon_1 e (cost=0.00..3691.08rows=176908 width=51) -> Seq Scan on ga_psr_exon_2 e (cost=0.00..3565.00rows=170800 width=51) -> Seq Scan on ga_psr_exon_3 e (cost=0.00..2709.88rows=129788 width=51) -> Seq Scan on ga_psr_exon_4 e (cost=0.00..2206.62rows=105662 width=51) -> Seq Scan on ga_psr_exon_5 e (cost=0.00..2376.48rows=113848 width=51) -> Seq Scan on ga_psr_exon_6 e (cost=0.00..2559.56rows=122156 width=51) -> Seq Scan on ga_psr_exon_7 e (cost=0.00..2352.78rows=112778 width=51) -> Seq Scan on ga_psr_exon_8 e (cost=0.00..1981.14 rows=94914 width=51) -> Seq Scan on ga_psr_exon_9 e (cost=0.00..1807.76 rows=86576 width=51) -> Seq Scan on ga_psr_exon_10 e (cost=0.00..2144.76rows=100376 width=52) -> Seq Scan on ga_psr_exon_11 e (cost=0.00..2158.90rows=100990 width=52) -> Seq Scan on ga_psr_exon_12 e (cost=0.00..2035.66rows=95266 width=52) -> Seq Scan on ga_psr_exon_13 e (cost=0.00..1212.48rows=56748 width=52) -> Seq Scan on ga_psr_exon_14 e (cost=0.00..1380.52rows=64652 width=51) -> Seq Scan on ga_psr_exon_15 e (cost=0.00..1529.88rows=71588 width=52) -> Seq Scan on ga_psr_exon_16 e (cost=0.00..1575.04rows=73704 width=52) -> Seq Scan on ga_psr_exon_17 e (cost=0.00..1811.04rows=84704 width=52) -> Seq Scan on ga_psr_exon_18 e (cost=0.00..1033.48rows=48348 width=52) -> Seq Scan on ga_psr_exon_19 e (cost=0.00..1375.44rows=64344 width=52) -> Seq Scan on ga_psr_exon_20 e (cost=0.00..1037.50rows=48550 width=51) -> Seq Scan on ga_psr_exon_21 e (cost=0.00..537.40 rows=25140 width=51) -> Seq Scan on ga_psr_exon_22 e (cost=0.00..736.96 rows=34596 width=50) -> Seq Scan on ga_psr_exon_x e (cost=0.00..1512.30 rows=72430 width=51) -> Seq Scan on ga_psr_exon_y e (cost=0.00..205.00 rows=9800 width=51) -> Seq Scan on ga_psr_exon_un e (cost=0.00..25.58 rows=1158 width=59) -> Seq Scan on ga_psr_exon_m e (cost=0.00..1.28 rows=28 width=51) -> Hash (cost=98.98..98.98 rows=31 width=165) -> Append (cost=0.00..98.98 rows=31 width=165) -> Seq Scan on ga_psr_transcript t (cost=0.00..22.50 rows=5 width=165) Filter: (id = 123) -> Index Scan using ga_psr_transcript_1_pkey on ga_psr_transcript_1 t (cost=0.00..3.03 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_2_pkey on ga_psr_transcript_2 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_3_pkey on ga_psr_transcript_3 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_4_pkey on ga_psr_transcript_4 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_5_pkey on ga_psr_transcript_5 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_6_pkey on ga_psr_transcript_6 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_7_pkey on ga_psr_transcript_7 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_8_pkey on ga_psr_transcript_8 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_9_pkey on ga_psr_transcript_9 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_10_pkey on ga_psr_transcript_10 t (cost=0.00..3.01 rows=1 width=48) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_11_pkey on ga_psr_transcript_11 t (cost=0.00..3.01 rows=1 width=48) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_12_pkey on ga_psr_transcript_12 t (cost=0.00..3.01 rows=1 width=48) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_13_pkey on ga_psr_transcript_13 t (cost=0.00..3.01 rows=1 width=48) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_14_pkey on ga_psr_transcript_14 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_15_pkey on ga_psr_transcript_15 t (cost=0.00..3.01 rows=1 width=48) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_16_pkey on ga_psr_transcript_16 t (cost=0.00..3.01 rows=1 width=48) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_17_pkey on ga_psr_transcript_17 t (cost=0.00..3.01 rows=1 width=48) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_18_pkey on ga_psr_transcript_18 t (cost=0.00..3.01 rows=1 width=48) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_19_pkey on ga_psr_transcript_19 t (cost=0.00..3.01 rows=1 width=48) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_20_pkey on ga_psr_transcript_20 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_21_pkey on ga_psr_transcript_21 t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_22_pkey on ga_psr_transcript_22 t (cost=0.00..3.01 rows=1 width=46) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_x_pkey on ga_psr_transcript_x t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_y_pkey on ga_psr_transcript_y t (cost=0.00..3.01 rows=1 width=47) Index Cond: (id = 123) -> Index Scan using ga_psr_transcript_un_pkey on ga_psr_transcript_un t (cost=0.00..3.01 rows=1 width=55) Index Cond: (id = 123) -> Seq Scan on ga_psr_transcript_m t (cost=0.00..1.12 rows=1 width=47) Filter: (id = 123) (86 rows) Any comments or suggestions would be greatly appreciated. Thanks. --------------------------------------------------------- Alan Williams, PhD Alan_Williams@Affymetrix.com Bioinformatics Manager ---------------------------------------------------------
pgsql-general by date: