Thread: Query optimization

Query optimization

From
Sean Davis
Date:
I have a couple of very large tables that I am querying on that gives the
following explain analyze output.  If someone can help out with my mess,
that would be great.

Thanks,
Sean

explain analyze
    select e.*,c.*
    from u_all_est_mrna c
        join g_rna_acc d on c.accession=d.accession,
        (select a.gene_id,
            b.db_id,
            max(tend-tstart)
        from g_rna_acc a
            join u_all_est_mrna b on a.accession=b.accession
        where gene_id<200
        group by a.gene_id,b.db_id) e
    where abs(tstart-tend)=e.max
        and d.gene_id=e.gene_id
        and c.db_id=e.db_id;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-----------------------------
 Hash Join  (cost=1054997.38..2472083.62 rows=2 width=219) (actual
time=122796.024..357269.576 rows=327 loops=1)
   Hash Cond: ((("outer".accession)::text = ("inner".accession)::text) AND
(abs(("outer".tstart - "outer".tend)) = "inner".max) AND
(("outer".db_id)::text = ("inner".db_id)::text))
   ->  Seq Scan on u_all_est_mrna c  (cost=0.00..932582.74 rows=24225174
width=179) (actual time=17.384..302484.904 rows=24225174 loops=1)
   ->  Hash  (cost=1054973.98..1054973.98 rows=3119 width=52) (actual
time=11562.968..11562.968 rows=2276 loops=1)
         ->  Nested Loop  (cost=1046393.15..1054973.98 rows=3119 width=52)
(actual time=11546.931..11558.704 rows=2276 loops=1)
               ->  HashAggregate  (cost=1046393.15..1046395.98 rows=567
width=19) (actual time=11546.892..11547.188 rows=276 loops=1)
                     ->  Nested Loop  (cost=4.78..1046285.32 rows=14377
width=19) (actual time=0.148..11537.307 rows=1854 loops=1)
                           ->  Index Scan using g_rna_acc_gene_id on
g_rna_acc a  (cost=0.00..1049.44 rows=519 width=16) (actual
time=0.026..50.006 rows=1139 loops=1)
                                 Index Cond: (gene_id < 200)
                           ->  Bitmap Heap Scan on u_all_est_mrna b
(cost=4.78..2007.57 rows=510 width=26) (actual time=7.100..10.068 rows=2
loops=1139)
                                 Recheck Cond: (("outer".accession)::text =
(b.accession)::text)
                                 ->  Bitmap Index Scan on uaem_accession
(cost=0.00..4.78 rows=510 width=0) (actual time=4.270..4.270 rows=2
loops=1139)
                                       Index Cond:
(("outer".accession)::text = (b.accession)::text)
               ->  Index Scan using g_rna_acc_gene_id on g_rna_acc d
(cost=0.00..15.04 rows=6 width=16) (actual time=0.010..0.037 rows=8
loops=276)
                     Index Cond: (d.gene_id = "outer".gene_id)
 Total runtime: 357270.873 ms
(16 rows)


\d+ u_all_est_mrna
                                                             Table
"public.u_all_est_mrna"
     Column      |       Type        |
Modifiers                                  |            Description
-----------------+-------------------+--------------------------------------
---------------------------------------+------------------------------------
 all_est_mrna_id | integer           | not null default
nextval('public.u_all_est_mrna_all_est_mrna_id_seq'::text) |
 db_id           | character varying |
|
 seqtype         | character varying |
|
 matches         | integer           |
|
 mismatches      | integer           |
|
 repmatches      | integer           |
|
 ncount          | integer           |
|
 qnuminsert      | integer           |
|
 qbaseinsert     | integer           |
|
 tnuminsert      | integer           |
|
 tbaseinsert     | integer           |
|
 strand          | character(1)      |
|
 accession       | character varying |
| Genbank Accession without version.
 qsize           | integer           |
|
 qstart          | integer           |
|
 qend            | integer           |
|
 chrom           | character varying |
| Chromosome, notation like "chr1"
 tsize           | integer           |
|
 tstart          | integer           |
| Blat hit start
 tend            | integer           |
| Blat hit end
 blockcount      | integer           |
|
 blocksizes      | character varying |
|
 qstarts         | character varying |
|
 tstarts         | character varying |
|
Indexes:
    "u_all_est_mrna_pkey" PRIMARY KEY, btree (all_est_mrna_id)
    "uaem_accession" btree (accession)
    "uaem_chrom" btree (chrom)
    "uaem_db_id" btree (db_id)
    "uaem_seqtype" btree (seqtype)
    "uaem_tend_chrom" btree (tend, chrom)
    "uaem_tstart_chrom" btree (tstart, chrom)
Has OIDs: yes


=== psql 78 ===
\d+ g_rna_acc
                  Table "public.g_rna_acc"
    Column    |       Type        | Modifiers | Description
--------------+-------------------+-----------+-------------
 gene_id      | integer           |           |
 accession    | character varying | not null  |
 version      | integer           |           |
 accession_gi | integer           |           |
Indexes:
    "g_rna_acc_pkey" PRIMARY KEY, btree (accession)
    "g_rna_acc_accession" btree (accession)
    "g_rna_acc_gene_id" btree (gene_id)
Foreign-key constraints:
    "g_rna_acc_gene_id_fkey" FOREIGN KEY (gene_id) REFERENCES
g_main(gene_id) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE
Has OIDs: no


Re: Query optimization

From
"Jim C. Nasby"
Date:
On Fri, Sep 30, 2005 at 06:24:52PM -0400, Sean Davis wrote:
>    ->  Seq Scan on u_all_est_mrna c  (cost=0.00..932582.74 rows=24225174
> width=179) (actual time=17.384..302484.904 rows=24225174 loops=1)

That step is where most of your time is being spent, which isn't
surprising given that it's scanning 24M rows. An index on
abs(tend-tstart) might help eliminate that. It will also probably help
if you increase the statistics_target for u_all_est_mrna, since it might
be best to join before filtering on abs(tend-tstart), which is the
opposite of what it's doing now.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461