Re: TimeOf(Subselects|Joins)FromLargeTables? - Mailing list pgsql-general

From Dann Corbit
Subject Re: TimeOf(Subselects|Joins)FromLargeTables?
Date
Msg-id D90A5A6C612A39408103E6ECDD77B8299CAB6A@voyager.corporate.connx.com
Whole thread Raw
In response to TimeOf(Subselects|Joins)FromLargeTables?  ("Hegedus, Tamas ." <Hegedus.Tamas@mayo.edu>)
List pgsql-general
How does this query perform:

SELECT p.name, p.seq
FROM prots p, kwx k
WHERE
    p.fid=k.fid
    AND
    k.kw_acc=812
;

> -----Original Message-----
> From: Hegedus, Tamas . [mailto:Hegedus.Tamas@mayo.edu]
> Sent: Thursday, June 03, 2004 6:48 PM
> To: 'pgsql-general@postgresql.org'
> Subject: [GENERAL] TimeOf(Subselects|Joins)FromLargeTables?
>
>
> Dear All,
>
> I am a biologist and I do not know what to expect from an RDB
> (PgSQL). I have large tables: 1215607 rows in prots, 2184596
> rows in kwx (see table details below). I would like to do
> something like that:
>
> SELECT name, seq FROM prots WHERE fid in (SELECT fid FROM kwx
> WHERE kw_acc=812);
>
> After executing this (either as a subquery or joins) the
> best/fastest result what I had (SET enable_seqscan=off):
> 83643.482 ms (see EXPLAIN ANALYZE below).
>
> The two (similar) parts of this query are executed much
> faster: SELECT fid FROM kwx WHERE kw_acc=812 -- takes 302ms,
> n(rows)=78050 SELECT name, seq FROM prots WHERE fid < 80000
> -- takes 1969.231 ms
>
> Is this realistic? OK?
> If not: how can I increase the speed by fine tuning of the
> RDB (indexes, run-time parameters) or my SQL query? (It came
> now into my mind: if I decrease the number of columns in the
> prots table (to have only 3 fields (fid, name, seq) instead
> of 20 columns), than the prots table will have smaller file
> size on disk, than this table may need less disk page
> fetches, queries may be faster. Is this true?)
>
> Thanks for your help!
> Tamas
>
> ===============================================
> Table "public.prots"
>   Column   |         Type          | Modifiers
> -----------+----------------------+----------
>  fid       | integer               | not null
>  name      | character varying(10) | not null
> [...other 17 columns...]
>  seq       | text                  |
> Indexes:
>     "prots_pkey" primary key, btree (fid)
>     "ix_prots_acc" unique, btree (acc)
>     "ix_prots_name" unique, btree (name)
>     "ix_prots_class" btree ("class")
> ===============================================
> Table "public.kwx"
>  Column |  Type   | Modifiers
> --------+--------+----------
>  fid    | integer |
>  kw_acc | integer |
> Indexes:
>     "ix_kwx_acc" btree (kw_acc)
>     "ix_kwx_fid" btree (fid)
> Foreign-key constraints:
>     "fk_kws_acc" FOREIGN KEY (kw_acc) REFERENCES kw_ref(kw_acc)
>     "fk_kws_fid" FOREIGN KEY (fid) REFERENCES prots(fid)
> ===============================================
>
> EXPLAIN ANALYZE SELECT name, seq from prots inner join kwx on
> (prots.fid=kwx.fid) where kwx.kw_acc = 812;
>
>     QUERY PLAN
>
> --------------------------------------------------------------
> --------------------------------------------------------------
> ------------------
>  Merge Join  (cost=0.00..160429.66 rows=84473 width=349)
> (actual time=29.039..83505.629 rows=78050 loops=1)
>    Merge Cond: ("outer".fid = "inner".fid)
>    ->  Index Scan using ix_kwx_fid on kwx
> (cost=0.00..44987.55 rows=84473 width=4) (actual
> time=18.893..5730.468 rows=78050 loops=1)
>          Filter: (kw_acc = 812)
>    ->  Index Scan using prots_pkey on prots
> (cost=0.00..112005.24 rows=981127 width=353) (actual
> time=0.083..76059.235 rows=1210377 loops=1)  Total runtime:
> 83643.482 ms (6 rows)
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>

pgsql-general by date:

Previous
From: "Hegedus, Tamas ."
Date:
Subject: TimeOf(Subselects|Joins)FromLargeTables?
Next
From: "Marc G. Fournier"
Date:
Subject: Re: Problem with mailing list or time? Was: Re: Best