Re: How to improve speed of 3 table join &group (HUGE tables) - Mailing list pgsql-performance

From ismo.tuononen@solenovo.fi
Subject Re: How to improve speed of 3 table join &group (HUGE tables)
Date
Msg-id Pine.LNX.4.64.0710190735550.19669@ismoli.solenovo.jns
Whole thread Raw
In response to How to improve speed of 3 table join &group (HUGE tables)  (John Major <major@cbio.mskcc.org>)
List pgsql-performance
Hi,

how about:

select sf.library_id, fio.clip_type , count(sf.sequence_id)
  from    sequence_fragment sf, fragment_external_info fio
        ,(SELECT distinct sequence_id from sequence_alignment) sa
  where sf.seq_frag_id = fio.sequence_frag_id
  and  sf.sequence_id = sa.sequence_id
  group by sf.library_id, fio.clip_type

I don't know postgres well, but I would put my bet in Oracle in that
derived table instead of that in clause.

Ismo

On Thu, 18 Oct 2007, John Major wrote:

> Hi Hekki-
>
> When I turn seq_scan off for the new query:
>
> explain
> select sf.library_id, fio.clip_type , count(sf.sequence_id)
>   from    sequence_fragment sf, fragment_external_info fio
>   where sf.seq_frag_id = fio.sequence_frag_id
>   and  sf.sequence_id IN
>       (SELECT sequence_id from sequence_alignment)
>   group by sf.library_id, fio.clip_type
>
> The index is used... but the cost gets worse!
> it goes from:
> 11831119
> -TO-
> 53654888
>
> Actually... The new query executes in ~ 15 minutes... which is good enough for
> me for now.
>
> Thanks Nis!
>
> john
>
>
>
> Heikki Linnakangas wrote:
> > John Major wrote:
> >
> > > ~there are indexes on all of the fields being joined (but not on
> > > library_id or clip_type ). ~Everything has been re-analyzed post index
> > > creation
> > > ~I've tried "set enable_seqscan=off" and set (join_table_order or
> > > something) = 1
> > >
> >
> > Seqscanning and sorting a table is generally faster than a full scan of
> > the table using an index scan, unless the heap is roughly in the index
> > order. You probably need to CLUSTER the tables to use the indexes
> > effectively.
> >
> > Are you sure you have an index on sequence_alignment.sequence_id? The
> > planner seems to choose a seqscan + sort, even though you've set
> > enable_seqscan=false.
> >
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>
>


pgsql-performance by date:

Previous
From: Chris Kratz
Date:
Subject: Re: Incorrect estimates on columns
Next
From: "Yinan Li"
Date:
Subject: how to improve the performance of creating index