Re: [SQL] bad select performance fixed by forbidding hash joins - Mailing list pgsql-general

From George Young
Subject Re: [SQL] bad select performance fixed by forbidding hash joins
Date
Msg-id 199907211506.LAA16984@ll.mit.edu
Whole thread Raw
Responses Re: [SQL] bad select performance fixed by forbidding hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
[PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66]
table opset_steps
      (name text, id int2, ver int2) [1400 rows]
      non-unique index is on (id, ver)

table run_opsets
      (status int2, id int2, ver int2, run_id int2, seq int2) [17000 rows]
      pkey is (id, seq), second index on(status, id, ver, run_id)
      select count(*) from run_opsets where status=1; --> 187
      select count(*) from run_opsets where status=3; --> 10564

table runs
      (run_name text, run_id int2, status int2) [900 rows]
      pkey is run_name, second index(run_id, status)
      select count(*)from runs where status=1; -->68

I have vacuum analyzed all relevant tables.

explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where (ro.status=3 or ro.status=1)
andro.opset_id=os.opset_id and ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; 

Hash Join  (cost=1793.58 rows=14560 width=38)
  ->  Hash Join  (cost=1266.98 rows=14086 width=24)
        ->  Seq Scan on run_opsets ro  (cost=685.51 rows=13903 width=8)
        ->  Hash  (cost=70.84 rows=1389 width=16)
              ->  Seq Scan on opset_steps os  (cost=70.84 rows=1389 width=16)
  ->  Hash  (cost=47.43 rows=374 width=14)
        ->  Seq Scan on runs r  (cost=47.43 rows=374 width=14)

This query takes 16 seconds. [returns 3126 rows]

On Tue, Jul 20, 1999 at 05:42:20PM -0400, Tom Lane wrote:
> On Tue, 20 Jul 1999 14:56:46 -0400 George Young wrote:
> > ... Is this then
> > the best that postgres can do?  Is there some rephrasing/restructuring of
> > this query that would make it faster?
>
> Hard to say.  The query looks reasonable as it stands ---
> ...  You have no restriction
> clause on opset_steps so all of those entries get loaded for hashing;
> can you provide one?
No.
> The system's plan looks pretty reasonable as well.  It might be that
> a merge join would be faster than a hash join but I wouldn't assume
> so.  If you want, you can try forcing the system not to use hashes;
> start psql with environment variable
>     PGOPTIONS="-fh"
> and see what sort of plan and execution time you get.  If that does
> turn out to be a big win it would indicate that the planner is using
> poor cost estimates, which is certainly possible...

Yes!  PGOPTIONS="-fh" made the query time go from 16 seconds to 2 seconds!
Is this a safe thing to leave on permanently, or is there some way to set
PGOPTIONS for just this query?

explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where (ro.status=3 or ro.status=1)
andro.opset_id=os.opset_id and ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; 

Merge Join  (cost=9295.54 rows=14560 width=38)
  ->  Seq Scan  (cost=8676.01 rows=14371 width=22)
        ->  Sort  (cost=8676.01 rows=14371 width=22)
              ->  Merge Join  (cost=1657.30 rows=14371 width=22)
                    ->  Index Scan using run_opsets_pkey on run_opsets ro  (cost=1031.25 rows=13903 width=8)
                    ->  Seq Scan  (cost=154.91 rows=374 width=14)
                          ->  Sort  (cost=154.91 rows=374 width=14)
                                ->  Seq Scan on runs r  (cost=47.43 rows=374 width=14)
  ->  Index Scan using opset_steps_idx_ver_id on opset_steps os  (cost=99.45 rows=1389 width=16)


With PGOPTIONS=-fh, this query takes ~ 2 seconds! [returns 3126 rows]

--
George Young,  Rm. L-204        gry@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St.
Lexington, Massachusetts  02420-9108    (781) 981-2756

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] inheritance
Next
From: Leon
Date:
Subject: Re: [GENERAL] OID IS INVALID?