Thread: how to make query more efficient?

how to make query more efficient?

From
"Johnson, Shaunn"
Date:

Howdy:

Running PostgreSQL 7.2.1 on RedHat Linux 7.2.

I'm trying to run a query and it seems by using EXPLAIN
that it will take a long time.  Is there a way to get the
results in a more efficient manner?  Basically, I want
a list of records that are NOT in the table ref_person.

[snip query]
explain

select
x.contract,
x.mbr_num,
x.pref_contract,
x.pref_mbr_num,
x.person_id
--into
--t_refperson_diff
from
ref_person x
where
x.contract not in
(
        select
        r.contract
        --r.mbr_num
        from
        ref_person r, db2_cn1pmemb d
        where
        (d.c_contract_num, d.c_mbr_num) =
        (r.contract, r.mbr_num)
)
;

[/snip]

[snip explain results]

psql:./tom_refperson_diff.sql:25: NOTICE:  QUERY PLAN:

Seq Scan on ref_person x  (cost=0.00..1425041431903.38 rows=886436 width=48)
  SubPlan
    ->  Materialize  (cost=803803.88..803803.88 rows=2167192 width=44)
          ->  Merge Join  (cost=762748.49..803803.88 rows=2167192 width=44)
                ->  Sort  (cost=371783.06..371783.06 rows=1772872 width=22)
                      ->  Seq Scan on ref_person r  (cost=0.00..37959.72 rows=1772872 width=22)
                ->  Sort  (cost=390965.43..390965.43 rows=1646129 width=22)
                      ->  Seq Scan on db2_cn1pmemb d  (cost=0.00..82307.29 rows=1646129 width=22)

EXPLAIN

[/snip explain results]

[snip index]

bcn=> \d db2_cn1pmemb_i
     Index "db2_cn1pmemb_i"
     Column     |     Type
----------------+---------------
 c_contract_num | character(12)
 c_mbr_num      | character(2)
btree

--

bcn=> \d ref_person_pkey
 Index "ref_person_pkey"
  Column  |     Type
----------+---------------
 contract | character(12)
 mbr_num  | character(2)
unique btree (primary key)

[/snip index | primary key]

Suggestions?  Thanks!

-X

Re: how to make query more efficient?

From
"Nigel J. Andrews"
Date:
On Wed, 19 Mar 2003, Johnson, Shaunn wrote:

> Howdy:
>
> Running PostgreSQL 7.2.1 on RedHat Linux 7.2.
>
> I'm trying to run a query and it seems by using EXPLAIN
> that it will take a long time.  Is there a way to get the
> results in a more efficient manner?  Basically, I want
> a list of records that are NOT in the table ref_person.
>
> [snip query]
> explain
>
> select
> x.contract,
> x.mbr_num,
> x.pref_contract,
> x.pref_mbr_num,
> x.person_id
> --into
> --t_refperson_diff
> from
> ref_person x
> where
> x.contract not in
> (
>         select
>         r.contract
>         --r.mbr_num
>         from
>         ref_person r, db2_cn1pmemb d
>         where
>         (d.c_contract_num, d.c_mbr_num) =
>         (r.contract, r.mbr_num)
> )
> ;

Are you sure that's the correct query? I'm not going to try it but I'm
surprised that works at all. That's an awful lot of rows it thinks it wants to
return, perhaps that's a feature of this odd syntax you've used.

Anyway, apart from that the main thing is to rewrite this so it avoids the IN
clause. Try doing something like:

select
  x.contract,
  x.mbr_num,
  x.pref_contract,
  x.pref_mbr_num,
  x.person_id
 from
  ref_person x
 where
  not exists (
         select
        1
         from
             ref_person r, db2_cn1pmemb d
         where
        r.contract = x.contract
        and
             d.c_contract_num = r.contract
        and
         d.c_mbr_num = r.mbr_num
  )


>
> [/snip]
>
> [snip explain results]
>
> psql:./tom_refperson_diff.sql:25: NOTICE:  QUERY PLAN:
>
> Seq Scan on ref_person x  (cost=0.00..1425041431903.38 rows=886436 width=48)
>   SubPlan
>     ->  Materialize  (cost=803803.88..803803.88 rows=2167192 width=44)
>           ->  Merge Join  (cost=762748.49..803803.88 rows=2167192 width=44)
>                 ->  Sort  (cost=371783.06..371783.06 rows=1772872 width=22)
>                       ->  Seq Scan on ref_person r  (cost=0.00..37959.72
> rows=1772872 width=22)
>                 ->  Sort  (cost=390965.43..390965.43 rows=1646129 width=22)
>                       ->  Seq Scan on db2_cn1pmemb d  (cost=0.00..82307.29
> rows=1646129 width=22)
>
> EXPLAIN
>
> [/snip explain results]
>
> [snip index]
>
> bcn=> \d db2_cn1pmemb_i
>      Index "db2_cn1pmemb_i"
>      Column     |     Type
> ----------------+---------------
>  c_contract_num | character(12)
>  c_mbr_num      | character(2)
> btree
>
> --
>
> bcn=> \d ref_person_pkey
>  Index "ref_person_pkey"
>   Column  |     Type
> ----------+---------------
>  contract | character(12)
>  mbr_num  | character(2)
> unique btree (primary key)
>
> [/snip index | primary key]
>
>
> Suggestions?  Thanks!
>
>
> -X
>

--
Nigel J. Andrews



Re: how to make query more efficient?

From
Richard Huxton
Date:
On Wednesday 19 Mar 2003 6:23 pm, Johnson, Shaunn wrote:
> Howdy:
>
> Running PostgreSQL 7.2.1 on RedHat Linux 7.2.
>
> I'm trying to run a query and it seems by using EXPLAIN
> that it will take a long time.  Is there a way to get the
> results in a more efficient manner?  Basically, I want
> a list of records that are NOT in the table ref_person.
>
> [snip query]
> explain
>
> select
...
> from
> ref_person x
> where
> x.contract not in

Try rewriting the IN as an EXISTS clause - PG has traditionally not been too
clever at handling IN (though I believe this is to improve in the next
version).

--
  Richard Huxton