Home > mailing lists

another query optimization question - Mailing list pgsql-performance

From	David Teran
Subject	another query optimization question
Date	January 30, 2004 17:00:47
Msg-id	32DEEB93-534E-11D8-9392-000A95A6F0DC@cluster9.com Whole thread Raw
Responses	Re: another query optimization question Re: another query optimization question
List	pgsql-performance

Tree view

Hi,

its me again. As far as we tested postgresql ist fast, very fast
compared to the other db system we test and are using currently.

  We are now testing some test databases on Postgres. We use one
function which simply calculates a difference between two values and
checks if on value is 0, so something like this:


declare
diff integer;
begin
if $1 > $2
then
diff := $1 -$2;
return diff * diff;
else
return 0;
end if;
end;

Language for this function is plpgsql

executing a select like this:

select
sum(job_property_difference(t0.int_value, t1.int_value)) as rank
   from
   job_property t0,
   job_property t1
   where
   t0.id_job_profile = 911
   and t0.id_job_attribute = t1.id_job_attribute
   and t1.id_job_profile in (select id_job_profile from unemployed)
   and t1.id_job_profile <> 911;

results in a query plan result:

                                                             QUERY PLAN
------------------------------------------------------------------------
-----------------------------------------------------------
  Aggregate  (cost=70521.28..70521.28 rows=1 width=8)
    ->  Merge Join  (cost=66272.11..70158.29 rows=145194 width=8)
          Merge Cond: ("outer".id_job_attribute =
"inner".id_job_attribute)
          ->  Sort  (cost=31.53..32.44 rows=366 width=8)
                Sort Key: t0.id_job_attribute
                ->  Index Scan using
job_property__id_job_profile__fk_index on job_property t0
(cost=0.00..15.95 rows=366 width=8)
                      Index Cond: (id_job_profile = 911)
          ->  Sort  (cost=66240.58..67456.79 rows=486483 width=8)
                Sort Key: t1.id_job_attribute
                ->  Hash IN Join  (cost=34.08..20287.32 rows=486483
width=8)
                      Hash Cond: ("outer".id_job_profile =
"inner".id_job_profile)
                      ->  Seq Scan on job_property t1
(cost=0.00..12597.89 rows=558106 width=12)
                            Filter: (id_job_profile <> 911)
                      ->  Hash  (cost=31.46..31.46 rows=1046 width=4)
                            ->  Seq Scan on unemployed
(cost=0.00..31.46 rows=1046 width=4)
(21 rows)




This takes about 1minute, 45 seconds on a test database with about
31.882 job_profile and 8.483.005 job_property records. The final
solution will have about 1.000.000 job_profile records and, well ...
about 266.074.901  so we wonder what options we have in order to
improve this select. Should we rewrite the function (and others) in C?
Turning off seqscans makes it slower which might be because psql is
hopping between the index and the row values back and forth as a lot of
rows are involved.

Any hint how to speed up this would be great.


regards David

pgsql-performance by date:

From: Jack Coates
Date: 30 January 2004, 16:13:27
Subject: Re: query optimization question

From: PC Drew
Date: 30 January 2004, 17:09:50
Subject: Re: another query optimization question

another query optimization question - Mailing list pgsql-performance

Previous

Next