Thread: Improve performance of query

From:
Richard Rowell
Date:

I'm trying to port our application from MS-SQL to Postgres.  We have
implemented all of our rather complicated application security in the
database.  The query that follows takes a half of a second or less on
MS-SQL server and around 5 seconds on Postgres.  My concern is that this
data set is rather "small" by our applications standards.  It is not
unusual for the da_answer table to have 2-4 million records.  I'm
worried that if this very small data set is taking 5 seconds, then a
"regular sized" data set will take far too long.

I originally thought the NOT EXISTS on the
"da_data_restrict_except_open" table was killing performance, but the
query took the exact same amount of time after I deleted all rows from
this table.  Note that the hard-coded 999999999.0, and 4000 parameters,
as well as the parameter to svp_getparentproviders are the three
variables that change from one run of this query to the next.

I'm using Postgres 7.4.5 as packaged in Debian.  shared_buffers is set
to 57344 and sort_mem=4096.

The machine has an AMD 1.8+ and ` gig of RAM.  Here are some relevant
performance statistics:
richard:/usr/share/cups/model# cat /proc/sys/kernel/shmmax
536870912
richard:/usr/share/cups/model# cat /proc/sys/kernel/shmall
536870912
richard:/home/richard# hdparm -tT /dev/hda
 Timing cached reads:   1112 MB in  2.00 seconds = 556.00 MB/sec
 Timing buffered disk reads:  176 MB in  3.02 seconds =  58.28 MB/sec

I have included an EXPLAIN ANALYZE, relevant table counts, and relevant
indexing information.  If anyone has any suggestions on how to improve
performance....  TIA!

SELECT tab.answer_id, client_id, question_id, recordset_id,
date_effective, virt_field_name
FROM
(
    SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id,
date_effective
    FROM da_answer a
    WHERE a.date_effective <= 9999999999.0
    AND a.inactive != 1
    AND
    (
            5000 = 4000
            OR
            (EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE
svp_getparentproviderids = a.provider_id))
    )
    UNION
    SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id,
date_effective
    FROM da_answer a,
    (
        SELECT main_id
        FROM da_data_restrict
        WHERE type_id = 2
        AND (provider_id IN (SELECT * FROM svp_getparentproviderids(1)))

        UNION

            SELECT sa.uid AS main_id
            FROM da_answer sa
                JOIN da_data_restrict_except_closed dr ON dr.main_id =
sa.uid AND dr.type_id = 2 AND dr.except_provider_id = 1
            WHERE (restricted = 1)
                AND (restricted_closed_except = 1)
                AND sa.covered_by_roi = 1
            UNION
            SELECT sa.uid AS main_id
            FROM da_answer sa
            WHERE (restricted = 0)
                AND (restricted_open_except = 1)
                AND (NOT EXISTS (SELECT dr.main_id FROM
da_data_restrict_except_open dr WHERE (dr.main_id = sa.uid) AND
(dr.type_id = 2) AND (dr.except_provider_id in (select * from
svp_getparentproviderids(1)))))
                AND sa.covered_by_roi = 1
            UNION
            SELECT sa.uid AS main_id FROM da_answer sa WHERE (restricted
= 0) AND (restricted_open_except = 0)
                AND sa.covered_by_roi = 1
    ) sec
    WHERE a.covered_by_roi = 1
        AND a.date_effective <= 9999999999.0
        AND a.inactive != 1
        AND a.uid = sec.main_id
        AND 5000 > 4000
) tab, da_question q
WHERE tab.question_id = q.uid AND (min_access_level <= 4000 OR
min_access_level IS NULL)

Table counts from relevant tables
da_question                      1095
da_answer                       21117
da_question                      1095
da_data_restrict_except_closed   3087
da_data_restrict_except_open    13391
svp_getparentproviderids(1)         1

Relevant Index
create index in_da_data_restrict_provider_id on
da_data_restrict(provider_id);
create index in_da_data_restrict_main_id on da_data_restrict(main_id);
create index in_da_data_restrict_type_id on da_data_restrict(type_id);
create index in_da_data_restrict_client_id on
da_data_restrict(client_id);
create index in_da_dr_type_provider on
da_data_restrict(type_id,provider_id);

create index in_da_data_rec_provider_id ON
da_data_restrict_except_closed(provider_id);
create index in_da_data_rec_type_id ON
da_data_restrict_except_closed(type_id);
create index in_da_data_rec_main_id ON
da_data_restrict_except_closed(main_id);
create index in_da_data_rec_except_provider_id ON
da_data_restrict_except_closed(except_provider_id);

create index in_da_data_reo_provider_id ON
da_data_restrict_except_open(provider_id);
create index in_da_data_reo_type_id ON
da_data_restrict_except_open(type_id);
create index in_da_data_reo_main_id ON
da_data_restrict_except_open(main_id);
create index in_da_data_reo_except_provider_id ON
da_data_restrict_except_open(except_provider_id);

create index in_da_answer_client_id ON da_answer(client_id);
create index in_da_answer_provider_id ON da_answer(provider_id);
create index in_da_answer_question_id ON da_answer(question_id);
create index in_da_answer_recordset_id ON da_answer(recordset_id);
create index in_da_answer_restricted ON da_answer(restricted);
create index in_da_answer_restricted_open_except ON
da_answer(restricted_open_except);
create index in_da_answer_restricted_closed_except ON
da_answer(restricted_closed_except);
create index in_da_answer_date_effective ON da_answer(date_effective);
create index in_da_answer_inactive ON da_answer(inactive);
create index in_da_answer_covered_by_roi ON da_answer(covered_by_roi);

create index in_da_ed_inactive_roi ON da_answer(date_effective,inactive,
covered_by_roi);

create index in_da_question_mal ON da_question(min_access_level);


From:
Stephen Frost
Date:

* Richard Rowell () wrote:
> I have included an EXPLAIN ANALYZE, relevant table counts, and relevant
> indexing information.  If anyone has any suggestions on how to improve
> performance....  TIA!

Just a thought- do the UNION's actually have to be union's or would
having them be 'UNION ALL's work?

    Stephen

From:
Richard Huxton
Date:

Richard Rowell wrote:
> I'm trying to port our application from MS-SQL to Postgres.  We have
> implemented all of our rather complicated application security in the
> database.  The query that follows takes a half of a second or less on
> MS-SQL server and around 5 seconds on Postgres.  My concern is that this
> data set is rather "small" by our applications standards.  It is not
> unusual for the da_answer table to have 2-4 million records.  I'm
> worried that if this very small data set is taking 5 seconds, then a
> "regular sized" data set will take far too long.
>
> I originally thought the NOT EXISTS on the
> "da_data_restrict_except_open" table was killing performance, but the
> query took the exact same amount of time after I deleted all rows from
> this table.  Note that the hard-coded 999999999.0, and 4000 parameters,
> as well as the parameter to svp_getparentproviders are the three
> variables that change from one run of this query to the next.
>
> I'm using Postgres 7.4.5 as packaged in Debian.  shared_buffers is set
> to 57344 and sort_mem=4096.

That shared_buffers value sounds too large for 1GB RAM - rewind to 10000
say. Also make sure you've read the "performance tuning" article at:
   http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

> I have included an EXPLAIN ANALYZE, relevant table counts, and relevant
> indexing information.  If anyone has any suggestions on how to improve
> performance....  TIA!

I think it's the function call(s).

> SELECT tab.answer_id, client_id, question_id, recordset_id,
> date_effective, virt_field_name
> FROM
> (
>     SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id,
> date_effective
>     FROM da_answer a
>     WHERE a.date_effective <= 9999999999.0
>     AND a.inactive != 1
>     AND
>     (
>             5000 = 4000
>             OR
>             (EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE
> svp_getparentproviderids = a.provider_id))
>     )
...
>SubPlan
>  ->  Function Scan on svp_getparentproviderids  (cost=0.00..15.00 rows=5 width=4) (actual time=0.203..0.203 rows=0
loops=21089)
>        Filter: (svp_getparentproviderids = $1)

Here it's running 21,089 loops around your function. Each one isn't
costing much, but it's the total that's killing you I think. It might be
possible to mark the function STABLE or such, depending on what it does
- see http://www.postgresql.org/docs/7.4/static/sql-createfunction.html

--
   Richard Huxton
   Archonet Ltd

From:
Tom Lane
Date:

Richard Rowell <> writes:
> I'm trying to port our application from MS-SQL to Postgres.  We have
> implemented all of our rather complicated application security in the
> database.  The query that follows takes a half of a second or less on
> MS-SQL server and around 5 seconds on Postgres.

The EXPLAIN shows that most of the time is going into repeated
executions of svp_getparentproviderids() in the first UNION arm:

>                                  ->  Seq Scan on da_answer a  (cost=0.00..63928.75 rows=10540 width=24) (actual
time=279.080..4418.808rows=161 loops=1) 
>                                        Filter: ((date_effective <= 9999999999::double precision) AND (inactive <> 1)
AND(subplan)) 
>                                        SubPlan
>                                          ->  Function Scan on svp_getparentproviderids  (cost=0.00..15.00 rows=5
width=4)(actual time=0.203..0.203 rows=0 loops=21089) 
>                                                Filter: (svp_getparentproviderids = $1)

I'd suggest replacing the EXISTS coding by IN:
    (EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE svp_getparentproviderids = a.provider_id))
to
    (a.provider_id IN (SELECT * FROM svp_getparentproviderids(1)))
The latter form is likely to be significantly faster in PG 7.4.

It's also possible that the speed loss compared to MSSQL is really
inside the svp_getparentproviderids function; you should look into
that rather than assuming this query per se is at fault.

Also, do you actually need UNION as opposed to UNION ALL?  The
duplicate-elimination behavior of UNION is a bit expensive if not
needed.  It looks from the EXPLAIN output that some of the unions
aren't actually eliminating any rows.

            regards, tom lane

From:
John A Meinel
Date:

The first thing to check... Did you do a recent VACUUM ANALYZE? This
updates all the statistics. There are a number of places where it says
"rows=1000" which is usually the "I have no idea, let me guess 1000".
Also, there are a number of places where the estimates are pretty far
off. For instance:

Richard Rowell wrote:

>->  Subquery Scan "*SELECT* 1"  (cost=0.00..64034.15 rows=10540 width=24) (actual time=279.089..4419.371 rows=161
loops=1)
>
>
estimating 10,000 when only 161 is a little bit different.

>                                 ->  Seq Scan on da_answer a  (cost=0.00..63928.75 rows=10540 width=24) (actual
time=279.080..4418.808rows=161 loops=1) 
>                                       Filter: ((date_effective <= 9999999999::double precision) AND (inactive <> 1)
AND(subplan)) 
>
>
Though this could be a lack of cross-column statistics. If 2 columns are
correlated, the planner isn't as accurate as it could be. Also,
date_effective <= 9999999999 doesn't seem very restrictive, could you
use a between statement? (date between 0 and 9999999). I know for
timestamps usually giving a between is better than a single sided query.

This one was underestimated.

>->  Subquery Scan "*SELECT* 2"  (cost=988627.58..989175.52 rows=2799 width=24) (actual time=290.730..417.720 rows=7556
loops=1)
>                                 ->  Hash Join  (cost=988627.58..989147.53 rows=2799 width=24) (actual
time=290.722..395.739rows=7556 loops=1) 
>                                       Hash Cond: ("outer".main_id = "inner".uid)
>
>
This is one of the ones that looks like it didn't have any ideas. It
could be because of the function. You might consider adding a function
index, though I think there are some caveats there.

>->  Function Scan on svp_getparentproviderids  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.473..0.474 rows=1
loops=1)
>
>
Another very poor estimation. It might be a need to increase the
statistics for this column (ALTER TABLE, ALTER COLUMN, SET STATISTICS).
IIRC, compared with other db's postgres defaults to a much lower
statistics value. Try changing it from 10 (?) to 100 or so. There was a
discussion that every column with an index should use higher statistics.

>->  Index Scan using in_da_dr_type_provider on da_data_restrict  (cost=0.00..145.50 rows=46 width=8) (actual
time=0.041..26.627rows=7280 loops=1) 
>
>
I'm not a great optimizer, these are just some first things to look at.
Your sort mem seems pretty low to me (considering you have 1GB of RAM).
Perhaps you could bump that up to 40MB instead of 4MB. Also, if you run
this query twice in a row, is it still slow? (Sometimes it takes a bit
of work to get the right indexes loaded into ram, but then it is faster.)

Just some guesses,
John
=:->



From:
John A Meinel
Date:

The first thing to check... Did you do a recent VACUUM ANALYZE? This
updates all the statistics. There are a number of places where it says
"rows=1000" which is usually the "I have no idea, let me guess 1000".
Also, there are a number of places where the estimates are pretty far
off. For instance:

Richard Rowell wrote:

>->  Subquery Scan "*SELECT* 1"  (cost=0.00..64034.15 rows=10540 width=24) (actual time=279.089..4419.371 rows=161
loops=1)
>
>
estimating 10,000 when only 161 is a little bit different.

>                                 ->  Seq Scan on da_answer a  (cost=0.00..63928.75 rows=10540 width=24) (actual
time=279.080..4418.808rows=161 loops=1) 
>                                       Filter: ((date_effective <= 9999999999::double precision) AND (inactive <> 1)
AND(subplan)) 
>
>
Though this could be a lack of cross-column statistics. If 2 columns are
correlated, the planner isn't as accurate as it could be. Also,
date_effective <= 9999999999 doesn't seem very restrictive, could you
use a between statement? (date between 0 and 9999999). I know for
timestamps usually giving a between is better than a single sided query.

This one was underestimated.

>->  Subquery Scan "*SELECT* 2"  (cost=988627.58..989175.52 rows=2799 width=24) (actual time=290.730..417.720 rows=7556
loops=1)
>                                 ->  Hash Join  (cost=988627.58..989147.53 rows=2799 width=24) (actual
time=290.722..395.739rows=7556 loops=1) 
>                                       Hash Cond: ("outer".main_id = "inner".uid)
>
>
This is one of the ones that looks like it didn't have any ideas. It
could be because of the function. You might consider adding a function
index, though I think there are some caveats there.

>->  Function Scan on svp_getparentproviderids  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.473..0.474 rows=1
loops=1)
>
>
Another very poor estimation. It might be a need to increase the
statistics for this column (ALTER TABLE, ALTER COLUMN, SET STATISTICS).
IIRC, compared with other db's postgres defaults to a much lower
statistics value. Try changing it from 10 (?) to 100 or so. There was a
discussion that every column with an index should use higher statistics.

>->  Index Scan using in_da_dr_type_provider on da_data_restrict  (cost=0.00..145.50 rows=46 width=8) (actual
time=0.041..26.627rows=7280 loops=1) 
>
>
I'm not a great optimizer, these are just some first things to look at.
Your sort mem seems pretty low to me (considering you have 1GB of RAM).
Perhaps you could bump that up to 40MB instead of 4MB. Also, if you run
this query twice in a row, is it still slow? (Sometimes it takes a bit
of work to get the right indexes loaded into ram, but then it is faster.)

Just some guesses,
John
=:->