Thread: Help with performance on current status column

Help with performance on current status column

From
Chris Kratz
Date:
Hello All,

We are struggling with a specific query that is killing us.  When doing
explain analyze on the entire query, we *seem* to be getting killed by the
estimated number of rows on a case statement calculation.

I've included a snippet from the explain analyze of the much larger query. The
line in question, (cost=0.00..106.52 rows=1 width=16) (actual
time=0.048..67.728 rows=4725 loops=1) shows that it returned 4700 rows
instead of 1 which when coupled with a later join causes the statement to run
over 3 minutes.[1]

It seems that it thinks that the scan on role_id is going to return 1 row, but
in reality returns 4725 rows.  The case statement causing the problem uses
todays date to see if a particular row is still active.  Here is a test case
showing how far off the estimate is from the reality. [2]

I'm not too surprised to see that the estimate is off because it is
calculated, but does anyone know either how to make the estimate more
accurate so it picks a better plan, or is there a better way to do a "status"
function based off of the current date so that it is more efficient?  I've
played with statistics on this table (racheting them up to 1000) with no
change in the plan.

Any thoughts?

-Chris

[1] explain analyze snippet from larger query
->  Nested Loop  (cost=0.00..955.70 rows=1 width=204) (actual
time=3096.689..202704.649 rows=17 loops=1)
      Join Filter: ("inner".nameid = "outer".name_id)
      ->  Nested Loop  (cost=0.00..112.25 rows=1 width=33) (actual
time=0.271..90.760 rows=4725 loops=1)
            ->  Index Scan using role_definition_description_idx on
role_definition rdf  (cost=0.00..5.72 rows=1 width=21) (actual
time=0.215..0.218 rows=1 loops=1)
                  Index Cond: (description = 'Participant'::text)
                  Filter: (program_id = 120)
            ->  Index Scan using roles_role_id_idx on roles rol
(cost=0.00..106.52 rows=1 width=16) (actual time=0.048..67.728 rows=4725
loops=1)
                  Index Cond: (rol.role_id = "outer".role_id)
                  Filter: (CASE WHEN (role_id IS NULL) THEN NULL::text WHEN
(("begin" IS NOT NULL) AND ("end" IS NOT NULL)) THEN CASE WHEN
((('now'::text)::date >= "begin") AND (('now'::text)::date <= "end")) THEN
'Active'::text ELSE 'Inactive'::text END WHEN ("begin" IS NOT NULL) THEN CASE
WHEN (('now'::text)::date >= "begin") THEN 'Active'::text ELSE
'Inactive'::text END WHEN ("end" IS NOT NULL) THEN CASE WHEN
(('now'::text)::date <= "end") THEN 'Active'::text ELSE 'Inactive'::text END
ELSE 'Active'::text END = 'Active'::text)
      ->  Nested Loop Left Join  (cost=0.00..842.19 rows=97 width=175) (actual
time=6.820..42.863 rows=21 loops=4725)
            ->  Index Scan using namemaster_programid_idx on namemaster dem
(cost=0.00..470.12 rows=97 width=164) (actual time=6.811..42.654 rows=21
loops=4725)
                  Index Cond: (programid = 120)
                  Filter: ((name_float_lfm ~~* '%clark%'::text) OR
(metaphone(name_float_lfm, 4) = 'KLRK'::text) OR (soundex(name_float_lfm) =
'C462'::text))
            ->  Index Scan using validanswerid_pk on validanswer ina
(cost=0.00..3.82 rows=1 width=19) (actual time=0.003..0.004 rows=1
loops=99225)
                  Index Cond: (ina.validanswerid = "outer".inactive)

---------------------
[2] A much simpler statement triggers the incorrect row counts here.

explain analyze
select *
from roles rol
where

            CASE
           WHEN rol.role_id IS NULL
           THEN NULL
           WHEN rol."begin" IS NOT NULL and rol."end" IS NOT NULL
           THEN
             CASE WHEN TIMESTAMP 'now'>=rol."begin" and TIMESTAMP
'now'<=rol."end"
             THEN 'Active'
             ELSE 'Inactive' END
           WHEN rol."begin" IS NOT NULL
           THEN
             CASE WHEN TIMESTAMP 'now'>=rol."begin"
             THEN 'Active'
             ELSE 'Inactive' END
           WHEN rol."end" IS NOT NULL
           THEN
             CASE WHEN TIMESTAMP 'now'<=rol."end"
             THEN 'Active'
             ELSE 'Inactive' END
           ELSE 'Active'
           END = 'Active'

Seq Scan on roles rol  (cost=0.00..2368.54 rows=413 width=20) (actual
time=0.046..562.635 rows=79424 loops=1)
   Filter: (CASE WHEN (role_id IS NULL) THEN NULL::text WHEN (("begin" IS NOT
NULL) AND ("end" IS NOT NULL)) THEN CASE WHEN (('2005-09-13
16:43:18.721214'::timestamp without time zone >= "begin") AND ('2005-09-13
16:43:18.721214'::timestamp without time zone <= "end")) THEN 'Active'::text
ELSE 'Inactive'::text END WHEN ("begin" IS NOT NULL) THEN CASE WHEN
('2005-09-13 16:43:18.721214'::timestamp without time zone >= "begin") THEN
'Active'::text ELSE 'Inactive'::text END WHEN ("end" IS NOT NULL) THEN CASE
WHEN ('2005-09-13 16:43:18.721214'::timestamp without time zone <= "end")
THEN 'Active'::text ELSE 'Inactive'::text END ELSE 'Active'::text END =
'Active'::text)
 Total runtime: 884.456 ms
(3 rows)
--
Chris Kratz

Re: Help with performance on current status column

From
Richard Huxton
Date:
Chris Kratz wrote:
> Hello All,
>
> We are struggling with a specific query that is killing us.  When doing
> explain analyze on the entire query, we *seem* to be getting killed by the
> estimated number of rows on a case statement calculation.
>
> I've included a snippet from the explain analyze of the much larger query. The
> line in question, (cost=0.00..106.52 rows=1 width=16) (actual
> time=0.048..67.728 rows=4725 loops=1) shows that it returned 4700 rows
> instead of 1 which when coupled with a later join causes the statement to run
> over 3 minutes.[1]
>
> It seems that it thinks that the scan on role_id is going to return 1 row, but
> in reality returns 4725 rows.  The case statement causing the problem uses
> todays date to see if a particular row is still active.  Here is a test case
> showing how far off the estimate is from the reality. [2]

> [2] A much simpler statement triggers the incorrect row counts here.
>
> explain analyze
> select *
> from roles rol
> where
>
>             CASE
>            WHEN rol.role_id IS NULL
>            THEN NULL
>            WHEN rol."begin" IS NOT NULL and rol."end" IS NOT NULL
>            THEN
>              CASE WHEN TIMESTAMP 'now'>=rol."begin" and TIMESTAMP
> 'now'<=rol."end"
>              THEN 'Active'
>              ELSE 'Inactive' END
>            WHEN rol."begin" IS NOT NULL
>            THEN
>              CASE WHEN TIMESTAMP 'now'>=rol."begin"
>              THEN 'Active'
>              ELSE 'Inactive' END
>            WHEN rol."end" IS NOT NULL
>            THEN
>              CASE WHEN TIMESTAMP 'now'<=rol."end"
>              THEN 'Active'
>              ELSE 'Inactive' END
>            ELSE 'Active'
>            END = 'Active'

Aside #1 - I'm not entirely clear how role_id can be null since you
seemed to be joining against it in the real query.

Aside #2 - You're probably better off with CURRENT_DATE since begin/end
seem to be dates, rather than TIMESTAMP 'now' - and in any case you
wanted "timestamp with time zone"

OK, I think the root of your problem is your use of null to mean "not
ended" or "not started" (whatever 'not started' means). PostgreSQL has
the handy timestamptz value "infinity", but only for timestamps and not
for dates. I'd probably cheat a little and use an end date of
'9999-12-31' or similar to simulate "infinity". Then your test is simply:

WHERE
   ...
   AND (rol.begin <= CURRENT_DATE AND rol.end >= CURRENT_DATE)

That should estimate simply enough.

--
   Richard Huxton
   Archonet Ltd

Re: Help with performance on current status column

From
Chris Kratz
Date:
Hello Richard,

Thank you for the response.  I did forget to mention that the columns have the
following meanings.

One, if a begin or end date is null, it means that the role is open ended in
that direction.  For example, if there is no end date, that means currently
the role will go on forever beginning with the start date.  Your idea of
using minimal and maximum dates is an interesting one and not one that I had
considered.  I will do some testing later today and see if that makes a
difference.

The other option I am toying with is simply having a status column which is
updated nightly via a cron job.  This will probably be the most efficient and
can be indexed.

I also forgot to say that we have seen this behavior on 2 boxes both on Linux
(Red Hat ES & Mandrake) and both are running Postgres 8.0 (8.0.1 and 8.0.3).
Strangely, after playing with statistics some yesterday (setting from 10 to
100 to 1000 and back to 10 and analyzing), the 8.0.1 machine picks a
different plan and runs in a 101.104ms.  The larger machine (dual proc Opt, 6
disk raid 10, etc) with 8.0.3 still takes 3-5minutes to run the same query
with the same data set even after playing with statistics and repeated
analyze on the same table.  It just seems odd.  It seems it is picking the
incorrect plan based off of an overly optimistic estimate of rows returned
from the calculation.

The other frustration with this is that this sql is machine generated which is
why we have some of the awkwardness in the calculation.  That calc gets used
for a lot of different things including column definitions when people want
to see the column on screen.

Thanks,

-Chris

On Wednesday 14 September 2005 05:13 am, Richard Huxton wrote:
> Chris Kratz wrote:
> > Hello All,
> >
> > We are struggling with a specific query that is killing us.  When doing
> > explain analyze on the entire query, we *seem* to be getting killed by
> > the estimated number of rows on a case statement calculation.
> >
> > I've included a snippet from the explain analyze of the much larger
> > query. The line in question, (cost=0.00..106.52 rows=1 width=16) (actual
> > time=0.048..67.728 rows=4725 loops=1) shows that it returned 4700 rows
> > instead of 1 which when coupled with a later join causes the statement to
> > run over 3 minutes.[1]
> >
> > It seems that it thinks that the scan on role_id is going to return 1
> > row, but in reality returns 4725 rows.  The case statement causing the
> > problem uses todays date to see if a particular row is still active.
> > Here is a test case showing how far off the estimate is from the reality.
> > [2]
> >
> > [2] A much simpler statement triggers the incorrect row counts here.
> >
> > explain analyze
> > select *
> > from roles rol
> > where
> >
> >             CASE
> >            WHEN rol.role_id IS NULL
> >            THEN NULL
> >            WHEN rol."begin" IS NOT NULL and rol."end" IS NOT NULL
> >            THEN
> >              CASE WHEN TIMESTAMP 'now'>=rol."begin" and TIMESTAMP
> > 'now'<=rol."end"
> >              THEN 'Active'
> >              ELSE 'Inactive' END
> >            WHEN rol."begin" IS NOT NULL
> >            THEN
> >              CASE WHEN TIMESTAMP 'now'>=rol."begin"
> >              THEN 'Active'
> >              ELSE 'Inactive' END
> >            WHEN rol."end" IS NOT NULL
> >            THEN
> >              CASE WHEN TIMESTAMP 'now'<=rol."end"
> >              THEN 'Active'
> >              ELSE 'Inactive' END
> >            ELSE 'Active'
> >            END = 'Active'
>
> Aside #1 - I'm not entirely clear how role_id can be null since you
> seemed to be joining against it in the real query.
>
> Aside #2 - You're probably better off with CURRENT_DATE since begin/end
> seem to be dates, rather than TIMESTAMP 'now' - and in any case you
> wanted "timestamp with time zone"
>
> OK, I think the root of your problem is your use of null to mean "not
> ended" or "not started" (whatever 'not started' means). PostgreSQL has
> the handy timestamptz value "infinity", but only for timestamps and not
> for dates. I'd probably cheat a little and use an end date of
> '9999-12-31' or similar to simulate "infinity". Then your test is simply:
>
> WHERE
>    ...
>    AND (rol.begin <= CURRENT_DATE AND rol.end >= CURRENT_DATE)
>
> That should estimate simply enough.
>
> --
>    Richard Huxton
>    Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--
Chris Kratz