Re: Help analyzing 7.2.4 EXPLAIN - Mailing list pgsql-performance

From Tom Lane
Subject Re: Help analyzing 7.2.4 EXPLAIN
Date
Msg-id 11860.1049935883@sss.pgh.pa.us
Whole thread Raw
In response to Help analyzing 7.2.4 EXPLAIN  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Help analyzing 7.2.4 EXPLAIN
Re: Help analyzing 7.2.4 EXPLAIN
List pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:
>  Becuase it's 7.2.4, it's a little hard to tell exactly which part of the
> query is taking up 90% of the processing time.

Keep in mind that in the subqueries, the "actual time" shown is the time
per iteration --- you should multiply by the "loops" value to get an
accurate idea of where the time is going.  With that in mind, it's real
clear that the first subplan is eating the bulk of the time.

I think you are probably right that execution of the
if_addendee_conflict() function is the main cost.  But
given this subquery that's not too surprising:

>      (SELECT if_addendee_conflict(users.user_id, 3272, '2003-04-15 10:00', '1 days'::INTERVAL,
>          events.event_id, events.event_date, events.duration, event_cats.status, '30 minutes') as cflt
>         FROM events, event_types, event_cats, event_days
>         WHERE events.event_id = event_days.event_id
>             and events.etype_id = event_types.etype_id
>              AND event_types.ecat_id = event_cats.ecat_id
>              AND event_days.event_day
>                  BETWEEN '2003-04-15' AND '2003-04-16 10:00'
>         ORDER BY cflt LIMIT 1)  AS conflict

What you have here is a subquery that will execute
if_addendee_conflict() for *each* row of the events table; then throw
away all but one of the results.  And then do that over again for each
user row.  It looks to me like if_addendee_conflict() is being called
nearly 4000 times in this query.  No wonder it's slow.

The first thing that pops to mind is whether you really need the *first*
conflict, or would it be enough to find any old conflict?  If you could
dispense with the ORDER BY then at least some evaluations of
if_addendee_conflict() could be saved.

Realistically, though, I think you're going to have to refactor the work
to make this perform reasonably.  How much of what
if_addendee_conflict() does is actually dependent on the user_id?  Could
you separate out tests that depend only on the event, and do that in a
separate pass that is done only once per event, instead once per
event*user?  If you could reduce the number of events that need to be
examined for any given user, you could get somewhere.

Also, I don't see where this query checks to see if the user is actually
interested in attending the event.  Is that one of the things
if_addendee_conflict checks?  If so, you should pull it out and make it
a join condition.  You're essentially forcing the stupidest possible
join algorithm by burying that condition inside a user-defined function.
It would win to check that sooner instead of later, since presumably the
set of interesting events for any one user is a lot smaller than the set
of all events.

            regards, tom lane


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Caching (was Re: choosing the right platform)
Next
From: "Matthew Nuzum"
Date:
Subject: Re: Caching (was Re: choosing the right platform)