Re: SELECT too complex? - Mailing list pgsql-novice

From Josh Berkus
Subject Re: SELECT too complex?
Date
Msg-id 200306270835.32822.josh@agliodbs.com
Whole thread Raw
In response to SELECT too complex?  (Rory Campbell-Lange <rory@campbell-lange.net>)
Responses Re: SELECT too complex?
List pgsql-novice
Rory,

> I have a set of complex selects that I'm worried about from the point of
> view of 1) can't these be made simpler 2) performance 3) describing the
> selects to other developers in my documentation. One of the complex
> selects is shown below.

Keep in mind that (2) is not in sync with (1) or (3); I have frequently run
into situations where huge, messy-looking versions of queries run much faster
than simpler-looking queries that produce the same results.  i.e. "simpler"
to a human is not necessarity "simpler" to the planner.

> General comments on the nature of the select, how to optimise it or
> rewrite it, and how to document what it does would be most appreciated.

Sure.  My general experience with complex queries has been that:
1) The more complex a query, the more possible ways there are to write it.
2) With a query which can be written many ways, the best way to
"performance-tune" it is to try different query structures and keep
re-executing it against the back end until you get something you're happy
with (EXPLAIN ANALYZE can help)

As for simplifying, I'm afraid that I don't really understand what you're
trying to get with this query.  If I'm clueless, send me a version of the
query with each clause commented.

>     FOR resulter IN
>         SELECT
>             o.t_text_id                            as itemid,
>             pers.t_nickname                     as itemauthor,
>             o.n_type                            as itemtype,
>             o.t_name                            as itemtitle,
>             o.t_description                     as itembody,
>             to_char(o.dt_created,  'DD Mon'')   as itemcreated,
>             to_char(botimer, ''DD  on'')        as itemupdated,
>             COALESCE (c_count, 0)               as itemcommentcount,
>             p.n_id                              as imageid,
>             COALESCE(p.t_path, ''none'')        as imagesrc,
>             p.n_width                           as imagewidth,
>             p.n_height                          as imageheight
>         FROM
>             people pers,
>             objects o
>             LEFT OUTER JOIN photo p ON o.n_id_photo = p.n_id
>             LEFT OUTER JOIN (
>                 SELECT
>                     bo.n_object_id as boider,
>                     max(bo.dt_modified) as botimer
>                 FROM
>                     board_objects bo, board_object_members bom
>                 WHERE
>                     bom.n_person_id = personid
>                     AND
>                     bom.n_board_object_id = bo.n_id
>                 GROUP BY
>                     bo.n_object_id
>             ) as timer

You have a lot of left outer joins on aggregate subselects in this query.
try putting the subselect in the SELECT clause, and using ORDER BY ____ DESC
LIMIT 1 instead of MAX
Also, I fail to see the reason you need to left outer join on Board_Objects,
when as subselect to board_objects is in your WHERE criteria.  Presumably, if
there are no records in board_objects, you'll lose the row from person
anyway.
What about doing a straight join to board_objects and simplifing your outer
joins thereby?

>             ON o.n_id = timer.boider
>             LEFT OUTER JOIN (
>                 SELECT
>                     bo.n_object_id as c_bo_id,
>                     count(comm.n_id) as c_count
>                 FROM
>                     board_objects bo, comments comm
>                 WHERE
>                     comm.n_board_object_id = bo.n_id
>                     AND
>                     bo.n_board_id IN

You *definitely* don't want to use IN in this subquery, unless you're on 7.4.
Use EXISTS instead.  I'm also unclear on why you need this IN subquery at all.
You're checking board_objects against itself; why not just put it in the main
subselect?

>                         (SELECT
>                             n_board_id
>                         FROM
>                             board_objects
>                         WHERE
>                             n_object_id = persobjid
>                         )
>                 GROUP BY
>                     bo.n_object_id, bo.n_object_id
>                 ) as counter
>                 ON o.n_id = counter.c_bo_id
>         WHERE
>             o.n_creator = pers.n_id
>             AND
>             o.n_id IN (

Once again, through 7.3, use EXISTS, not IN.

>                 SELECT DISTINCT ON (bo.dt_modified, bo.n_object_id)
>                     bo.n_object_id as bo_obid

Why do you need a SELECT DISTINCT if you're just checking for existence?

>                 FROM
>                     board_objects bo, board_object_members bom
>                 WHERE
>                     bo.n_id = bom.n_board_object_id
>                     AND
>                     bom.n_person_id = personid
>             )
>         ORDER BY
>             botimer DESC
>         LIMIT
>             locallimit
>         OFFSET
>             localoffset LOOP

Overall, yeah, I think you're making the query too complex.  You have 3
subselects consulting the same table; surely at least 2 of these could be
consolidated.


--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-novice by date:

Previous
From: "Chad Thompson"
Date:
Subject: Re: Fw: Removing foreign key and adding sequence
Next
From: Tom Lane
Date:
Subject: Re: Which GRANTS are needed on a Sequnce?