Thread: SELECT too complex?

SELECT too complex?

From
Rory Campbell-Lange
Date:
This is a rather ill-defined enquiry. The main reason behind it is to
find out if I'm going down the right path as a Postgres newbie.

I have a set of complex selects that I'm worried about from the point of
view of 1) can't these be made simpler 2) performance 3) describing the
selects to other developers in my documentation. One of the complex
selects is shown below.

The database schema consists of objects, which are shared on boards by
being recorded in board_objects. Individuals can view things on a board
they themselves are joined to (people are like objects). People are
joined to specific board_objects as board_object_members, or may comment
on specific board_objects. It is relatively straightforward doing a
query on a board_object and discerning what object it refers to and what
comments it has aggregated against it.

The complex query below provides a list of objects which a person has
commented on or joined, garnering summaries of the comments on those
objects across boards the user is joined to, and sorting the objects on
the basis of the last updated board_object instance of that object
which, again, the user has access to.

I realise life would be easier if there was more duplicated data.

General comments on the nature of the select, how to optimise it or
rewrite it, and how to document what it does would be most appreciated.

Rory

A rough sketch of the main tables:

           +----------+                   +----------+
           |  people  |   +-----------+   |  boards  |
           +----+-----+   |  objects  |   +----+-----+
                |         +-----+-----+        |
                |               |              |
           +----+---------------+--------------+-----+
           |              board   objects            |
           + ---------+--------------------+---------+
                      |                    |
              +-------+-------+     +------+-------+
              |  obj members  |     |   comments   |
              +---------------+     +--------------+

Example for a personal view as described above: (the query runs within a
function):

    FOR resulter IN
        SELECT
            o.t_text_id                            as itemid,
            pers.t_nickname                     as itemauthor,
            o.n_type                            as itemtype,
            o.t_name                            as itemtitle,
            o.t_description                     as itembody,
            to_char(o.dt_created,  'DD Mon'')   as itemcreated,
            to_char(botimer, ''DD  on'')        as itemupdated,
            COALESCE (c_count, 0)               as itemcommentcount,
            p.n_id                              as imageid,
            COALESCE(p.t_path, ''none'')        as imagesrc,
            p.n_width                           as imagewidth,
            p.n_height                          as imageheight
        FROM
            people pers,
            objects o
            LEFT OUTER JOIN photo p ON o.n_id_photo = p.n_id
            LEFT OUTER JOIN (
                SELECT
                    bo.n_object_id as boider,
                    max(bo.dt_modified) as botimer
                FROM
                    board_objects bo, board_object_members bom
                WHERE
                    bom.n_person_id = personid
                    AND
                    bom.n_board_object_id = bo.n_id
                GROUP BY
                    bo.n_object_id
            ) as timer
            ON o.n_id = timer.boider
            LEFT OUTER JOIN (
                SELECT
                    bo.n_object_id as c_bo_id,
                    count(comm.n_id) as c_count
                FROM
                    board_objects bo, comments comm
                WHERE
                    comm.n_board_object_id = bo.n_id
                    AND
                    bo.n_board_id IN
                        (SELECT
                            n_board_id
                        FROM
                            board_objects
                        WHERE
                            n_object_id = persobjid
                        )
                GROUP BY
                    bo.n_object_id, bo.n_object_id
                ) as counter
                ON o.n_id = counter.c_bo_id
        WHERE
            o.n_creator = pers.n_id
            AND
            o.n_id IN (
                SELECT DISTINCT ON (bo.dt_modified, bo.n_object_id)
                    bo.n_object_id as bo_obid
                FROM
                    board_objects bo, board_object_members bom
                WHERE
                    bo.n_id = bom.n_board_object_id
                    AND
                    bom.n_person_id = personid
            )
        ORDER BY
            botimer DESC
        LIMIT
            locallimit
        OFFSET
            localoffset LOOP

        RETURN NEXT
            resulter;

    END LOOP;

--
Rory Campbell-Lange
<rory@campbell-lange.net>
<www.campbell-lange.net>

Re: SELECT too complex?

From
Josh Berkus
Date:
Rory,

> I have a set of complex selects that I'm worried about from the point of
> view of 1) can't these be made simpler 2) performance 3) describing the
> selects to other developers in my documentation. One of the complex
> selects is shown below.

Keep in mind that (2) is not in sync with (1) or (3); I have frequently run
into situations where huge, messy-looking versions of queries run much faster
than simpler-looking queries that produce the same results.  i.e. "simpler"
to a human is not necessarity "simpler" to the planner.

> General comments on the nature of the select, how to optimise it or
> rewrite it, and how to document what it does would be most appreciated.

Sure.  My general experience with complex queries has been that:
1) The more complex a query, the more possible ways there are to write it.
2) With a query which can be written many ways, the best way to
"performance-tune" it is to try different query structures and keep
re-executing it against the back end until you get something you're happy
with (EXPLAIN ANALYZE can help)

As for simplifying, I'm afraid that I don't really understand what you're
trying to get with this query.  If I'm clueless, send me a version of the
query with each clause commented.

>     FOR resulter IN
>         SELECT
>             o.t_text_id                            as itemid,
>             pers.t_nickname                     as itemauthor,
>             o.n_type                            as itemtype,
>             o.t_name                            as itemtitle,
>             o.t_description                     as itembody,
>             to_char(o.dt_created,  'DD Mon'')   as itemcreated,
>             to_char(botimer, ''DD  on'')        as itemupdated,
>             COALESCE (c_count, 0)               as itemcommentcount,
>             p.n_id                              as imageid,
>             COALESCE(p.t_path, ''none'')        as imagesrc,
>             p.n_width                           as imagewidth,
>             p.n_height                          as imageheight
>         FROM
>             people pers,
>             objects o
>             LEFT OUTER JOIN photo p ON o.n_id_photo = p.n_id
>             LEFT OUTER JOIN (
>                 SELECT
>                     bo.n_object_id as boider,
>                     max(bo.dt_modified) as botimer
>                 FROM
>                     board_objects bo, board_object_members bom
>                 WHERE
>                     bom.n_person_id = personid
>                     AND
>                     bom.n_board_object_id = bo.n_id
>                 GROUP BY
>                     bo.n_object_id
>             ) as timer

You have a lot of left outer joins on aggregate subselects in this query.
try putting the subselect in the SELECT clause, and using ORDER BY ____ DESC
LIMIT 1 instead of MAX
Also, I fail to see the reason you need to left outer join on Board_Objects,
when as subselect to board_objects is in your WHERE criteria.  Presumably, if
there are no records in board_objects, you'll lose the row from person
anyway.
What about doing a straight join to board_objects and simplifing your outer
joins thereby?

>             ON o.n_id = timer.boider
>             LEFT OUTER JOIN (
>                 SELECT
>                     bo.n_object_id as c_bo_id,
>                     count(comm.n_id) as c_count
>                 FROM
>                     board_objects bo, comments comm
>                 WHERE
>                     comm.n_board_object_id = bo.n_id
>                     AND
>                     bo.n_board_id IN

You *definitely* don't want to use IN in this subquery, unless you're on 7.4.
Use EXISTS instead.  I'm also unclear on why you need this IN subquery at all.
You're checking board_objects against itself; why not just put it in the main
subselect?

>                         (SELECT
>                             n_board_id
>                         FROM
>                             board_objects
>                         WHERE
>                             n_object_id = persobjid
>                         )
>                 GROUP BY
>                     bo.n_object_id, bo.n_object_id
>                 ) as counter
>                 ON o.n_id = counter.c_bo_id
>         WHERE
>             o.n_creator = pers.n_id
>             AND
>             o.n_id IN (

Once again, through 7.3, use EXISTS, not IN.

>                 SELECT DISTINCT ON (bo.dt_modified, bo.n_object_id)
>                     bo.n_object_id as bo_obid

Why do you need a SELECT DISTINCT if you're just checking for existence?

>                 FROM
>                     board_objects bo, board_object_members bom
>                 WHERE
>                     bo.n_id = bom.n_board_object_id
>                     AND
>                     bom.n_person_id = personid
>             )
>         ORDER BY
>             botimer DESC
>         LIMIT
>             locallimit
>         OFFSET
>             localoffset LOOP

Overall, yeah, I think you're making the query too complex.  You have 3
subselects consulting the same table; surely at least 2 of these could be
consolidated.


--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: SELECT too complex?

From
Rory Campbell-Lange
Date:
Hi Josh. Thanks very much for your input.

On 27/06/03, Josh Berkus (josh@agliodbs.com) wrote:

> As for simplifying, I'm afraid that I don't really understand what
> you're trying to get with this query.  If I'm clueless, send me a
> version of the query with each clause commented.

The aim of the select is to find all objects which a person has joined.
Objects are shared on boards, recorded in board_objects. People join
objects as board_object_members, which point to instances of
board_objects. So, an object can show up on one or more boards. One can
be a member of the object on one or more boards. One may also not have
the ability to "see" a board_object record of an object because one
isn't joined to that particular board.

So I am also an object and my membership of a board depends on whether
or not a board_objects record exists for that board. (See ***)

Mmm. Clear as mud.

> >     FOR resulter IN
> >         SELECT
> >             o.t_text_id                            as itemid,
> >             pers.t_nickname                     as itemauthor,
> >             o.n_type                            as itemtype,
> >             o.t_name                            as itemtitle,
> >             o.t_description                     as itembody,
> >             to_char(o.dt_created,  'DD Mon'')   as itemcreated,
> >             to_char(botimer, ''DD  on'')        as itemupdated,
> >             COALESCE (c_count, 0)               as itemcommentcount,
> >             p.n_id                              as imageid,
> >             COALESCE(p.t_path, ''none'')        as imagesrc,
> >             p.n_width                           as imagewidth,
> >             p.n_height                          as imageheight
> >         FROM
> >             people pers,
> >             objects o
> >             LEFT OUTER JOIN photo p ON o.n_id_photo = p.n_id
                            -- join photo on objects. There may or may
                            -- not be a photo.
> >             LEFT OUTER JOIN (
> >                 SELECT
> >                     bo.n_object_id as boider,
> >                     max(bo.dt_modified) as botimer
> >                 FROM
> >                     board_objects bo, board_object_members bom
> >                 WHERE
> >                     bom.n_person_id = personid ***
> >                     AND
> >                     bom.n_board_object_id = bo.n_id
> >                 GROUP BY
> >                     bo.n_object_id
> >             ) as timer

                -- here I grab the most recently updated instance of a
                -- board_ object to which I am attached. It is important to
                -- grab the latest as I may be attached to the same object
                -- on several boards, each which may have different
                -- update times.
                -- personid is my object id.
>
> You have a lot of left outer joins on aggregate subselects in this query.
> try putting the subselect in the SELECT clause, and using ORDER BY ____ DESC
> LIMIT 1 instead of MAX.

    OK. Sounds sensible.

> Also, I fail to see the reason you need to left outer join on Board_Objects,
> when as subselect to board_objects is in your WHERE criteria.  Presumably, if
> there are no records in board_objects, you'll lose the row from person
> anyway.
> What about doing a straight join to board_objects and simplifing your outer
> joins thereby?

    Good point! Excellent analysis; thanks. Presumably reducing outer
    joins will in theory reduce the query time?
>
> >             ON o.n_id = timer.boider
> >             LEFT OUTER JOIN (
> >                 SELECT
> >                     bo.n_object_id as c_bo_id,
> >                     count(comm.n_id) as c_count
> >                 FROM
> >                     board_objects bo, comments comm
> >                 WHERE
> >                     comm.n_board_object_id = bo.n_id
> >                     AND
> >                     bo.n_board_id IN
>
> You *definitely* don't want to use IN in this subquery, unless you're on 7.4.
> Use EXISTS instead.  I'm also unclear on why you need this IN subquery at all.
> You're checking board_objects against itself; why not just put it in the main
> subselect?

                         Mmm. This bit of the query grabs the count of
                         all comments on all versions of the object that
                         I'm attached to. It disregards comments on
                         board_objects I'm not joined to.

                         To paraphrase:
                         Get count of comments on board_objects where
                         board_objects.object_id is this object id BUT
                         only get objects on boards where I'm a member.

                         I haven't really used EXISTS. Makes sense,
                         thanks.

>
> >                         (SELECT
> >                             n_board_id
> >                         FROM
> >                             board_objects
> >                         WHERE
> >                             n_object_id = persobjid
> >                         )
> >                 GROUP BY
> >                     bo.n_object_id, bo.n_object_id
> >                 ) as counter
> >                 ON o.n_id = counter.c_bo_id
> >         WHERE
> >             o.n_creator = pers.n_id
> >             AND
> >             o.n_id IN (
>
> Once again, through 7.3, use EXISTS, not IN.

Yup.
>
> >                 SELECT DISTINCT ON (bo.dt_modified, bo.n_object_id)
> >                     bo.n_object_id as bo_obid
>
> Why do you need a SELECT DISTINCT if you're just checking for existence?

I found that a normal query was finding more copies of an object because
an object can be reflected several times in board_objects.
>
> >                 FROM
> >                     board_objects bo, board_object_members bom
> >                 WHERE
> >                     bo.n_id = bom.n_board_object_id
> >                     AND
> >                     bom.n_person_id = personid
> >             )
> >         ORDER BY
> >             botimer DESC
> >         LIMIT
> >             locallimit
> >         OFFSET
> >             localoffset LOOP
>
> Overall, yeah, I think you're making the query too complex.  You have 3
> subselects consulting the same table; surely at least 2 of these could be
> consolidated.

Going through it I'm not sure the aims of the query subselects are all
that wrong (i.e. most recent object I'm attached to; sum of comments on
this object that I'm attached to but only in the context of the boards I
belong to; ensuring that I'm WHEREing only objects that I can access).
Although I could be horribly wrong :-).

However your analysis has made me think much more deeply about how the
joins work and whether or not I need left outer joins. More than that,
the difficulty I've had explaining the database structure to you means
that I should carefully reconsider it.

Many thanks for the time you've taken to answer my question.

Rory
--
Rory Campbell-Lange
<rory@campbell-lange.net>
<www.campbell-lange.net>

Re: SELECT too complex?

From
Josh Berkus
Date:
Rory,

> However your analysis has made me think much more deeply about how the
> joins work and whether or not I need left outer joins. More than that,
> the difficulty I've had explaining the database structure to you means
> that I should carefully reconsider it.

Sure.  Although sometimes a very good, well-normalized structure is impossible
to explain without a clearer diagram than you can do in ASCII -- and without
a clear understanding of what the database is for.

--
Josh Berkus
Aglio Database Solutions
San Francisco