Re: Best approach for query with optional constraints - Mailing list pgsql-general

From Kevin Grittner
Subject Re: Best approach for query with optional constraints
Date
Msg-id 1359396969.76236.YahooMailNeo@web162901.mail.bf1.yahoo.com
Whole thread Raw
In response to Best approach for query with optional constraints  (Jon Smark <jon.smark@yahoo.com>)
List pgsql-general
Jon Smark <jon.smark@yahoo.com> wrote:

> Here's the problem: I want to retrieve a list of bugs (possibly) matching
> certain constraints.  One possible constraint is a user ID: if given, only
> those bugs reported by the user will be returned.  Another constraint is
> a set of tags: only those bugs that contain *all* the tags in the set must
> be returned (if the set is empty, then all bugs match).  Moreover, the user
> and tags constraints are conjunctive: if both set, that means I want only
> the bugs by that user having also the given set of tags.
>
> Below is one possible solution using a CTE to fetch all bugs that match
> the tags constraint.  (Note that $IDENTIFIER will be replaced by the actual
> contents of the variable using prepared statements).

The best thing to do in such a situation is to mock up some data
and try.

I'll show some timings for your alternatives and one I thought of,
using one million rows in bugs and about 10 million rows in
bug_tags, with about on hundred thousand tags, randomly assigned.
Searches were for a user and three tags.  I eliminated the
apparently unneeded joins to tags.

> with tagged as
>   (select bid
>   from bug_tags inner join tags on bug_tags.tid = tags.tid
>   group by bid
>   having array_agg (tags.tag) @> $TAGS)
> select *
> from bugs
> where (($UID :: int4) is null or uid = $UID) and
> ($TAGS = (array[] :: text[]) or bid in (select bid from tagged))

About 1550 ms.

> The first alternative is shown below.  The basic principle is the same,
> except that instead of a CTE there is a subquery.  The idea is that the
> subquery will only be performed if the set of tags is non-empty.  Or is
> the query planner smart enough to never execute the CTE if this is the
> case?  If so, then both these solutions should be identical.
>
> select *
> from bugs
> where (($UID :: int4) is null or uid = $UID) and
> ($TAGS = (array[] :: text[]) or
> bid in (
>   select bid
>   from bug_tags inner join tags on bug_tags.tid = tags.tid
>   group by bid
>   having array_agg (tags.tag) @> $TAGS))

About 1550 ms.

> The second alternative is shown below.  This one uses a different approach:
> instead of constructing a set of bug IDs matching the tag constraints, it
> goes one by one through the bugs, verifying if each one matches the tag
> constraints.
>
> select *
> from bugs
> where (($UID :: int4) is null or uid = $UID) and
> ($TAGS = (array[] :: text[]) or
> $TAGSs <@ array (select tags.tag from bug_tags inner join tags on
> bug_tags.tid = tags.tid where bug_tags.bid = bugs.bid))

About 90 ms.

My idea:

SELECT *
  FROM bugs b
  WHERE (($UID :: int4) is null or uid = $UID)
    AND NOT EXISTS
          (
            SELECT * FROM (SELECT unnest($TAGSs)) t(tid)
              WHERE NOT EXISTS
                      (
                        SELECT * FROM bug_tags bt
                          WHERE bt.bid = b.bid
                            AND bt.tid = t.tid
                      )
          )

About 85 ms.

> To conclude: is it worth worrying about such details, or can I assume the
> query optimizer will automagically choose the best approach.

The optimizer gets smarter with every major release in terms of
recognizing logically equivalent plans, so that it can pick the one
with the lowest estimated cost; but there will always be constructs
which it does not recognize as logically equivalent either because
it would cost to much to test for equivalence or we just haven't
gotten to that yet.  So if you can see different ways to write a
query, it's not a bad idea to compare them on performance.

-Kevin


pgsql-general by date:

Previous
From: Steve Atkins
Date:
Subject: Re: Installing PostgreSQL on OSX Server
Next
From: Anson Abraham
Date:
Subject: Re: main.log file not being updated