Re: Optimizing execution of expensive subqueries - Mailing list pgsql-general

From David Rowley
Subject Re: Optimizing execution of expensive subqueries
Date
Msg-id CAKJS1f8eutt9TbEgrVg6KRXqpcE0FLxkYen+TgZZR+0yJYxL_A@mail.gmail.com
Whole thread Raw
In response to Optimizing execution of expensive subqueries  (Mathieu Fenniak <mathieu.fenniak@replicon.com>)
List pgsql-general
On 12 July 2018 at 02:24, Mathieu Fenniak <mathieu.fenniak@replicon.com> wrote:
> I'm currently looking at a query that is generally selecting a bunch of
> simple columns from a table, and also performing some subqueries to
> aggregate related data, and then sorting by one of the simple columns and
> paginating the result.
>
> eg.
>
> SELECT
>   tbl.field1, tbl.field2, tbl.field3, ...,
>   (SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId
> AND anothertbl.ThingyId = 1) as Thingy1Sum,
>   ... repeat for multiply thingies ...
> FROM
>   tbl
> ORDER BY tbl.field1 LIMIT 20
>
> I'm finding that if "tbl" contains hundreds of thousands of rows, the
> subqueries are being executed hundreds of thousands of times.  Because of
> the sorting and pagination, this is appears to be unnecessary, and the
> result is slow performance.  (PostgreSQL 9.5.9 server)

You've got two choices.

1) You can add a btree index on field1 so that the executor does not
need to examine all records before taking the top-20, or;
2) move the subquery out of the target list and instead make it a LEFT
JOIN adding an appropriate GROUP BY clause.

#2 might not be a great option since it may require building groups
that don't get used, but it would likely be the bast option if you
didn't have a LIMIT clause, or the LIMIT was a larger percentage of
the total records.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-general by date:

Previous
From: Thiemo Kellner
Date:
Subject: RFC on pglogger
Next
From: Anto Aravinth
Date:
Subject: Building a notification system.