Re: Poor OFFSET performance in PostgreSQL 9.1.6 - Mailing list pgsql-performance

From David Rowley
Subject Re: Poor OFFSET performance in PostgreSQL 9.1.6
Date
Msg-id CAApHDvrTpkK-6XUvKxWNKMw09mwW-xVbuw7WS-5Yti0NhNdy_w@mail.gmail.com
Whole thread Raw
In response to Poor OFFSET performance in PostgreSQL 9.1.6  (<fburgess@radiantblue.com>)
List pgsql-performance
On Thu, Aug 29, 2013 at 8:39 AM, <fburgess@radiantblue.com> wrote:
Can anyone offer suggestions on how I can optimize a query that contains the LIMIT OFFSET clause?

The explain plan of the query is included in the notepad attachment.

thanks


Before I write anything, I should warn that it has been a while since I had to read an explain analyze without also having a schema design to back it up, so I may have this wrong.

Since the sort must be applied before the limit and the join must take place before the sort, it's naturally pretty slow given the number of rows being processed.

It's hard to tell how the tables are constrained by looking at the explain analyze. It seems that labor_uid is the primary key for the labor_tasks table... (going by the 1 record that is returned. If so, then it's a bit confusing why you'd need to sort on labor1.start_time at all, since only 1 row exists anyway...

I got that from here:
Index Scan using corporate_labor_pkey on labor_tasks labor1_  (cost=0.00..4.27 rows=1 width=954) (actual time=0.017..0.020 rows=1 loops=1)                   Index Cond:  (labor_uid = 20178)

So this is the case then you could do all of your sorting before the join takes place by using a sub query. This would also mean you could do your LIMIT and OFFSET in the same sub query. If the tables in the sub query were properly indexed then this could be a a pretty fast operation which would only hit less than 1 million rows. I'm not quite sure how your partitioning is setup though as the names seem to indicate both dates and labour UIDs... so I'll just ignore these for now...
The query I came up with looked like this:

 SELECT * 
 FROM labor_tasks labor_1
 INNER JOIN (SELECT *
             FROM labor_task_report this_
             WHERE labor_UID = 20178
             ORDER BY work_DATE_TIME asc, work_UID asc, task_REPORT_UID 
             LIMIT 10000 OFFSET 940000
) this_ ON labor_1.labor_uid = this_.labor_uid
ORDER BY this_.work_DATE_TIME asc, this.work_UID asc, this.task_REPORT_UID 
If you had an index on labor_UID, work_date_time, work_UID, task_Report_uid then inner would have a better chance at running quickly.

Though I may have mis-read the explain analyze results and have it completely wrong.
Postgresql is a pretty amazing feat of engineering, but it does lack sometimes when it comes to properly optimising queries to the most optimal way by detecting functional dependencies in tables to speed up joins and grouping.
Regards
David


 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Poor OFFSET performance in PostgreSQL 9.1.6
Next
From: hubert depesz lubaczewski
Date:
Subject: Re: Poor OFFSET performance in PostgreSQL 9.1.6