Home > mailing lists

Query performance issue - Mailing list pgsql-performance

From	Jonathan Gray
Subject	Query performance issue
Date	July 24, 2007 05:12:07
Msg-id	0f1701c7cdc6$f9cc1a30$ed644e90$@com Whole thread
Responses	Re: Query performance issue
List	pgsql-performance

Tree view

We’re experiencing a query performance problem related to the planner and its ability to perform a specific type of merge.

We have created a test case (as attached, or here: http://www3.streamy.com/postgres/indextest.sql) which involves a hypothetical customer ordering system, with customers, orders, and customer groups.

If we want to retrieve a single customers 10 most recent orders, sorted by date, we can use a double index on (customer,date); Postgres’s query planner will use the double index with a backwards index scan on the second indexed column (date).

However, if we want to retrieve a “customer class’s” 10 most recent orders, sorted by date, we are not able to get Postgres to use double indexes.

We have come to the conclusion that the fastest way to accomplish this type of query is to merge, in sorted order, each customers set of orders (for which we can use the double index). Using a heap to merge these ordered lists (until we reach the limit) seems the most algorithmically efficient way we are able to find. This is implemented in the attachment as a pl/pythonu function.

Another less algorithmically efficient solution, but faster in practice for many cases, is to fetch the full limit of orders from each customer, sort these by date, and return up to the limit.

We are no masters of reading query plans, but for straight SQL queries the planner seems to yield two different types of plan. They are fast in certain cases but breakdown in our typical use cases, where the number of orders per customer is sparse compared to the total number of orders across the date range.

We are interested in whether a mechanism internal to Postgres can accomplish this type of merging of indexed columns in sorted order.

If this cannot currently be accomplished (or if there is something we are missing about why it shouldn’t be) we would appreciate any pointers to be able to translate our python heap approach into C functions integrated more closely with Postgres. The python function incurs large constant costs because of type conversions and repeated queries to the database.

Thanks for any help or direction.

Jonathan Gray / Miguel Simon

Attachment

indextest.sql

pgsql-performance by date:

From: andrew@pillette.com
Date: 24 July 2007, 04:57:57
Subject: Re: multicolumn index column order

From: Chris
Date: 24 July 2007, 05:42:03
Subject: Re: Query performance issue

Query performance issue - Mailing list pgsql-performance

Attachment

Previous

Next