Re: Query performance issue - Mailing list pgsql-performance
From | Jonathan Gray |
---|---|
Subject | Re: Query performance issue |
Date | |
Msg-id | 0f3001c7cdd3$a7db7740$f79265c0$@com Whole thread Raw |
In response to | Re: Query performance issue (Chris <dmagick@gmail.com>) |
Responses |
Re: Query performance issue
|
List | pgsql-performance |
Chris, Creating indexes on the customerclass table does speed up the queries but still does not create the plan we are looking for (using the double index with a backward index scan on the orders table). The plans we now get, with times on par or slightly better than with the plpgsql hack, are: EXPLAIN ANALYZE SELECT o.orderid,o.orderstamp FROM indextest.orders o INNER JOIN indextest.customerclass cc ON (cc.classid = 2) WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 5; QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- -------- Limit (cost=0.00..176.65 rows=5 width=12) (actual time=0.930..3.675 rows=5 loops=1) -> Nested Loop (cost=0.00..46388.80 rows=1313 width=12) (actual time=0.927..3.664 rows=5 loops=1) -> Index Scan Backward using orders_orderstamp_idx on orders o (cost=0.00..6225.26 rows=141001 width=16) (actual time=0.015..0.957 rows=433 loops=1) -> Index Scan using customerclass_customerid_idx on customerclass cc (cost=0.00..0.27 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=433) Index Cond: (o.customerid = cc.customerid) Filter: (classid = 2) And EXPLAIN ANALYZE SELECT o.orderid,o.orderstamp FROM indextest.orders o INNER JOIN indextest.customerclass cc ON (cc.classid = 2) WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 100; QUERY PLAN ---------------------------------------------------------------------------- --------------------------------------------------------------------------- Limit (cost=1978.80..1979.05 rows=100 width=12) (actual time=6.167..6.448 rows=100 loops=1) -> Sort (cost=1978.80..1982.09 rows=1313 width=12) (actual time=6.165..6.268 rows=100 loops=1) Sort Key: o.orderstamp -> Nested Loop (cost=3.99..1910.80 rows=1313 width=12) (actual time=0.059..4.576 rows=939 loops=1) -> Bitmap Heap Scan on customerclass cc (cost=3.99..55.16 rows=95 width=4) (actual time=0.045..0.194 rows=95 loops=1) Recheck Cond: (classid = 2) -> Bitmap Index Scan on customerclass_classid_idx (cost=0.00..3.96 rows=95 width=0) (actual time=0.032..0.032 rows=95 loops=1) Index Cond: (classid = 2) -> Index Scan using orders_customerid_idx on orders o (cost=0.00..19.35 rows=15 width=16) (actual time=0.006..0.025 rows=10 loops=95) Index Cond: (o.customerid = cc.customerid) As I said, this is a hypothetical test case we have arrived at that describes our situation as best as we can given a simple case. We're interested in potential issues with the approach, why postgres would not attempt something like it, and how we might go about implementing it ourselves at a lower level than we currently have (in SPI, libpq, etc). If it could be generalized then we could use it in cases where we aren't pulling from just one table (the orders table) but rather trying to merge, in sorted order, results from different conditions on different tables. Right now we use something like the plpgsql or plpythonu functions in the example and they outperform our regular SQL queries by a fairly significant margin. An example might be: SELECT * FROM ( (SELECT orderid,stamp FROM indextest.orders_usa WHERE customerid = <setof customerids> ORDER BY stamp DESC LIMIT 5) UNION (SELECT orderid,stamp FROM indextest.orders_can WHERE customerid = <setoff customerids> ORDER BY stamp DESC LIMIT 5) ) as x ORDER BY x.stamp DESC Again, that's a general example but some of my queries contain between 5 and 10 different sorted joins of this kind and it would be helpful to have something internal in postgres to efficiently handle it (do something just like the above query but not have to do the full LIMIT 5 for each set, some kind of in order merge/heap join?) Jonathan Gray -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Chris Sent: Tuesday, July 24, 2007 1:51 AM To: Jonathan Gray Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query performance issue Chris wrote: > Jonathan Gray wrote: >> We're experiencing a query performance problem related to the planner >> and its ability to perform a specific type of merge. >> >> >> >> We have created a test case (as attached, or here: >> http://www3.streamy.com/postgres/indextest.sql) which involves a >> hypothetical customer ordering system, with customers, orders, and >> customer groups. >> >> >> >> If we want to retrieve a single customers 10 most recent orders, >> sorted by date, we can use a double index on (customer,date); >> Postgres's query planner will use the double index with a backwards >> index scan on the second indexed column (date). >> >> >> >> However, if we want to retrieve a "customer class's" 10 most recent >> orders, sorted by date, we are not able to get Postgres to use double >> indexes. > > You don't have any indexes on the 'customerclass' table. > > Creating a foreign key doesn't create an index, you need to do that > separately. > > Try > > create index cc_customerid_class on indextest.customerclass(classid, > customerid); > It could also be that since you don't have very much data (10,000) rows - postgres is ignoring the indexes because it'll be quicker to scan the tables. If you bump it up to say 100k rows, what happens? -- Postgresql & php tutorials http://www.designmagick.com/ ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
pgsql-performance by date: