Odd sorting behaviour - Mailing list pgsql-performance
From | Steinar H. Gunderson |
---|---|
Subject | Odd sorting behaviour |
Date | |
Msg-id | 20040708101913.GA15871@uio.no Whole thread Raw |
Responses |
Re: Odd sorting behaviour
Re: Odd sorting behaviour |
List | pgsql-performance |
[Apologies if this reaches the list twice -- I sent a copy before subscribing, but it seems to be stuck waiting for listmaster forever, so I subscribed and sent it again.] Hi, I'm trying to find out why one of my queries is so slow -- I'm primarily using PostgreSQL 7.2 (Debian stable), but I don't really get much better performance with 7.4 (Debian unstable). My prototype table looks like this: CREATE TABLE opinions ( prodid INTEGER NOT NULL, uid INTEGER NOT NULL, opinion INTEGER NOT NULL, PRIMARY KEY ( prodid, uid ) ); In addition, there are separate indexes on prodid and uid. I've run VACUUM ANALYZE before all queries, and they are repeatable. (If anybody needs the data, that could be arranged -- it's not secret or anything :-) ) My query looks like this: EXPLAIN ANALYZE SELECT o3.prodid, SUM(o3.opinion*o12.correlation) AS total_correlation FROM opinions o3 RIGHT JOIN ( SELECT o2.uid, SUM(o1.opinion*o2.opinion)/SQRT(count(*)+0.0) AS correlation FROM opinions o1 LEFT JOIN opinions o2 ON o1.prodid=o2.prodid WHERE o1.uid=1355 GROUP BY o2.uid ) o12 ON o3.uid=o12.uid LEFT JOIN ( SELECT o4.prodid, COUNT(*) as num_my_comments FROM opinions o4 WHERE o4.uid=1355 GROUP BY o4.prodid ) nmc ON o3.prodid=nmc.prodid WHERE nmc.num_my_comments IS NULL AND o3.opinion<>0 AND o12.correlation<>0 GROUP BY o3.prodid ORDER BY total_correlation desc; And produces the query plan at http://www.samfundet.no/~sesse/queryplan.txt (The lines were a bit too long to include in an e-mail :-) ) Note that the "o3.opinion<>0 AND o12.correleation<>0" lines are an optimization; I can run the query fine without them and it will produce the same results, but it goes slower both in 7.2 and 7.4. There are a few oddities here: - The "subquery scan o12" phase outputs 1186 rows, yet 83792 are sorted. Where do the other ~82000 rows come from? And why would it take ~100ms to sort the rows at all? (In earlier tests, this was _one full second_ but somehow that seems to have improved, yet without really improving the overall query time. shared_buffers is 4096 and sort_mem is 16384, so it should really fit into RAM.) - Why does it use uid_index for an index scan on the table, when it obviously has no filter on it (since it returns all the rows)? Furthermore, why would this take half a second? (The machine is a 950MHz machine with SCSI disks.) - Also, the outer sort (the sorting of the 58792 rows from the merge join) is slow. :-) 7.4 isn't really much better: http://www.samfundet.no/~sesse/queryplan74.txt Note that this is run on a machine with almost twice the speed (in terms of CPU speed, at least). The same oddities are mostly present (such as o12 returning 1186 rows, but 58788 rows are sorted), so I really don't understand what's going on here. Any ideas on how to improve this? /* Steinar */ -- Homepage: http://www.sesse.net/
pgsql-performance by date: