Re: How can I make this query faster (resend) - Mailing list pgsql-performance
From | Cstdenis |
---|---|
Subject | Re: How can I make this query faster (resend) |
Date | |
Msg-id | 013801c6832d$1cf73d90$6401a8c0@chris Whole thread Raw |
In response to | How can I make this query faster (resend) ("Cstdenis" <cstdenis@voicio.com>) |
Responses |
Re: How can I make this query faster (resend)
|
List | pgsql-performance |
(Resending because my other send didn't get a CC to the list) From: "Jim C. Nasby" <jnasby@pervasive.com> To: "Cstdenis" <cstdenis@voicio.com> Cc: <pgsql-performance@postgresql.org> Sent: Monday, May 22, 2006 8:20 AM Subject: Re: [PERFORM] How can I make this query faster (resend) > On Fri, May 19, 2006 at 03:56:49PM -0700, Cstdenis wrote: > > (Its been a hour and I dont see my message on the list so I'm sending it again. I've moved the queries and analyze out of the email incase it was rejected because too long) > > > > query: http://pastebin.ca/57218 > > > > In the pictures table all the ratings have a shared index > > > > CREATE INDEX idx_rating ON pictures USING btree (rating_nudity, rating_violence, rating_sex, rating_racism, rating_spoilers, rating_yaoi, rating_yuri, rating_profanity); > > > > and approved and date_submitted and user_id also have their own btree indexes. > > > > In the picture_categories table pid and cat_id have their own btree indices plus one together. > > > > Full table definition: http://pastebin.ca/57219 > > > > the cat_id and rating values vary from query to query. The one listed above took 54 seconds in a test run just now. Here is explain analyze: http://pastebin.ca/57220 > > pictures is the interesting table here. It looks like the planner would > do better to choose something other than a nested loop on it. Try > running EXPLAIN ANALYZE on the query with enable_nestloop=off and see > what you get (you'll need to compare it to what you get with > enable_nestloop on to see what the change is). With enable_nestloop=off the same query as is explained further down in this email took much longer 63 seconds insted of 6. It decided to do sequencial scans on pictures and users with nested loop disabled. Merge Join (cost=146329.63..146963.96 rows=231 width=66) (actual time=61610.538..62749.176 rows=1305 loops=1) Merge Cond: ("outer".user_id = "inner".user_id) -> Sort (cost=123828.88..123829.46 rows=231 width=47) (actual time=60445.367..60451.176 rows=1305 loops=1) Sort Key: pictures.user_id -> Hash Join (cost=634.36..123819.81 rows=231 width=47) (actual time=128.088..60423.623 rows=1305 loops=1) Hash Cond: ("outer".pid = "inner".pid) -> Seq Scan on pictures (cost=0.00..121670.43 rows=302543 width=47) (actual time=0.210..58795.925 rows=291318 loops=1) -> Hash (cost=633.78..633.78 rows=231 width=4) (actual time=38.443..38.443 rows=1305 loops=1) -> Bitmap Heap Scan on picture_categories (cost=2.81..633.78 rows=231 width=4) (actual time=4.753..32.259 rows=1305 loops=1) Recheck Cond: (cat_id = 182) -> Bitmap Index Scan on idx_picture_categories_cat_id (cost=0.00..2.81 rows=231 width=0) (actual time=4.398..4.398 rows=1305 loops=1) Index Cond: (cat_id = 182) -> Sort (cost=22500.74..22816.79 rows=126418 width=23) (actual time=1163.788..1505.104 rows=52214 loops=1) Sort Key: users.user_id -> Seq Scan on users (cost=0.00..11788.18 rows=126418 width=23) (actual time=0.017..692.992 rows=54605 loops=1) Total runtime: 62776.720 ms > > Both pictures and picture categories have about 287,000 rows > > > > This query needs to run in under about a second or it kills my site by clogging apache slots (apache maxes out at 256 and I can have several hundred people on my site at a time). How can I make it run faster? > > > > > > Server is a dual xeon with a gig of ram dedicated mostly to postgresql. > > Here is the changed lines in my postgresql.conf: http://pastebin.ca/57222 > > I suspect the low work_mem may be why it's using a nested loop. In > addition to the test above, it would be interesting to see what happens > to the plan if you set work_mem to 10000. I moved to a more powerful server (2gb ram and mirrored scsi HDs) and upped the work mem to 10mb. Its much faster now, however its still doing a nested loop. (see also my reply to Markus Schaber) Nested Loop (cost=2.81..3398.76 rows=231 width=66) (actual time=14.946..5797.701 rows=1305 loops=1) -> Nested Loop (cost=2.81..2022.71 rows=231 width=47) (actual time=14.551..5181.042 rows=1305 loops=1) -> Bitmap Heap Scan on picture_categories (cost=2.81..633.78 rows=231 width=4) (actual time=9.966..140.606 rows=1305 loops=1) Recheck Cond: (cat_id = 182) -> Bitmap Index Scan on idx_picture_categories_cat_id (cost=0.00..2.81 rows=231 width=0) (actual time=9.720..9.720 rows=1305 loops=1) Index Cond: (cat_id = 182) -> Index Scan using pictures_pkey on pictures (cost=0.00..6.00 rows=1 width=47) (actual time=3.802..3.820 rows=1 loops=1305) Index Cond: (pictures.pid = "outer".pid) -> Index Scan using users_pkey on users (cost=0.00..5.94 rows=1 width=23) (actual time=0.095..0.100 rows=1 loops=1305) Index Cond: ("outer".user_id = users.user_id) Total runtime: 5812.238 ms > To be honest, you're pushing things expecting a machine with only 1G to > serve 300 active connections. How large is the database itself? The database is 3.7G on disk. There is about 1G of actual data in it -- the rest is dead tuples and indices. (I vacuum regularly, but a vacuum full causes too much downtime to do unless I have to) > > I know hyperthreading is considered something that can slow down a server but with my very high concurancy (averages about 400-500 concurant users during peak hours) I am hoping the extra virtual CPUs wil help. Anyone have experance that says diferent at high concurancy? > > Best bet is to try it and see. Generally, people find HT hurts, but I > recently saw it double the performance of pgbench on a windows XP > machine, so it's possible that windows is just more clever about how to > use it than linux is. Anyone know if those who have found it hurts are low concurancy complex cpu intensive queries or high concurancy simple queries or both? I can understand it hurting in the former, but not the later. I'll have to give it a try I guess. It should at least help my very high load averages. > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
pgsql-performance by date: