a lot of problems with pg 7.4 - Mailing list pgsql-performance
From | Kari Lavikka |
---|---|
Subject | a lot of problems with pg 7.4 |
Date | |
Msg-id | Pine.HPX.4.51.0312132027230.9281@purple.bdb.fi Whole thread Raw |
Responses |
Re: a lot of problems with pg 7.4
Re: a lot of problems with pg 7.4 Re: a lot of problems with pg 7.4 |
List | pgsql-performance |
Hi! We have been running a rather busy website using pg 7.3 as the database. Peak hitrate is something like 120 request / second without images and other static stuff. The site is a sort of image gallery for IRC users. I evaluated pg 7.4 on our development server and it looked just fine but performance with production loads seems to be quite poor. Most of performance problems are caused by nonsensical query plans but there's also some strange slowness that I can't locate. I have included the essential tables, columns and indexes that participate to queries in this mail. table rows ------- ---- users 50k image 400k comment 17M Table "public.users" Column | Type | -------------+-----------------------------+ uid | integer | nick | character varying(40) | status | character(1) | Indexes: "users_pkey" primary key, btree (uid) "users_upper_nick" unique, btree (upper((nick)::text)) "users_status" btree (status) Table "public.image" Column | Type | ----------------------+-----------------------------+ image_id | integer | uid | integer | status | character(1) | Indexes: "image_pkey" primary key, btree (image_id) "image_uid_status" btree (uid, status) Table "public.comment" Column | Type | ------------+-----------------------------+ comment_id | integer | image_id | integer | uid_sender | integer | comment | character varying(255) | Indexes: "comment_pkey" primary key, btree (comment_id) "comment_image_id" btree (image_id) "comment_uid_sender" btree (uid_sender) Planner estimates the cost of nested loop to be much higher than hash join _although_ the other side of join consists of only one row (which is found using a unique index). Well, bad estimation. Difference in runtime is huge. galleria=# explain analyze SELECT i.image_id, i.info, i.stamp, i.status, i.t_width, i.t_height, u.nick, u.uid FROM usersu INNER JOIN image i ON i.uid = u.uid WHERE upper(u.nick) = upper('Intuitio') AND (i.status = 'd' OR i.status = 'v')AND u.status = 'a' ORDER BY status, stamp DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=21690.23..21694.22 rows=1595 width=64) (actual time=2015.615..2015.637 rows=35 loops=1) Sort Key: i.status, i.stamp -> Hash Join (cost=907.20..21605.38 rows=1595 width=64) (actual time=891.400..2015.464 rows=35 loops=1) Hash Cond: ("outer".uid = "inner".uid) -> Seq Scan on image i (cost=0.00..18207.19 rows=330005 width=54) (actual time=0.012..1607.278 rows=341086 loops=1) Filter: ((status = 'd'::bpchar) OR (status = 'v'::bpchar)) -> Hash (cost=906.67..906.67 rows=213 width=14) (actual time=0.128..0.128 rows=0 loops=1) -> Index Scan using users_upper_nick on users u (cost=0.00..906.67 rows=213 width=14) (actual time=0.120..0.122rows=1 loops=1) Index Cond: (upper((nick)::text) = 'INTUITIO'::text) Filter: (status = 'a'::bpchar) Total runtime: 2015.756 ms galleria=# set enable_hashjoin = false; SET galleria=# explain analyze SELECT i.image_id, i.info, i.stamp, i.status, i.t_width, i.t_height, u.nick, u.uid FROM usersu INNER JOIN image i ON i.uid = u.uid WHERE upper(u.nick) = upper('Intuitio') AND (i.status = 'd' OR i.status = 'v')AND u.status = 'a' ORDER BY status, stamp DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=31090.72..31094.71 rows=1595 width=64) (actual time=5.240..5.267 rows=35 loops=1) Sort Key: i.status, i.stamp -> Nested Loop (cost=0.00..31005.87 rows=1595 width=64) (actual time=4.474..5.082 rows=35 loops=1) -> Index Scan using users_upper_nick on users u (cost=0.00..906.67 rows=213 width=14) (actual time=3.902..3.906rows=1 loops=1) Index Cond: (upper((nick)::text) = 'INTUITIO'::text) Filter: (status = 'a'::bpchar) -> Index Scan using image_uid_status on image i (cost=0.00..141.03 rows=23 width=54) (actual time=0.537..0.961rows=35 loops=1) Index Cond: (i.uid = "outer".uid) Filter: ((status = 'd'::bpchar) OR (status = 'v'::bpchar)) Total runtime: 5.479 ms (10 rows) Is there anything to do for this besides forcing hashjoin off? I think there were similar problems with 7.3 Now something specific to 7.4. The following query selects all comments written to user's image. It worked just fine with pg 7.3 but there seems to be a Materialize in a bit strange place. galleria=# explain SELECT s.nick, c.comment, c.private, c.admin, c.parsable, c.uid_sender, c.stamp, i.image_id, c.comment_idFROM users s, comment c, image i WHERE s.uid = c.uid_sender AND s.status = 'a' AND c.visible = 'y' AND c.image_id= i.image_id AND i.image_id = 184239 ORDER BY c.comment_id DESC; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Sort (cost=1338.43..1339.41 rows=392 width=92) Sort Key: c.comment_id -> Nested Loop (cost=1308.41..1321.54 rows=392 width=92) -> Index Scan using image_pkey on image i (cost=0.00..5.29 rows=2 width=4) Index Cond: (image_id = 184239) -> Materialize (cost=1308.41..1310.37 rows=196 width=92) -> Nested Loop (cost=0.00..1308.41 rows=196 width=92) -> Index Scan using comment_image_id on "comment" c (cost=0.00..60.68 rows=207 width=82) Index Cond: (184239 = image_id) Filter: (visible = 'y'::bpchar) -> Index Scan using users_pkey on users s (cost=0.00..6.02 rows=1 width=14) Index Cond: (s.uid = "outer".uid_sender) Filter: (status = 'a'::bpchar) However, when the joins are written in a different style the plan seems to be just right. galleria=# explain SELECT u.nick, c.comment, c.private, c.admin, c.parsable, c.uid_sender, c.stamp, i.image_id, c.comment_idFROM image i INNER JOIN comment c ON c.image_id = i.image_id INNER JOIN users u ON u.uid = c.uid_sender WHEREc.visible = 'y' AND c.image_id = i.image_id AND i.image_id = 184239 ORDER BY c.comment_id DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------- Sort (cost=17.76..17.76 rows=1 width=92) Sort Key: c.comment_id -> Nested Loop (cost=0.00..17.75 rows=1 width=92) -> Nested Loop (cost=0.00..11.72 rows=1 width=82) -> Index Scan using image_pkey on image i (cost=0.00..5.29 rows=2 width=4) Index Cond: (image_id = 184239) -> Index Scan using comment_image_id on "comment" c (cost=0.00..3.20 rows=1 width=82) Index Cond: ((c.image_id = "outer".image_id) AND (184239 = c.image_id)) Filter: (visible = 'y'::bpchar) -> Index Scan using users_pkey on users u (cost=0.00..6.01 rows=1 width=14) Index Cond: (u.uid = "outer".uid_sender) (11 rows) I happened to look into this query when one of them got stuck. Normally postgres performs tens of these in a second, but after shutting down the web server one them was still running. I gathered some statistics and the runtime was something like half an hour! It was causing pretty much disk io but quite little cpu load. Don't know what it was doing... galleria=# select * from pg_stat_activity where current_query != '<IDLE>'; datid | datname | procpid | usesysid | usename | current_query | query_start -------+----------+---------+----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------- 17144 | galleria | 27849 | 100 | galleria | SELECT s.nick, c.comment, c.private, c.admin, c.parsable, c.uid_sender,c.stamp, i.image_id, c.comment_id FROM users s, comment c, image i WHERE s.uid = c.uid_sender AND s.status ='a' AND c.visible = 'y' AND c.image_id = i.image_id AND i.image_id = 95406 | 2003-12-08 19:15:10.218859+02 (1 row) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command 29756 tuner 25 0 1212 1212 800 R 42.7 0.0 2:51.03 top 27849 postgres 15 0 783m 783m 780m D 6.2 20.0 0:55.86 postmaster procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 90912 551104 16180 3195612 0 0 2724 0 692 628 5 0 95 0 0 1 90912 548280 16192 3198464 0 0 2864 0 810 689 3 2 95 0 1 0 90912 545644 16192 3201068 0 0 2604 0 686 663 5 1 95 0 0 1 90912 542980 16192 3203712 0 0 2644 0 684 673 3 1 96 0 1 0 90912 540260 16220 3206480 0 0 2780 40 827 684 4 1 95 0 0 1 90912 537724 16224 3209032 0 0 2556 0 613 666 3 0 97 0 0 1 90912 534920 16224 3211840 0 0 2808 0 658 714 6 0 94 0 0 1 90912 532172 16224 3214596 0 0 2756 0 678 769 5 0 95 0 There's some other slowness with 7.4 as well. After running just fine for several hours pg starts to eat a LOT of cpu. Query plans are just like normally and pg_stat_activity shows nothing special. Disconnecting and reconnecting all clients seems to help (restarting web server). hw/sw configuration is something like this: Dual Xeon 2.8GHz with 4GB of memory. RAID is fast enuff. Linux 2.4 (Debian) Postgres is compiled using gcc 3.2, cflags: CFLAGS=-march=pentium4 -O3 -msse2 -mmmx |\__/| ( oo ) Kari Lavikka - tuner@bdb.fi - (050) 380 3808 __ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _ ""
pgsql-performance by date: