Very strange query difference between 7.3.6 and 7.4.6 (7.3.6 kicking 7.4.6 butt) - Mailing list pgsql-hackers

From Joshua D. Drake
Subject Very strange query difference between 7.3.6 and 7.4.6 (7.3.6 kicking 7.4.6 butt)
Date
Msg-id 423C6BD4.6010001@commandprompt.com
Whole thread Raw
Responses Re: Very strange query difference between 7.3.6 and 7.4.6 (7.3.6 kicking 7.4.6 butt)
List pgsql-hackers
Specs:

7.3.6 machine Dual Athlon MP
   2 GIG of ram,
   4 Drive IDE (3ware) RAID 10
   OS FC1 with 2.4 kernel

7.4.6 machine Dual Opteron MP (64bit PostgreSQL),
   2 Gig of Ram
   10 Drive RAID 10 with 128 Meg battery backed cache (3WARE).
   OS FC3 x86_64 with 2.6 kernel

Essentials parameters:

7.3.6:

shared_buffers = 8192
wal_buffers = 2048
sort_mem = 4096
checkpoint_segments = 25
effective_cache_size = 65536
random_page_cost = 1.5
statistics_target = 150

7.4.6
Same as above except 8192 sort mem and 50 checkpoint segments

Both are running fsync with open_sync

Both have been vacuumed and analyze repeatedly while trying to figure
this out.

Explain Analyzes:

7.3.6 (old)

 Aggregate  (cost=320.49..324.89 rows=7 width=338) (actual time=630.21..630.21 rows=1 loops=1)
   ->  Group  (cost=320.49..324.71 rows=70 width=338) (actual time=447.98..623.91 rows=8845 loops=1)
         ->  Sort  (cost=320.49..320.67 rows=70 width=338) (actual time=447.95..460.77 rows=8845 loops=1)
               Sort Key: p.post_id, t.topic_id, t.topic_title, t.topic_status, t.topic_replies, t.topic_time,
t.topic_type,t.topic_vote, t.topic_last_post_id, f.forum_name, f.forum_status, f.forum_id, f.auth_view, f.auth_read,
f.auth_post,f.auth_reply, f.auth_edit, f.auth_delete, f.auth_sticky, f.auth_announce, f.auth_pollcreate, f.auth_vote,
f.auth_attachments
               ->  Nested Loop  (cost=0.00..318.33 rows=70 width=338) (actual time=0.41..109.19 rows=8845 loops=1)
                     ->  Nested Loop  (cost=0.00..17.55 rows=1 width=330) (actual time=0.11..0.34 rows=1 loops=1)
                           ->  Nested Loop  (cost=0.00..6.75 rows=1 width=291) (actual time=0.08..0.12 rows=1 loops=1)
                                 ->  Index Scan using nuke_bbposts_pkey on nuke_bbposts p  (cost=0.00..3.21 rows=1
width=8)(actual time=0.04..0.05 rows=1 loops=1) 
                                       Index Cond: (post_id = 352888)
                                 ->  Index Scan using nuke_bbtopics_pkey on nuke_bbtopics t  (cost=0.00..3.53 rows=1
width=283)(actual time=0.02..0.05 rows=1 loops=1) 
                                       Index Cond: (t.topic_id = "outer".topic_id)
                           ->  Index Scan using nuke_bbforums_pkey on nuke_bbforums f  (cost=0.00..10.78 rows=1
width=39)(actual time=0.02..0.21 rows=1 loops=1) 
                                 Index Cond: (f.forum_id = "outer".forum_id)
                     ->  Index Scan using topic_id_nuke_bbposts_index on nuke_bbposts p2  (cost=0.00..299.33 rows=117
width=8)(actual time=0.29..31.68 rows=8845 loops=1) 
                           Index Cond: (p2.topic_id = "outer".topic_id)
                           Filter: (post_id <= 352888)
 Total runtime: 633.72 msec
(17 rows)

7.4.6 (new)

GroupAggregate  (cost=209.11..213.73 rows=71 width=328) (actual time=3701.837..3701.837 rows=1 loops=1)
   ->  Sort  (cost=209.11..209.29 rows=71 width=328) (actual time=2725.518..2728.590 rows=8845 loops=1)
         Sort Key: p.post_id, t.topic_id, t.topic_title, t.topic_status, t.topic_replies, t.topic_time, t.topic_type,
t.topic_vote,t.topic_last_post_id, f.forum_name, f.forum_status, f.forum_id, f.auth_view, f.auth_read, f.auth_post,
f.auth_reply,f.auth_edit, f.auth_delete, f.auth_sticky, f.auth_announce, f.auth_pollcreate, f.auth_vote,
f.auth_attachments
         ->  Nested Loop  (cost=0.00..206.93 rows=71 width=328) (actual time=0.621..190.112 rows=8845 loops=1)
               ->  Nested Loop  (cost=0.00..9.04 rows=1 width=328) (actual time=0.347..0.365 rows=1 loops=1)
                     ->  Nested Loop  (cost=0.00..6.04 rows=1 width=291) (actual time=0.298..0.307 rows=1 loops=1)
                           ->  Index Scan using nuke_bbposts_pkey on nuke_bbposts p  (cost=0.00..3.01 rows=1 width=8)
(actualtime=0.209..0.212 rows=1 loops=1) 
                                 Index Cond: (post_id = 352888)
                           ->  Index Scan using nuke_bbtopics_pkey on nuke_bbtopics t  (cost=0.00..3.01 rows=1
width=283)(actual time=0.069..0.073 rows=1 loops=1) 
                                 Index Cond: (t.topic_id = "outer".topic_id)
                     ->  Index Scan using nuke_bbforums_pkey on nuke_bbforums f  (cost=0.00..2.99 rows=1 width=39)
(actualtime=0.028..0.035 rows=1 loops=1) 
                           Index Cond: (f.forum_id = "outer".forum_id)
               ->  Index Scan using topic_id_nuke_bbposts_index on nuke_bbposts p2  (cost=0.00..196.46 rows=114
width=8)(actual time=0.256..95.501 rows=8845 loops=1) 
                     Index Cond: (p2.topic_id = "outer".topic_id)
                     Filter: (post_id <= 352888)
 Total runtime: 3728.376 ms
(16 rows)


If you look at the second line in each explain it is the sort that is
causing the grief. On 7.3.6 it only takes say 447ms (on an completely
unused machine), on the Opteron it takes 2725.518.

The query on the opteron even after a fresh restart of apache and
PostgreSQL takes at least 1100 ms.

Other 7.4.6 information:

[root@www contrib]# mpstat
Linux 2.6.10-1.770_FC3smp (www.radioparadise.com)       03/19/2005

10:05:59 AM  CPU   %user   %nice %system %iowait    %irq   %soft
%idle    intr/s
10:05:59 AM  all    4.48    0.00    0.79    2.96    0.01    0.04
91.72   1112.02


[root@www contrib]# vmstat
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
id wa
 1  0   3560 230524  31196 1599572    0    0    10   189    3     3  4
1 92  3


[root@www contrib]# iostat
Linux 2.6.10-1.770_FC3smp (www.radioparadise.com)       03/19/2005

avg-cpu:  %user   %nice    %sys %iowait   %idle
           4.48    0.00    0.84    2.96   91.72

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              29.72        38.28       757.12   49022777  969575882




The database is identical in the sense of schema (direct dump from 7.3.6
to 7.4.6).
Both were initalized with initdb --no-locale .

Any ideas?

Sincerely,

Joshua D. Drake




--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: GUC variable for setting number of local buffers
Next
From: Tom Lane
Date:
Subject: Re: Very strange query difference between 7.3.6 and 7.4.6 (7.3.6 kicking 7.4.6 butt)