Thread: strange performance regression between 7.4 and 8.1
Hello, I have noticed a strange performance regression and I'm at a loss as to what's happening. We have a fairly large database (~16 GB). The original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB of ram running Solaris on local scsi discs. The new server is a sun Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux (AMD64) on a 4 Gbps FC SAN volume. When we created the new database it was created from scratch rather than copying over the old one, however the table structure is almost identical (UTF8 on the new one vs. C on the old). The problem is queries are ~10x slower on the new hardware. I read several places that the SAN might be to blame, but testing with bonnie and dd indicates that the SAN is actually almost twice as fast as the scsi discs in the old sun server. I've tried adjusting just about every option in the postgres config file, but performance remains the same. Any ideas? Thanks, Alex
Alex Deucher wrote: > Hello, > > I have noticed a strange performance regression and I'm at a loss as > to what's happening. We have a fairly large database (~16 GB). The > original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB > of ram running Solaris on local scsi discs. The new server is a sun > Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux > (AMD64) on a 4 Gbps FC SAN volume. When we created the new database > it was created from scratch rather than copying over the old one, > however the table structure is almost identical (UTF8 on the new one > vs. C on the old). The problem is queries are ~10x slower on the new > hardware. I read several places that the SAN might be to blame, but > testing with bonnie and dd indicates that the SAN is actually almost > twice as fast as the scsi discs in the old sun server. I've tried > adjusting just about every option in the postgres config file, but > performance remains the same. Any ideas? Vacuum? Analayze? default_statistics_target? How many shared_buffers? effective_cache_size? work_mem? Sincerely, Joshua D. Drake > > Thanks, > > Alex > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Thu, 1 Mar 2007, Joshua D. Drake wrote: > Alex Deucher wrote: >> Hello, >> >> I have noticed a strange performance regression and I'm at a loss as >> to what's happening. We have a fairly large database (~16 GB). The >> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB >> of ram running Solaris on local scsi discs. The new server is a sun >> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux >> (AMD64) on a 4 Gbps FC SAN volume. When we created the new database >> it was created from scratch rather than copying over the old one, >> however the table structure is almost identical (UTF8 on the new one >> vs. C on the old). The problem is queries are ~10x slower on the new >> hardware. I read several places that the SAN might be to blame, but >> testing with bonnie and dd indicates that the SAN is actually almost >> twice as fast as the scsi discs in the old sun server. I've tried >> adjusting just about every option in the postgres config file, but >> performance remains the same. Any ideas? > > Vacuum? Analayze? default_statistics_target? How many shared_buffers? > effective_cache_size? work_mem? Also, an explain analyze from both the 7.4 and 8.1 systems with one of the 10x slower queries would probably be handy. What do you mean by "created from scratch rather than copying over the old one"? How did you put the data in? Did you run analyze after loading it? Is autovacuum enabled and if so, what are the thresholds? -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On 3/1/07, Joshua D. Drake <jd@commandprompt.com> wrote: > Alex Deucher wrote: > > Hello, > > > > I have noticed a strange performance regression and I'm at a loss as > > to what's happening. We have a fairly large database (~16 GB). The > > original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB > > of ram running Solaris on local scsi discs. The new server is a sun > > Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux > > (AMD64) on a 4 Gbps FC SAN volume. When we created the new database > > it was created from scratch rather than copying over the old one, > > however the table structure is almost identical (UTF8 on the new one > > vs. C on the old). The problem is queries are ~10x slower on the new > > hardware. I read several places that the SAN might be to blame, but > > testing with bonnie and dd indicates that the SAN is actually almost > > twice as fast as the scsi discs in the old sun server. I've tried > > adjusting just about every option in the postgres config file, but > > performance remains the same. Any ideas? > > Vacuum? Analayze? default_statistics_target? How many shared_buffers? > effective_cache_size? work_mem? > I'm running the autovacuum process on the 8.1 server. vacuuming on the old server was done manually. default_statistics_target and effective_cache_size are set to the the defaults on both. postgres 7.4 server: # - Memory - shared_buffers = 82000 # 1000 min 16, at least max_connections*2, 8KB each sort_mem = 8000 # 1024 min 64, size in KB vacuum_mem = 32000 # 8192 min 1024, size in KB # - Free Space Map - #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 postgres 8.1 server: # - Memory - shared_buffers = 100000 # min 16 or max_connections*2, 8KB each temp_buffers = 2000 #1000 # min 100, 8KB each max_prepared_transactions = 100 #5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 10000 #1024 # min 64, size in KB maintenance_work_mem = 524288 #16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB I've also tried using the same settings from the old server on the new one; same performance issues. Thanks, Alex > Sincerely, > > Joshua D. Drake > > > > > > Thanks, > > > > Alex > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > > > > -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 > Providing the most comprehensive PostgreSQL solutions since 1997 > http://www.commandprompt.com/ > > Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate > PostgreSQL Replication: http://www.commandprompt.com/products/ > >
On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Joshua D. Drake wrote: > > > Alex Deucher wrote: > >> Hello, > >> > >> I have noticed a strange performance regression and I'm at a loss as > >> to what's happening. We have a fairly large database (~16 GB). The > >> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB > >> of ram running Solaris on local scsi discs. The new server is a sun > >> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux > >> (AMD64) on a 4 Gbps FC SAN volume. When we created the new database > >> it was created from scratch rather than copying over the old one, > >> however the table structure is almost identical (UTF8 on the new one > >> vs. C on the old). The problem is queries are ~10x slower on the new > >> hardware. I read several places that the SAN might be to blame, but > >> testing with bonnie and dd indicates that the SAN is actually almost > >> twice as fast as the scsi discs in the old sun server. I've tried > >> adjusting just about every option in the postgres config file, but > >> performance remains the same. Any ideas? > > > > Vacuum? Analayze? default_statistics_target? How many shared_buffers? > > effective_cache_size? work_mem? > > Also, an explain analyze from both the 7.4 and 8.1 systems with one of the > 10x slower queries would probably be handy. > I'll run some and get back to you. > What do you mean by "created from scratch rather than copying over the old > one"? How did you put the data in? Did you run analyze after loading it? > Is autovacuum enabled and if so, what are the thresholds? Both the databases were originally created from xml files. We just re-created the new one from the xml rather than copying the old database over. I didn't manually run analyze on it, but we are running the autovacuum process: autovacuum = on #off # enable autovacuum subprocess? autovacuum_naptime = 360 #60 # time between autovacuum runs, in secs autovacuum_vacuum_threshold = 10000 #1000 # min # of tuple updates before # vacuum autovacuum_analyze_threshold = 5000 #500 # min # of tuple updates before Thanks, Alex > > -- > Jeff Frost, Owner <jeff@frostconsultingllc.com> > Frost Consulting, LLC http://www.frostconsultingllc.com/ > Phone: 650-780-7908 FAX: 650-649-1954 >
On Thu, 1 Mar 2007, Alex Deucher wrote: > On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: >> On Thu, 1 Mar 2007, Joshua D. Drake wrote: >> >> > Alex Deucher wrote: >> >> Hello, >> >> >> >> I have noticed a strange performance regression and I'm at a loss as >> >> to what's happening. We have a fairly large database (~16 GB). The >> >> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB >> >> of ram running Solaris on local scsi discs. The new server is a sun >> >> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux >> >> (AMD64) on a 4 Gbps FC SAN volume. When we created the new database >> >> it was created from scratch rather than copying over the old one, >> >> however the table structure is almost identical (UTF8 on the new one >> >> vs. C on the old). The problem is queries are ~10x slower on the new >> >> hardware. I read several places that the SAN might be to blame, but >> >> testing with bonnie and dd indicates that the SAN is actually almost >> >> twice as fast as the scsi discs in the old sun server. I've tried >> >> adjusting just about every option in the postgres config file, but >> >> performance remains the same. Any ideas? >> > >> > Vacuum? Analayze? default_statistics_target? How many shared_buffers? >> > effective_cache_size? work_mem? >> >> Also, an explain analyze from both the 7.4 and 8.1 systems with one of the >> 10x slower queries would probably be handy. >> > > I'll run some and get back to you. > >> What do you mean by "created from scratch rather than copying over the old >> one"? How did you put the data in? Did you run analyze after loading it? >> Is autovacuum enabled and if so, what are the thresholds? > > Both the databases were originally created from xml files. We just > re-created the new one from the xml rather than copying the old > database over. I didn't manually run analyze on it, but we are > running the autovacuum process: You should probably manually run analyze and see if that resolves your problem. > > autovacuum = on #off # enable autovacuum subprocess? > autovacuum_naptime = 360 #60 # time between autovacuum runs, in > secs > autovacuum_vacuum_threshold = 10000 #1000 # min # of tuple updates > before > # vacuum > autovacuum_analyze_threshold = 5000 #500 # min # of tuple updates > before Most people make autovacuum more aggressive and not less aggressive. In fact, the new defaults in 8.2 are: #autovacuum_vacuum_threshold = 500 # min # of tuple updates before # vacuum #autovacuum_analyze_threshold = 250 # min # of tuple updates before # analyze #autovacuum_vacuum_scale_factor = 0.2 # fraction of rel size before # vacuum #autovacuum_analyze_scale_factor = 0.1 # fraction of rel size before I'd recommend trying those, otherwise you might not vacuum enough. It'll be interesting to see the explain analyze output after you've run analyze by hand. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Thu, 1 Mar 2007, Alex Deucher wrote: >> Vacuum? Analayze? default_statistics_target? How many shared_buffers? >> effective_cache_size? work_mem? >> > > I'm running the autovacuum process on the 8.1 server. vacuuming on > the old server was done manually. > > default_statistics_target and effective_cache_size are set to the the > defaults on both. > > postgres 7.4 server: > # - Memory - > shared_buffers = 82000 # 1000 min 16, at least > max_connections*2, 8KB each > sort_mem = 8000 # 1024 min 64, size in KB > vacuum_mem = 32000 # 8192 min 1024, size in KB > # - Free Space Map - > #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each > #max_fsm_relations = 1000 # min 100, ~50 bytes each > # - Kernel Resource Usage - > #max_files_per_process = 1000 # min 25 > > postgres 8.1 server: > # - Memory - > shared_buffers = 100000 # min 16 or max_connections*2, 8KB > each > temp_buffers = 2000 #1000 # min 100, 8KB each > max_prepared_transactions = 100 #5 # can be 0 or more > # note: increasing max_prepared_transactions costs ~600 bytes of shared > memory > # per transaction slot, plus lock space (see max_locks_per_transaction). > work_mem = 10000 #1024 # min 64, size in KB > maintenance_work_mem = 524288 #16384 # min 1024, size in KB > #max_stack_depth = 2048 # min 100, size in KB > > I've also tried using the same settings from the old server on the new > one; same performance issues. > If this is a linux system, could you give us the output of the 'free' command? Postgresql might be choosing a bad plan because your effective_cache_size is way off (it's the default now right?). Also, what was the block read/write speed of the SAN from your bonnie tests? Probably want to tune random_page_cost as well if it's also at the default. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Alex Deucher wrote: > > >> Vacuum? Analayze? default_statistics_target? How many shared_buffers? > >> effective_cache_size? work_mem? > >> > > > > I'm running the autovacuum process on the 8.1 server. vacuuming on > > the old server was done manually. > > > > default_statistics_target and effective_cache_size are set to the the > > defaults on both. > > > > postgres 7.4 server: > > # - Memory - > > shared_buffers = 82000 # 1000 min 16, at least > > max_connections*2, 8KB each > > sort_mem = 8000 # 1024 min 64, size in KB > > vacuum_mem = 32000 # 8192 min 1024, size in KB > > # - Free Space Map - > > #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each > > #max_fsm_relations = 1000 # min 100, ~50 bytes each > > # - Kernel Resource Usage - > > #max_files_per_process = 1000 # min 25 > > > > postgres 8.1 server: > > # - Memory - > > shared_buffers = 100000 # min 16 or max_connections*2, 8KB > > each > > temp_buffers = 2000 #1000 # min 100, 8KB each > > max_prepared_transactions = 100 #5 # can be 0 or more > > # note: increasing max_prepared_transactions costs ~600 bytes of shared > > memory > > # per transaction slot, plus lock space (see max_locks_per_transaction). > > work_mem = 10000 #1024 # min 64, size in KB > > maintenance_work_mem = 524288 #16384 # min 1024, size in KB > > #max_stack_depth = 2048 # min 100, size in KB > > > > I've also tried using the same settings from the old server on the new > > one; same performance issues. > > > > If this is a linux system, could you give us the output of the 'free' command? total used free shared buffers cached Mem: 8059852 8042868 16984 0 228 7888648 -/+ buffers/cache: 153992 7905860 Swap: 15631224 2164 15629060 > Postgresql might be choosing a bad plan because your effective_cache_size is > way off (it's the default now right?). Also, what was the block read/write yes it's set to the default. > speed of the SAN from your bonnie tests? Probably want to tune > random_page_cost as well if it's also at the default. > ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 397.7 0 effective_cache_size is the default. Alex
On Thu, 1 Mar 2007, Alex Deucher wrote: > On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: >> On Thu, 1 Mar 2007, Alex Deucher wrote: >> >> >> Vacuum? Analayze? default_statistics_target? How many shared_buffers? >> >> effective_cache_size? work_mem? >> >> >> > >> > I'm running the autovacuum process on the 8.1 server. vacuuming on >> > the old server was done manually. >> > >> > default_statistics_target and effective_cache_size are set to the the >> > defaults on both. >> > >> > postgres 7.4 server: >> > # - Memory - >> > shared_buffers = 82000 # 1000 min 16, at least >> > max_connections*2, 8KB each >> > sort_mem = 8000 # 1024 min 64, size in KB >> > vacuum_mem = 32000 # 8192 min 1024, size in KB >> > # - Free Space Map - >> > #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each >> > #max_fsm_relations = 1000 # min 100, ~50 bytes each >> > # - Kernel Resource Usage - >> > #max_files_per_process = 1000 # min 25 >> > >> > postgres 8.1 server: >> > # - Memory - >> > shared_buffers = 100000 # min 16 or max_connections*2, >> 8KB >> > each >> > temp_buffers = 2000 #1000 # min 100, 8KB each >> > max_prepared_transactions = 100 #5 # can be 0 or more >> > # note: increasing max_prepared_transactions costs ~600 bytes of shared >> > memory >> > # per transaction slot, plus lock space (see max_locks_per_transaction). >> > work_mem = 10000 #1024 # min 64, size in KB >> > maintenance_work_mem = 524288 #16384 # min 1024, size in KB >> > #max_stack_depth = 2048 # min 100, size in KB >> > >> > I've also tried using the same settings from the old server on the new >> > one; same performance issues. >> > >> >> If this is a linux system, could you give us the output of the 'free' >> command? > > total used free shared buffers cached > Mem: 8059852 8042868 16984 0 228 7888648 > -/+ buffers/cache: 153992 7905860 > Swap: 15631224 2164 15629060 So, I would set effective_cache_size = 988232 (7905860/8). > >> Postgresql might be choosing a bad plan because your effective_cache_size >> is >> way off (it's the default now right?). Also, what was the block read/write > > yes it's set to the default. > >> speed of the SAN from your bonnie tests? Probably want to tune >> random_page_cost as well if it's also at the default. >> > > ------Sequential Output------ --Sequential Input- > --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > %CP > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 397.7 > 0 > So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that write speed is about the same as my single SATA drive write speed on my workstation, so not that great. The read speed is decent, though and with that sort of read performance, you might want to lower random_page_cost to something like 2.5 or 2 so the planner will tend to prefer index scans. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Alex Deucher wrote: > > > On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > >> On Thu, 1 Mar 2007, Alex Deucher wrote: > >> > >> >> Vacuum? Analayze? default_statistics_target? How many shared_buffers? > >> >> effective_cache_size? work_mem? > >> >> > >> > > >> > I'm running the autovacuum process on the 8.1 server. vacuuming on > >> > the old server was done manually. > >> > > >> > default_statistics_target and effective_cache_size are set to the the > >> > defaults on both. > >> > > >> > postgres 7.4 server: > >> > # - Memory - > >> > shared_buffers = 82000 # 1000 min 16, at least > >> > max_connections*2, 8KB each > >> > sort_mem = 8000 # 1024 min 64, size in KB > >> > vacuum_mem = 32000 # 8192 min 1024, size in KB > >> > # - Free Space Map - > >> > #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each > >> > #max_fsm_relations = 1000 # min 100, ~50 bytes each > >> > # - Kernel Resource Usage - > >> > #max_files_per_process = 1000 # min 25 > >> > > >> > postgres 8.1 server: > >> > # - Memory - > >> > shared_buffers = 100000 # min 16 or max_connections*2, > >> 8KB > >> > each > >> > temp_buffers = 2000 #1000 # min 100, 8KB each > >> > max_prepared_transactions = 100 #5 # can be 0 or more > >> > # note: increasing max_prepared_transactions costs ~600 bytes of shared > >> > memory > >> > # per transaction slot, plus lock space (see max_locks_per_transaction). > >> > work_mem = 10000 #1024 # min 64, size in KB > >> > maintenance_work_mem = 524288 #16384 # min 1024, size in KB > >> > #max_stack_depth = 2048 # min 100, size in KB > >> > > >> > I've also tried using the same settings from the old server on the new > >> > one; same performance issues. > >> > > >> > >> If this is a linux system, could you give us the output of the 'free' > >> command? > > > > total used free shared buffers cached > > Mem: 8059852 8042868 16984 0 228 7888648 > > -/+ buffers/cache: 153992 7905860 > > Swap: 15631224 2164 15629060 > > So, I would set effective_cache_size = 988232 (7905860/8). > > > > >> Postgresql might be choosing a bad plan because your effective_cache_size > >> is > >> way off (it's the default now right?). Also, what was the block read/write > > > > yes it's set to the default. > > > >> speed of the SAN from your bonnie tests? Probably want to tune > >> random_page_cost as well if it's also at the default. > >> > > > > ------Sequential Output------ --Sequential Input- > > --Random- > > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > > --Seeks-- > > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > > %CP > > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 397.7 > > 0 > > > > So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that write > speed is about the same as my single SATA drive write speed on my workstation, > so not that great. The read speed is decent, though and with that sort of > read performance, you might want to lower random_page_cost to something like > 2.5 or 2 so the planner will tend to prefer index scans. > Right, but the old box was getting ~45MBps on both reads and writes, so it's an improvement for me :) Thanks for the advice, I'll let you know how it goes. Alex
On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Joshua D. Drake wrote: > > > Alex Deucher wrote: > >> Hello, > >> > >> I have noticed a strange performance regression and I'm at a loss as > >> to what's happening. We have a fairly large database (~16 GB). The > >> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB > >> of ram running Solaris on local scsi discs. The new server is a sun > >> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux > >> (AMD64) on a 4 Gbps FC SAN volume. When we created the new database > >> it was created from scratch rather than copying over the old one, > >> however the table structure is almost identical (UTF8 on the new one > >> vs. C on the old). The problem is queries are ~10x slower on the new > >> hardware. I read several places that the SAN might be to blame, but > >> testing with bonnie and dd indicates that the SAN is actually almost > >> twice as fast as the scsi discs in the old sun server. I've tried > >> adjusting just about every option in the postgres config file, but > >> performance remains the same. Any ideas? > > > > Vacuum? Analayze? default_statistics_target? How many shared_buffers? > > effective_cache_size? work_mem? > > Also, an explain analyze from both the 7.4 and 8.1 systems with one of the > 10x slower queries would probably be handy. here are some examples. Analyze is still running on the new db, I'll post results when that is done. Mostly what our apps do is prepared row selects from different tables: select c1,c2,c3,c4,c5 from t1 where c1='XXX'; old server: db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c2_index on t1 (cost=0.00..166.89 rows=42 width=26) (actual time=5.722..5.809 rows=2 loops=1) Index Cond: ((c2)::text = '6258261'::text) Total runtime: 5.912 ms (3 rows) db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c1_key on t1 (cost=0.00..286.08 rows=72 width=26) (actual time=12.423..12.475 rows=12 loops=1) Index Cond: ((c1)::text = '6258261'::text) Total runtime: 12.538 ms (3 rows) new server: db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c2_index on t1 (cost=0.00..37.63 rows=11 width=26) (actual time=33.461..51.377 rows=2 loops=1) Index Cond: ((c2)::text = '6258261'::text) Total runtime: 51.419 ms (3 rows) db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c1_index on t1 (cost=0.00..630.45 rows=2907 width=26) (actual time=45.733..46.271 rows=12 loops=1) Index Cond: ((c1)::text = '6258261'::text) Total runtime: 46.325 ms (3 rows) Alex
On Thu, 1 Mar 2007, Alex Deucher wrote: > here are some examples. Analyze is still running on the new db, I'll > post results when that is done. Mostly what our apps do is prepared > row selects from different tables: > select c1,c2,c3,c4,c5 from t1 where c1='XXX'; > > old server: > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------- > Index Scan using t1_c2_index on t1 (cost=0.00..166.89 rows=42 > width=26) (actual time=5.722..5.809 rows=2 loops=1) > Index Cond: ((c2)::text = '6258261'::text) > Total runtime: 5.912 ms > (3 rows) > > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Index Scan using t1_c1_key on t1 (cost=0.00..286.08 rows=72 > width=26) (actual time=12.423..12.475 rows=12 loops=1) > Index Cond: ((c1)::text = '6258261'::text) > Total runtime: 12.538 ms > (3 rows) > > > new server: > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Index Scan using t1_c2_index on t1 (cost=0.00..37.63 rows=11 > width=26) (actual time=33.461..51.377 rows=2 loops=1) > Index Cond: ((c2)::text = '6258261'::text) > Total runtime: 51.419 ms > (3 rows) > > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------- > Index Scan using t1_c1_index on t1 (cost=0.00..630.45 rows=2907 > width=26) (actual time=45.733..46.271 rows=12 loops=1) > Index Cond: ((c1)::text = '6258261'::text) > Total runtime: 46.325 ms > (3 rows) Notice the huge disparity here betwen the expected number of rows (2907) and the actual rows? That's indicative of needing to run analyze. The time is only about 4x the 7.4 runtime and that's with the analyze running merrily along in the background. It's probably not as bad off as you think. At least this query isn't 10x. :-) Run these again for us after analyze is complete. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Thu, 1 Mar 2007, Alex Deucher wrote: >> >> Postgresql might be choosing a bad plan because your >> effective_cache_size >> >> is >> >> way off (it's the default now right?). Also, what was the block >> read/write >> > >> > yes it's set to the default. >> > >> >> speed of the SAN from your bonnie tests? Probably want to tune >> >> random_page_cost as well if it's also at the default. >> >> >> > >> > ------Sequential Output------ --Sequential Input- >> > --Random- >> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> > --Seeks-- >> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >> /sec >> > %CP >> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 >> 397.7 >> > 0 >> > >> >> So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that write >> speed is about the same as my single SATA drive write speed on my >> workstation, >> so not that great. The read speed is decent, though and with that sort of >> read performance, you might want to lower random_page_cost to something >> like >> 2.5 or 2 so the planner will tend to prefer index scans. >> > > Right, but the old box was getting ~45MBps on both reads and writes, > so it's an improvement for me :) Thanks for the advice, I'll let you > know how it goes. Do you think that is because you have a different interface between you and the SAN? ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do that and costs quite a bit less than a SAN. Is the SAN being shared between the database servers and other servers? Maybe it was just random timing that gave you the poor write performance on the old server which might be also yielding occassional poor performance on the new one. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
At 07:36 PM 3/1/2007, Jeff Frost wrote: >On Thu, 1 Mar 2007, Alex Deucher wrote: > >>> >> Postgresql might be choosing a bad plan because your >>> effective_cache_size >>> >> is >>> >> way off (it's the default now right?). Also, what was the >>> block read/write >>> > >>> > yes it's set to the default. >>> > >>> >> speed of the SAN from your bonnie tests? Probably want to tune >>> >> random_page_cost as well if it's also at the default. >>> >> >>> > >>> > ------Sequential Output------ --Sequential Input- >>> > --Random- >>> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >>> > --Seeks-- >>> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP >>> K/sec %CP /sec >>> > %CP >>> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 >>> 145504 13 397.7 >>> > 0 >>> > >>>So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that write >>>speed is about the same as my single SATA drive write speed on my >>>workstation, >>>so not that great. The read speed is decent, though and with that sort of >>>read performance, you might want to lower random_page_cost to something like >>>2.5 or 2 so the planner will tend to prefer index scans. >> >>Right, but the old box was getting ~45MBps on both reads and writes, >>so it's an improvement for me :) Thanks for the advice, I'll let you >>know how it goes. > >Do you think that is because you have a different interface between >you and the SAN? ~45MBps is pretty slow - your average 7200RPM >ATA133 drive can do that and costs quite a bit less than a SAN. > >Is the SAN being shared between the database servers and other >servers? Maybe it was just random timing that gave you the poor >write performance on the old server which might be also yielding >occassional poor performance on the new one. Remember that pg, even pg 8.2.3, has a known history of very poor insert speed (see comments on this point by Josh Berkus, Luke Lonergan, etc) For some reason, the code changes that have resulted in dramatic improvements in pg's read speed have not had nearly the same efficacy for writes. Bottom line: pg presently has a fairly low and fairly harsh upper bound on write performance. What exactly that bound is has been the subject of some discussion, but IIUC the fact of its existence is well established. Various proposals for improving the situation exist, I've even made some of them, but AFAIK this is currently considered one of the "tough pg problems". Cheers, Ron Peacetree
On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Alex Deucher wrote: > > >> >> Postgresql might be choosing a bad plan because your > >> effective_cache_size > >> >> is > >> >> way off (it's the default now right?). Also, what was the block > >> read/write > >> > > >> > yes it's set to the default. > >> > > >> >> speed of the SAN from your bonnie tests? Probably want to tune > >> >> random_page_cost as well if it's also at the default. > >> >> > >> > > >> > ------Sequential Output------ --Sequential Input- > >> > --Random- > >> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > >> > --Seeks-- > >> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > >> /sec > >> > %CP > >> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 > >> 397.7 > >> > 0 > >> > > >> > >> So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that write > >> speed is about the same as my single SATA drive write speed on my > >> workstation, > >> so not that great. The read speed is decent, though and with that sort of > >> read performance, you might want to lower random_page_cost to something > >> like > >> 2.5 or 2 so the planner will tend to prefer index scans. > >> > > > > Right, but the old box was getting ~45MBps on both reads and writes, > > so it's an improvement for me :) Thanks for the advice, I'll let you > > know how it goes. > > Do you think that is because you have a different interface between you and > the SAN? ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do > that and costs quite a bit less than a SAN. > > Is the SAN being shared between the database servers and other servers? Maybe > it was just random timing that gave you the poor write performance on the old > server which might be also yielding occassional poor performance on the new > one. > The direct attached scsi discs on the old database server we getting 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. We have 4 servers on the SAN each with it's own 4 GBps FC link via an FC switch. I'll try and re-run the numbers when the servers are idle this weekend. Alex
\ >> Is the SAN being shared between the database servers and other >> servers? Maybe >> it was just random timing that gave you the poor write performance on >> the old >> server which might be also yielding occassional poor performance on >> the new >> one. >> > > The direct attached scsi discs on the old database server we getting > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. How many spindles you got in that SAN? We > have 4 servers on the SAN each with it's own 4 GBps FC link via an FC > switch. I'll try and re-run the numbers when the servers are idle > this weekend. > > Alex > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On 3/1/07, Joshua D. Drake <jd@commandprompt.com> wrote: > \ > >> Is the SAN being shared between the database servers and other > >> servers? Maybe > >> it was just random timing that gave you the poor write performance on > >> the old > >> server which might be also yielding occassional poor performance on > >> the new > >> one. > >> > > > > The direct attached scsi discs on the old database server we getting > > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. > > How many spindles you got in that SAN? 105 IIRC. Alex
On Thu, 1 Mar 2007, Alex Deucher wrote: > On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: >> On Thu, 1 Mar 2007, Alex Deucher wrote: >> >> >> >> Postgresql might be choosing a bad plan because your >> >> effective_cache_size >> >> >> is >> >> >> way off (it's the default now right?). Also, what was the block >> >> read/write >> >> > >> >> > yes it's set to the default. >> >> > >> >> >> speed of the SAN from your bonnie tests? Probably want to tune >> >> >> random_page_cost as well if it's also at the default. >> >> >> >> >> > >> >> > ------Sequential Output------ --Sequential Input- >> >> > --Random- >> >> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> >> > --Seeks-- >> >> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >> >> /sec >> >> > %CP >> >> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 >> >> 397.7 >> >> > 0 >> >> > >> >> >> >> So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that >> write >> >> speed is about the same as my single SATA drive write speed on my >> >> workstation, >> >> so not that great. The read speed is decent, though and with that sort >> of >> >> read performance, you might want to lower random_page_cost to something >> >> like >> >> 2.5 or 2 so the planner will tend to prefer index scans. >> >> >> > >> > Right, but the old box was getting ~45MBps on both reads and writes, >> > so it's an improvement for me :) Thanks for the advice, I'll let you >> > know how it goes. >> >> Do you think that is because you have a different interface between you and >> the SAN? ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do >> that and costs quite a bit less than a SAN. >> >> Is the SAN being shared between the database servers and other servers? >> Maybe >> it was just random timing that gave you the poor write performance on the >> old >> server which might be also yielding occassional poor performance on the new >> one. >> > > The direct attached scsi discs on the old database server we getting > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. We > have 4 servers on the SAN each with it's own 4 GBps FC link via an FC > switch. I'll try and re-run the numbers when the servers are idle > this weekend. Sorry, I thought the old server was also attached to the SAN. My fault for not hanging onto the entire email thread. I think you're mixing and matching your capitol and lower case Bs in your sentence above though. :-) I suspect what you really mean is The SAN got 62/145MBps (megabytes/sec) and teh FC link is 4Gbps (gigabits/sec) or 500MBps. Is that correct? If so, and seeing that you think there are 105 spindles on the SAN, I'd say you're either maxxing out the switch fabric of the SAN with your servers or you have a really poorly performing SAN in general, or you just misunderstood the . As a comparison With 8 WD Raptors configured in a RAID10 with normal ext3 I get about 160MB/s write and 305MB/s read performance. Hopefully the SAN has lots of other super nifty features that make up for the poor performance. :-( -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Alex Deucher wrote: > > > On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > >> On Thu, 1 Mar 2007, Alex Deucher wrote: > >> > >> >> >> Postgresql might be choosing a bad plan because your > >> >> effective_cache_size > >> >> >> is > >> >> >> way off (it's the default now right?). Also, what was the block > >> >> read/write > >> >> > > >> >> > yes it's set to the default. > >> >> > > >> >> >> speed of the SAN from your bonnie tests? Probably want to tune > >> >> >> random_page_cost as well if it's also at the default. > >> >> >> > >> >> > > >> >> > ------Sequential Output------ --Sequential Input- > >> >> > --Random- > >> >> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > >> >> > --Seeks-- > >> >> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > >> >> /sec > >> >> > %CP > >> >> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 > >> >> 397.7 > >> >> > 0 > >> >> > > >> >> > >> >> So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that > >> write > >> >> speed is about the same as my single SATA drive write speed on my > >> >> workstation, > >> >> so not that great. The read speed is decent, though and with that sort > >> of > >> >> read performance, you might want to lower random_page_cost to something > >> >> like > >> >> 2.5 or 2 so the planner will tend to prefer index scans. > >> >> > >> > > >> > Right, but the old box was getting ~45MBps on both reads and writes, > >> > so it's an improvement for me :) Thanks for the advice, I'll let you > >> > know how it goes. > >> > >> Do you think that is because you have a different interface between you and > >> the SAN? ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do > >> that and costs quite a bit less than a SAN. > >> > >> Is the SAN being shared between the database servers and other servers? > >> Maybe > >> it was just random timing that gave you the poor write performance on the > >> old > >> server which might be also yielding occassional poor performance on the new > >> one. > >> > > > > The direct attached scsi discs on the old database server we getting > > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. We > > have 4 servers on the SAN each with it's own 4 GBps FC link via an FC > > switch. I'll try and re-run the numbers when the servers are idle > > this weekend. > > Sorry, I thought the old server was also attached to the SAN. My fault for > not hanging onto the entire email thread. > > I think you're mixing and matching your capitol and lower case Bs in your > sentence above though. :-) whoops :) > > I suspect what you really mean is The SAN got 62/145MBps (megabytes/sec) and > teh FC link is 4Gbps (gigabits/sec) or 500MBps. Is that correct? If so, and > seeing that you think there are 105 spindles on the SAN, I'd say you're either > maxxing out the switch fabric of the SAN with your servers or you have a > really poorly performing SAN in general, or you just misunderstood the . > > As a comparison With 8 WD Raptors configured in a RAID10 with normal ext3 I > get about 160MB/s write and 305MB/s read performance. Hopefully the SAN has > lots of other super nifty features that make up for the poor performance. :-( > It's big and reliable (and compared to lots of others, relatively inexpensive) which is why we bought it. We bought it mostly as a huge file store. The RAID groups on the SAN were set up for maximum capacity rather than for performance. Using it for the databases just came up recently. Alex
Alex Deucher wrote: > On 3/1/07, Joshua D. Drake <jd@commandprompt.com> wrote: >> \ >> >> Is the SAN being shared between the database servers and other >> >> servers? Maybe >> >> it was just random timing that gave you the poor write performance on >> >> the old >> >> server which might be also yielding occassional poor performance on >> >> the new >> >> one. >> >> >> > >> > The direct attached scsi discs on the old database server we getting >> > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. >> >> How many spindles you got in that SAN? > > 105 IIRC. You have 105 spindles are you are only get 62megs on writes? That seems about half what you should be getting. (at least). Joshua D. Drake > > Alex > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On 3/1/07, Joshua D. Drake <jd@commandprompt.com> wrote: > Alex Deucher wrote: > > On 3/1/07, Joshua D. Drake <jd@commandprompt.com> wrote: > >> \ > >> >> Is the SAN being shared between the database servers and other > >> >> servers? Maybe > >> >> it was just random timing that gave you the poor write performance on > >> >> the old > >> >> server which might be also yielding occassional poor performance on > >> >> the new > >> >> one. > >> >> > >> > > >> > The direct attached scsi discs on the old database server we getting > >> > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. > >> > >> How many spindles you got in that SAN? > > > > 105 IIRC. > > You have 105 spindles are you are only get 62megs on writes? That seems > about half what you should be getting. (at least). > Take the numbers with grain of salt. They are by no means a thorough evaluation. I just ran bonnie a couple times to get a rough reference point. I can do a more thorough analysis. Alex > Joshua D. Drake > > > > > > Alex > > >
* Alex Deucher: > I have noticed a strange performance regression and I'm at a loss as > to what's happening. We have a fairly large database (~16 GB). Sorry for asking, but is this a typo? Do you mean 16 *TB* instead of 16 *GB*? If it's really 16 GB, you should check if it's cheaper to buy more RAM than to fiddle with the existing infrastructure. > however the table structure is almost identical (UTF8 on the new one > vs. C on the old). Locale settings make a huge difference for sorting and LIKE queries. We usually use the C locale and SQL_ASCII encoding, mostly for performance reasons. (Proper UTF-8 can be enforced through constraints if necessary.) -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Florian Weimer wrote: > * Alex Deucher: > > >> I have noticed a strange performance regression and I'm at a loss as >> to what's happening. We have a fairly large database (~16 GB). >> > > Sorry for asking, but is this a typo? Do you mean 16 *TB* instead of > 16 *GB*? > > If it's really 16 GB, you should check if it's cheaper to buy more RAM > than to fiddle with the existing infrastructure. > This brings me to a related question: Do I need to specifically configure something to take advantage of such increase of RAM? In particular, is the amount of things that postgres can do with RAM limited by the amount of shared_buffers or some other parameter? Should shared_buffers be a fixed fraction of the total amount of physical RAM, or should it be the total amount minus half a gigabyte or so? As an example, if one upgrades a host from 1GB to 4GB, what would be the right thing to do in the configuration, assuming 8.1 or 8.2? (at least what would be the critical aspects?) Thanks, Carlos --
On 3/2/07, Florian Weimer <fweimer@bfk.de> wrote: > * Alex Deucher: > > > I have noticed a strange performance regression and I'm at a loss as > > to what's happening. We have a fairly large database (~16 GB). > > Sorry for asking, but is this a typo? Do you mean 16 *TB* instead of > 16 *GB*? > > If it's really 16 GB, you should check if it's cheaper to buy more RAM > than to fiddle with the existing infrastructure. > Yes, 16 GB. I'd rather not shell out for more ram, if I'm not even sure that will help. The new system should be faster, or at least as fast, so I'd like to sort out what's going on before I buy more ram. > > however the table structure is almost identical (UTF8 on the new one > > vs. C on the old). > > Locale settings make a huge difference for sorting and LIKE queries. > We usually use the C locale and SQL_ASCII encoding, mostly for > performance reasons. (Proper UTF-8 can be enforced through > constraints if necessary.) > I suppose that might be a factor. How much of a performance difference do you see between utf-8 and C? Alex
At 08:56 AM 3/2/2007, Carlos Moreno wrote: >Florian Weimer wrote: >>* Alex Deucher: >> >> >>>I have noticed a strange performance regression and I'm at a loss as >>>to what's happening. We have a fairly large database (~16 GB). >>> >> >>Sorry for asking, but is this a typo? Do you mean 16 *TB* instead of >>16 *GB*? >> >>If it's really 16 GB, you should check if it's cheaper to buy more RAM >>than to fiddle with the existing infrastructure. >> > >This brings me to a related question: > >Do I need to specifically configure something to take advantage of >such increase of RAM? > >In particular, is the amount of things that postgres can do with RAM >limited by the amount of shared_buffers or some other parameter? >Should shared_buffers be a fixed fraction of the total amount of >physical RAM, or should it be the total amount minus half a gigabyte >or so? > >As an example, if one upgrades a host from 1GB to 4GB, what would >be the right thing to do in the configuration, assuming 8.1 or 8.2? (at >least what would be the critical aspects?) > >Thanks, > >Carlos Unfortunately, pg does not (yet! ;-) ) treat all available RAM as a common pool and dynamically allocate it intelligently to each of the various memory data structures. So if you increase your RAM, you will have to manually change the entries in the pg config file to take advantage of it. (and start pg after changing it for the new config values to take effect) The pertinent values are all those listed under "Memory" in the annotated pg conf file: shared_buffers, work_mem, maintenance_work_mem, etc. http://www.powerpostgresql.com/Downloads/annotated_conf_80.html Cheers, Ron Peacetree
On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Alex Deucher wrote: > > > here are some examples. Analyze is still running on the new db, I'll > > post results when that is done. Mostly what our apps do is prepared > > row selects from different tables: > > select c1,c2,c3,c4,c5 from t1 where c1='XXX'; > > > > old server: > > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------- > > Index Scan using t1_c2_index on t1 (cost=0.00..166.89 rows=42 > > width=26) (actual time=5.722..5.809 rows=2 loops=1) > > Index Cond: ((c2)::text = '6258261'::text) > > Total runtime: 5.912 ms > > (3 rows) > > > > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; > > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------------- > > Index Scan using t1_c1_key on t1 (cost=0.00..286.08 rows=72 > > width=26) (actual time=12.423..12.475 rows=12 loops=1) > > Index Cond: ((c1)::text = '6258261'::text) > > Total runtime: 12.538 ms > > (3 rows) > > > > > > new server: > > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; > > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------------- > > Index Scan using t1_c2_index on t1 (cost=0.00..37.63 rows=11 > > width=26) (actual time=33.461..51.377 rows=2 loops=1) > > Index Cond: ((c2)::text = '6258261'::text) > > Total runtime: 51.419 ms > > (3 rows) > > > > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; > > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------------- > > Index Scan using t1_c1_index on t1 (cost=0.00..630.45 rows=2907 > > width=26) (actual time=45.733..46.271 rows=12 loops=1) > > Index Cond: ((c1)::text = '6258261'::text) > > Total runtime: 46.325 ms > > (3 rows) > > Notice the huge disparity here betwen the expected number of rows (2907) and > the actual rows? That's indicative of needing to run analyze. The time is > only about 4x the 7.4 runtime and that's with the analyze running merrily > along in the background. It's probably not as bad off as you think. At least > this query isn't 10x. :-) > > Run these again for us after analyze is complete. well, while the DB isn't 10x, the application using the DB shoes a 10x decrease in performance. Pages that used to take 5 seconds to load take 50 secs (I supposed the problem is compounded as there are several queries per page.). Anyway, new numbers after the analyze. Unfortunately, they are improved, but still not great: old server: db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c2_index on t1 (cost=0.00..166.89 rows=42 width=26) (actual time=0.204..0.284 rows=2 loops=1) Index Cond: ((c2)::text = '6258261'::text) Total runtime: 0.421 ms (3 rows) db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c1_key on t1 (cost=0.00..286.08 rows=72 width=26) (actual time=0.299..0.354 rows=12 loops=1) Index Cond: ((c1)::text = '6258261'::text) Total runtime: 0.451 ms (3 rows) new server: db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c2_index on t1 (cost=0.00..37.63 rows=11 width=26) (actual time=0.126..0.134 rows=2 loops=1) Index Cond: ((c2)::text = '6258261'::text) Total runtime: 0.197 ms (3 rows) db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Index Scan using t1_c1_index on t1 (cost=0.00..630.45 rows=2907 width=26) (actual time=5.820..5.848 rows=12 loops=1) Index Cond: ((c1)::text = '6258261'::text) Total runtime: 5.899 ms (3 rows) Here's another example: old server: db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6000001'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c2_index on t1 (cost=0.00..166.89 rows=42 width=26) (actual time=4.031..55.349 rows=8 loops=1) Index Cond: ((c2)::text = '6000001'::text) Total runtime: 55.459 ms (3 rows) db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6000001'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c1_key on t1 (cost=0.00..286.08 rows=72 width=26) (actual time=0.183..0.203 rows=4 loops=1) Index Cond: ((c1)::text = '6000001'::text) Total runtime: 0.289 ms (3 rows) new server: db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6000001'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Index Scan using t1_c2_index on t1 (cost=0.00..37.63 rows=11 width=26) (actual time=115.412..202.151 rows=8 loops=1) Index Cond: ((c2)::text = '6000001'::text) Total runtime: 202.234 ms (3 rows) db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6000001'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c1_index on t1 (cost=0.00..630.45 rows=2907 width=26) (actual time=99.811..99.820 rows=4 loops=1) Index Cond: ((c1)::text = '6000001'::text) Total runtime: 99.861 ms (3 rows) I haven't gotten a chance to restart postgres this the config changes you suggested yet. The rows have improved for some but not all and the times are still slow. Any ideas? Alex
At 10:16 AM 3/2/2007, Alex Deucher wrote: >On 3/2/07, Florian Weimer <fweimer@bfk.de> wrote: >>* Alex Deucher: >> >> > I have noticed a strange performance regression and I'm at a loss as >> > to what's happening. We have a fairly large database (~16 GB). >> >>Sorry for asking, but is this a typo? Do you mean 16 *TB* instead of >>16 *GB*? >> >>If it's really 16 GB, you should check if it's cheaper to buy more RAM >>than to fiddle with the existing infrastructure. > >Yes, 16 GB. I'd rather not shell out for more ram, if I'm not even >sure that will help. The new system should be faster, or at least as >fast, so I'd like to sort out what's going on before I buy more ram. > OK. You a= went from pg 7.4.x to 8.1.4 AND b= you changed from 4 SPARC CPUs (how many cores? If this is > 4...) to 2 2C Opterons AND (SPEC and TPC bench differences between these CPUs?) c= you went from a Sun box to a "white box" AND (memory subsystem differences? other differences?) d= you went from local HD IO to a SAN (many differences hidden in that one line... ...and is the physical layout of tables and things like pg_xlog sane on the SAN?) ...and you did this by just pulling over the old DB onto the new HW? May I suggest that it is possible that your schema, queries, etc were all optimized for pg 7.x running on the old HW? (explain analyze shows the old system taking ~1/10 the time per row as well as estimating the number of rows more accurately) RAM is =cheap=. Much cheaper than the cost of a detective hunt followed by rework to queries, schema, etc. Fitting the entire DB into RAM is guaranteed to help unless this is an OLTP like application where HD IO is required to be synchronous.. If you can fit the entire DB comfortably into RAM, do it and buy yourself the time to figure out the rest of the story w/o impacting on production performance. Cheers, Ron Peacetree
On 3/2/07, Ron <rjpeace@earthlink.net> wrote: > At 10:16 AM 3/2/2007, Alex Deucher wrote: > >On 3/2/07, Florian Weimer <fweimer@bfk.de> wrote: > >>* Alex Deucher: > >> > >> > I have noticed a strange performance regression and I'm at a loss as > >> > to what's happening. We have a fairly large database (~16 GB). > >> > >>Sorry for asking, but is this a typo? Do you mean 16 *TB* instead of > >>16 *GB*? > >> > >>If it's really 16 GB, you should check if it's cheaper to buy more RAM > >>than to fiddle with the existing infrastructure. > > > >Yes, 16 GB. I'd rather not shell out for more ram, if I'm not even > >sure that will help. The new system should be faster, or at least as > >fast, so I'd like to sort out what's going on before I buy more ram. > > > OK. You > a= went from pg 7.4.x to 8.1.4 AND > yes. > b= you changed from 4 SPARC CPUs (how many cores? If this is > 4...) > to 2 2C Opterons AND > (SPEC and TPC bench differences between these CPUs?) > 4 single core 800 Mhz sparcs to 2 dual core 2.2 Ghz opterons. > c= you went from a Sun box to a "white box" AND > (memory subsystem differences? other differences?) > The new hardware is Sun as well. X4100s running Linux. It should be faster all around because the old server is 5 years old. > d= you went from local HD IO to a SAN > (many differences hidden in that one line... ...and is the physical > layout of tables and things like pg_xlog sane on the SAN?) > > > ...and you did this by just pulling over the old DB onto the new HW? > We rebuild the DB from scratch on the new server. Same table structure though. We reloaded from the source material directly. > May I suggest that it is possible that your schema, queries, etc were > all optimized for pg 7.x running on the old HW? > (explain analyze shows the old system taking ~1/10 the time per row > as well as estimating the number of rows more accurately) > > RAM is =cheap=. Much cheaper than the cost of a detective hunt > followed by rework to queries, schema, etc. > Fitting the entire DB into RAM is guaranteed to help unless this is > an OLTP like application where HD IO is required to be synchronous.. > If you can fit the entire DB comfortably into RAM, do it and buy > yourself the time to figure out the rest of the story w/o impacting > on production performance. Perhaps so. I just don't want to spend $1000 on ram and have it only marginally improve performance if at all. The old DB works, so we can keep using that until we sort this out. Alex > > Cheers, > Ron Peacetree > >
"Alex Deucher" <alexdeucher@gmail.com> writes: > Anyway, new numbers after the analyze. > Unfortunately, they are improved, but still not great: Why are the index names different between the old and new servers? Is that just cosmetic, or is 8.2 actually picking a different (and less suitable) index for the c1 queries? regards, tom lane
On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Alex Deucher" <alexdeucher@gmail.com> writes: > > Anyway, new numbers after the analyze. > > Unfortunately, they are improved, but still not great: > > Why are the index names different between the old and new servers? > Is that just cosmetic, or is 8.2 actually picking a different > (and less suitable) index for the c1 queries? > That's just cosmetic. They are the same. Alex
On Fri, 2007-03-02 at 10:03, Alex Deucher wrote: > On 3/2/07, Ron <rjpeace@earthlink.net> wrote: > > At 10:16 AM 3/2/2007, Alex Deucher wrote: > > d= you went from local HD IO to a SAN > > (many differences hidden in that one line... ...and is the physical > > layout of tables and things like pg_xlog sane on the SAN?) > > > > > > ...and you did this by just pulling over the old DB onto the new HW? > > > > We rebuild the DB from scratch on the new server. Same table > structure though. We reloaded from the source material directly. I would REALLY recommend testing this machine out with a simple software RAID-1 pair of SCSI or SATA drives just to eliminate or confirm the SAN as the root problem.
At 11:03 AM 3/2/2007, Alex Deucher wrote: >On 3/2/07, Ron <rjpeace@earthlink.net> wrote: > >>May I suggest that it is possible that your schema, queries, etc were >>all optimized for pg 7.x running on the old HW? >>(explain analyze shows the old system taking ~1/10 the time per row >>as well as estimating the number of rows more accurately) >> >>RAM is =cheap=. Much cheaper than the cost of a detective hunt >>followed by rework to queries, schema, etc. >>Fitting the entire DB into RAM is guaranteed to help unless this is >>an OLTP like application where HD IO is required to be synchronous.. >>If you can fit the entire DB comfortably into RAM, do it and buy >>yourself the time to figure out the rest of the story w/o impacting >>on production performance. > >Perhaps so. I just don't want to spend $1000 on ram and have it only >marginally improve performance if at all. The old DB works, so we can >keep using that until we sort this out. > >Alex 1= $1000 worth of RAM is very likely less than the $ worth of, say, 10 hours of your time to your company. Perhaps much less. (Your =worth=, not your pay or even your fully loaded cost. This number tends to be >= 4x what you are paid unless the organization you are working for is in imminent financial danger.) You've already put more considerably more than 10 hours of your time into this... 2= If the DB goes from not fitting completely into RAM to being completely RAM resident, you are almost 100% guaranteed a big performance boost. The exception is an OLTP like app where DB writes can't be done a-synchronously (doing financial transactions, real time control systems, etc). Data mines should never have this issue. 3= Whether adding enough RAM to make the DB RAM resident (and re-configuring conf, etc, appropriately) solves the problem or not, you will have gotten a serious lead as to what's wrong. ...and I still think looking closely at the actual physical layout of the tables in the SAN is likely to be worth it. Cheers, Ron Peacetree
Am Donnerstag 01 März 2007 21:44 schrieb Alex Deucher: > Hello, > > I have noticed a strange performance regression and I'm at a loss as > to what's happening. We have a fairly large database (~16 GB). The > original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB > of ram running Solaris on local scsi discs. The new server is a sun > Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux > (AMD64) on a 4 Gbps FC SAN volume. When we created the new database > it was created from scratch rather than copying over the old one, > however the table structure is almost identical (UTF8 on the new one > vs. C on the old). The problem is queries are ~10x slower on the new > hardware. I read several places that the SAN might be to blame, but > testing with bonnie and dd indicates that the SAN is actually almost > twice as fast as the scsi discs in the old sun server. I've tried > adjusting just about every option in the postgres config file, but > performance remains the same. Any ideas? > 1. Do you use NUMA ctl for locking the db on one node ? 2. do you use bios to interleave memeory ? 3. do you expand cache over mor than one numa node ? > Thanks, > > Alex > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- ATRSoft GmbH Rosellstrasse 9 D 50354 Hürth Deutschland Tel .: +49(0)2233 691324 Geschäftsführer Anton Rommerskirchen Köln HRB 44927 STNR 224/5701 - 1010
Florian Weimer escribió: > Locale settings make a huge difference for sorting and LIKE queries. > We usually use the C locale and SQL_ASCII encoding, mostly for > performance reasons. (Proper UTF-8 can be enforced through > constraints if necessary.) Hmm, you are aware of varchar_pattern_ops and related opclasses, right? That helps for LIKE queries in non-C locales (though you do have to keep almost-duplicate indexes). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 3/2/07, Ron <rjpeace@earthlink.net> wrote: > At 11:03 AM 3/2/2007, Alex Deucher wrote: > >On 3/2/07, Ron <rjpeace@earthlink.net> wrote: > > > >>May I suggest that it is possible that your schema, queries, etc were > >>all optimized for pg 7.x running on the old HW? > >>(explain analyze shows the old system taking ~1/10 the time per row > >>as well as estimating the number of rows more accurately) > >> > >>RAM is =cheap=. Much cheaper than the cost of a detective hunt > >>followed by rework to queries, schema, etc. > >>Fitting the entire DB into RAM is guaranteed to help unless this is > >>an OLTP like application where HD IO is required to be synchronous.. > >>If you can fit the entire DB comfortably into RAM, do it and buy > >>yourself the time to figure out the rest of the story w/o impacting > >>on production performance. > > > >Perhaps so. I just don't want to spend $1000 on ram and have it only > >marginally improve performance if at all. The old DB works, so we can > >keep using that until we sort this out. > > > >Alex > 1= $1000 worth of RAM is very likely less than the $ worth of, say, > 10 hours of your time to your company. Perhaps much less. > (Your =worth=, not your pay or even your fully loaded cost. This > number tends to be >= 4x what you are paid unless the organization > you are working for is in imminent financial danger.) > You've already put more considerably more than 10 hours of your time > into this... > > 2= If the DB goes from not fitting completely into RAM to being > completely RAM resident, you are almost 100% guaranteed a big > performance boost. > The exception is an OLTP like app where DB writes can't be done > a-synchronously (doing financial transactions, real time control systems, etc). > Data mines should never have this issue. > > 3= Whether adding enough RAM to make the DB RAM resident (and > re-configuring conf, etc, appropriately) solves the problem or not, > you will have gotten a serious lead as to what's wrong. > > ...and I still think looking closely at the actual physical layout of > the tables in the SAN is likely to be worth it. > How would I go about doing that? Thanks, Alex
At 02:43 PM 3/2/2007, Alex Deucher wrote: >On 3/2/07, Ron <rjpeace@earthlink.net> wrote: >> >>...and I still think looking closely at the actual physical layout of >>the tables in the SAN is likely to be worth it. > >How would I go about doing that? > >Alex Hard for me to give specific advice when I don't know what SAN product we are talking about nor what kind of HDs are in it nor how those HDs are presently configured... I quote you in an earlier post: "The RAID groups on the SAN were set up for maximum capacity rather than for performance. Using it for the databases just came up recently." That implies to me that the SAN is more or less set up as a huge 105 HD (assuming this number is correct? We all know how "assume" is spelled...) JBOD or RAID 5 (or 6, or 5*, or 6*) set. =IF= that is true, tables are not being given dedicated RAID groups. That implies that traditional lore like having pg_xlog on dedicated spindles is being ignored. Nor is the more general Best Practice of putting the most heavily used tables onto dedicated spindles being followed. In addition, the most space efficient RAID levels: 5* or 6*, are not the best performing one (RAID 10 striping your mirrors) In short, configuring a SAN for maximum capacity is exactly the wrong thing to do if one is planning to use it in the best way to support DB performance. I assume (there's that word again...) that there is someone in your organization who understands how the SAN is configured and administered. You need to talk to them about these issues. Cheers, Ron
On 3/2/07, Ron <rjpeace@earthlink.net> wrote: > At 02:43 PM 3/2/2007, Alex Deucher wrote: > >On 3/2/07, Ron <rjpeace@earthlink.net> wrote: > >> > >>...and I still think looking closely at the actual physical layout of > >>the tables in the SAN is likely to be worth it. > > > >How would I go about doing that? > > > >Alex > > Hard for me to give specific advice when I don't know what SAN > product we are talking about nor what kind of HDs are in it nor how > those HDs are presently configured... > > I quote you in an earlier post: > "The RAID groups on the SAN were set up for maximum capacity rather > than for performance. Using it for the databases just came up recently." > > That implies to me that the SAN is more or less set up as a huge 105 > HD (assuming this number is correct? We all know how "assume" is > spelled...) JBOD or RAID 5 (or 6, or 5*, or 6*) set. > > =IF= that is true, tables are not being given dedicated RAID > groups. That implies that traditional lore like having pg_xlog on > dedicated spindles is being ignored. > Nor is the more general Best Practice of putting the most heavily > used tables onto dedicated spindles being followed. > > In addition, the most space efficient RAID levels: 5* or 6*, are not > the best performing one (RAID 10 striping your mirrors) > > In short, configuring a SAN for maximum capacity is exactly the wrong > thing to do if one is planning to use it in the best way to support > DB performance. > > I assume (there's that word again...) that there is someone in your > organization who understands how the SAN is configured and administered. > You need to talk to them about these issues. > Ah OK. I see what you are saying; thank you for clarifying. Yes, the SAN is configured for maximum capacity; it has large RAID 5 groups. As I said earlier, we never intended to run a DB on the SAN, it just happened to come up, hence the configuration. Alex
On 02.03.2007, at 14:20, Alex Deucher wrote: > Ah OK. I see what you are saying; thank you for clarifying. Yes, > the SAN is configured for maximum capacity; it has large RAID 5 > groups. As I said earlier, we never intended to run a DB on the SAN, > it just happened to come up, hence the configuration. So why not dumping the stuff ones, importing into a PG configured to use local discs (Or even ONE local disc, you might have the 16GB you gave as a size for the db on the local machine, right?) and testing whether the problem is with PG connecting to the SAN. So you have one factor less to consider after all your changes. Maybe it's just that something in the chain from PG to the actual HD spindles kills your random access performance for getting the actual rows. cug
On Fri, 2 Mar 2007, Guido Neitzer wrote: > On 02.03.2007, at 14:20, Alex Deucher wrote: > >> Ah OK. I see what you are saying; thank you for clarifying. Yes, >> the SAN is configured for maximum capacity; it has large RAID 5 >> groups. As I said earlier, we never intended to run a DB on the SAN, >> it just happened to come up, hence the configuration. > > So why not dumping the stuff ones, importing into a PG configured to use > local discs (Or even ONE local disc, you might have the 16GB you gave as a > size for the db on the local machine, right?) and testing whether the problem > is with PG connecting to the SAN. So you have one factor less to consider > after all your changes. > > Maybe it's just that something in the chain from PG to the actual HD spindles > kills your random access performance for getting the actual rows. I am actually starting to think that the SAN may be introducing some amount of latency that is enough to kill your random IO which is what all of the queries in question are doing - look up in index - fetch row from table. If you have the time, it would be totally worth it to test with a local disk and see how that affects the speed. I would think that even with RAID5, a SAN with that many spindles would be quite fast in raw throughput, but perhaps it's just seek latency that's killing you. When you run the bonnie tests again, take note of what the seeks/sec is compared with the old disk. Also, you should run bonnie with the -b switch to see if that causes significant slowdown of the writes...maybe minor synced write activity to pg_xlog is bogging the entire system down. Is the system spending most of its time in IO wait? Also, another item of note might be the actual on disk DB size..I wonder if it has changed significantly going from SQL_ASCII to UTF8. In 8.1 you can do this: SELECT datname, pg_size_pretty(pg_database_size(datname)) AS size FROM pg_database; In 7.4, you'll need to install the dbsize contrib module to get the same info. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On 3/2/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Fri, 2 Mar 2007, Guido Neitzer wrote: > > > On 02.03.2007, at 14:20, Alex Deucher wrote: > > > >> Ah OK. I see what you are saying; thank you for clarifying. Yes, > >> the SAN is configured for maximum capacity; it has large RAID 5 > >> groups. As I said earlier, we never intended to run a DB on the SAN, > >> it just happened to come up, hence the configuration. > > > > So why not dumping the stuff ones, importing into a PG configured to use > > local discs (Or even ONE local disc, you might have the 16GB you gave as a > > size for the db on the local machine, right?) and testing whether the problem > > is with PG connecting to the SAN. So you have one factor less to consider > > after all your changes. > > > > Maybe it's just that something in the chain from PG to the actual HD spindles > > kills your random access performance for getting the actual rows. > > I am actually starting to think that the SAN may be introducing some amount of > latency that is enough to kill your random IO which is what all of the queries > in question are doing - look up in index - fetch row from table. > > If you have the time, it would be totally worth it to test with a local disk > and see how that affects the speed. > > I would think that even with RAID5, a SAN with that many spindles would be > quite fast in raw throughput, but perhaps it's just seek latency that's > killing you. > > When you run the bonnie tests again, take note of what the seeks/sec is > compared with the old disk. Also, you should run bonnie with the -b switch to > see if that causes significant slowdown of the writes...maybe minor synced > write activity to pg_xlog is bogging the entire system down. Is the system > spending most of its time in IO wait? > > Also, another item of note might be the actual on disk DB size..I wonder if it > has changed significantly going from SQL_ASCII to UTF8. > > In 8.1 you can do this: > > SELECT datname, > pg_size_pretty(pg_database_size(datname)) AS size > FROM pg_database; > > In 7.4, you'll need to install the dbsize contrib module to get the same info. > I'm beginning the think the same thing. I'm planning to try the tests above next week. I'll let you know what I find out. Thanks! Alex