Thread: High CPU Load
Hi All, I migrated Postgres server from 7.4.6 to 8.1.4, But my server is completely full, by moment load average > 40 All queries analyzed by EXPLAIN, all indexes are used .. IO is good ... My configuration is correct ? - default configuration and se + somes updates : max_connections = 512 superuser_reserved_connections = 2 shared_buffers = 65536 work_mem = 65536 effective_cache_size = 131072 log_destination = 'syslog' redirect_stderr = off log_directory = '/var/log/pgsql' log_min_duration_statement = 100 silent_mode = on log_statement = 'none' default_with_oids = on My Server is Dual Xeon 3.06GHz with 2 Go RAM and good SCSI disks. Best Regards, Jérôme BENOIS.
Attachment
On 9/14/06, Jérôme BENOIS <benois@argia-engineering.fr> wrote: > I migrated Postgres server from 7.4.6 to 8.1.4, But my server is > completely full, by moment load average > 40 > All queries analyzed by EXPLAIN, all indexes are used .. IO is good ... What is the bottleneck? Are you CPU bound? Do you have iowait? Do you swap? Any weird things in vmstat output? > My configuration is correct ? > work_mem = 65536 If you have a lot of concurrent queries, it's probably far too much. That said, if you don't swap, it's probably not the problem. -- Guillaume
Hi Guillaume, Le jeudi 14 septembre 2006 à 15:46 +0200, Guillaume Smet a écrit : > On 9/14/06, Jérôme BENOIS <benois@argia-engineering.fr> wrote: > > I migrated Postgres server from 7.4.6 to 8.1.4, But my server is > > completely full, by moment load average > 40 > > All queries analyzed by EXPLAIN, all indexes are used .. IO is good ... > > What is the bottleneck? Are you CPU bound? Do you have iowait? Do you > swap? Any weird things in vmstat output? the load average goes up and goes down between 1 and 70, it's strange. IO wait and swap are good. I have just very high CPU load. And it's user land time. top output : top - 15:57:57 up 118 days, 9:04, 4 users, load average: 8.16, 9.16, 15.51 Tasks: 439 total, 7 running, 432 sleeping, 0 stopped, 0 zombie Cpu(s): 87.3% us, 6.8% sy, 0.0% ni, 4.8% id, 0.1% wa, 0.2% hi, 0.8% si Mem: 2076404k total, 2067812k used, 8592k free, 13304k buffers Swap: 1954312k total, 236k used, 1954076k free, 1190296k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15667 postgres 25 0 536m 222m 532m R 98.8 11.0 1:39.29 postmaster 19533 postgres 25 0 535m 169m 532m R 92.9 8.3 0:38.68 postmaster 16278 postgres 25 0 537m 285m 532m R 86.3 14.1 1:37.56 postmaster 18695 postgres 16 0 535m 171m 532m S 16.1 8.5 0:14.46 postmaster 18092 postgres 16 0 544m 195m 532m R 11.5 9.7 0:31.87 postmaster 16896 postgres 15 0 534m 215m 532m S 6.3 10.6 0:27.13 postmaster 4835 postgres 15 0 535m 147m 532m S 2.6 7.3 1:27.20 postmaster 4836 postgres 15 0 536m 154m 532m S 2.0 7.6 1:26.07 postmaster 4833 postgres 15 0 535m 153m 532m S 1.0 7.6 1:26.54 postmaster 4839 postgres 15 0 535m 148m 532m S 1.0 7.3 1:25.10 postmaster 15083 postgres 15 0 535m 44m 532m S 1.0 2.2 0:16.13 postmaster Vmstat output : procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 4 0 236 13380 13876 1192036 0 0 0 0 1 1 19 6 70 5 4 0 236 13252 13876 1192036 0 0 10 0 0 0 92 8 0 0 16 0 236 13764 13884 1192096 0 0 52 28 0 0 91 9 0 0 4 0 236 11972 13904 1192824 0 0 320 17 0 0 92 8 0 0 4 0 236 12548 13904 1192892 0 0 16 0 0 0 92 8 0 0 9 0 236 11908 13912 1192884 0 0 4 38 0 0 91 9 0 0 8 0 236 8832 13568 1195676 0 0 6975 140 0 0 91 9 0 0 8 0 236 10236 13588 1193208 0 0 82 18 0 0 93 7 0 0 6 0 236 9532 13600 1193264 0 0 76 18 0 0 92 8 0 0 10 1 236 11060 13636 1193432 0 0 54 158 0 0 91 9 0 0 6 0 236 10204 13636 1193432 0 0 8 0 0 0 92 8 0 0 8 1 236 10972 13872 1192720 0 0 28 316 0 0 91 9 0 0 6 0 236 11004 13936 1192724 0 0 4 90 0 0 92 8 0 0 7 0 236 10300 13936 1192996 0 0 150 0 0 0 92 8 0 0 11 0 236 11004 13944 1192988 0 0 16 6 0 0 91 8 0 0 17 0 236 10732 13996 1193208 0 0 118 94 0 0 91 9 0 0 6 0 236 10796 13996 1193820 0 0 274 0 0 0 91 9 0 0 24 0 236 9900 13996 1193820 0 0 8 0 0 0 92 8 0 0 13 0 236 9420 14016 1194004 0 0 100 98 0 0 92 8 0 0 8 0 236 9276 13944 1188976 0 0 42 0 0 0 92 8 0 0 3 0 236 14524 13952 1188968 0 0 0 38 0 0 77 8 16 0 3 0 236 15164 13960 1189164 0 0 92 6 0 0 65 7 28 0 3 0 236 16380 13968 1189156 0 0 8 36 0 0 57 7 36 0 1 0 236 15604 14000 1189260 0 0 38 37 0 0 39 6 54 1 1 0 236 16564 14000 1189328 0 0 0 0 0 0 38 5 57 0 1 1 236 14900 14024 1189372 0 0 28 140 0 0 47 7 46 0 1 1 236 10212 14100 1195280 0 0 2956 122 0 0 21 3 71 5 5 0 236 13156 13988 1192400 0 0 534 6 0 0 19 3 77 1 0 0 236 8408 13996 1197016 0 0 4458 200 0 0 18 2 78 2 1 0 236 9784 13996 1195588 0 0 82 0 0 0 16 3 81 0 0 0 236 10728 14028 1195556 0 0 30 118 0 0 11 2 87 1 Thanks for your help, -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])" > > My configuration is correct ? > > work_mem = 65536 > > If you have a lot of concurrent queries, it's probably far too much. > That said, if you don't swap, it's probably not the problem. > > -- > Guillaume > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
Attachment
=?ISO-8859-1?Q?J=E9r=F4me?= BENOIS <benois@argia-engineering.fr> writes: > I migrated Postgres server from 7.4.6 to 8.1.4, But my server is > completely full, by moment load average > 40 Did you remember to ANALYZE the whole database after reloading it? pg_dump/reload won't by itself regenerate statistics. regards, tom lane
Hi Tom, Le jeudi 14 septembre 2006 à 10:13 -0400, Tom Lane a écrit : > =?ISO-8859-1?Q?J=E9r=F4me?= BENOIS <benois@argia-engineering.fr> writes: > > I migrated Postgres server from 7.4.6 to 8.1.4, But my server is > > completely full, by moment load average > 40 > > Did you remember to ANALYZE the whole database after reloading it? > pg_dump/reload won't by itself regenerate statistics. > > regards, tom lane I tested, dump + restore + vaccumdb --analyze on all databases but no change ... Cheers, -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])"
Attachment
On Thu, 2006-09-14 at 09:00, Jérôme BENOIS wrote: > Hi Guillaume, > > Le jeudi 14 septembre 2006 à 15:46 +0200, Guillaume Smet a écrit : > > On 9/14/06, Jérôme BENOIS <benois@argia-engineering.fr> wrote: > > > I migrated Postgres server from 7.4.6 to 8.1.4, But my server is > > > completely full, by moment load average > 40 > > > All queries analyzed by EXPLAIN, all indexes are used .. IO is good ... > > > > What is the bottleneck? Are you CPU bound? Do you have iowait? Do you > > swap? Any weird things in vmstat output? > the load average goes up and goes down between 1 and 70, it's strange. > IO wait and swap are good. I have just very high CPU load. And it's user > land time. > > top output : > > top - 15:57:57 up 118 days, 9:04, 4 users, load average: 8.16, 9.16, > 15.51 > Tasks: 439 total, 7 running, 432 sleeping, 0 stopped, 0 zombie > Cpu(s): 87.3% us, 6.8% sy, 0.0% ni, 4.8% id, 0.1% wa, 0.2% hi, > 0.8% si > Mem: 2076404k total, 2067812k used, 8592k free, 13304k buffers > Swap: 1954312k total, 236k used, 1954076k free, 1190296k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 15667 postgres 25 0 536m 222m 532m R 98.8 11.0 1:39.29 postmaster > 19533 postgres 25 0 535m 169m 532m R 92.9 8.3 0:38.68 postmaster > 16278 postgres 25 0 537m 285m 532m R 86.3 14.1 1:37.56 postmaster > 18695 postgres 16 0 535m 171m 532m S 16.1 8.5 0:14.46 postmaster > 18092 postgres 16 0 544m 195m 532m R 11.5 9.7 0:31.87 postmaster > 16896 postgres 15 0 534m 215m 532m S 6.3 10.6 0:27.13 postmaster Somewhere, the query planner is likely making a really bad decision. Have you analyzed your dbs?
On Thu, 2006-09-14 at 09:17, Jérôme BENOIS wrote: > Hi Tom, > > Le jeudi 14 septembre 2006 à 10:13 -0400, Tom Lane a écrit : > > =?ISO-8859-1?Q?J=E9r=F4me?= BENOIS <benois@argia-engineering.fr> writes: > > > I migrated Postgres server from 7.4.6 to 8.1.4, But my server is > > > completely full, by moment load average > 40 > > > > Did you remember to ANALYZE the whole database after reloading it? > > pg_dump/reload won't by itself regenerate statistics. > > > > regards, tom lane > I tested, dump + restore + vaccumdb --analyze on all databases but no change ... OK, set your db to log queries that take more than a few seconds to run. Execute those queries by hand with an explain analyze in front and post the output here.
On 9/14/06, Jérôme BENOIS <benois@argia-engineering.fr> wrote: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 15667 postgres 25 0 536m 222m 532m R 98.8 11.0 1:39.29 postmaster > 19533 postgres 25 0 535m 169m 532m R 92.9 8.3 0:38.68 postmaster > 16278 postgres 25 0 537m 285m 532m R 86.3 14.1 1:37.56 postmaster Enable stats_command_string and see which queries are running on these backends by selecting on pg_stat_activity. Do the queries finish? Do you have them in your query log? -- Guillaume
Hello, Le jeudi 14 septembre 2006 à 09:21 -0500, Scott Marlowe a écrit : > On Thu, 2006-09-14 at 09:17, Jérôme BENOIS wrote: > > Hi Tom, > > > > Le jeudi 14 septembre 2006 à 10:13 -0400, Tom Lane a écrit : > > > =?ISO-8859-1?Q?J=E9r=F4me?= BENOIS <benois@argia-engineering.fr> writes: > > > > I migrated Postgres server from 7.4.6 to 8.1.4, But my server is > > > > completely full, by moment load average > 40 > > > > > > Did you remember to ANALYZE the whole database after reloading it? > > > pg_dump/reload won't by itself regenerate statistics. > > > > > > regards, tom lane > > I tested, dump + restore + vaccumdb --analyze on all databases but no change ... > > > OK, set your db to log queries that take more than a few seconds to > run. Execute those queries by hand with an explain analyze in front and > post the output here. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend i tested all queries, but she used indexes ... an example : explain analyze select distinct INTEGER_VALUE,DATE_VALUE,EI_ID,VALUE_TYPE,FLOAT_VALUE,ID,TEXT_VALUE,CATEGORY_ID,STRING_VALUE,CATEGORYATTR_ID,NAME from (((select distinct ei_id as EIID from mpng2_ei_attribute as reqin1 where reqin1.CATEGORYATTR_ID = 0 AND reqin1.TEXT_VALUEilike '' and ei_id in ( select distinct ei_id as EIID from mpng2_ei_attribute as reqin2 where reqin2.CATEGORYATTR_ID= 0 AND reqin2.TEXT_VALUE ilike '' and ei_id in ( select distinct ei_id as EIID from mpng2_ei_attributeas reqin3 where reqin3.NAME = '' AND reqin3.STRING_VALUE = '' ) ) ) ) ) as req0 join mpng2_ei_attributeon req0.eiid = mpng2_ei_attribute.ei_id order by ei_id asc; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=758.53..762.19 rows=122 width=233) (actual time=0.191..0.191 rows=0 loops=1) -> Sort (cost=758.53..758.84 rows=122 width=233) (actual time=0.182..0.182 rows=0 loops=1) Sort Key: mpng2_ei_attribute.ei_id, mpng2_ei_attribute.integer_value, mpng2_ei_attribute.date_value, mpng2_ei_attribute.value_type, mpng2_ei_attribute.float_value, mpng2_ei_attribute.id, mpng2_ei_attribute.text_value, mpng2_ei_attribute.category_id, mpng2_ei_attribute.string_value, mpng2_ei_attribute.categoryattr_id, mpng2_ei_attribute.name -> Nested Loop (cost=365.83..754.31 rows=122 width=233) (actual time=0.126..0.126 rows=0 loops=1) -> Unique (cost=365.83..374.34 rows=1 width=4) (actual time=0.116..0.116 rows=0 loops=1) -> Nested Loop (cost=365.83..374.34 rows=1 width=4) (actual time=0.108..0.108 rows=0 loops=1) -> Unique (cost=350.22..354.69 rows=1 width=4) (actual time=0.097..0.097 rows=0 loops=1) -> Nested Loop (cost=350.22..354.69 rows=1 width=4) (actual time=0.089..0.089 rows=0 loops=1) -> Unique (cost=334.60..335.03 rows=1 width=4) (actual time=0.080..0.080 rows=0 loops=1) -> Sort (cost=334.60..334.82 rows=86 width=4) (actual time=0.072..0.072 rows=0 loops=1) Sort Key: reqin3.ei_id -> Bitmap Heap Scan on mpng2_ei_attribute reqin3 (cost=2.52..331.84 rows=86 width=4) (actual time=0.056..0.056 rows=0 loops=1) Recheck Cond: (((name)::text = ''::text) AND ((string_value)::text = ''::text)) -> Bitmap Index Scan on mpng2_ei_attribute_name_svalue (cost=0.00..2.52 rows=86 width=0) (actual time=0.043..0.043 rows=0 loops=1) Index Cond: (((name)::text = ''::text) AND ((string_value)::text = ''::text)) -> Bitmap Heap Scan on mpng2_ei_attribute reqin2 (cost=15.61..19.63 rows=1 width=4) (never executed) Recheck Cond: ((reqin2.ei_id = "outer".ei_id) AND (reqin2.categoryattr_id = 0)) Filter: (text_value ~~* ''::text) -> BitmapAnd (cost=15.61..15.61 rows=1 width=0) (never executed) -> Bitmap Index Scan on mpng2_ei_attribute_ei_id (cost=0.00..2.43 rows=122 width=0) (never executed) Index Cond: (reqin2.ei_id = "outer".ei_id) -> Bitmap Index Scan on mpng2_ei_attribute_categoryattr (cost=0.00..12.94 rows=1982 width=0) (never executed) Index Cond: (categoryattr_id = 0) -> Bitmap Heap Scan on mpng2_ei_attribute reqin1 (cost=15.61..19.63 rows=1 width=4) (never executed) Recheck Cond: ((reqin1.ei_id = "outer".ei_id) AND (reqin1.categoryattr_id = 0)) Filter: (text_value ~~* ''::text) -> BitmapAnd (cost=15.61..15.61 rows=1 width=0) (never executed) -> Bitmap Index Scan on mpng2_ei_attribute_ei_id (cost=0.00..2.43 rows=122 width=0) (never executed) Index Cond: (reqin1.ei_id = "outer".ei_id) -> Bitmap Index Scan on mpng2_ei_attribute_categoryattr (cost=0.00..12.94 rows=1982 width=0) (never executed) Index Cond: (categoryattr_id = 0) -> Index Scan using mpng2_ei_attribute_ei_id on mpng2_ei_attribute (cost=0.00..378.43 rows=122 width=233) (never executed) Index Cond: ("outer".ei_id = mpng2_ei_attribute.ei_id) Thanks, -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])"
Attachment
> -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > Jérôme BENOIS > explain analyze select distinct > INTEGER_VALUE,DATE_VALUE,EI_ID,VALUE_TYPE,FLOAT_VALUE,ID,TEXT_ > VALUE,CATEGORY_ID,STRING_VALUE,CATEGORYATTR_ID,NAME from ((( > select distinct ei_id as EIID from mpng2_ei_attribute as > reqin1 where reqin1.CATEGORYATTR_ID = 0 AND reqin1.TEXT_VALUE > ilike '' and ei_id in ( select distinct ei_id as EIID from > mpng2_ei_attribute as reqin2 where reqin2.CATEGORYATTR_ID = 0 > AND reqin2.TEXT_VALUE ilike '' and ei_id in ( select distinct > ei_id as EIID from mpng2_ei_attribute as reqin3 where > reqin3.NAME = '' AND reqin3.STRING_VALUE = '' ) ) ) ) ) as > req0 join mpng2_ei_attribute on req0.eiid = > mpng2_ei_attribute.ei_id order by ei_id asc; That is a lot of distinct's. Sorts are one thing that can really use up CPU. This query is doing lots of sorts, so its not surprising the CPU usage is high. On the subqueries you have a couple of cases where you say "... in (select distinct ...)" I dont think the distinct clause is necessary in that case. I'm not a hundred percent sure, but you might want to try removing them and see if the query results are the same and maybe the query will execute faster.
Hi Dave, Le jeudi 14 septembre 2006 à 10:02 -0500, Dave Dutcher a écrit : > > -----Original Message----- > > From: pgsql-performance-owner@postgresql.org > > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > > Jérôme BENOIS > > > explain analyze select distinct > > INTEGER_VALUE,DATE_VALUE,EI_ID,VALUE_TYPE,FLOAT_VALUE,ID,TEXT_ > > VALUE,CATEGORY_ID,STRING_VALUE,CATEGORYATTR_ID,NAME from ((( > > select distinct ei_id as EIID from mpng2_ei_attribute as > > reqin1 where reqin1.CATEGORYATTR_ID = 0 AND reqin1.TEXT_VALUE > > ilike '' and ei_id in ( select distinct ei_id as EIID from > > mpng2_ei_attribute as reqin2 where reqin2.CATEGORYATTR_ID = 0 > > AND reqin2.TEXT_VALUE ilike '' and ei_id in ( select distinct > > ei_id as EIID from mpng2_ei_attribute as reqin3 where > > reqin3.NAME = '' AND reqin3.STRING_VALUE = '' ) ) ) ) ) as > > req0 join mpng2_ei_attribute on req0.eiid = > > mpng2_ei_attribute.ei_id order by ei_id asc; > > > That is a lot of distinct's. Sorts are one thing that can really use up > CPU. This query is doing lots of sorts, so its not surprising the CPU usage > is high. > > On the subqueries you have a couple of cases where you say "... in (select > distinct ...)" I don’t think the distinct clause is necessary in that case. > I'm not a hundred percent sure, but you might want to try removing them and > see if the query results are the same and maybe the query will execute > faster. Thanks for your advice, but the load was good with previous version of postgres -> 7.4.6 on the same server and same datas, same application, same final users ... So we supect some system parameter, but which ? With vmstat -s is showing a lot of "pages swapped out", have you an idea ? Thanls a lot, -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])"
Attachment
On Thu, 2006-09-14 at 10:02, Dave Dutcher wrote: > > -----Original Message----- > > From: pgsql-performance-owner@postgresql.org > > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > > Jérôme BENOIS > > > explain analyze select distinct > > INTEGER_VALUE,DATE_VALUE,EI_ID,VALUE_TYPE,FLOAT_VALUE,ID,TEXT_ > > VALUE,CATEGORY_ID,STRING_VALUE,CATEGORYATTR_ID,NAME from ((( > > select distinct ei_id as EIID from mpng2_ei_attribute as > > reqin1 where reqin1.CATEGORYATTR_ID = 0 AND reqin1.TEXT_VALUE > > ilike '' and ei_id in ( select distinct ei_id as EIID from > > mpng2_ei_attribute as reqin2 where reqin2.CATEGORYATTR_ID = 0 > > AND reqin2.TEXT_VALUE ilike '' and ei_id in ( select distinct > > ei_id as EIID from mpng2_ei_attribute as reqin3 where > > reqin3.NAME = '' AND reqin3.STRING_VALUE = '' ) ) ) ) ) as > > req0 join mpng2_ei_attribute on req0.eiid = > > mpng2_ei_attribute.ei_id order by ei_id asc; > > > That is a lot of distinct's. Sorts are one thing that can really use up > CPU. This query is doing lots of sorts, so its not surprising the CPU usage > is high. I'm gonna make a SWAG here and guess that maybe your 7.4 db was initdb'd with a locale of C and the new one is initdb'd with a real locale, like en_US. Can Jérôme confirm or deny this?
Hi Scott, Le jeudi 14 septembre 2006 à 10:56 -0500, Scott Marlowe a écrit : > On Thu, 2006-09-14 at 10:02, Dave Dutcher wrote: > > > -----Original Message----- > > > From: pgsql-performance-owner@postgresql.org > > > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > > > Jérôme BENOIS > > > > > explain analyze select distinct > > > INTEGER_VALUE,DATE_VALUE,EI_ID,VALUE_TYPE,FLOAT_VALUE,ID,TEXT_ > > > VALUE,CATEGORY_ID,STRING_VALUE,CATEGORYATTR_ID,NAME from ((( > > > select distinct ei_id as EIID from mpng2_ei_attribute as > > > reqin1 where reqin1.CATEGORYATTR_ID = 0 AND reqin1.TEXT_VALUE > > > ilike '' and ei_id in ( select distinct ei_id as EIID from > > > mpng2_ei_attribute as reqin2 where reqin2.CATEGORYATTR_ID = 0 > > > AND reqin2.TEXT_VALUE ilike '' and ei_id in ( select distinct > > > ei_id as EIID from mpng2_ei_attribute as reqin3 where > > > reqin3.NAME = '' AND reqin3.STRING_VALUE = '' ) ) ) ) ) as > > > req0 join mpng2_ei_attribute on req0.eiid = > > > mpng2_ei_attribute.ei_id order by ei_id asc; > > > > > > That is a lot of distinct's. Sorts are one thing that can really use up > > CPU. This query is doing lots of sorts, so its not surprising the CPU usage > > is high. > > I'm gonna make a SWAG here and guess that maybe your 7.4 db was initdb'd > with a locale of C and the new one is initdb'd with a real locale, like > en_US. Can Jérôme confirm or deny this? > The locale used to run initdb is : su - postgres :~$ locale LANG=POSIX LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= Cheers, -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])"
Attachment
=?ISO-8859-1?Q?J=E9r=F4me?= BENOIS <benois@argia-engineering.fr> writes: > Le jeudi 14 septembre 2006 =C3=A0 10:56 -0500, Scott Marlowe a =C3=A9crit : >> I'm gonna make a SWAG here and guess that maybe your 7.4 db was initdb'd >> with a locale of C and the new one is initdb'd with a real locale, like >> en_US. Can J=C3=A9r=C3=B4me confirm or deny this? > The locale used to run initdb is : > su - postgres > :~$ locale > LANG=POSIX It'd be more convincing if "show lc_collate" etc. display C or POSIX. The fact that postgres' current default environment is LANG=POSIX doesn't prove much about what initdb saw. regards, tom lane
Jérôme, Perhaps it's a stupid question but are your queries slower than before? You didn't tell it. IMHO, it's not a problem to have a high load if you have a lot of users and your queries are fast (and with 8.1, they should be far faster than before). To take a real example, we had a problem with a quad xeon running postgres 7.4 and even when there were a lot of queries, the load was always lower than 4 and suddenly the queries were really slow and the database was completely unusable. When we upgraded to 8.1, on very high load, we had a far higher cpu load but queries were far faster even with a high cpu load. Considering your top output, I suspect you use HT and you should really remove it if it's the case. -- Guillaume
Hi Evgeny, Le jeudi 14 septembre 2006 à 20:47 +0400, Evgeny Gridasov a écrit : > Jérôme, > > How many concurrent connections do you have? I have between 300 and 400 concurrent connections. > Because You've got only 2GB of ram this is important! Postgres process > takes some bytes in memory =) .. I don't exactly how many, > but thinking if it is about 2Mb you'll get about 1Gb of ram used only by > postgres' processes (for 512 connections)! > Don't forget about your 512Mb shared memory setting, > postgres shared libraries and the OS filesystem cache... > > I hope your postgres binaries are not statically linked? no, i not use static binaries > Try using connection pooling in your software, or add some RAM, it's cheap. > And I think that work_mem of 65536 is too high for your system... I already use connection pool but i have many servers in front of database server. Ok i will test new lower work_mem tomorrow. -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])" > On Thu, 14 Sep 2006 17:09:25 +0200 > Jérôme BENOIS <benois@argia-engineering.fr> wrote: > > > Hi Dave, > > Le jeudi 14 septembre 2006 à 10:02 -0500, Dave Dutcher a écrit : > > > > -----Original Message----- > > > > From: pgsql-performance-owner@postgresql.org > > > > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > > > > Jérôme BENOIS > > > > > > > explain analyze select distinct > > > > INTEGER_VALUE,DATE_VALUE,EI_ID,VALUE_TYPE,FLOAT_VALUE,ID,TEXT_ > > > > VALUE,CATEGORY_ID,STRING_VALUE,CATEGORYATTR_ID,NAME from ((( > > > > select distinct ei_id as EIID from mpng2_ei_attribute as > > > > reqin1 where reqin1.CATEGORYATTR_ID = 0 AND reqin1.TEXT_VALUE > > > > ilike '' and ei_id in ( select distinct ei_id as EIID from > > > > mpng2_ei_attribute as reqin2 where reqin2.CATEGORYATTR_ID = 0 > > > > AND reqin2.TEXT_VALUE ilike '' and ei_id in ( select distinct > > > > ei_id as EIID from mpng2_ei_attribute as reqin3 where > > > > reqin3.NAME = '' AND reqin3.STRING_VALUE = '' ) ) ) ) ) as > > > > req0 join mpng2_ei_attribute on req0.eiid = > > > > mpng2_ei_attribute.ei_id order by ei_id asc; > > > > > > > > > That is a lot of distinct's. Sorts are one thing that can really use up > > > CPU. This query is doing lots of sorts, so its not surprising the CPU usage > > > is high. > > > > > > On the subqueries you have a couple of cases where you say "... in (select > > > distinct ...)" I don’t think the distinct clause is necessary in that case. > > > I'm not a hundred percent sure, but you might want to try removing them and > > > see if the query results are the same and maybe the query will execute > > > faster. > > > > Thanks for your advice, but the load was good with previous version of > > postgres -> 7.4.6 on the same server and same datas, same application, > > same final users ... > > > > So we supect some system parameter, but which ? > > > > With vmstat -s is showing a lot of "pages swapped out", have you an > > idea ? > > > > Thanls a lot, > >
Attachment
Hi Guillaume, Le jeudi 14 septembre 2006 à 23:22 +0200, Guillaume Smet a écrit : > Jérôme, > > Perhaps it's a stupid question but are your queries slower than > before? You didn't tell it. No, it's not stupid question ! Yes queries speed but when the load average exceeds 40 all queries are slower than before. > IMHO, it's not a problem to have a high load if you have a lot of > users and your queries are fast (and with 8.1, they should be far > faster than before). Yes i have a lot of users ;-) > > To take a real example, we had a problem with a quad xeon running > postgres 7.4 and even when there were a lot of queries, the load was > always lower than 4 and suddenly the queries were really slow and the > database was completely unusable. > When we upgraded to 8.1, on very high load, we had a far higher cpu > load but queries were far faster even with a high cpu load. I agree but by moment DB Server is so slow. > Considering your top output, I suspect you use HT and you should > really remove it if it's the case. what's means "HT" please ? > -- > Guillaume If you want, my JabberId : jerome.benois AT gmail.com -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])"
Attachment
On 9/14/06, Jérôme BENOIS <benois@argia-engineering.fr> wrote: > Yes i have a lot of users ;-) So your work_mem is probably far too high (that's what I told you in my first message) and you probably swap when you have too many users. Remember that work_mem can be used several times per query (and it's especially the case when you have a lot of sorts). When your load is high, check your swap activity and your io/wait. top gives you these information. If you swap, lower your work_mem to 32 MB for example then see if it's enough for your queries to run fast (you can check if there are files created in the $PGDATA/base/<your database oid>/pg_tmp) and if it doesn't swap. Retry with a lower/higher value to find the one that fits best to your queries and load. > I agree but by moment DB Server is so slow. Yep, that's the information that was missing :). > what's means "HT" please ? Hyper threading. It's usually not recommended to enable it on PostgreSQL servers. On most servers, you can disable it directly in the BIOS. -- Guillaume
>Hyper threading. It's usually not recommended to enable it on >PostgreSQL servers. On most servers, you can disable it directly in >the BIOS. Maybe for specific usage scenarios, but that's generally not been my experience with relatively recent versions of PG. Weran some tests with pgbench, and averaged 10% or more performance improvement. Now, I agree pgbench isn't the most realisticperformance, but we did notice a slight improvement in our application performance too. Also, here's some benchmarks that were posted earlier by the folks at tweakers.net also showing hyperthreading to be faster: http://tweakers.net/reviews/646/10 I'm not sure if it's dependent on OS- our tests were on BSD 5.x and PG 7.4 and 8.0/8.1 and were several months ago, so Idon't remember many more specifics than that. So, not saying it's a best practice one way or another, but this is pretty easy to test and you should definitely try itout both ways for your workload. - Bucky
Hi, Jérôme, Jérôme BENOIS wrote: > max_connections = 512 Do you really have that much concurrent connections? Then you should think about getting a larger machine, probably. You will definitely want to play with commit_delay and commit_siblings settings in that case, especially if you have write access. > work_mem = 65536 > effective_cache_size = 131072 hmm, 131072*8*1024 + 512*65536*1024 = 35433480192 - thats 33 Gig of Memory you assume here, not counting OS usage, and the fact that certain queries can use up a multiple of work_mem. Even on amachine that big, I'd be inclined to dedicate more memory to caching, and less to the backends, unless specific needs dictate it. You could try to use sqlrelay or pgpool to cut down the number of backends you need. > My Server is Dual Xeon 3.06GHz For xeons, there were rumours about "context switch storms" which kill performance. > with 2 Go RAM and good SCSI disks. For 2 Gigs of ram, you should cut down the number of concurrent backends. Does your machine go into swap? Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
On 9/15/06, Markus Schaber <schabi@logix-tt.com> wrote: > For xeons, there were rumours about "context switch storms" which kill > performance. It's not that much a problem in 8.1. There are a few corner cases when you still have the problem but on a regular load you don't have it anymore (validated here with a quad Xeon MP and a dual Xeon). -- Guillaume
Jérôme, How many concurrent connections do you have? Because You've got only 2GB of ram this is important! Postgres process takes some bytes in memory =) .. I don't exactly how many, but thinking if it is about 2Mb you'll get about 1Gb of ram used only by postgres' processes (for 512 connections)! Don't forget about your 512Mb shared memory setting, postgres shared libraries and the OS filesystem cache... I hope your postgres binaries are not statically linked? Try using connection pooling in your software, or add some RAM, it's cheap. And I think that work_mem of 65536 is too high for your system... On Thu, 14 Sep 2006 17:09:25 +0200 Jérôme BENOIS <benois@argia-engineering.fr> wrote: > Hi Dave, > Le jeudi 14 septembre 2006 à 10:02 -0500, Dave Dutcher a écrit : > > > -----Original Message----- > > > From: pgsql-performance-owner@postgresql.org > > > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > > > Jérôme BENOIS > > > > > explain analyze select distinct > > > INTEGER_VALUE,DATE_VALUE,EI_ID,VALUE_TYPE,FLOAT_VALUE,ID,TEXT_ > > > VALUE,CATEGORY_ID,STRING_VALUE,CATEGORYATTR_ID,NAME from ((( > > > select distinct ei_id as EIID from mpng2_ei_attribute as > > > reqin1 where reqin1.CATEGORYATTR_ID = 0 AND reqin1.TEXT_VALUE > > > ilike '' and ei_id in ( select distinct ei_id as EIID from > > > mpng2_ei_attribute as reqin2 where reqin2.CATEGORYATTR_ID = 0 > > > AND reqin2.TEXT_VALUE ilike '' and ei_id in ( select distinct > > > ei_id as EIID from mpng2_ei_attribute as reqin3 where > > > reqin3.NAME = '' AND reqin3.STRING_VALUE = '' ) ) ) ) ) as > > > req0 join mpng2_ei_attribute on req0.eiid = > > > mpng2_ei_attribute.ei_id order by ei_id asc; > > > > > > That is a lot of distinct's. Sorts are one thing that can really use up > > CPU. This query is doing lots of sorts, so its not surprising the CPU usage > > is high. > > > > On the subqueries you have a couple of cases where you say "... in (select > > distinct ...)" I don’t think the distinct clause is necessary in that case. > > I'm not a hundred percent sure, but you might want to try removing them and > > see if the query results are the same and maybe the query will execute > > faster. > > Thanks for your advice, but the load was good with previous version of > postgres -> 7.4.6 on the same server and same datas, same application, > same final users ... > > So we supect some system parameter, but which ? > > With vmstat -s is showing a lot of "pages swapped out", have you an > idea ? > > Thanls a lot, -- Evgeny Gridasov Software Engineer I-Free, Russia
Hi Guillaume, Now i disable Hyper Threading in BIOS, and "context switch storms" disappeared. (when i look with command sar -t) I decreased work_mem parameter to 32768. My CPU load is better. But it is still too high, in example : top - 16:27:05 up 9:13, 3 users, load average: 45.37, 43.43, 41.43 Tasks: 390 total, 26 running, 363 sleeping, 0 stopped, 1 zombie Cpu(s): 89.5% us, 9.8% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.2% hi, 0.4% si Mem: 2076404k total, 2039552k used, 36852k free, 40412k buffers Swap: 1954312k total, 468k used, 1953844k free, 1232000k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30907 postgres 16 0 537m 51m 532m R 20.4 2.5 1:44.73 postmaster 25631 postgres 16 0 538m 165m 532m R 17.4 8.2 8:43.76 postmaster 29357 postgres 16 0 537m 311m 532m R 17.4 15.3 0:26.47 postmaster 32294 postgres 16 0 535m 86m 532m R 14.9 4.3 0:04.97 postmaster 31406 postgres 16 0 536m 180m 532m R 14.4 8.9 0:22.04 postmaster 31991 postgres 16 0 535m 73m 532m R 14.4 3.6 0:08.21 postmaster 30782 postgres 16 0 536m 205m 532m R 14.0 10.1 0:19.63 postmaster Tomorrow morning i plan to add 2Go RAM in order to test difference with my actual config. Have you another ideas ? Best Regards, -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])" Le vendredi 15 septembre 2006 à 00:24 +0200, Guillaume Smet a écrit : > On 9/14/06, Jérôme BENOIS <benois@argia-engineering.fr> wrote: > > Yes i have a lot of users ;-) > > So your work_mem is probably far too high (that's what I told you in > my first message) and you probably swap when you have too many users. > Remember that work_mem can be used several times per query (and it's > especially the case when you have a lot of sorts). > When your load is high, check your swap activity and your io/wait. top > gives you these information. If you swap, lower your work_mem to 32 MB > for example then see if it's enough for your queries to run fast (you > can check if there are files created in the $PGDATA/base/<your > database oid>/pg_tmp) and if it doesn't swap. Retry with a > lower/higher value to find the one that fits best to your queries and > load. > > > I agree but by moment DB Server is so slow. > > Yep, that's the information that was missing :). > > > what's means "HT" please ? > > Hyper threading. It's usually not recommended to enable it on > PostgreSQL servers. On most servers, you can disable it directly in > the BIOS. > > -- > Guillaume > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Attachment
Hi Markus, Le vendredi 15 septembre 2006 à 11:43 +0200, Markus Schaber a écrit : > Hi, Jérôme, > > Jérôme BENOIS wrote: > > > max_connections = 512 > > Do you really have that much concurrent connections? Then you should > think about getting a larger machine, probably. > > You will definitely want to play with commit_delay and commit_siblings > settings in that case, especially if you have write access. > > > work_mem = 65536 > > effective_cache_size = 131072 > > hmm, 131072*8*1024 + 512*65536*1024 = 35433480192 - thats 33 Gig of > Memory you assume here, not counting OS usage, and the fact that certain > queries can use up a multiple of work_mem. Now i Have 335 concurrent connections, i decreased work_mem parameter to 32768 and disabled Hyper Threading in BIOS. But my CPU load is still very important. Tomorrow morning i plan to add 2Giga RAM ... But I don't understand why my database server worked good with previous version of postgres and same queries ... > Even on amachine that big, I'd be inclined to dedicate more memory to > caching, and less to the backends, unless specific needs dictate it. You > could try to use sqlrelay or pgpool to cut down the number of backends > you need. I used already database pool on my application and when i decrease number of connection my application is more slow ;-( > > > My Server is Dual Xeon 3.06GHz > > For xeons, there were rumours about "context switch storms" which kill > performance. I disabled Hyper Threading. > > with 2 Go RAM and good SCSI disks. > > For 2 Gigs of ram, you should cut down the number of concurrent backends. > > Does your machine go into swap? No, 0 swap found and i cannot found pgsql_tmp files in $PG_DATA/base/... > > Markus -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])"
Attachment
On 9/18/06, Jérôme BENOIS <benois@argia-engineering.fr> wrote: > Tomorrow morning i plan to add 2Go RAM in order to test difference with > my actual config. I don't think more RAM will change anything if you don't swap at all. You can try to set shared_buffers lower (try 32768 and 16384) but I don't think it will change anything in 8.1. The only thing left IMHO is that 8.1 is choosing a bad plan which consumes a lot of CPU for at least a query. When you analyze your logs, did you see a particularly slow query? Can you compare query log analysis from your old server and your new one? -- Guillaume
Hi, Jerome, Jérôme BENOIS wrote: > Now i Have 335 concurrent connections, i decreased work_mem parameter to > 32768 and disabled Hyper Threading in BIOS. But my CPU load is still > very important. What are your settings for commit_siblings and commit_delay? > Tomorrow morning i plan to add 2Giga RAM ... But I don't understand why > my database server worked good with previous version of postgres and > same queries ... I don't think any more that it's the RAM, as you told you don't go into swap. It has to be something else. Could you try logging which are the problematic queries, maybe they have bad plans for whatever reason. > I used already database pool on my application and when i decrease > number of connection my application is more slow ;-( Could you just make sure that the pool really uses persistent connections, and is not broken or misconfigured, always reconnect? HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Markus, Le mardi 19 septembre 2006 à 11:53 +0200, Markus Schaber a écrit : > Hi, Jerome, > > Jérôme BENOIS wrote: > > > Now i Have 335 concurrent connections, i decreased work_mem parameter to > > 32768 and disabled Hyper Threading in BIOS. But my CPU load is still > > very important. > > What are your settings for commit_siblings and commit_delay? It default : #commit_delay = 01 # range 0-100000, inmicroseconds #commit_siblings = 5 # range 1-1000 > > Tomorrow morning i plan to add 2Giga RAM ... But I don't understand why > > my database server worked good with previous version of postgres and > > same queries ... > > I don't think any more that it's the RAM, as you told you don't go into > swap. It has to be something else. Yes, i agree with you. > > Could you try logging which are the problematic queries, maybe they have > bad plans for whatever reason. > > > I used already database pool on my application and when i decrease > > number of connection my application is more slow ;-( > > Could you just make sure that the pool really uses persistent > connections, and is not broken or misconfigured, always reconnect? Yes it's persistent. I plan to return to previous version : 7.4.6 in and i will reinstall all in a dedicated server in order to reproduce and solve the problem. Jérôme. > HTH, > Markus > -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])"
Attachment
Hi, Jerome, Jérôme BENOIS wrote: >>> Now i Have 335 concurrent connections, i decreased work_mem parameter to >>> 32768 and disabled Hyper Threading in BIOS. But my CPU load is still >>> very important. >> What are your settings for commit_siblings and commit_delay? > It default : > > #commit_delay = 01 # range 0-100000, inmicroseconds > #commit_siblings = 5 # range 1-1000 You should uncomment them, and play with different settings. I'd try a commit_delay of 100, and commit_siblings of 5 to start with. > I plan to return to previous version : 7.4.6 in and i will reinstall all > in a dedicated server in order to reproduce and solve the problem. You should use at least 7.4.13 as it fixes some critical buts that were in 7.4.6. They use the same on-disk format and query planner logic, so they should not have any difference. I don't have much more ideas what the problem could be. Can you try to do some profiling (e. G. with statement logging) to see what specific statements are the one that cause high cpu load? Are there other differences (besides the PostgreSQL version) between the two installations? (Kernel, libraries, other software...) HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Hi, Markus, Le mardi 19 septembre 2006 à 15:09 +0200, Markus Schaber a écrit : > Hi, Jerome, > > Jérôme BENOIS wrote: > > >>> Now i Have 335 concurrent connections, i decreased work_mem parameter to > >>> 32768 and disabled Hyper Threading in BIOS. But my CPU load is still > >>> very important. > >> What are your settings for commit_siblings and commit_delay? > > It default : > > > > #commit_delay = 01 # range 0-100000, inmicroseconds > > #commit_siblings = 5 # range 1-1000 > > You should uncomment them, and play with different settings. I'd try a > commit_delay of 100, and commit_siblings of 5 to start with. > > > I plan to return to previous version : 7.4.6 in and i will reinstall all > > in a dedicated server in order to reproduce and solve the problem. > > You should use at least 7.4.13 as it fixes some critical buts that were > in 7.4.6. They use the same on-disk format and query planner logic, so > they should not have any difference. > > I don't have much more ideas what the problem could be. > > Can you try to do some profiling (e. G. with statement logging) to see > what specific statements are the one that cause high cpu load? > > Are there other differences (besides the PostgreSQL version) between the > two installations? (Kernel, libraries, other software...) nothing. I returned to the previous version 7.4.6 in my production server, it's work fine ! And I plan to reproduce this problem in a dedicated server, and i will send all informations in this list in the next week. I hope your help for solve this problem. Cheers, Jérôme. > HTH, > Markus -- Jérôme, python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in 'sioneb@gnireenigne-aigra.rf'.split('@')])"
Attachment
Hi All, I reply to me, we solved a CPU Load problem. We had an external batch who used an expensive SQL view and took 99% of the CPU. Thanks all for you help ! ------------------- I started the HAPlatform open-source project is a part of Share'nGo Project, this goal is define all documentation and scripts required to install and maintain High Available platform. Tow platform are targeted : * LAPJ : Linux Apache PostgreSQL Java * LAMP : Linux Apache MySQL PHP The first documentation is here (it's my postgres configuration) : http://sharengo.org/haplatform/docs/PostgreSQL/en/html_single/index.html Cheers, Jérôme. -- Open-Source : http://www.sharengo.org Corporate : http://www.argia-engineering.fr Le vendredi 22 septembre 2006 à 09:43 +0200, Jérôme BENOIS a écrit : > Hi, Markus, > > Le mardi 19 septembre 2006 à 15:09 +0200, Markus Schaber a écrit : > > Hi, Jerome, > > > > Jérôme BENOIS wrote: > > > > >>> Now i Have 335 concurrent connections, i decreased work_mem parameter to > > >>> 32768 and disabled Hyper Threading in BIOS. But my CPU load is still > > >>> very important. > > >> What are your settings for commit_siblings and commit_delay? > > > It default : > > > > > > #commit_delay = 01 # range 0-100000, inmicroseconds > > > #commit_siblings = 5 # range 1-1000 > > > > You should uncomment them, and play with different settings. I'd try a > > commit_delay of 100, and commit_siblings of 5 to start with. > > > > > I plan to return to previous version : 7.4.6 in and i will reinstall all > > > in a dedicated server in order to reproduce and solve the problem. > > > > You should use at least 7.4.13 as it fixes some critical buts that were > > in 7.4.6. They use the same on-disk format and query planner logic, so > > they should not have any difference. > > > > I don't have much more ideas what the problem could be. > > > > Can you try to do some profiling (e. G. with statement logging) to see > > what specific statements are the one that cause high cpu load? > > > > Are there other differences (besides the PostgreSQL version) between the > > two installations? (Kernel, libraries, other software...) > nothing. > > I returned to the previous version 7.4.6 in my production server, it's > work fine ! > > And I plan to reproduce this problem in a dedicated server, and i will > send all informations in this list in the next week. > > I hope your help for solve this problem. > > Cheers, > Jérôme. > > > HTH, > > Markus