Thread: Problem with ExclusiveLock on inserts
Hi all.
Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).
2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""
--
I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.
I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:
root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
root@rpopdb01e ~ # free -m
total used free shared buffers cached
Mem: 129028 123558 5469 0 135 119504
-/+ buffers/cache: 3918 125110
Swap: 16378 0 16378
root@rpopdb01e ~ #
PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.
The settings changed in postgresql.conf are here [5].
When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.
(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053
(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490
(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963
I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].
I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:
rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;
count
--------
115953
(1 row)
rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;
count
-------
0
(1 row)
rpopdb_p0=# \dS+ rpop.rpop_uidl
Table "rpop.rpop_uidl"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
popid | bigint | not null | plain | |
uidl | character varying(200) | not null | extended | |
Indexes:
"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)
Has OIDs: no
rpopdb_p0=#
My questions are:
1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?
2. How much space do we extend to the relation when we get exclusive lock on it?
3. Why extended page is not visible for other backends?
4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.
5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
--
Vladimir
Hi Vladimir,
Just in case: how is your ext4 mount?
Best regards,
Ilya
Hi all.Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfomodel name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHzroot@rpopdb01e ~ # free -mtotal used free shared buffers cachedMem: 129028 123558 5469 0 135 119504-/+ buffers/cache: 3918 125110Swap: 16378 0 16378root@rpopdb01e ~ #PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.The settings changed in postgresql.conf are here [5].When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;count--------115953(1 row)rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;count-------0(1 row)rpopdb_p0=# \dS+ rpop.rpop_uidlTable "rpop.rpop_uidl"Column | Type | Modifiers | Storage | Stats target | Description--------+------------------------+-----------+----------+--------------+-------------popid | bigint | not null | plain | |uidl | character varying(200) | not null | extended | |Indexes:"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)Has OIDs: norpopdb_p0=#My questions are:1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?2. How much space do we extend to the relation when we get exclusive lock on it?3. Why extended page is not visible for other backends?4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
--Vladimir
root@rpopdb01e ~ # fgrep data /etc/fstab
UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1
root@rpopdb01e ~ #
According to iostat the disks are not the bottleneck.
12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
Hi Vladimir,Just in case: how is your ext4 mount?Best regards,IlyaHi all.Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfomodel name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHzroot@rpopdb01e ~ # free -mtotal used free shared buffers cachedMem: 129028 123558 5469 0 135 119504-/+ buffers/cache: 3918 125110Swap: 16378 0 16378root@rpopdb01e ~ #PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.The settings changed in postgresql.conf are here [5].When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;count--------115953(1 row)rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;count-------0(1 row)rpopdb_p0=# \dS+ rpop.rpop_uidlTable "rpop.rpop_uidl"Column | Type | Modifiers | Storage | Stats target | Description--------+------------------------+-----------+----------+--------------+-------------popid | bigint | not null | plain | |uidl | character varying(200) | not null | extended | |Indexes:"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)Has OIDs: norpopdb_p0=#My questions are:1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?2. How much space do we extend to the relation when we get exclusive lock on it?3. Why extended page is not visible for other backends?4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
--Vladimir
--
Vladimir
My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneck on io before disks are actually involved. What says mount without arguments?
root@rpopdb01e ~ # fgrep data /etc/fstabUUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1root@rpopdb01e ~ #According to iostat the disks are not the bottleneck.12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):Hi Vladimir,Just in case: how is your ext4 mount?Best regards,IlyaHi all.Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfomodel name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHzroot@rpopdb01e ~ # free -mtotal used free shared buffers cachedMem: 129028 123558 5469 0 135 119504-/+ buffers/cache: 3918 125110Swap: 16378 0 16378root@rpopdb01e ~ #PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.The settings changed in postgresql.conf are here [5].When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;count--------115953(1 row)rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;count-------0(1 row)rpopdb_p0=# \dS+ rpop.rpop_uidlTable "rpop.rpop_uidl"Column | Type | Modifiers | Storage | Stats target | Description--------+------------------------+-----------+----------+--------------+-------------popid | bigint | not null | plain | |uidl | character varying(200) | not null | extended | |Indexes:"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)Has OIDs: norpopdb_p0=#My questions are:1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?2. How much space do we extend to the relation when we get exclusive lock on it?3. Why extended page is not visible for other backends?4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
--Vladimir
--Vladimir
Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on one cluster shard to try.
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)
root@rpopdb01e ~ #
12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneck on io before disks are actually involved. What says mount without arguments?root@rpopdb01e ~ # fgrep data /etc/fstabUUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1root@rpopdb01e ~ #According to iostat the disks are not the bottleneck.12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):Hi Vladimir,Just in case: how is your ext4 mount?Best regards,IlyaHi all.Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfomodel name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHzroot@rpopdb01e ~ # free -mtotal used free shared buffers cachedMem: 129028 123558 5469 0 135 119504-/+ buffers/cache: 3918 125110Swap: 16378 0 16378root@rpopdb01e ~ #PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.The settings changed in postgresql.conf are here [5].When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;count--------115953(1 row)rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;count-------0(1 row)rpopdb_p0=# \dS+ rpop.rpop_uidlTable "rpop.rpop_uidl"Column | Type | Modifiers | Storage | Stats target | Description--------+------------------------+-----------+----------+--------------+-------------popid | bigint | not null | plain | |uidl | character varying(200) | not null | extended | |Indexes:"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)Has OIDs: norpopdb_p0=#My questions are:1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?2. How much space do we extend to the relation when we get exclusive lock on it?3. Why extended page is not visible for other backends?4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
--Vladimir
--Vladimir
another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running? 4000 definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had barrier enabled and software raid.
Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu, otherwise performance will be poor.
Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu, otherwise performance will be poor.
On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote:
Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on one cluster shard to try.root@rpopdb01e ~ # mount | fgrep data/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2root@rpopdb01e ~ # mount | fgrep data/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)root@rpopdb01e ~ #12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneck on io before disks are actually involved. What says mount without arguments?root@rpopdb01e ~ # fgrep data /etc/fstabUUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1root@rpopdb01e ~ #According to iostat the disks are not the bottleneck.12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):Hi Vladimir,Just in case: how is your ext4 mount?Best regards,IlyaHi all.Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfomodel name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHzroot@rpopdb01e ~ # free -mtotal used free shared buffers cachedMem: 129028 123558 5469 0 135 119504-/+ buffers/cache: 3918 125110Swap: 16378 0 16378root@rpopdb01e ~ #PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.The settings changed in postgresql.conf are here [5].When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;count--------115953(1 row)rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;count-------0(1 row)rpopdb_p0=# \dS+ rpop.rpop_uidlTable "rpop.rpop_uidl"Column | Type | Modifiers | Storage | Stats target | Description--------+------------------------+-----------+----------+--------------+-------------popid | bigint | not null | plain | |uidl | character varying(200) | not null | extended | |Indexes:"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)Has OIDs: norpopdb_p0=#My questions are:1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?2. How much space do we extend to the relation when we get exclusive lock on it?3. Why extended page is not visible for other backends?4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
--Vladimir
--Vladimir
Yes, this is legacy, I will fix it. We had lots of inactive connections but right now we use pgbouncer for this. When the workload is normal we have some kind of 80-120 backends. Less than 10 of them are in active state. Having problem with locks we get lots of sessions (sometimes more than 1000 of them are in active state). According to vmstat the number of context switches is not so big (less than 20k), so I don't think it is the main reason. Yes, it can aggravate the problem, but imho not create it.
I don't understand the correlation of shared buffers size and synchronous_commit. Could you please explain your statement?
--
12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running? 4000 definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had barrier enabled and software raid.
Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu, otherwise performance will be poor.On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote:Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on one cluster shard to try.root@rpopdb01e ~ # mount | fgrep data/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2root@rpopdb01e ~ # mount | fgrep data/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)root@rpopdb01e ~ #12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneck on io before disks are actually involved. What says mount without arguments?root@rpopdb01e ~ # fgrep data /etc/fstabUUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1root@rpopdb01e ~ #According to iostat the disks are not the bottleneck.12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):Hi Vladimir,Just in case: how is your ext4 mount?Best regards,IlyaHi all.Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfomodel name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHzroot@rpopdb01e ~ # free -mtotal used free shared buffers cachedMem: 129028 123558 5469 0 135 119504-/+ buffers/cache: 3918 125110Swap: 16378 0 16378root@rpopdb01e ~ #PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.The settings changed in postgresql.conf are here [5].When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;count--------115953(1 row)rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;count-------0(1 row)rpopdb_p0=# \dS+ rpop.rpop_uidlTable "rpop.rpop_uidl"Column | Type | Modifiers | Storage | Stats target | Description--------+------------------------+-----------+----------+--------------+-------------popid | bigint | not null | plain | |uidl | character varying(200) | not null | extended | |Indexes:"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)Has OIDs: norpopdb_p0=#My questions are:1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?2. How much space do we extend to the relation when we get exclusive lock on it?3. Why extended page is not visible for other backends?4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
--Vladimir
--Vladimir
--
Vladimir
On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир <root@simply.name> wrote: > > Yes, this is legacy, I will fix it. We had lots of inactive connections but right now we use pgbouncer for this. When theworkload is normal we have some kind of 80-120 backends. Less than 10 of them are in active state. Having problem withlocks we get lots of sessions (sometimes more than 1000 of them are in active state). According to vmstat the numberof context switches is not so big (less than 20k), so I don't think it is the main reason. Yes, it can aggravate theproblem, but imho not create it. I'am afraid that is the problem. More than 1000 backends, most of them are simply waiting. > > > I don't understand the correlation of shared buffers size and synchronous_commit. Could you please explain your statement? You need to fsync your huge shared buffers any time your database performs checkpoint. By default it usually happens too often because checkpoint_timeout is 5min by default. Without bbu, on software raid that leads to io spike and you commit waits for wal. > > 12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а): > > another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running? 4000definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had barrierenabled and software raid. > > Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu,otherwise performance will be poor. > > > On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote: >> >> Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off onone cluster shard to try. >> >> root@rpopdb01e ~ # mount | fgrep data >> /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime) >> root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2 >> root@rpopdb01e ~ # mount | fgrep data >> /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier) >> root@rpopdb01e ~ # >> >> 12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а): >> >> My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneckon io before disks are actually involved. What says mount without arguments? >> >> On Feb 12, 2014, at 18:43, Бородин Владимир <root@simply.name> wrote: >> >> root@rpopdb01e ~ # fgrep data /etc/fstab >> UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1 >> root@rpopdb01e ~ # >> >> According to iostat the disks are not the bottleneck. >> >> 12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а): >> >> Hi Vladimir, >> >> Just in case: how is your ext4 mount? >> >> Best regards, >> Ilya >> >> On Feb 12, 2014, at 17:59, Бородин Владимир <root@simply.name> wrote: >> >> Hi all. >> >> Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When ithappens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout). >> >> 2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values(i_folder_id, i_uidl)"" >> >> I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics. >> >> I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores)and memory: >> >> root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo >> model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz >> root@rpopdb01e ~ # free -m >> total used free shared buffers cached >> Mem: 129028 123558 5469 0 135 119504 >> -/+ buffers/cache: 3918 125110 >> Swap: 16378 0 16378 >> root@rpopdb01e ~ # >> >> PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATAtakes only 95G. >> The settings changed in postgresql.conf are here [5]. >> >> When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they doa simple insert. >> >> (extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock| 5459,8053 >> (extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock| 5567,5490 >> (extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock| 5611,3963 >> >> I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids.I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updatesand deletes (only inserts and reads). Some more info about one of such tables is here [7]. >> >> I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken,but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for mostof them there are free pages but there are, for example, such cases: >> >> rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0; >> count >> -------- >> 115953 >> (1 row) >> >> rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0; >> count >> ------- >> 0 >> (1 row) >> >> rpopdb_p0=# \dS+ rpop.rpop_uidl >> Table "rpop.rpop_uidl" >> Column | Type | Modifiers | Storage | Stats target | Description >> --------+------------------------+-----------+----------+--------------+------------- >> popid | bigint | not null | plain | | >> uidl | character varying(200) | not null | extended | | >> Indexes: >> "pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl) >> Has OIDs: no >> >> rpopdb_p0=# >> >> >> My questions are: >> 1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring?How can I dig it? >> 2. How much space do we extend to the relation when we get exclusive lock on it? >> 3. Why extended page is not visible for other backends? >> 4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPUscheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And soon for many backends. >> 5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have anyideas what to do with this situation :( Any help would be really appropriate. >> >> [1] http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc159@mail.gmail.com >> [2] http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks >> [3] http://www.postgresql.org/message-id/50A2C93E.9070607@dalibo.com >> [4] http://www.postgresql.org/message-id/CAL_0b1sypYeOyNkYNV95nNV2d+4jXTug3HkKF6FahfW7Gvgb_Q@mail.gmail.com >> [5] http://pastebin.com/raw.php?i=Bd40Vn6h >> [6] http://wiki.postgresql.org/wiki/Lock_dependency_information >> [7 http://pastebin.com/raw.php?i=eGrtG524] >> >> -- >> Vladimir >> >> >> >> >> >> >> -- >> Vladimir >> >> >> >> >> >> >> -- >> Да пребудет с вами сила... >> http://simply.name >> >> >> >> >> > > > > -- > Vladimir > > > >
I have limited max connections to 1000, reduced shared buffers to 8G and restarted postgres.
The logs say that checkpoint finishes in 2,5 minutes (as expected due to default checkpoint_completion_target = 0.5) with no IO spikes so I don't want to increase checkpoint_timeout of checkpoint_segments.
I have also noticed that this big tables stopped vacuuming automatically a couple of weeks ago. It could be the reason of the problem, I will now try to tune autovacuum parameters to turn it back. But yesterday I ran "vacuum analyze" for all relations manually but that did not help.
--
13.02.2014, в 0:14, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир <root@simply.name> wrote:
Yes, this is legacy, I will fix it. We had lots of inactive connections but right now we use pgbouncer for this. When the workload is normal we have some kind of 80-120 backends. Less than 10 of them are in active state. Having problem with locks we get lots of sessions (sometimes more than 1000 of them are in active state). According to vmstat the number of context switches is not so big (less than 20k), so I don't think it is the main reason. Yes, it can aggravate the problem, but imho not create it.
I'am afraid that is the problem. More than 1000 backends, most of them
are simply waiting.
I don't understand the correlation of shared buffers size and synchronous_commit. Could you please explain your statement?
You need to fsync your huge shared buffers any time your database
performs checkpoint. By default it usually happens too often because
checkpoint_timeout is 5min by default. Without bbu, on software raid
that leads to io spike and you commit waits for wal.
12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running? 4000 definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had barrier enabled and software raid.
Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu, otherwise performance will be poor.
On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote:
Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on one cluster shard to try.
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)
root@rpopdb01e ~ #
12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneck on io before disks are actually involved. What says mount without arguments?
On Feb 12, 2014, at 18:43, Бородин Владимир <root@simply.name> wrote:
root@rpopdb01e ~ # fgrep data /etc/fstab
UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1
root@rpopdb01e ~ #
According to iostat the disks are not the bottleneck.
12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
Hi Vladimir,
Just in case: how is your ext4 mount?
Best regards,
Ilya
On Feb 12, 2014, at 17:59, Бородин Владимир <root@simply.name> wrote:
Hi all.
Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).
2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""
I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.
I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:
root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
root@rpopdb01e ~ # free -m
total used free shared buffers cached
Mem: 129028 123558 5469 0 135 119504
-/+ buffers/cache: 3918 125110
Swap: 16378 0 16378
root@rpopdb01e ~ #
PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.
The settings changed in postgresql.conf are here [5].
When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.
(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053
(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490
(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963
I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].
I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:
rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;
count
--------
115953
(1 row)
rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;
count
-------
0
(1 row)
rpopdb_p0=# \dS+ rpop.rpop_uidl
Table "rpop.rpop_uidl"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
popid | bigint | not null | plain | |
uidl | character varying(200) | not null | extended | |
Indexes:
"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)
Has OIDs: no
rpopdb_p0=#
My questions are:
1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?
2. How much space do we extend to the relation when we get exclusive lock on it?
3. Why extended page is not visible for other backends?
4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.
5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
[1] http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc159@mail.gmail.com
[2] http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks
[3] http://www.postgresql.org/message-id/50A2C93E.9070607@dalibo.com
[4] http://www.postgresql.org/message-id/CAL_0b1sypYeOyNkYNV95nNV2d+4jXTug3HkKF6FahfW7Gvgb_Q@mail.gmail.com
[5] http://pastebin.com/raw.php?i=Bd40Vn6h
[6] http://wiki.postgresql.org/wiki/Lock_dependency_information
[7 http://pastebin.com/raw.php?i=eGrtG524]
--
Vladimir
--
Vladimir
--
Да пребудет с вами сила...
http://simply.name
--
Vladimir
--
Vladimir
2014-02-12 18:59, Бородин Владимир <root@simply.name>: > I have read several topics [1, 2, 3, 4] with similar problems but haven't > find a good solution. Below is some more diagnostics. I reported the second one. The diagnostics was very similar to yours. I think a lot people experienced this problem with big servers. It was only because of large shared_buffers. The problem disappeared after we reduced it to 2 GB.
Cool. How much ram do you have on your server and what is the size of the database?
My settings of 32 GB was set due to recommendation of 25% from here [1]. Right now I am going to investigate the content of buffer cache with pg_buffercache and maybe OS page cache with recently released pg_stat_kcache to determine the optimal size for my load profile.
[1] http://www.postgresql.org/docs/current/static/runtime-config-resource.html
13.02.2014, в 13:13, Emre Hasegeli <emre@hasegeli.com> написал(а):
2014-02-12 18:59, Бородин Владимир <root@simply.name>:I have read several topics [1, 2, 3, 4] with similar problems but haven't
find a good solution. Below is some more diagnostics.
I reported the second one. The diagnostics was very similar to yours.
I think a lot people experienced this problem with big servers. It was
only because of large shared_buffers. The problem disappeared after
we reduced it to 2 GB.
Vladimir, And, any effect on your problem? On Thu, Feb 13, 2014 at 9:35 AM, Бородин Владимир <root@simply.name> wrote: > I have limited max connections to 1000, reduced shared buffers to 8G and restarted postgres. 1000 is still to much in most cases. With pgbouncer in transaction pooling mode normaly pool size 8-32, max_connections = 100 (default value) and client_connections 500-1500 looks more reasonable. > I have also noticed that this big tables stopped vacuuming automatically a couple of weeks ago. It could be the reasonof the problem, I will now try to tune autovacuum parameters to turn it back. But yesterday I ran "vacuum analyze"for all relations manually but that did not help. How do your autovacuum parameters look like now? > 13.02.2014, в 0:14, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а): > > On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир <root@simply.name> wrote: > > > Yes, this is legacy, I will fix it. We had lots of inactive connections but right now we use pgbouncer for this. When theworkload is normal we have some kind of 80-120 backends. Less than 10 of them are in active state. Having problem withlocks we get lots of sessions (sometimes more than 1000 of them are in active state). According to vmstat the numberof context switches is not so big (less than 20k), so I don't think it is the main reason. Yes, it can aggravate theproblem, but imho not create it. > > > > I'am afraid that is the problem. More than 1000 backends, most of them > are simply waiting. > > > > I don't understand the correlation of shared buffers size and synchronous_commit. Could you please explain your statement? > > > > You need to fsync your huge shared buffers any time your database > performs checkpoint. By default it usually happens too often because > checkpoint_timeout is 5min by default. Without bbu, on software raid > that leads to io spike and you commit waits for wal. > > > > 12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а): > > another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running? 4000definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had barrierenabled and software raid. > > Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu,otherwise performance will be poor. > > > On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote: > > > Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on onecluster shard to try. > > root@rpopdb01e ~ # mount | fgrep data > /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime) > root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2 > root@rpopdb01e ~ # mount | fgrep data > /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier) > root@rpopdb01e ~ # > > 12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а): > > My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneckon io before disks are actually involved. What says mount without arguments? > > On Feb 12, 2014, at 18:43, Бородин Владимир <root@simply.name> wrote: > > root@rpopdb01e ~ # fgrep data /etc/fstab > UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1 > root@rpopdb01e ~ # > > According to iostat the disks are not the bottleneck. > > 12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а): > > Hi Vladimir, > > Just in case: how is your ext4 mount? > > Best regards, > Ilya > > On Feb 12, 2014, at 17:59, Бородин Владимир <root@simply.name> wrote: > > Hi all. > > Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When ithappens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout). > > 2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values(i_folder_id, i_uidl)"" > > I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics. > > I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores)and memory: > > root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo > model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz > root@rpopdb01e ~ # free -m > total used free shared buffers cached > Mem: 129028 123558 5469 0 135 119504 > -/+ buffers/cache: 3918 125110 > Swap: 16378 0 16378 > root@rpopdb01e ~ # > > PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takesonly 95G. > The settings changed in postgresql.conf are here [5]. > > When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do asimple insert. > > (extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock| 5459,8053 > (extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock| 5567,5490 > (extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock| 5611,3963 > > I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids.I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updatesand deletes (only inserts and reads). Some more info about one of such tables is here [7]. > > I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken,but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for mostof them there are free pages but there are, for example, such cases: > > rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0; > count > -------- > 115953 > (1 row) > > rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0; > count > ------- > 0 > (1 row) > > rpopdb_p0=# \dS+ rpop.rpop_uidl > Table "rpop.rpop_uidl" > Column | Type | Modifiers | Storage | Stats target | Description > --------+------------------------+-----------+----------+--------------+------------- > popid | bigint | not null | plain | | > uidl | character varying(200) | not null | extended | | > Indexes: > "pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl) > Has OIDs: no > > rpopdb_p0=# > > > My questions are: > 1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? Howcan I dig it? > 2. How much space do we extend to the relation when we get exclusive lock on it? > 3. Why extended page is not visible for other backends? > 4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPUscheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And soon for many backends. > 5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have anyideas what to do with this situation :( Any help would be really appropriate. > > [1] http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc159@mail.gmail.com > [2] http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks > [3] http://www.postgresql.org/message-id/50A2C93E.9070607@dalibo.com > [4] http://www.postgresql.org/message-id/CAL_0b1sypYeOyNkYNV95nNV2d+4jXTug3HkKF6FahfW7Gvgb_Q@mail.gmail.com > [5] http://pastebin.com/raw.php?i=Bd40Vn6h > [6] http://wiki.postgresql.org/wiki/Lock_dependency_information > [7 http://pastebin.com/raw.php?i=eGrtG524] > > -- > Vladimir > > > > > > > -- > Vladimir > > > > > > > -- > Да пребудет с вами сила... > http://simply.name > > > > > > > > > -- > Vladimir > > > > > > > -- > Vladimir > > > >
On Thu, Feb 13, 2014 at 11:23 AM, Бородин Владимир <root@simply.name> wrote: > Cool. How much ram do you have on your server and what is the size of the > database? It has 200 GiB of memory for 100 GB database at that time. We had migrated the database from MySQL, that was the reason of overmuch resources. I do not maintain it anymore and I moved the database to much smaller server before I leave. > My settings of 32 GB was set due to recommendation of 25% from here [1]. Maybe we should add a note to that page. > Right now I am going to investigate the content of buffer cache with > pg_buffercache and maybe OS page cache with recently released pg_stat_kcache > to determine the optimal size for my load profile. > > [1] > http://www.postgresql.org/docs/current/static/runtime-config-resource.html
13.02.2014, в 13:29, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
Vladimir,
And, any effect on your problem?
It worked without problems longer than previous configuration but repeated again several minutes ago :(
On Thu, Feb 13, 2014 at 9:35 AM, Бородин Владимир <root@simply.name> wrote:I have limited max connections to 1000, reduced shared buffers to 8G and restarted postgres.
1000 is still to much in most cases. With pgbouncer in transaction
pooling mode normaly pool size 8-32, max_connections = 100 (default
value) and client_connections 500-1500 looks more reasonable.
Clients for this db are plproxy hosts. As far as I know plproxy can work only with statement pooling.
I have also noticed that this big tables stopped vacuuming automatically a couple of weeks ago. It could be the reason of the problem, I will now try to tune autovacuum parameters to turn it back. But yesterday I ran "vacuum analyze" for all relations manually but that did not help.
How do your autovacuum parameters look like now?
They were all default except for vacuum_defer_cleanup_age = 100000. I have increased autovacuum_max_workers = 20 because I have 10 databases with about 10 tables each. That did not make better (I haven't seen more than two auto vacuum workers simultaneously). Then I have tried to set vacuum_cost_limit = 1000. Still not vacuuming big tables. Right now the parameters look like this:
root@rpopdb01e ~ # fgrep vacuum /var/lib/pgsql/9.3/data/conf.d/postgresql.conf
#vacuum_cost_delay = 0 # 0-100 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
vacuum_cost_limit = 1000 # 1-10000 credits
vacuum_defer_cleanup_age = 100000 # number of xacts by which cleanup is delayed
autovacuum = on # Enable autovacuum subprocess? 'on'
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
autovacuum_max_workers = 20 # max number of autovacuum subprocesses
#autovacuum_naptime = 1min # time between autovacuum runs
#autovacuum_vacuum_threshold = 50 # min number of row updates before
# vacuum
#autovacuum_analyze_threshold = 50 # min number of row updates before
#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum
#autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for
# autovacuum, in milliseconds;
# -1 means use vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for
# autovacuum, -1 means use
# vacuum_cost_limit
#vacuum_freeze_min_age = 50000000
#vacuum_freeze_table_age = 150000000
root@rpopdb01e ~ #
13.02.2014, в 0:14, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир <root@simply.name> wrote:
Yes, this is legacy, I will fix it. We had lots of inactive connections but right now we use pgbouncer for this. When the workload is normal we have some kind of 80-120 backends. Less than 10 of them are in active state. Having problem with locks we get lots of sessions (sometimes more than 1000 of them are in active state). According to vmstat the number of context switches is not so big (less than 20k), so I don't think it is the main reason. Yes, it can aggravate the problem, but imho not create it.
I'am afraid that is the problem. More than 1000 backends, most of them
are simply waiting.
I don't understand the correlation of shared buffers size and synchronous_commit. Could you please explain your statement?
You need to fsync your huge shared buffers any time your database
performs checkpoint. By default it usually happens too often because
checkpoint_timeout is 5min by default. Without bbu, on software raid
that leads to io spike and you commit waits for wal.
12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running? 4000 definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had barrier enabled and software raid.
Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu, otherwise performance will be poor.
On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote:
Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on one cluster shard to try.
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)
root@rpopdb01e ~ #
12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneck on io before disks are actually involved. What says mount without arguments?
On Feb 12, 2014, at 18:43, Бородин Владимир <root@simply.name> wrote:
root@rpopdb01e ~ # fgrep data /etc/fstab
UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1
root@rpopdb01e ~ #
According to iostat the disks are not the bottleneck.
12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
Hi Vladimir,
Just in case: how is your ext4 mount?
Best regards,
Ilya
On Feb 12, 2014, at 17:59, Бородин Владимир <root@simply.name> wrote:
Hi all.
Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).
2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""
I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.
I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:
root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
root@rpopdb01e ~ # free -m
total used free shared buffers cached
Mem: 129028 123558 5469 0 135 119504
-/+ buffers/cache: 3918 125110
Swap: 16378 0 16378
root@rpopdb01e ~ #
PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.
The settings changed in postgresql.conf are here [5].
When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.
(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053
(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490
(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963
I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].
I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:
rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;
count
--------
115953
(1 row)
rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;
count
-------
0
(1 row)
rpopdb_p0=# \dS+ rpop.rpop_uidl
Table "rpop.rpop_uidl"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
popid | bigint | not null | plain | |
uidl | character varying(200) | not null | extended | |
Indexes:
"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)
Has OIDs: no
rpopdb_p0=#
My questions are:
1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?
2. How much space do we extend to the relation when we get exclusive lock on it?
3. Why extended page is not visible for other backends?
4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.
5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
[1] http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc159@mail.gmail.com
[2] http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks
[3] http://www.postgresql.org/message-id/50A2C93E.9070607@dalibo.com
[4] http://www.postgresql.org/message-id/CAL_0b1sypYeOyNkYNV95nNV2d+4jXTug3HkKF6FahfW7Gvgb_Q@mail.gmail.com
[5] http://pastebin.com/raw.php?i=Bd40Vn6h
[6] http://wiki.postgresql.org/wiki/Lock_dependency_information
[7 http://pastebin.com/raw.php?i=eGrtG524]
--
Vladimir
--
Vladimir
--
Да пребудет с вами сила...
http://simply.name
--
Vladimir
--
Vladimir
Vladimir,
pgbouncer works with pl/proxy in transaction pooling mode. A wide spread phrase that statement mode is for plproxy does not mean any limitations for transaction pooling mode until you have atocommit on client. Anyway, try to reduce connections.autovacuum_analyze_scale_
factor=0.05 #or like that
autovacuum_analyze_threshold=5
autovacuum_freeze_max_age=200000000
autovacuum_max_workers=20 # that is fine for slow disks
autovacuum_naptime=1
autovacuum_vacuum_cost_delay=5 # or at least 10
autovacuum_vacuum_cost_limit =-1
autovacuum_vacuum_scale_factor=0.01 # this setting is to be really aggressive, otherwise you simply postpone huge vacuums and related disk io, smaller portions are better
autovacuum_vacuum_threshold=20
probably you will also need some ionice for autovacuum workers
autovacuum_analyze_threshold=5
autovacuum_freeze_max_age=200000000
autovacuum_max_workers=20 # that is fine for slow disks
autovacuum_naptime=1
autovacuum_vacuum_cost_delay=5 # or at least 10
autovacuum_vacuum_cost_limit =-1
autovacuum_vacuum_scale_factor=0.01 # this setting is to be really aggressive, otherwise you simply postpone huge vacuums and related disk io, smaller portions are better
autovacuum_vacuum_threshold=20
probably you will also need some ionice for autovacuum workers
On Thu, Feb 13, 2014 at 1:13 PM, Ilya Kosmodemiansky <ilya.kosmodemiansky@postgresql-consulting.com> wrote:
Vladimir,pgbouncer works with pl/proxy in transaction pooling mode. A wide spread phrase that statement mode is for plproxy does not mean any limitations for transaction pooling mode until you have atocommit on client. Anyway, try to reduce connections.try to set your autovacuum a bit more aggressive:
autovacuum_analyze_scale_factor=0.05 #or like that
autovacuum_analyze_threshold=5
autovacuum_freeze_max_age=200000000
autovacuum_max_workers=20 # that is fine for slow disks
autovacuum_naptime=1
autovacuum_vacuum_cost_delay=5 # or at least 10
autovacuum_vacuum_cost_limit =-1
autovacuum_vacuum_scale_factor=0.01 # this setting is to be really aggressive, otherwise you simply postpone huge vacuums and related disk io, smaller portions are better
autovacuum_vacuum_threshold=20
probably you will also need some ionice for autovacuum workers--On Thu, Feb 13, 2014 at 11:26 AM, Бородин Владимир <root@simply.name> wrote:13.02.2014, в 13:29, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):Vladimir,
And, any effect on your problem?It worked without problems longer than previous configuration but repeated again several minutes ago :(
On Thu, Feb 13, 2014 at 9:35 AM, Бородин Владимир <root@simply.name> wrote:I have limited max connections to 1000, reduced shared buffers to 8G and restarted postgres.
1000 is still to much in most cases. With pgbouncer in transaction
pooling mode normaly pool size 8-32, max_connections = 100 (default
value) and client_connections 500-1500 looks more reasonable.Clients for this db are plproxy hosts. As far as I know plproxy can work only with statement pooling.I have also noticed that this big tables stopped vacuuming automatically a couple of weeks ago. It could be the reason of the problem, I will now try to tune autovacuum parameters to turn it back. But yesterday I ran "vacuum analyze" for all relations manually but that did not help.
How do your autovacuum parameters look like now?They were all default except for vacuum_defer_cleanup_age = 100000. I have increased autovacuum_max_workers = 20 because I have 10 databases with about 10 tables each. That did not make better (I haven't seen more than two auto vacuum workers simultaneously). Then I have tried to set vacuum_cost_limit = 1000. Still not vacuuming big tables. Right now the parameters look like this:root@rpopdb01e ~ # fgrep vacuum /var/lib/pgsql/9.3/data/conf.d/postgresql.conf#vacuum_cost_delay = 0 # 0-100 milliseconds#vacuum_cost_page_hit = 1 # 0-10000 credits#vacuum_cost_page_miss = 10 # 0-10000 credits#vacuum_cost_page_dirty = 20 # 0-10000 creditsvacuum_cost_limit = 1000 # 1-10000 creditsvacuum_defer_cleanup_age = 100000 # number of xacts by which cleanup is delayedautovacuum = on # Enable autovacuum subprocess? 'on'log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions andautovacuum_max_workers = 20 # max number of autovacuum subprocesses#autovacuum_naptime = 1min # time between autovacuum runs#autovacuum_vacuum_threshold = 50 # min number of row updates before# vacuum#autovacuum_analyze_threshold = 50 # min number of row updates before#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum#autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for# autovacuum, in milliseconds;# -1 means use vacuum_cost_delay#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for# autovacuum, -1 means use# vacuum_cost_limit#vacuum_freeze_min_age = 50000000#vacuum_freeze_table_age = 150000000root@rpopdb01e ~ #13.02.2014, в 0:14, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир <root@simply.name> wrote:
Yes, this is legacy, I will fix it. We had lots of inactive connections but right now we use pgbouncer for this. When the workload is normal we have some kind of 80-120 backends. Less than 10 of them are in active state. Having problem with locks we get lots of sessions (sometimes more than 1000 of them are in active state). According to vmstat the number of context switches is not so big (less than 20k), so I don't think it is the main reason. Yes, it can aggravate the problem, but imho not create it.
I'am afraid that is the problem. More than 1000 backends, most of them
are simply waiting.
I don't understand the correlation of shared buffers size and synchronous_commit. Could you please explain your statement?
You need to fsync your huge shared buffers any time your database
performs checkpoint. By default it usually happens too often because
checkpoint_timeout is 5min by default. Without bbu, on software raid
that leads to io spike and you commit waits for wal.
12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running? 4000 definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had barrier enabled and software raid.
Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu, otherwise performance will be poor.
On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote:
Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on one cluster shard to try.
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)
root@rpopdb01e ~ #
12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneck on io before disks are actually involved. What says mount without arguments?
On Feb 12, 2014, at 18:43, Бородин Владимир <root@simply.name> wrote:
root@rpopdb01e ~ # fgrep data /etc/fstab
UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1
root@rpopdb01e ~ #
According to iostat the disks are not the bottleneck.
12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
Hi Vladimir,
Just in case: how is your ext4 mount?
Best regards,
Ilya
On Feb 12, 2014, at 17:59, Бородин Владимир <root@simply.name> wrote:
Hi all.
Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).
2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""
I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.
I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:
root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
root@rpopdb01e ~ # free -m
total used free shared buffers cached
Mem: 129028 123558 5469 0 135 119504
-/+ buffers/cache: 3918 125110
Swap: 16378 0 16378
root@rpopdb01e ~ #
PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.
The settings changed in postgresql.conf are here [5].
When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.
(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053
(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490
(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963
I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].
I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:
rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;
count
--------
115953
(1 row)
rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;
count
-------
0
(1 row)
rpopdb_p0=# \dS+ rpop.rpop_uidl
Table "rpop.rpop_uidl"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
popid | bigint | not null | plain | |
uidl | character varying(200) | not null | extended | |
Indexes:
"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)
Has OIDs: no
rpopdb_p0=#
My questions are:
1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?
2. How much space do we extend to the relation when we get exclusive lock on it?
3. Why extended page is not visible for other backends?
4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.
5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
[1] http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc159@mail.gmail.com
[2] http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks
[3] http://www.postgresql.org/message-id/50A2C93E.9070607@dalibo.com
[4] http://www.postgresql.org/message-id/CAL_0b1sypYeOyNkYNV95nNV2d+4jXTug3HkKF6FahfW7Gvgb_Q@mail.gmail.com
[5] http://pastebin.com/raw.php?i=Bd40Vn6h
[6] http://wiki.postgresql.org/wiki/Lock_dependency_information
[7 http://pastebin.com/raw.php?i=eGrtG524]
--
Vladimir
--
Vladimir
--
Да пребудет с вами сила...
http://simply.name
--
Vladimir
--
Vladimir
Ilya Kosmodemiansky
Database consultant,
PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com
--
Ilya Kosmodemiansky
Database consultant,
PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com
Vladimir,
pgbouncer works with pl/proxy in transaction pooling mode. A wide spread phrase that statement mode is for plproxy does not mean any limitations for transaction pooling mode until you have atocommit on client. Anyway, try to reduce connections.try to set your autovacuum a bit more aggressive:
autovacuum_analyze_scale_factor=0.05 #or like that
autovacuum_analyze_threshold=5
autovacuum_freeze_max_age=200000000
autovacuum_max_workers=20 # that is fine for slow disks
autovacuum_naptime=1
autovacuum_vacuum_cost_delay=5 # or at least 10
autovacuum_vacuum_cost_limit =-1
autovacuum_vacuum_scale_factor=0.01 # this setting is to be really aggressive, otherwise you simply postpone huge vacuums and related disk io, smaller portions are better
autovacuum_vacuum_threshold=20
probably you will also need some ionice for autovacuum workers
autovacuum_analyze_scale_factor=0.05 #or like that
autovacuum_analyze_threshold=5
autovacuum_freeze_max_age=200000000
autovacuum_max_workers=20 # that is fine for slow disks
autovacuum_naptime=1
autovacuum_vacuum_cost_delay=5 # or at least 10
autovacuum_vacuum_cost_limit =-1
autovacuum_vacuum_scale_factor=0.01 # this setting is to be really aggressive, otherwise you simply postpone huge vacuums and related disk io, smaller portions are better
autovacuum_vacuum_threshold=20
probably you will also need some ionice for autovacuum workers
On Thu, Feb 13, 2014 at 11:26 AM, Бородин Владимир <root@simply.name> wrote:
13.02.2014, в 13:29, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):Vladimir,
And, any effect on your problem?It worked without problems longer than previous configuration but repeated again several minutes ago :(
On Thu, Feb 13, 2014 at 9:35 AM, Бородин Владимир <root@simply.name> wrote:I have limited max connections to 1000, reduced shared buffers to 8G and restarted postgres.
1000 is still to much in most cases. With pgbouncer in transaction
pooling mode normaly pool size 8-32, max_connections = 100 (default
value) and client_connections 500-1500 looks more reasonable.Clients for this db are plproxy hosts. As far as I know plproxy can work only with statement pooling.I have also noticed that this big tables stopped vacuuming automatically a couple of weeks ago. It could be the reason of the problem, I will now try to tune autovacuum parameters to turn it back. But yesterday I ran "vacuum analyze" for all relations manually but that did not help.
How do your autovacuum parameters look like now?They were all default except for vacuum_defer_cleanup_age = 100000. I have increased autovacuum_max_workers = 20 because I have 10 databases with about 10 tables each. That did not make better (I haven't seen more than two auto vacuum workers simultaneously). Then I have tried to set vacuum_cost_limit = 1000. Still not vacuuming big tables. Right now the parameters look like this:root@rpopdb01e ~ # fgrep vacuum /var/lib/pgsql/9.3/data/conf.d/postgresql.conf#vacuum_cost_delay = 0 # 0-100 milliseconds#vacuum_cost_page_hit = 1 # 0-10000 credits#vacuum_cost_page_miss = 10 # 0-10000 credits#vacuum_cost_page_dirty = 20 # 0-10000 creditsvacuum_cost_limit = 1000 # 1-10000 creditsvacuum_defer_cleanup_age = 100000 # number of xacts by which cleanup is delayedautovacuum = on # Enable autovacuum subprocess? 'on'log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions andautovacuum_max_workers = 20 # max number of autovacuum subprocesses#autovacuum_naptime = 1min # time between autovacuum runs#autovacuum_vacuum_threshold = 50 # min number of row updates before# vacuum#autovacuum_analyze_threshold = 50 # min number of row updates before#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum#autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for# autovacuum, in milliseconds;# -1 means use vacuum_cost_delay#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for# autovacuum, -1 means use# vacuum_cost_limit#vacuum_freeze_min_age = 50000000#vacuum_freeze_table_age = 150000000root@rpopdb01e ~ #13.02.2014, в 0:14, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир <root@simply.name> wrote:
Yes, this is legacy, I will fix it. We had lots of inactive connections but right now we use pgbouncer for this. When the workload is normal we have some kind of 80-120 backends. Less than 10 of them are in active state. Having problem with locks we get lots of sessions (sometimes more than 1000 of them are in active state). According to vmstat the number of context switches is not so big (less than 20k), so I don't think it is the main reason. Yes, it can aggravate the problem, but imho not create it.
I'am afraid that is the problem. More than 1000 backends, most of them
are simply waiting.
I don't understand the correlation of shared buffers size and synchronous_commit. Could you please explain your statement?
You need to fsync your huge shared buffers any time your database
performs checkpoint. By default it usually happens too often because
checkpoint_timeout is 5min by default. Without bbu, on software raid
that leads to io spike and you commit waits for wal.
12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running? 4000 definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had barrier enabled and software raid.
Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu, otherwise performance will be poor.
On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote:
Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on one cluster shard to try.
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)
root@rpopdb01e ~ #
12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneck on io before disks are actually involved. What says mount without arguments?
On Feb 12, 2014, at 18:43, Бородин Владимир <root@simply.name> wrote:
root@rpopdb01e ~ # fgrep data /etc/fstab
UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1
root@rpopdb01e ~ #
According to iostat the disks are not the bottleneck.
12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
Hi Vladimir,
Just in case: how is your ext4 mount?
Best regards,
Ilya
On Feb 12, 2014, at 17:59, Бородин Владимир <root@simply.name> wrote:
Hi all.
Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).
2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""
I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.
I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:
root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
root@rpopdb01e ~ # free -m
total used free shared buffers cached
Mem: 129028 123558 5469 0 135 119504
-/+ buffers/cache: 3918 125110
Swap: 16378 0 16378
root@rpopdb01e ~ #
PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.
The settings changed in postgresql.conf are here [5].
When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.
(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053
(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490
(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963
I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].
I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:
rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;
count
--------
115953
(1 row)
rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;
count
-------
0
(1 row)
rpopdb_p0=# \dS+ rpop.rpop_uidl
Table "rpop.rpop_uidl"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
popid | bigint | not null | plain | |
uidl | character varying(200) | not null | extended | |
Indexes:
"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)
Has OIDs: no
rpopdb_p0=#
My questions are:
1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?
2. How much space do we extend to the relation when we get exclusive lock on it?
3. Why extended page is not visible for other backends?
4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.
5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
[1] http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc159@mail.gmail.com
[2] http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks
[3] http://www.postgresql.org/message-id/50A2C93E.9070607@dalibo.com
[4] http://www.postgresql.org/message-id/CAL_0b1sypYeOyNkYNV95nNV2d+4jXTug3HkKF6FahfW7Gvgb_Q@mail.gmail.com
[5] http://pastebin.com/raw.php?i=Bd40Vn6h
[6] http://wiki.postgresql.org/wiki/Lock_dependency_information
[7 http://pastebin.com/raw.php?i=eGrtG524]
--
Vladimir
--
Vladimir
--
Да пребудет с вами сила...
http://simply.name
--
Vladimir
--
Vladimir
--
Ilya Kosmodemiansky
Database consultant,
PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com
With these settings auto vacuuming of all tables became more often (as was expected), but not of big tables with lots of inserts (and with absolutely no updates). I've done a cron-script that does "vacuum analyze" for all databases which runs every hour. And it seems that right now I don't have any performance problems.
Ilya and Emre, thank you for help.
13.02.2014, в 16:20, Ilya Kosmodemiansky <ilya.kosmodemiansky@postgresql-consulting.com> написал(а):
try to set your autovacuum a bit more aggressive:Vladimir,pgbouncer works with pl/proxy in transaction pooling mode. A wide spread phrase that statement mode is for plproxy does not mean any limitations for transaction pooling mode until you have atocommit on client. Anyway, try to reduce connections.
autovacuum_analyze_scale_factor=0.05 #or like that
autovacuum_analyze_threshold=5
autovacuum_freeze_max_age=200000000
autovacuum_max_workers=20 # that is fine for slow disks
autovacuum_naptime=1
autovacuum_vacuum_cost_delay=5 # or at least 10
autovacuum_vacuum_cost_limit =-1
autovacuum_vacuum_scale_factor=0.01 # this setting is to be really aggressive, otherwise you simply postpone huge vacuums and related disk io, smaller portions are better
autovacuum_vacuum_threshold=20
probably you will also need some ionice for autovacuum workersOn Thu, Feb 13, 2014 at 1:13 PM, Ilya Kosmodemiansky <ilya.kosmodemiansky@postgresql-consulting.com> wrote:Vladimir,pgbouncer works with pl/proxy in transaction pooling mode. A wide spread phrase that statement mode is for plproxy does not mean any limitations for transaction pooling mode until you have atocommit on client. Anyway, try to reduce connections.try to set your autovacuum a bit more aggressive:
autovacuum_analyze_scale_factor=0.05 #or like that
autovacuum_analyze_threshold=5
autovacuum_freeze_max_age=200000000
autovacuum_max_workers=20 # that is fine for slow disks
autovacuum_naptime=1
autovacuum_vacuum_cost_delay=5 # or at least 10
autovacuum_vacuum_cost_limit =-1
autovacuum_vacuum_scale_factor=0.01 # this setting is to be really aggressive, otherwise you simply postpone huge vacuums and related disk io, smaller portions are better
autovacuum_vacuum_threshold=20
probably you will also need some ionice for autovacuum workers--On Thu, Feb 13, 2014 at 11:26 AM, Бородин Владимир <root@simply.name> wrote:13.02.2014, в 13:29, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):Vladimir,
And, any effect on your problem?It worked without problems longer than previous configuration but repeated again several minutes ago :(
On Thu, Feb 13, 2014 at 9:35 AM, Бородин Владимир <root@simply.name> wrote:I have limited max connections to 1000, reduced shared buffers to 8G and restarted postgres.
1000 is still to much in most cases. With pgbouncer in transaction
pooling mode normaly pool size 8-32, max_connections = 100 (default
value) and client_connections 500-1500 looks more reasonable.Clients for this db are plproxy hosts. As far as I know plproxy can work only with statement pooling.I have also noticed that this big tables stopped vacuuming automatically a couple of weeks ago. It could be the reason of the problem, I will now try to tune autovacuum parameters to turn it back. But yesterday I ran "vacuum analyze" for all relations manually but that did not help.
How do your autovacuum parameters look like now?They were all default except for vacuum_defer_cleanup_age = 100000. I have increased autovacuum_max_workers = 20 because I have 10 databases with about 10 tables each. That did not make better (I haven't seen more than two auto vacuum workers simultaneously). Then I have tried to set vacuum_cost_limit = 1000. Still not vacuuming big tables. Right now the parameters look like this:root@rpopdb01e ~ # fgrep vacuum /var/lib/pgsql/9.3/data/conf.d/postgresql.conf#vacuum_cost_delay = 0 # 0-100 milliseconds#vacuum_cost_page_hit = 1 # 0-10000 credits#vacuum_cost_page_miss = 10 # 0-10000 credits#vacuum_cost_page_dirty = 20 # 0-10000 creditsvacuum_cost_limit = 1000 # 1-10000 creditsvacuum_defer_cleanup_age = 100000 # number of xacts by which cleanup is delayedautovacuum = on # Enable autovacuum subprocess? 'on'log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions andautovacuum_max_workers = 20 # max number of autovacuum subprocesses#autovacuum_naptime = 1min # time between autovacuum runs#autovacuum_vacuum_threshold = 50 # min number of row updates before# vacuum#autovacuum_analyze_threshold = 50 # min number of row updates before#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum#autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for# autovacuum, in milliseconds;# -1 means use vacuum_cost_delay#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for# autovacuum, -1 means use# vacuum_cost_limit#vacuum_freeze_min_age = 50000000#vacuum_freeze_table_age = 150000000root@rpopdb01e ~ #13.02.2014, в 0:14, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир <root@simply.name> wrote:
Yes, this is legacy, I will fix it. We had lots of inactive connections but right now we use pgbouncer for this. When the workload is normal we have some kind of 80-120 backends. Less than 10 of them are in active state. Having problem with locks we get lots of sessions (sometimes more than 1000 of them are in active state). According to vmstat the number of context switches is not so big (less than 20k), so I don't think it is the main reason. Yes, it can aggravate the problem, but imho not create it.
I'am afraid that is the problem. More than 1000 backends, most of them
are simply waiting.
I don't understand the correlation of shared buffers size and synchronous_commit. Could you please explain your statement?
You need to fsync your huge shared buffers any time your database
performs checkpoint. By default it usually happens too often because
checkpoint_timeout is 5min by default. Without bbu, on software raid
that leads to io spike and you commit waits for wal.
12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
another thing which is arguable - concurrency degree. How many of your max_connections = 4000 are actually running? 4000 definitely looks like an overkill and they could be a serious source of concurrency, especially then you have had barrier enabled and software raid.
Plus for 32Gb of shared buffers with synchronous_commit = on especially on heavy workload one should definitely have bbu, otherwise performance will be poor.
On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <root@simply.name> wrote:
Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on one cluster shard to try.
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)
root@rpopdb01e ~ #
12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
My question was actually about barrier option, by default it is enabled on RHEL6/ext4 and could cause serious bottleneck on io before disks are actually involved. What says mount without arguments?
On Feb 12, 2014, at 18:43, Бородин Владимир <root@simply.name> wrote:
root@rpopdb01e ~ # fgrep data /etc/fstab
UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 noatime,nodiratime 0 1
root@rpopdb01e ~ #
According to iostat the disks are not the bottleneck.
12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobiont@gmail.com> написал(а):
Hi Vladimir,
Just in case: how is your ext4 mount?
Best regards,
Ilya
On Feb 12, 2014, at 17:59, Бородин Владимир <root@simply.name> wrote:
Hi all.
Today I have started getting errors like below in logs (seems that I have not changed anything for last week). When it happens the db gets lots of connections in state active, eats 100% cpu and clients get errors (due to timeout).
2014-02-12 15:44:24.562 MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 30061 still waiting for ExclusiveLock on extension of relation 26118 of database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""
I have read several topics [1, 2, 3, 4] with similar problems but haven't find a good solution. Below is some more diagnostics.
I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host is running with the following CPU (32 cores) and memory:
root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
root@rpopdb01e ~ # free -m
total used free shared buffers cached
Mem: 129028 123558 5469 0 135 119504
-/+ buffers/cache: 3918 125110
Swap: 16378 0 16378
root@rpopdb01e ~ #
PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the disks are really free. Right now PGDATA takes only 95G.
The settings changed in postgresql.conf are here [5].
When it happens the last query from here [6] shows that almost all queries are waiting for ExclusiveLock, but they do a simple insert.
(extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053
(extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490
(extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963
I have several databases running on one host with one postmaster process and ExclusiveLock is being waited by many oids. I suppose the only common thing for all of them is that they are bigger than others and they almost do not get updates and deletes (only inserts and reads). Some more info about one of such tables is here [7].
I have tried to look at the source code (src/backend/access/heap/hio.c) to understand when the exclusive lock can be taken, but I could only read comments :) I have also examined FSM for this tables and their indexes and found that for most of them there are free pages but there are, for example, such cases:
rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail != 0;
count
--------
115953
(1 row)
rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where avail != 0;
count
-------
0
(1 row)
rpopdb_p0=# \dS+ rpop.rpop_uidl
Table "rpop.rpop_uidl"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
popid | bigint | not null | plain | |
uidl | character varying(200) | not null | extended | |
Indexes:
"pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)
Has OIDs: no
rpopdb_p0=#
My questions are:
1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it happen during exclusive lock acquiring? How can I dig it?
2. How much space do we extend to the relation when we get exclusive lock on it?
3. Why extended page is not visible for other backends?
4. Is there any possibility of situation where backend A got exclusive lock on some relation to extend it. Then OS CPU scheduler made a context switch to backend B while backend B is waiting for exclusive lock on the same relation. And so on for many backends.
5. (and the main question) what can I do to get rid of such situations? It is a production cluster and I do not have any ideas what to do with this situation :( Any help would be really appropriate.
[1] http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc159@mail.gmail.com
[2] http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks
[3] http://www.postgresql.org/message-id/50A2C93E.9070607@dalibo.com
[4] http://www.postgresql.org/message-id/CAL_0b1sypYeOyNkYNV95nNV2d+4jXTug3HkKF6FahfW7Gvgb_Q@mail.gmail.com
[5] http://pastebin.com/raw.php?i=Bd40Vn6h
[6] http://wiki.postgresql.org/wiki/Lock_dependency_information
[7 http://pastebin.com/raw.php?i=eGrtG524]
--
Vladimir
--
Vladimir
--
Да пребудет с вами сила...
http://simply.name
--
Vladimir
--
Vladimir
Ilya Kosmodemiansky
Database consultant,
PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com
--
Ilya Kosmodemiansky
Database consultant,
PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com