Thread: Simple select hangs while CPU close to 100%
I’m having this very disturbing problem. I got a table with about 100,000 rows in it. Our software deletes the majority of these rows and then bulk loads another 100,000 rows into the same table. All this is happening within a single transaction. I then perform a simple “select count(*) from …” statement that never returns. In the mean time, the backend Postgres process is taking close to 100% of the CPU. The hang-up does not always happen on the same statement but eventually it happens 2 out of 3 times. If I dump and then restore the schema where this table resides the problem is gone until the next time we run through the whole process of deleting, loading and querying the table.
There is no other activity in the database. All requested locks are granted.
Has anyone seen similar behavior?
Some details:
Postgres v 8.1.2
Linux Fedora 3
shared_buffers = 65536
temp_buffers = 32768
work_mem = 131072
maintenance_work_mem = 131072
max_stack_depth = 8192
max_fsm_pages = 40000
wal_buffers = 16
checkpoint_segments = 16
top reports
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 postmaster
ps –ef | grep postgres reports
postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase [local] SELECT
strace –p 19478
no system calls reported
Thanks for the help!
Jozsef
Hello did you vacuum? It's good technique do vacuum table after remove bigger number of rows. Regards Pavel Stehule 2007/7/22, Jozsef Szalay <jszalay@storediq.com>: > > > > > I'm having this very disturbing problem. I got a table with about 100,000 > rows in it. Our software deletes the majority of these rows and then bulk > loads another 100,000 rows into the same table. All this is happening within > a single transaction. I then perform a simple "select count(*) from …" > statement that never returns. In the mean time, the backend Postgres process > is taking close to 100% of the CPU. The hang-up does not always happen on > the same statement but eventually it happens 2 out of 3 times. If I dump and > then restore the schema where this table resides the problem is gone until > the next time we run through the whole process of deleting, loading and > querying the table. > > > > There is no other activity in the database. All requested locks are granted. > > > > Has anyone seen similar behavior? > > > > Some details: > > > > Postgres v 8.1.2 > > Linux Fedora 3 > > > > shared_buffers = 65536 > > temp_buffers = 32768 > > work_mem = 131072 > > maintenance_work_mem = 131072 > > max_stack_depth = 8192 > > max_fsm_pages = 40000 > > wal_buffers = 16 > > checkpoint_segments = 16 > > > > > > top reports > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 postmaster > > > > ps –ef | grep postgres reports > > > > postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase [local] > SELECT > > > > strace –p 19478 > > no system calls reported > > > > > > Thanks for the help! > > Jozsef
> I’m having this very disturbing problem. I got a table with about > 100,000 rows in it. Our software deletes the majority of these rows > and then bulk loads another 100,000 rows into the same table. All this > is happening within a single transaction. I then perform a simple > “select count(*) from …” statement that never returns. In the mean COUNT(*) is always slow; but either way if the process is deleting and then adding records, can't you just keep track of how may records you loaded [aka count++] rather than turning around and asking the database before any statistics have had a chance to be updated. > time, the backend Postgres process is taking close to 100% of the > CPU. The hang-up does not always happen on the same statement but > eventually it happens 2 out of 3 times. If I dump and then restore the > schema where this table resides the problem is gone until the next > time we run through the whole process of deleting, loading and > querying the table. > There is no other activity in the database. All requested locks are > granted.
Hi Pavel, Yes I did vacuum. In fact the only way to "fix" this problem is executing a "full" vacuum. The plain vacuum did not help. Regards, Jozsef -----Original Message----- From: Pavel Stehule [mailto:pavel.stehule@gmail.com] Sent: Sunday, July 22, 2007 10:53 AM To: Jozsef Szalay Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% Hello did you vacuum? It's good technique do vacuum table after remove bigger number of rows. Regards Pavel Stehule 2007/7/22, Jozsef Szalay <jszalay@storediq.com>: > > > > > I'm having this very disturbing problem. I got a table with about 100,000 > rows in it. Our software deletes the majority of these rows and then bulk > loads another 100,000 rows into the same table. All this is happening within > a single transaction. I then perform a simple "select count(*) from ..." > statement that never returns. In the mean time, the backend Postgres process > is taking close to 100% of the CPU. The hang-up does not always happen on > the same statement but eventually it happens 2 out of 3 times. If I dump and > then restore the schema where this table resides the problem is gone until > the next time we run through the whole process of deleting, loading and > querying the table. > > > > There is no other activity in the database. All requested locks are granted. > > > > Has anyone seen similar behavior? > > > > Some details: > > > > Postgres v 8.1.2 > > Linux Fedora 3 > > > > shared_buffers = 65536 > > temp_buffers = 32768 > > work_mem = 131072 > > maintenance_work_mem = 131072 > > max_stack_depth = 8192 > > max_fsm_pages = 40000 > > wal_buffers = 16 > > checkpoint_segments = 16 > > > > > > top reports > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 postmaster > > > > ps -ef | grep postgres reports > > > > postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase [local] > SELECT > > > > strace -p 19478 > > no system calls reported > > > > > > Thanks for the help! > > Jozsef
2007/7/25, Jozsef Szalay <jszalay@storediq.com>: > Hi Pavel, > > > Yes I did vacuum. In fact the only way to "fix" this problem is > executing a "full" vacuum. The plain vacuum did not help. > > > Regards, > Jozsef It's question if vacuum was done. Try vacuum verbose; Maybe your max_fsm_pages is too low and you have to up it. Regards Pavel Stehule
The actual application does not have to perform this statement since, as you suggested; it keeps track of what got loaded. However, the table has to go thru a de-duplication process because bulk load is utilized to load the potentially large number (millions) of rows. All indexes were dropped for the bulk load. This de-duplication procedure starts with a SELECT statement that identifies duplicate rows. This is the original SELECT that never returned. Later on I used the SELECT COUNT(*) to see if somehow my original SELECT had something to do with the hang and I found that this simple query hung as well. The only way I could avoid getting into this stage was to perform a VACUUM FULL on the table before the bulk load. I would prefer not using a full vacuum every time due to the exclusive access to the table and time it requires. The plain VACUUM did not work. Regards, Jozsef -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Adam Tauno Williams Sent: Sunday, July 22, 2007 1:33 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > I'm having this very disturbing problem. I got a table with about > 100,000 rows in it. Our software deletes the majority of these rows > and then bulk loads another 100,000 rows into the same table. All this > is happening within a single transaction. I then perform a simple > "select count(*) from ..." statement that never returns. In the mean COUNT(*) is always slow; but either way if the process is deleting and then adding records, can't you just keep track of how may records you loaded [aka count++] rather than turning around and asking the database before any statistics have had a chance to be updated. > time, the backend Postgres process is taking close to 100% of the > CPU. The hang-up does not always happen on the same statement but > eventually it happens 2 out of 3 times. If I dump and then restore the > schema where this table resides the problem is gone until the next > time we run through the whole process of deleting, loading and > querying the table. > There is no other activity in the database. All requested locks are > granted. ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
In response to "Jozsef Szalay" <jszalay@storediq.com>: > Hi Pavel, > > Yes I did vacuum. In fact the only way to "fix" this problem is > executing a "full" vacuum. The plain vacuum did not help. Based on the information, I would expect that this problem is the result of improper PG tuning, or inadequate server sizing (RAM, etc). A table changing 100,000 rows shouldn't cause enough bloat to hurt count(*)'s performance significantly, unless something else is wrong. Some quick piddling around shows that tables with over a million rows can count(*) the whole table in about 1/2 second on a system 2G of RAM. A table with 13 mil takes a min and a half. We haven't specifically tuned this server for count(*) performance, as it's not a priority for this database, so I expect the performance drop for the 13 mil database is a result of exhausting shared_buffers and hitting the disks. > -----Original Message----- > From: Pavel Stehule [mailto:pavel.stehule@gmail.com] > Sent: Sunday, July 22, 2007 10:53 AM > To: Jozsef Szalay > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > Hello > > did you vacuum? > > It's good technique do vacuum table after remove bigger number of rows. > > Regards > Pavel Stehule > > 2007/7/22, Jozsef Szalay <jszalay@storediq.com>: > > > > > > > > > > I'm having this very disturbing problem. I got a table with about > 100,000 > > rows in it. Our software deletes the majority of these rows and then > bulk > > loads another 100,000 rows into the same table. All this is happening > within > > a single transaction. I then perform a simple "select count(*) from > ..." > > statement that never returns. In the mean time, the backend Postgres > process > > is taking close to 100% of the CPU. The hang-up does not always happen > on > > the same statement but eventually it happens 2 out of 3 times. If I > dump and > > then restore the schema where this table resides the problem is gone > until > > the next time we run through the whole process of deleting, loading > and > > querying the table. > > > > > > > > There is no other activity in the database. All requested locks are > granted. > > > > > > > > Has anyone seen similar behavior? > > > > > > > > Some details: > > > > > > > > Postgres v 8.1.2 > > > > Linux Fedora 3 > > > > > > > > shared_buffers = 65536 > > > > temp_buffers = 32768 > > > > work_mem = 131072 > > > > maintenance_work_mem = 131072 > > > > max_stack_depth = 8192 > > > > max_fsm_pages = 40000 > > > > wal_buffers = 16 > > > > checkpoint_segments = 16 > > > > > > > > > > > > top reports > > > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 > postmaster > > > > > > > > ps -ef | grep postgres reports > > > > > > > > postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase > [local] > > SELECT > > > > > > > > strace -p 19478 > > > > no system calls reported > > > > > > > > > > > > Thanks for the help! > > > > Jozsef > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > > > > > > -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
In response to "Jozsef Szalay" <jszalay@storediq.com>: > Hi Pavel, > > > Yes I did vacuum. In fact the only way to "fix" this problem is > executing a "full" vacuum. The plain vacuum did not help. I read over my previous reply and picked up on something else ... What is your vacuum _policy_? i.e. how often do you vacuum/analyze? The fact that you had to do a vacuum full to get things back under control tends to suggest that your current vacuum schedule is not aggressive enough. An explicit vacuum of this table after the large delete/insert may be helpful. > -----Original Message----- > From: Pavel Stehule [mailto:pavel.stehule@gmail.com] > Sent: Sunday, July 22, 2007 10:53 AM > To: Jozsef Szalay > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > Hello > > did you vacuum? > > It's good technique do vacuum table after remove bigger number of rows. > > Regards > Pavel Stehule > > 2007/7/22, Jozsef Szalay <jszalay@storediq.com>: > > > > > > > > > > I'm having this very disturbing problem. I got a table with about > 100,000 > > rows in it. Our software deletes the majority of these rows and then > bulk > > loads another 100,000 rows into the same table. All this is happening > within > > a single transaction. I then perform a simple "select count(*) from > ..." > > statement that never returns. In the mean time, the backend Postgres > process > > is taking close to 100% of the CPU. The hang-up does not always happen > on > > the same statement but eventually it happens 2 out of 3 times. If I > dump and > > then restore the schema where this table resides the problem is gone > until > > the next time we run through the whole process of deleting, loading > and > > querying the table. > > > > > > > > There is no other activity in the database. All requested locks are > granted. > > > > > > > > Has anyone seen similar behavior? > > > > > > > > Some details: > > > > > > > > Postgres v 8.1.2 > > > > Linux Fedora 3 > > > > > > > > shared_buffers = 65536 > > > > temp_buffers = 32768 > > > > work_mem = 131072 > > > > maintenance_work_mem = 131072 > > > > max_stack_depth = 8192 > > > > max_fsm_pages = 40000 > > > > wal_buffers = 16 > > > > checkpoint_segments = 16 > > > > > > > > > > > > top reports > > > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 > postmaster > > > > > > > > ps -ef | grep postgres reports > > > > > > > > postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase > [local] > > SELECT > > > > > > > > strace -p 19478 > > > > no system calls reported > > > > > > > > > > > > Thanks for the help! > > > > Jozsef > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > > > > > > -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
Our application is such that any update to the database is done by a single session in a batch process using bulk load. The frequency of these usually larger scale updates is variable but an update runs every 2-3 days on average. Originally a plain VACUUM ANALYZE was executed on every affected table after every load. VACUUM FULL ANALYZE is scheduled to run on a weekly basis. I do understand the need for vacuuming. Nevertheless I expect Postgres to return data eventually even if I do not vacuum. In my case, the simple SELECT COUNT(*) FROM table; statement on a table that had around 100K "live" rows has not returned the result for more than 6 hours after which I manually killed it. Jozsef -----Original Message----- From: Bill Moran [mailto:wmoran@collaborativefusion.com] Sent: Wednesday, July 25, 2007 1:12 PM To: Jozsef Szalay Cc: Pavel Stehule; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% In response to "Jozsef Szalay" <jszalay@storediq.com>: > Hi Pavel, > > > Yes I did vacuum. In fact the only way to "fix" this problem is > executing a "full" vacuum. The plain vacuum did not help. I read over my previous reply and picked up on something else ... What is your vacuum _policy_? i.e. how often do you vacuum/analyze? The fact that you had to do a vacuum full to get things back under control tends to suggest that your current vacuum schedule is not aggressive enough. An explicit vacuum of this table after the large delete/insert may be helpful. > -----Original Message----- > From: Pavel Stehule [mailto:pavel.stehule@gmail.com] > Sent: Sunday, July 22, 2007 10:53 AM > To: Jozsef Szalay > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > Hello > > did you vacuum? > > It's good technique do vacuum table after remove bigger number of rows. > > Regards > Pavel Stehule > > 2007/7/22, Jozsef Szalay <jszalay@storediq.com>: > > > > > > > > > > I'm having this very disturbing problem. I got a table with about > 100,000 > > rows in it. Our software deletes the majority of these rows and then > bulk > > loads another 100,000 rows into the same table. All this is happening > within > > a single transaction. I then perform a simple "select count(*) from > ..." > > statement that never returns. In the mean time, the backend Postgres > process > > is taking close to 100% of the CPU. The hang-up does not always happen > on > > the same statement but eventually it happens 2 out of 3 times. If I > dump and > > then restore the schema where this table resides the problem is gone > until > > the next time we run through the whole process of deleting, loading > and > > querying the table. > > > > > > > > There is no other activity in the database. All requested locks are > granted. > > > > > > > > Has anyone seen similar behavior? > > > > > > > > Some details: > > > > > > > > Postgres v 8.1.2 > > > > Linux Fedora 3 > > > > > > > > shared_buffers = 65536 > > > > temp_buffers = 32768 > > > > work_mem = 131072 > > > > maintenance_work_mem = 131072 > > > > max_stack_depth = 8192 > > > > max_fsm_pages = 40000 > > > > wal_buffers = 16 > > > > checkpoint_segments = 16 > > > > > > > > > > > > top reports > > > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 > postmaster > > > > > > > > ps -ef | grep postgres reports > > > > > > > > postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase > [local] > > SELECT > > > > > > > > strace -p 19478 > > > > no system calls reported > > > > > > > > > > > > Thanks for the help! > > > > Jozsef > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > > > > > > -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
hello show me, please, output from vacuum verbose table Pavel 2007/7/25, Jozsef Szalay <jszalay@storediq.com>: > Our application is such that any update to the database is done by a > single session in a batch process using bulk load. The frequency of > these usually larger scale updates is variable but an update runs every > 2-3 days on average. > > Originally a plain VACUUM ANALYZE was executed on every affected table > after every load. > > VACUUM FULL ANALYZE is scheduled to run on a weekly basis. > > I do understand the need for vacuuming. Nevertheless I expect Postgres > to return data eventually even if I do not vacuum. In my case, the > simple SELECT COUNT(*) FROM table; statement on a table that had around > 100K "live" rows has not returned the result for more than 6 hours after > which I manually killed it. > > Jozsef > > > -----Original Message----- > From: Bill Moran [mailto:wmoran@collaborativefusion.com] > Sent: Wednesday, July 25, 2007 1:12 PM > To: Jozsef Szalay > Cc: Pavel Stehule; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > In response to "Jozsef Szalay" <jszalay@storediq.com>: > > > Hi Pavel, > > > > > > Yes I did vacuum. In fact the only way to "fix" this problem is > > executing a "full" vacuum. The plain vacuum did not help. > > I read over my previous reply and picked up on something else ... > > What is your vacuum _policy_? i.e. how often do you vacuum/analyze? > The fact that you had to do a vacuum full to get things back under > control tends to suggest that your current vacuum schedule is not > aggressive enough. > > An explicit vacuum of this table after the large delete/insert may > be helpful. > > > -----Original Message----- > > From: Pavel Stehule [mailto:pavel.stehule@gmail.com] > > Sent: Sunday, July 22, 2007 10:53 AM > > To: Jozsef Szalay > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > > > Hello > > > > did you vacuum? > > > > It's good technique do vacuum table after remove bigger number of > rows. > > > > Regards > > Pavel Stehule > > > > 2007/7/22, Jozsef Szalay <jszalay@storediq.com>: > > > > > > > > > > > > > > > I'm having this very disturbing problem. I got a table with about > > 100,000 > > > rows in it. Our software deletes the majority of these rows and then > > bulk > > > loads another 100,000 rows into the same table. All this is > happening > > within > > > a single transaction. I then perform a simple "select count(*) from > > ..." > > > statement that never returns. In the mean time, the backend Postgres > > process > > > is taking close to 100% of the CPU. The hang-up does not always > happen > > on > > > the same statement but eventually it happens 2 out of 3 times. If I > > dump and > > > then restore the schema where this table resides the problem is gone > > until > > > the next time we run through the whole process of deleting, loading > > and > > > querying the table. > > > > > > > > > > > > There is no other activity in the database. All requested locks are > > granted. > > > > > > > > > > > > Has anyone seen similar behavior? > > > > > > > > > > > > Some details: > > > > > > > > > > > > Postgres v 8.1.2 > > > > > > Linux Fedora 3 > > > > > > > > > > > > shared_buffers = 65536 > > > > > > temp_buffers = 32768 > > > > > > work_mem = 131072 > > > > > > maintenance_work_mem = 131072 > > > > > > max_stack_depth = 8192 > > > > > > max_fsm_pages = 40000 > > > > > > wal_buffers = 16 > > > > > > checkpoint_segments = 16 > > > > > > > > > > > > > > > > > > top reports > > > > > > > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > > > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 > > postmaster > > > > > > > > > > > > ps -ef | grep postgres reports > > > > > > > > > > > > postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase > > [local] > > > SELECT > > > > > > > > > > > > strace -p 19478 > > > > > > no system calls reported > > > > > > > > > > > > > > > > > > Thanks for the help! > > > > > > Jozsef > > > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > > > > > > > > > > > -- > Bill Moran > Collaborative Fusion Inc. > http://people.collaborativefusion.com/~wmoran/ > > wmoran@collaborativefusion.com > Phone: 412-422-3463x4023 > > **************************************************************** > IMPORTANT: This message contains confidential information and is > intended only for the individual named. If the reader of this > message is not an intended recipient (or the individual > responsible for the delivery of this message to an intended > recipient), please be advised that any re-use, dissemination, > distribution or copying of this message is prohibited. Please > notify the sender immediately by e-mail if you have received > this e-mail by mistake and delete this e-mail from your system. > E-mail transmission cannot be guaranteed to be secure or > error-free as information could be intercepted, corrupted, lost, > destroyed, arrive late or incomplete, or contain viruses. The > sender therefore does not accept liability for any errors or > omissions in the contents of this message, which arise as a > result of e-mail transmission. > **************************************************************** > >
In response to "Jozsef Szalay" <jszalay@storediq.com>: > Our application is such that any update to the database is done by a > single session in a batch process using bulk load. The frequency of > these usually larger scale updates is variable but an update runs every > 2-3 days on average. > > Originally a plain VACUUM ANALYZE was executed on every affected table > after every load. Any other insert/update activity outside of the bulk loads? What's the vacuum policy outside the bulk loads? You say originally, does it still do so? I agree with Pavel that the output of vacuum verbose when the problem is occurring would be helpful. > VACUUM FULL ANALYZE is scheduled to run on a weekly basis. If you need to do this, then other settings are incorrect. > I do understand the need for vacuuming. Nevertheless I expect Postgres > to return data eventually even if I do not vacuum. In my case, the > simple SELECT COUNT(*) FROM table; statement on a table that had around > 100K "live" rows has not returned the result for more than 6 hours after > which I manually killed it. It should, 6 hours is too long for that process, unless you're running a 486dx2. You didn't mention your hardware or your postgresql.conf settings. What other activity is occurring during this long count()? Can you give us a shot of the iostat output and/or top during this phenomenon? > > Jozsef > > > -----Original Message----- > From: Bill Moran [mailto:wmoran@collaborativefusion.com] > Sent: Wednesday, July 25, 2007 1:12 PM > To: Jozsef Szalay > Cc: Pavel Stehule; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > In response to "Jozsef Szalay" <jszalay@storediq.com>: > > > Hi Pavel, > > > > > > Yes I did vacuum. In fact the only way to "fix" this problem is > > executing a "full" vacuum. The plain vacuum did not help. > > I read over my previous reply and picked up on something else ... > > What is your vacuum _policy_? i.e. how often do you vacuum/analyze? > The fact that you had to do a vacuum full to get things back under > control tends to suggest that your current vacuum schedule is not > aggressive enough. > > An explicit vacuum of this table after the large delete/insert may > be helpful. > > > -----Original Message----- > > From: Pavel Stehule [mailto:pavel.stehule@gmail.com] > > Sent: Sunday, July 22, 2007 10:53 AM > > To: Jozsef Szalay > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > > > Hello > > > > did you vacuum? > > > > It's good technique do vacuum table after remove bigger number of > rows. > > > > Regards > > Pavel Stehule > > > > 2007/7/22, Jozsef Szalay <jszalay@storediq.com>: > > > > > > > > > > > > > > > I'm having this very disturbing problem. I got a table with about > > 100,000 > > > rows in it. Our software deletes the majority of these rows and then > > bulk > > > loads another 100,000 rows into the same table. All this is > happening > > within > > > a single transaction. I then perform a simple "select count(*) from > > ..." > > > statement that never returns. In the mean time, the backend Postgres > > process > > > is taking close to 100% of the CPU. The hang-up does not always > happen > > on > > > the same statement but eventually it happens 2 out of 3 times. If I > > dump and > > > then restore the schema where this table resides the problem is gone > > until > > > the next time we run through the whole process of deleting, loading > > and > > > querying the table. > > > > > > > > > > > > There is no other activity in the database. All requested locks are > > granted. > > > > > > > > > > > > Has anyone seen similar behavior? > > > > > > > > > > > > Some details: > > > > > > > > > > > > Postgres v 8.1.2 > > > > > > Linux Fedora 3 > > > > > > > > > > > > shared_buffers = 65536 > > > > > > temp_buffers = 32768 > > > > > > work_mem = 131072 > > > > > > maintenance_work_mem = 131072 > > > > > > max_stack_depth = 8192 > > > > > > max_fsm_pages = 40000 > > > > > > wal_buffers = 16 > > > > > > checkpoint_segments = 16 > > > > > > > > > > > > > > > > > > top reports > > > > > > > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > > > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 > > postmaster > > > > > > > > > > > > ps -ef | grep postgres reports > > > > > > > > > > > > postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase > > [local] > > > SELECT > > > > > > > > > > > > strace -p 19478 > > > > > > no system calls reported > > > > > > > > > > > > > > > > > > Thanks for the help! > > > > > > Jozsef > > > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > > > > > > > > > > > -- > Bill Moran > Collaborative Fusion Inc. > http://people.collaborativefusion.com/~wmoran/ > > wmoran@collaborativefusion.com > Phone: 412-422-3463x4023 -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
With the limited time I had, I could not produce a test case that I could have submitted to this forum. I have found another issue in production though. In addition to the SELECT COUNT(*) FROM table; taking forever (>6 hours on a table with 100,000 rows and with no indexes on a system with Linux Fedora 3 and with two 3.2GHz Xeon processors plus hyperthreading), the "SELECT column_1 FROM table GROUP BY column_1 HAVING COUNT(*) > 1; statement actually ran out of memory (on a table with 300,000 rows and with no indexes, while the OS reported >3.5 GB virtual memory for the postgres backend process). To make it short, we found that both problems could be solved with a single magic bullet, namely by calling ANALYZE every time after large amount of changes were introduced to the table. (Earlier, we called ANALYZE only after we did some serious post-processing on freshly bulk-loaded data.) I don't know why ANALYZE would have any effect on a sequential scan of a table but it does appear to impact both performance and memory usage significantly. Both of our production issues have vanished after this simple change! We do not have to call a FULL VACUUM on the table anymore. The plain VACUUM is satisfactory. Thanks for all the responses! Jozsef -----Original Message----- From: Bill Moran [mailto:wmoran@collaborativefusion.com] Sent: Wednesday, July 25, 2007 3:29 PM To: Jozsef Szalay Cc: Pavel Stehule; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% In response to "Jozsef Szalay" <jszalay@storediq.com>: > Our application is such that any update to the database is done by a > single session in a batch process using bulk load. The frequency of > these usually larger scale updates is variable but an update runs every > 2-3 days on average. > > Originally a plain VACUUM ANALYZE was executed on every affected table > after every load. Any other insert/update activity outside of the bulk loads? What's the vacuum policy outside the bulk loads? You say originally, does it still do so? I agree with Pavel that the output of vacuum verbose when the problem is occurring would be helpful. > VACUUM FULL ANALYZE is scheduled to run on a weekly basis. If you need to do this, then other settings are incorrect. > I do understand the need for vacuuming. Nevertheless I expect Postgres > to return data eventually even if I do not vacuum. In my case, the > simple SELECT COUNT(*) FROM table; statement on a table that had around > 100K "live" rows has not returned the result for more than 6 hours after > which I manually killed it. It should, 6 hours is too long for that process, unless you're running a 486dx2. You didn't mention your hardware or your postgresql.conf settings. What other activity is occurring during this long count()? Can you give us a shot of the iostat output and/or top during this phenomenon? > > Jozsef > > > -----Original Message----- > From: Bill Moran [mailto:wmoran@collaborativefusion.com] > Sent: Wednesday, July 25, 2007 1:12 PM > To: Jozsef Szalay > Cc: Pavel Stehule; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > In response to "Jozsef Szalay" <jszalay@storediq.com>: > > > Hi Pavel, > > > > > > Yes I did vacuum. In fact the only way to "fix" this problem is > > executing a "full" vacuum. The plain vacuum did not help. > > I read over my previous reply and picked up on something else ... > > What is your vacuum _policy_? i.e. how often do you vacuum/analyze? > The fact that you had to do a vacuum full to get things back under > control tends to suggest that your current vacuum schedule is not > aggressive enough. > > An explicit vacuum of this table after the large delete/insert may > be helpful. > > > -----Original Message----- > > From: Pavel Stehule [mailto:pavel.stehule@gmail.com] > > Sent: Sunday, July 22, 2007 10:53 AM > > To: Jozsef Szalay > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > > > Hello > > > > did you vacuum? > > > > It's good technique do vacuum table after remove bigger number of > rows. > > > > Regards > > Pavel Stehule > > > > 2007/7/22, Jozsef Szalay <jszalay@storediq.com>: > > > > > > > > > > > > > > > I'm having this very disturbing problem. I got a table with about > > 100,000 > > > rows in it. Our software deletes the majority of these rows and then > > bulk > > > loads another 100,000 rows into the same table. All this is > happening > > within > > > a single transaction. I then perform a simple "select count(*) from > > ..." > > > statement that never returns. In the mean time, the backend Postgres > > process > > > is taking close to 100% of the CPU. The hang-up does not always > happen > > on > > > the same statement but eventually it happens 2 out of 3 times. If I > > dump and > > > then restore the schema where this table resides the problem is gone > > until > > > the next time we run through the whole process of deleting, loading > > and > > > querying the table. > > > > > > > > > > > > There is no other activity in the database. All requested locks are > > granted. > > > > > > > > > > > > Has anyone seen similar behavior? > > > > > > > > > > > > Some details: > > > > > > > > > > > > Postgres v 8.1.2 > > > > > > Linux Fedora 3 > > > > > > > > > > > > shared_buffers = 65536 > > > > > > temp_buffers = 32768 > > > > > > work_mem = 131072 > > > > > > maintenance_work_mem = 131072 > > > > > > max_stack_depth = 8192 > > > > > > max_fsm_pages = 40000 > > > > > > wal_buffers = 16 > > > > > > checkpoint_segments = 16 > > > > > > > > > > > > > > > > > > top reports > > > > > > > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > > > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 > > postmaster > > > > > > > > > > > > ps -ef | grep postgres reports > > > > > > > > > > > > postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase > > [local] > > > SELECT > > > > > > > > > > > > strace -p 19478 > > > > > > no system calls reported > > > > > > > > > > > > > > > > > > Thanks for the help! > > > > > > Jozsef > > > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > > > > > > > > > > > -- > Bill Moran > Collaborative Fusion Inc. > http://people.collaborativefusion.com/~wmoran/ > > wmoran@collaborativefusion.com > Phone: 412-422-3463x4023 -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Jozsef Szalay escribió: > I don't know why ANALYZE would have any effect on a sequential scan of a > table but it does appear to impact both performance and memory usage > significantly. It doesn't. What it does is provide the query optimizer with the information that it needs to know that the table contains many different values, which makes it turn the initial hashed aggregation into a sort plus group aggregation. This allows the aggregation to use less memory. As an exercise, see an EXPLAIN of the query, both before and after the analyze, and study the difference. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.