Thread: Simple select hangs while CPU close to 100%

Simple select hangs while CPU close to 100%

From

"Jozsef Szalay"

Date:

22 July 2007, 12:41:19

I’m having this very disturbing problem. I got a table with about 100,000 rows in it. Our software deletes the majority of these rows and then bulk loads another 100,000 rows into the same table. All this is happening within a single transaction. I then perform a simple “select count(*) from …” statement that never returns. In the mean time, the backend Postgres process is taking close to 100% of the CPU. The hang-up does not always happen on the same statement but eventually it happens 2 out of 3 times. If I dump and then restore the schema where this table resides the problem is gone until the next time we run through the whole process of deleting, loading and querying the table.

There is no other activity in the database. All requested locks are granted.

Has anyone seen similar behavior?

Some details:

Postgres v 8.1.2

Linux Fedora 3

shared_buffers = 65536

temp_buffers = 32768

work_mem = 131072

maintenance_work_mem = 131072

max_stack_depth = 8192

max_fsm_pages = 40000

wal_buffers = 16

checkpoint_segments = 16

top reports

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 postmaster

ps –ef | grep postgres reports

postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user dbase [local] SELECT

strace –p 19478

no system calls reported

Thanks for the help!

Jozsef

Re: Simple select hangs while CPU close to 100%

From

"Pavel Stehule"

Date:

22 July 2007, 12:52:39

Hello

did you vacuum?

It's good technique do vacuum table after remove bigger number of rows.

Regards
Pavel Stehule

2007/7/22, Jozsef Szalay <jszalay@storediq.com>:
>
>
>
>
> I'm having this very disturbing problem. I got a table with about 100,000
> rows in it. Our software deletes the majority of these rows and then bulk
> loads another 100,000 rows into the same table. All this is happening within
> a single transaction. I then perform a simple "select count(*) from …"
> statement that never returns. In the mean time, the backend Postgres process
> is taking close to 100% of the CPU. The hang-up does not always happen on
> the same statement but eventually it happens 2 out of 3 times. If I dump and
> then restore the schema where this table resides the problem is gone until
> the next time we run through the whole process of deleting, loading and
> querying the table.
>
>
>
> There is no other activity in the database. All requested locks are granted.
>
>
>
> Has anyone seen similar behavior?
>
>
>
> Some details:
>
>
>
> Postgres v 8.1.2
>
> Linux Fedora 3
>
>
>
> shared_buffers = 65536
>
> temp_buffers = 32768
>
> work_mem = 131072
>
> maintenance_work_mem = 131072
>
> max_stack_depth = 8192
>
> max_fsm_pages = 40000
>
> wal_buffers = 16
>
> checkpoint_segments = 16
>
>
>
>
>
> top reports
>
>
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 19478 postgres  25   0  740m 721m 536m R 99.7  4.4 609:41.16 postmaster
>
>
>
> ps –ef | grep postgres reports
>
>
>
> postgres 19478  8061 99 00:11 ?        10:13:03 postgres: user dbase [local]
> SELECT
>
>
>
> strace –p 19478
>
> no system calls reported
>
>
>
>
>
> Thanks for the help!
>
> Jozsef

Re: Simple select hangs while CPU close to 100%

From

Adam Tauno Williams

Date:

22 July 2007, 15:35:16

> I’m having this very disturbing problem. I got a table with about
> 100,000 rows in it. Our software deletes the majority of these rows
> and then bulk loads another 100,000 rows into the same table. All this
> is happening within a single transaction. I then perform a simple
> “select count(*) from …” statement that never returns. In the mean

COUNT(*) is always slow;  but either way if the process is deleting and
then adding records, can't you just keep track of how may records you
loaded [aka count++] rather than turning around and asking the database
before any statistics have had a chance to be updated.

>  time, the backend Postgres process is taking close to 100% of the
> CPU. The hang-up does not always happen on the same statement but
> eventually it happens 2 out of 3 times. If I dump and then restore the
> schema where this table resides the problem is gone until the next
> time we run through the whole process of deleting, loading and
> querying the table.
> There is no other activity in the database. All requested locks are
> granted.

Re: Simple select hangs while CPU close to 100%

From

"Jozsef Szalay"

Date:

25 July 2007, 14:37:28

Hi Pavel,


Yes I did vacuum. In fact the only way to "fix" this problem is
executing a "full" vacuum. The plain vacuum did not help.


Regards,
Jozsef


-----Original Message-----
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Sunday, July 22, 2007 10:53 AM
To: Jozsef Szalay
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%

Hello

did you vacuum?

It's good technique do vacuum table after remove bigger number of rows.

Regards
Pavel Stehule

2007/7/22, Jozsef Szalay <jszalay@storediq.com>:
>
>
>
>
> I'm having this very disturbing problem. I got a table with about
100,000
> rows in it. Our software deletes the majority of these rows and then
bulk
> loads another 100,000 rows into the same table. All this is happening
within
> a single transaction. I then perform a simple "select count(*) from
..."
> statement that never returns. In the mean time, the backend Postgres
process
> is taking close to 100% of the CPU. The hang-up does not always happen
on
> the same statement but eventually it happens 2 out of 3 times. If I
dump and
> then restore the schema where this table resides the problem is gone
until
> the next time we run through the whole process of deleting, loading
and
> querying the table.
>
>
>
> There is no other activity in the database. All requested locks are
granted.
>
>
>
> Has anyone seen similar behavior?
>
>
>
> Some details:
>
>
>
> Postgres v 8.1.2
>
> Linux Fedora 3
>
>
>
> shared_buffers = 65536
>
> temp_buffers = 32768
>
> work_mem = 131072
>
> maintenance_work_mem = 131072
>
> max_stack_depth = 8192
>
> max_fsm_pages = 40000
>
> wal_buffers = 16
>
> checkpoint_segments = 16
>
>
>
>
>
> top reports
>
>
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 19478 postgres  25   0  740m 721m 536m R 99.7  4.4 609:41.16
postmaster
>
>
>
> ps -ef | grep postgres reports
>
>
>
> postgres 19478  8061 99 00:11 ?        10:13:03 postgres: user dbase
[local]
> SELECT
>
>
>
> strace -p 19478
>
> no system calls reported
>
>
>
>
>
> Thanks for the help!
>
> Jozsef

Re: Simple select hangs while CPU close to 100%

From

"Pavel Stehule"

Date:

25 July 2007, 14:42:16

2007/7/25, Jozsef Szalay <jszalay@storediq.com>:
> Hi Pavel,
>
>
> Yes I did vacuum. In fact the only way to "fix" this problem is
> executing a "full" vacuum. The plain vacuum did not help.
>
>
> Regards,
> Jozsef

It's question if vacuum was done.

Try vacuum verbose;

Maybe your max_fsm_pages is too low and you have to up it.

Regards
Pavel Stehule

Re: Simple select hangs while CPU close to 100%

From

"Jozsef Szalay"

Date:

25 July 2007, 14:51:43

The actual application does not have to perform this statement since, as
you suggested; it keeps track of what got loaded. However, the table has
to go thru a de-duplication process because bulk load is utilized to
load the potentially large number (millions) of rows. All indexes were
dropped for the bulk load. This de-duplication procedure starts with a
SELECT statement that identifies duplicate rows. This is the original
SELECT that never returned. Later on I used the SELECT COUNT(*) to see
if somehow my original SELECT had something to do with the hang and I
found that this simple query hung as well.

The only way I could avoid getting into this stage was to perform a
VACUUM FULL on the table before the bulk load. I would prefer not using
a full vacuum every time due to the exclusive access to the table and
time it requires. The plain VACUUM did not work.

Regards,
Jozsef



-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Adam Tauno
Williams
Sent: Sunday, July 22, 2007 1:33 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%


> I'm having this very disturbing problem. I got a table with about
> 100,000 rows in it. Our software deletes the majority of these rows
> and then bulk loads another 100,000 rows into the same table. All this
> is happening within a single transaction. I then perform a simple
> "select count(*) from ..." statement that never returns. In the mean

COUNT(*) is always slow;  but either way if the process is deleting and
then adding records, can't you just keep track of how may records you
loaded [aka count++] rather than turning around and asking the database
before any statistics have had a chance to be updated.

>  time, the backend Postgres process is taking close to 100% of the
> CPU. The hang-up does not always happen on the same statement but
> eventually it happens 2 out of 3 times. If I dump and then restore the
> schema where this table resides the problem is gone until the next
> time we run through the whole process of deleting, loading and
> querying the table.
> There is no other activity in the database. All requested locks are
> granted.



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: Simple select hangs while CPU close to 100%

From

Bill Moran

Date:

25 July 2007, 14:59:26

In response to "Jozsef Szalay" <jszalay@storediq.com>:

> Hi Pavel,
>
> Yes I did vacuum. In fact the only way to "fix" this problem is
> executing a "full" vacuum. The plain vacuum did not help.

Based on the information, I would expect that this problem is the result
of improper PG tuning, or inadequate server sizing (RAM, etc).

A table changing 100,000 rows shouldn't cause enough bloat to hurt
count(*)'s performance significantly, unless something else is wrong.

Some quick piddling around shows that tables with over a million rows
can count(*) the whole table in about 1/2 second on a system 2G of RAM.
A table with 13 mil takes a min and a half.  We haven't specifically
tuned this server for count(*) performance, as it's not a priority
for this database, so I expect the performance drop for the 13 mil
database is a result of exhausting shared_buffers and hitting the
disks.

> -----Original Message-----
> From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
> Sent: Sunday, July 22, 2007 10:53 AM
> To: Jozsef Szalay
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
>
> Hello
>
> did you vacuum?
>
> It's good technique do vacuum table after remove bigger number of rows.
>
> Regards
> Pavel Stehule
>
> 2007/7/22, Jozsef Szalay <jszalay@storediq.com>:
> >
> >
> >
> >
> > I'm having this very disturbing problem. I got a table with about
> 100,000
> > rows in it. Our software deletes the majority of these rows and then
> bulk
> > loads another 100,000 rows into the same table. All this is happening
> within
> > a single transaction. I then perform a simple "select count(*) from
> ..."
> > statement that never returns. In the mean time, the backend Postgres
> process
> > is taking close to 100% of the CPU. The hang-up does not always happen
> on
> > the same statement but eventually it happens 2 out of 3 times. If I
> dump and
> > then restore the schema where this table resides the problem is gone
> until
> > the next time we run through the whole process of deleting, loading
> and
> > querying the table.
> >
> >
> >
> > There is no other activity in the database. All requested locks are
> granted.
> >
> >
> >
> > Has anyone seen similar behavior?
> >
> >
> >
> > Some details:
> >
> >
> >
> > Postgres v 8.1.2
> >
> > Linux Fedora 3
> >
> >
> >
> > shared_buffers = 65536
> >
> > temp_buffers = 32768
> >
> > work_mem = 131072
> >
> > maintenance_work_mem = 131072
> >
> > max_stack_depth = 8192
> >
> > max_fsm_pages = 40000
> >
> > wal_buffers = 16
> >
> > checkpoint_segments = 16
> >
> >
> >
> >
> >
> > top reports
> >
> >
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >
> > 19478 postgres  25   0  740m 721m 536m R 99.7  4.4 609:41.16
> postmaster
> >
> >
> >
> > ps -ef | grep postgres reports
> >
> >
> >
> > postgres 19478  8061 99 00:11 ?        10:13:03 postgres: user dbase
> [local]
> > SELECT
> >
> >
> >
> > strace -p 19478
> >
> > no system calls reported
> >
> >
> >
> >
> >
> > Thanks for the help!
> >
> > Jozsef
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>
>
>
>
>


--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: Simple select hangs while CPU close to 100%

From

Bill Moran

Date:

25 July 2007, 15:12:25

In response to "Jozsef Szalay" <jszalay@storediq.com>:

> Hi Pavel,
>
>
> Yes I did vacuum. In fact the only way to "fix" this problem is
> executing a "full" vacuum. The plain vacuum did not help.

I read over my previous reply and picked up on something else ...

What is your vacuum _policy_?  i.e. how often do you vacuum/analyze?
The fact that you had to do a vacuum full to get things back under
control tends to suggest that your current vacuum schedule is not
aggressive enough.

An explicit vacuum of this table after the large delete/insert may
be helpful.

> -----Original Message-----
> From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
> Sent: Sunday, July 22, 2007 10:53 AM
> To: Jozsef Szalay
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
>
> Hello
>
> did you vacuum?
>
> It's good technique do vacuum table after remove bigger number of rows.
>
> Regards
> Pavel Stehule
>
> 2007/7/22, Jozsef Szalay <jszalay@storediq.com>:
> >
> >
> >
> >
> > I'm having this very disturbing problem. I got a table with about
> 100,000
> > rows in it. Our software deletes the majority of these rows and then
> bulk
> > loads another 100,000 rows into the same table. All this is happening
> within
> > a single transaction. I then perform a simple "select count(*) from
> ..."
> > statement that never returns. In the mean time, the backend Postgres
> process
> > is taking close to 100% of the CPU. The hang-up does not always happen
> on
> > the same statement but eventually it happens 2 out of 3 times. If I
> dump and
> > then restore the schema where this table resides the problem is gone
> until
> > the next time we run through the whole process of deleting, loading
> and
> > querying the table.
> >
> >
> >
> > There is no other activity in the database. All requested locks are
> granted.
> >
> >
> >
> > Has anyone seen similar behavior?
> >
> >
> >
> > Some details:
> >
> >
> >
> > Postgres v 8.1.2
> >
> > Linux Fedora 3
> >
> >
> >
> > shared_buffers = 65536
> >
> > temp_buffers = 32768
> >
> > work_mem = 131072
> >
> > maintenance_work_mem = 131072
> >
> > max_stack_depth = 8192
> >
> > max_fsm_pages = 40000
> >
> > wal_buffers = 16
> >
> > checkpoint_segments = 16
> >
> >
> >
> >
> >
> > top reports
> >
> >
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >
> > 19478 postgres  25   0  740m 721m 536m R 99.7  4.4 609:41.16
> postmaster
> >
> >
> >
> > ps -ef | grep postgres reports
> >
> >
> >
> > postgres 19478  8061 99 00:11 ?        10:13:03 postgres: user dbase
> [local]
> > SELECT
> >
> >
> >
> > strace -p 19478
> >
> > no system calls reported
> >
> >
> >
> >
> >
> > Thanks for the help!
> >
> > Jozsef
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>
>
>
>
>


--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: Simple select hangs while CPU close to 100%

From

"Jozsef Szalay"

Date:

25 July 2007, 15:48:12

Our application is such that any update to the database is done by a
single session in a batch process using bulk load. The frequency of
these usually larger scale updates is variable but an update runs every
2-3 days on average.

Originally a plain VACUUM ANALYZE was executed on every affected table
after every load.

VACUUM FULL ANALYZE is scheduled to run on a weekly basis.

I do understand the need for vacuuming. Nevertheless I expect Postgres
to return data eventually even if I do not vacuum. In my case, the
simple SELECT COUNT(*) FROM table; statement on a table that had around
100K "live" rows has not returned the result for more than 6 hours after
which I manually killed it.

Jozsef


-----Original Message-----
From: Bill Moran [mailto:wmoran@collaborativefusion.com]
Sent: Wednesday, July 25, 2007 1:12 PM
To: Jozsef Szalay
Cc: Pavel Stehule; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%

In response to "Jozsef Szalay" <jszalay@storediq.com>:

> Hi Pavel,
>
>
> Yes I did vacuum. In fact the only way to "fix" this problem is
> executing a "full" vacuum. The plain vacuum did not help.

I read over my previous reply and picked up on something else ...

What is your vacuum _policy_?  i.e. how often do you vacuum/analyze?
The fact that you had to do a vacuum full to get things back under
control tends to suggest that your current vacuum schedule is not
aggressive enough.

An explicit vacuum of this table after the large delete/insert may
be helpful.

> -----Original Message-----
> From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
> Sent: Sunday, July 22, 2007 10:53 AM
> To: Jozsef Szalay
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
>
> Hello
>
> did you vacuum?
>
> It's good technique do vacuum table after remove bigger number of
rows.
>
> Regards
> Pavel Stehule
>
> 2007/7/22, Jozsef Szalay <jszalay@storediq.com>:
> >
> >
> >
> >
> > I'm having this very disturbing problem. I got a table with about
> 100,000
> > rows in it. Our software deletes the majority of these rows and then
> bulk
> > loads another 100,000 rows into the same table. All this is
happening
> within
> > a single transaction. I then perform a simple "select count(*) from
> ..."
> > statement that never returns. In the mean time, the backend Postgres
> process
> > is taking close to 100% of the CPU. The hang-up does not always
happen
> on
> > the same statement but eventually it happens 2 out of 3 times. If I
> dump and
> > then restore the schema where this table resides the problem is gone
> until
> > the next time we run through the whole process of deleting, loading
> and
> > querying the table.
> >
> >
> >
> > There is no other activity in the database. All requested locks are
> granted.
> >
> >
> >
> > Has anyone seen similar behavior?
> >
> >
> >
> > Some details:
> >
> >
> >
> > Postgres v 8.1.2
> >
> > Linux Fedora 3
> >
> >
> >
> > shared_buffers = 65536
> >
> > temp_buffers = 32768
> >
> > work_mem = 131072
> >
> > maintenance_work_mem = 131072
> >
> > max_stack_depth = 8192
> >
> > max_fsm_pages = 40000
> >
> > wal_buffers = 16
> >
> > checkpoint_segments = 16
> >
> >
> >
> >
> >
> > top reports
> >
> >
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >
> > 19478 postgres  25   0  740m 721m 536m R 99.7  4.4 609:41.16
> postmaster
> >
> >
> >
> > ps -ef | grep postgres reports
> >
> >
> >
> > postgres 19478  8061 99 00:11 ?        10:13:03 postgres: user dbase
> [local]
> > SELECT
> >
> >
> >
> > strace -p 19478
> >
> > no system calls reported
> >
> >
> >
> >
> >
> > Thanks for the help!
> >
> > Jozsef
>
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>
>
>
>
>


--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: Simple select hangs while CPU close to 100%

From

"Pavel Stehule"

Date:

25 July 2007, 16:06:11

hello

show me, please, output from vacuum verbose table

Pavel

2007/7/25, Jozsef Szalay <jszalay@storediq.com>:
> Our application is such that any update to the database is done by a
> single session in a batch process using bulk load. The frequency of
> these usually larger scale updates is variable but an update runs every
> 2-3 days on average.
>
> Originally a plain VACUUM ANALYZE was executed on every affected table
> after every load.
>
> VACUUM FULL ANALYZE is scheduled to run on a weekly basis.
>
> I do understand the need for vacuuming. Nevertheless I expect Postgres
> to return data eventually even if I do not vacuum. In my case, the
> simple SELECT COUNT(*) FROM table; statement on a table that had around
> 100K "live" rows has not returned the result for more than 6 hours after
> which I manually killed it.
>
> Jozsef
>
>
> -----Original Message-----
> From: Bill Moran [mailto:wmoran@collaborativefusion.com]
> Sent: Wednesday, July 25, 2007 1:12 PM
> To: Jozsef Szalay
> Cc: Pavel Stehule; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
>
> In response to "Jozsef Szalay" <jszalay@storediq.com>:
>
> > Hi Pavel,
> >
> >
> > Yes I did vacuum. In fact the only way to "fix" this problem is
> > executing a "full" vacuum. The plain vacuum did not help.
>
> I read over my previous reply and picked up on something else ...
>
> What is your vacuum _policy_?  i.e. how often do you vacuum/analyze?
> The fact that you had to do a vacuum full to get things back under
> control tends to suggest that your current vacuum schedule is not
> aggressive enough.
>
> An explicit vacuum of this table after the large delete/insert may
> be helpful.
>
> > -----Original Message-----
> > From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
> > Sent: Sunday, July 22, 2007 10:53 AM
> > To: Jozsef Szalay
> > Cc: pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
> >
> > Hello
> >
> > did you vacuum?
> >
> > It's good technique do vacuum table after remove bigger number of
> rows.
> >
> > Regards
> > Pavel Stehule
> >
> > 2007/7/22, Jozsef Szalay <jszalay@storediq.com>:
> > >
> > >
> > >
> > >
> > > I'm having this very disturbing problem. I got a table with about
> > 100,000
> > > rows in it. Our software deletes the majority of these rows and then
> > bulk
> > > loads another 100,000 rows into the same table. All this is
> happening
> > within
> > > a single transaction. I then perform a simple "select count(*) from
> > ..."
> > > statement that never returns. In the mean time, the backend Postgres
> > process
> > > is taking close to 100% of the CPU. The hang-up does not always
> happen
> > on
> > > the same statement but eventually it happens 2 out of 3 times. If I
> > dump and
> > > then restore the schema where this table resides the problem is gone
> > until
> > > the next time we run through the whole process of deleting, loading
> > and
> > > querying the table.
> > >
> > >
> > >
> > > There is no other activity in the database. All requested locks are
> > granted.
> > >
> > >
> > >
> > > Has anyone seen similar behavior?
> > >
> > >
> > >
> > > Some details:
> > >
> > >
> > >
> > > Postgres v 8.1.2
> > >
> > > Linux Fedora 3
> > >
> > >
> > >
> > > shared_buffers = 65536
> > >
> > > temp_buffers = 32768
> > >
> > > work_mem = 131072
> > >
> > > maintenance_work_mem = 131072
> > >
> > > max_stack_depth = 8192
> > >
> > > max_fsm_pages = 40000
> > >
> > > wal_buffers = 16
> > >
> > > checkpoint_segments = 16
> > >
> > >
> > >
> > >
> > >
> > > top reports
> > >
> > >
> > >
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > >
> > > 19478 postgres  25   0  740m 721m 536m R 99.7  4.4 609:41.16
> > postmaster
> > >
> > >
> > >
> > > ps -ef | grep postgres reports
> > >
> > >
> > >
> > > postgres 19478  8061 99 00:11 ?        10:13:03 postgres: user dbase
> > [local]
> > > SELECT
> > >
> > >
> > >
> > > strace -p 19478
> > >
> > > no system calls reported
> > >
> > >
> > >
> > >
> > >
> > > Thanks for the help!
> > >
> > > Jozsef
> >
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
> >
> >
> >
> >
> >
> >
>
>
> --
> Bill Moran
> Collaborative Fusion Inc.
> http://people.collaborativefusion.com/~wmoran/
>
> wmoran@collaborativefusion.com
> Phone: 412-422-3463x4023
>
> ****************************************************************
> IMPORTANT: This message contains confidential information and is
> intended only for the individual named. If the reader of this
> message is not an intended recipient (or the individual
> responsible for the delivery of this message to an intended
> recipient), please be advised that any re-use, dissemination,
> distribution or copying of this message is prohibited. Please
> notify the sender immediately by e-mail if you have received
> this e-mail by mistake and delete this e-mail from your system.
> E-mail transmission cannot be guaranteed to be secure or
> error-free as information could be intercepted, corrupted, lost,
> destroyed, arrive late or incomplete, or contain viruses. The
> sender therefore does not accept liability for any errors or
> omissions in the contents of this message, which arise as a
> result of e-mail transmission.
> ****************************************************************
>
>

Re: Simple select hangs while CPU close to 100%

From

Bill Moran

Date:

25 July 2007, 17:28:39

In response to "Jozsef Szalay" <jszalay@storediq.com>:

> Our application is such that any update to the database is done by a
> single session in a batch process using bulk load. The frequency of
> these usually larger scale updates is variable but an update runs every
> 2-3 days on average.
>
> Originally a plain VACUUM ANALYZE was executed on every affected table
> after every load.

Any other insert/update activity outside of the bulk loads?  What's
the vacuum policy outside the bulk loads?  You say originally, does
it still do so?

I agree with Pavel that the output of vacuum verbose when the problem
is occurring would be helpful.

> VACUUM FULL ANALYZE is scheduled to run on a weekly basis.

If you need to do this, then other settings are incorrect.

> I do understand the need for vacuuming. Nevertheless I expect Postgres
> to return data eventually even if I do not vacuum. In my case, the
> simple SELECT COUNT(*) FROM table; statement on a table that had around
> 100K "live" rows has not returned the result for more than 6 hours after
> which I manually killed it.

It should, 6 hours is too long for that process, unless you're running
a 486dx2.  You didn't mention your hardware or your postgresql.conf
settings.  What other activity is occurring during this long count()?
Can you give us a shot of the iostat output and/or top during this
phenomenon?

>
> Jozsef
>
>
> -----Original Message-----
> From: Bill Moran [mailto:wmoran@collaborativefusion.com]
> Sent: Wednesday, July 25, 2007 1:12 PM
> To: Jozsef Szalay
> Cc: Pavel Stehule; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
>
> In response to "Jozsef Szalay" <jszalay@storediq.com>:
>
> > Hi Pavel,
> >
> >
> > Yes I did vacuum. In fact the only way to "fix" this problem is
> > executing a "full" vacuum. The plain vacuum did not help.
>
> I read over my previous reply and picked up on something else ...
>
> What is your vacuum _policy_?  i.e. how often do you vacuum/analyze?
> The fact that you had to do a vacuum full to get things back under
> control tends to suggest that your current vacuum schedule is not
> aggressive enough.
>
> An explicit vacuum of this table after the large delete/insert may
> be helpful.
>
> > -----Original Message-----
> > From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
> > Sent: Sunday, July 22, 2007 10:53 AM
> > To: Jozsef Szalay
> > Cc: pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
> >
> > Hello
> >
> > did you vacuum?
> >
> > It's good technique do vacuum table after remove bigger number of
> rows.
> >
> > Regards
> > Pavel Stehule
> >
> > 2007/7/22, Jozsef Szalay <jszalay@storediq.com>:
> > >
> > >
> > >
> > >
> > > I'm having this very disturbing problem. I got a table with about
> > 100,000
> > > rows in it. Our software deletes the majority of these rows and then
> > bulk
> > > loads another 100,000 rows into the same table. All this is
> happening
> > within
> > > a single transaction. I then perform a simple "select count(*) from
> > ..."
> > > statement that never returns. In the mean time, the backend Postgres
> > process
> > > is taking close to 100% of the CPU. The hang-up does not always
> happen
> > on
> > > the same statement but eventually it happens 2 out of 3 times. If I
> > dump and
> > > then restore the schema where this table resides the problem is gone
> > until
> > > the next time we run through the whole process of deleting, loading
> > and
> > > querying the table.
> > >
> > >
> > >
> > > There is no other activity in the database. All requested locks are
> > granted.
> > >
> > >
> > >
> > > Has anyone seen similar behavior?
> > >
> > >
> > >
> > > Some details:
> > >
> > >
> > >
> > > Postgres v 8.1.2
> > >
> > > Linux Fedora 3
> > >
> > >
> > >
> > > shared_buffers = 65536
> > >
> > > temp_buffers = 32768
> > >
> > > work_mem = 131072
> > >
> > > maintenance_work_mem = 131072
> > >
> > > max_stack_depth = 8192
> > >
> > > max_fsm_pages = 40000
> > >
> > > wal_buffers = 16
> > >
> > > checkpoint_segments = 16
> > >
> > >
> > >
> > >
> > >
> > > top reports
> > >
> > >
> > >
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > >
> > > 19478 postgres  25   0  740m 721m 536m R 99.7  4.4 609:41.16
> > postmaster
> > >
> > >
> > >
> > > ps -ef | grep postgres reports
> > >
> > >
> > >
> > > postgres 19478  8061 99 00:11 ?        10:13:03 postgres: user dbase
> > [local]
> > > SELECT
> > >
> > >
> > >
> > > strace -p 19478
> > >
> > > no system calls reported
> > >
> > >
> > >
> > >
> > >
> > > Thanks for the help!
> > >
> > > Jozsef
> >
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
> >
> >
> >
> >
> >
> >
>
>
> --
> Bill Moran
> Collaborative Fusion Inc.
> http://people.collaborativefusion.com/~wmoran/
>
> wmoran@collaborativefusion.com
> Phone: 412-422-3463x4023


--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Simple select hangs while CPU close to 100% - Analyze

From

"Jozsef Szalay"

Date:

17 August 2007, 19:54:41

With the limited time I had, I could not produce a test case that I
could have submitted to this forum.

I have found another issue in production though. In addition to the

SELECT COUNT(*) FROM table;

taking forever (>6 hours on a table with 100,000 rows and with no
indexes on a system with Linux Fedora 3 and with two 3.2GHz Xeon
processors plus hyperthreading), the

"SELECT column_1 FROM table GROUP BY column_1 HAVING COUNT(*) > 1;

statement actually ran out of memory (on a table with 300,000 rows and
with no indexes, while the OS reported >3.5 GB virtual memory for the
postgres backend process).

To make it short, we found that both problems could be solved with a
single magic bullet, namely by calling ANALYZE every time after large
amount of changes were introduced to the table. (Earlier, we called
ANALYZE only after we did some serious post-processing on freshly
bulk-loaded data.)

I don't know why ANALYZE would have any effect on a sequential scan of a
table but it does appear to impact both performance and memory usage
significantly.

Both of our production issues have vanished after this simple change! We
do not have to call a FULL VACUUM on the table anymore. The plain VACUUM
is satisfactory.

Thanks for all the responses!
Jozsef

-----Original Message-----
From: Bill Moran [mailto:wmoran@collaborativefusion.com]
Sent: Wednesday, July 25, 2007 3:29 PM
To: Jozsef Szalay
Cc: Pavel Stehule; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%

In response to "Jozsef Szalay" <jszalay@storediq.com>:

> Our application is such that any update to the database is done by a
> single session in a batch process using bulk load. The frequency of
> these usually larger scale updates is variable but an update runs
every
> 2-3 days on average.
>
> Originally a plain VACUUM ANALYZE was executed on every affected table
> after every load.

Any other insert/update activity outside of the bulk loads?  What's
the vacuum policy outside the bulk loads?  You say originally, does
it still do so?

I agree with Pavel that the output of vacuum verbose when the problem
is occurring would be helpful.

> VACUUM FULL ANALYZE is scheduled to run on a weekly basis.

If you need to do this, then other settings are incorrect.

> I do understand the need for vacuuming. Nevertheless I expect Postgres
> to return data eventually even if I do not vacuum. In my case, the
> simple SELECT COUNT(*) FROM table; statement on a table that had
around
> 100K "live" rows has not returned the result for more than 6 hours
after
> which I manually killed it.

It should, 6 hours is too long for that process, unless you're running
a 486dx2.  You didn't mention your hardware or your postgresql.conf
settings.  What other activity is occurring during this long count()?
Can you give us a shot of the iostat output and/or top during this
phenomenon?

>
> Jozsef
>
>
> -----Original Message-----
> From: Bill Moran [mailto:wmoran@collaborativefusion.com]
> Sent: Wednesday, July 25, 2007 1:12 PM
> To: Jozsef Szalay
> Cc: Pavel Stehule; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
>
> In response to "Jozsef Szalay" <jszalay@storediq.com>:
>
> > Hi Pavel,
> >
> >
> > Yes I did vacuum. In fact the only way to "fix" this problem is
> > executing a "full" vacuum. The plain vacuum did not help.
>
> I read over my previous reply and picked up on something else ...
>
> What is your vacuum _policy_?  i.e. how often do you vacuum/analyze?
> The fact that you had to do a vacuum full to get things back under
> control tends to suggest that your current vacuum schedule is not
> aggressive enough.
>
> An explicit vacuum of this table after the large delete/insert may
> be helpful.
>
> > -----Original Message-----
> > From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
> > Sent: Sunday, July 22, 2007 10:53 AM
> > To: Jozsef Szalay
> > Cc: pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
> >
> > Hello
> >
> > did you vacuum?
> >
> > It's good technique do vacuum table after remove bigger number of
> rows.
> >
> > Regards
> > Pavel Stehule
> >
> > 2007/7/22, Jozsef Szalay <jszalay@storediq.com>:
> > >
> > >
> > >
> > >
> > > I'm having this very disturbing problem. I got a table with about
> > 100,000
> > > rows in it. Our software deletes the majority of these rows and
then
> > bulk
> > > loads another 100,000 rows into the same table. All this is
> happening
> > within
> > > a single transaction. I then perform a simple "select count(*)
from
> > ..."
> > > statement that never returns. In the mean time, the backend
Postgres
> > process
> > > is taking close to 100% of the CPU. The hang-up does not always
> happen
> > on
> > > the same statement but eventually it happens 2 out of 3 times. If
I
> > dump and
> > > then restore the schema where this table resides the problem is
gone
> > until
> > > the next time we run through the whole process of deleting,
loading
> > and
> > > querying the table.
> > >
> > >
> > >
> > > There is no other activity in the database. All requested locks
are
> > granted.
> > >
> > >
> > >
> > > Has anyone seen similar behavior?
> > >
> > >
> > >
> > > Some details:
> > >
> > >
> > >
> > > Postgres v 8.1.2
> > >
> > > Linux Fedora 3
> > >
> > >
> > >
> > > shared_buffers = 65536
> > >
> > > temp_buffers = 32768
> > >
> > > work_mem = 131072
> > >
> > > maintenance_work_mem = 131072
> > >
> > > max_stack_depth = 8192
> > >
> > > max_fsm_pages = 40000
> > >
> > > wal_buffers = 16
> > >
> > > checkpoint_segments = 16
> > >
> > >
> > >
> > >
> > >
> > > top reports
> > >
> > >
> > >
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
> > >
> > > 19478 postgres  25   0  740m 721m 536m R 99.7  4.4 609:41.16
> > postmaster
> > >
> > >
> > >
> > > ps -ef | grep postgres reports
> > >
> > >
> > >
> > > postgres 19478  8061 99 00:11 ?        10:13:03 postgres: user
dbase
> > [local]
> > > SELECT
> > >
> > >
> > >
> > > strace -p 19478
> > >
> > > no system calls reported
> > >
> > >
> > >
> > >
> > >
> > > Thanks for the help!
> > >
> > > Jozsef
> >
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
> >
> >
> >
> >
> >
> >
>
>
> --
> Bill Moran
> Collaborative Fusion Inc.
> http://people.collaborativefusion.com/~wmoran/
>
> wmoran@collaborativefusion.com
> Phone: 412-422-3463x4023


--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Simple select hangs while CPU close to 100% - Analyze

From

Alvaro Herrera

Date:

18 August 2007, 00:04:06

Jozsef Szalay escribió:

> I don't know why ANALYZE would have any effect on a sequential scan of a
> table but it does appear to impact both performance and memory usage
> significantly.

It doesn't.  What it does is provide the query optimizer with the
information that it needs to know that the table contains many different
values, which makes it turn the initial hashed aggregation into a sort
plus group aggregation.  This allows the aggregation to use less memory.

As an exercise, see an EXPLAIN of the query, both before and after the
analyze, and study the difference.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.