Thread: very long updates very small tables

very long updates very small tables

From
Lars Feistner
Date:
Dear list,

we have got a web application and when people log in some information is
written to the user tables. We have around 500 active users, but at the
most 5 people are logged in at the same time. From times to times people
log in and then the application is not responsive any more.

What we see in the postgres server logs is that processes are waiting
for other transactions to finish though not because of a deadlock.

The log tells me that certain update statements take sometimes about
3-10 minutes. But we are talking about updates on tables with 1000 to
10000 rows and updates that are supposed to update 1 row.

We are running under windows 2008 and postgres 8.4.7. ( Sorry for the
windows, it was not MY first choice )

My only explanation at the moment would be, that there must be any kind
of windows process that stops all other processes until it is finished
or something like that. ( Could it even be autovaccuum? ). Is there a
way to find out how long autovaccum took ? Has anyone seen anything
similiar? Or could it really be that we need a bigger machine with more
io? But the one disk in the system still seems not very busy and
response times in windows resource monitor are not higher than 28 ms.

Following is an excerpt of our server log.

LOG:  process 1660 acquired ShareLock on transaction 74652 after
533354.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  process 4984 acquired ShareLock on transaction 74652 after
1523530.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  process 956 acquired ExclusiveLock on tuple (4,188) of relation
16412 of database 16384 after 383055.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  process 4312 acquired ExclusiveLock on tuple (9,112) of relation
16412 of database 16384 after 1422677.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  duration: 1523567.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 't', $2 = '1362'
LOG:  duration: 533391.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 't', $2 = '31'
LOG:  process 5504 acquired ExclusiveLock on tuple (9,112) of relation
16412 of database 16384 after 183216.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  process 1524 acquired ExclusiveLock on tuple (4,188) of relation
16412 of database 16384 after 376370.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  duration: 1422688.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 't', $2 = '1362'
LOG:  duration: 383067.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 'f', $2 = '31'
LOG:  process 4532 acquired ExclusiveLock on tuple (9,112) of relation
16412 of database 16384 after 118851.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  process 4448 acquired ExclusiveLock on tuple (4,188) of relation
16412 of database 16384 after 366304.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  duration: 183241.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 't', $2 = '1362'
LOG:  duration: 376395.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 't', $2 = '31'
LOG:  process 4204 acquired ExclusiveLock on tuple (4,188) of relation
16412 of database 16384 after 339893.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  duration: 366342.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 't', $2 = '31'
LOG:  process 4760 acquired ExclusiveLock on tuple (4,188) of relation
16412 of database 16384 after 205943.000 ms
STATEMENT:  UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
LOG:  duration: 339923.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 't', $2 = '31'
LOG:  duration: 205963.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 't', $2 = '31'
LOG:  duration: 124654.000 ms  execute <unnamed>: UPDATE
extjs_recentlist SET visible=$1 WHERE recentlist_id=$2
DETAIL:  parameters: $1 = 't', $2 = '1362'
LOG:  process 3844 still waiting for ShareLock on transaction 74839
after 8000.000 ms

Thanx in advance.

Lars
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lars Feistner

Kompetenzzentrum für Prüfungen in der Medizin
Medizinische Fakultät Heidelberg,
Im Neuenheimer Feld 346, Raum 013
69120 Heidelberg

E-Mail: feistner@uni-heidelberg.de
Fon:   +49-6221-56-8269
Fax:   +49-6221-56-7175

WWW:   http://www.ims-m.de
        http://www.kompmed.de
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: very long updates very small tables

From
"Kevin Grittner"
Date:
Lars Feistner <feistner@uni-heidelberg.de> wrote:

> The log tells me that certain update statements take sometimes
> about 3-10 minutes. But we are talking about updates on tables
> with 1000 to 10000 rows and updates that are supposed to update 1
> row.

The top possibilities that come to my mind are:

(1)  The tables are horribly bloated.  If autovacuum is off or not
aggressive enough, things can degenerate to this level.

(2)  Memory is over-committed and your machine is thrashing.

(3)  There are explicit LOCK commands in the software which is
contributing to the blocking.

(4)  There is some external delay within the transaction, such as
waiting for user input while the transaction is open.

Maybe there's a combination of the above at play.  Can you rule any
of these out?

-Kevin

Re: very long updates very small tables

From
Lars Feistner
Date:
Hello Kevin,


On 03/29/2011 09:28 PM, Kevin Grittner wrote:
> Lars Feistner<feistner@uni-heidelberg.de>  wrote:
>
>> The log tells me that certain update statements take sometimes
>> about 3-10 minutes. But we are talking about updates on tables
>> with 1000 to 10000 rows and updates that are supposed to update 1
>> row.
>
> The top possibilities that come to my mind are:
>
> (1)  The tables are horribly bloated.  If autovacuum is off or not
> aggressive enough, things can degenerate to this level.
>
Some tables are auto vacuumed regularly others are not. The specific
table extjs_recentlist was never autovacuumed. So i would think that
updates on this table should be always very slow, but they are not. Only
every 4 or 5th day for maybe half an hour and then everything is fine
again. And;-) there is no anti virus installed.
> (2)  Memory is over-committed and your machine is thrashing.
>
We can rule this out. There is enough memory installed and the database
is less than 500MB.
> (3)  There are explicit LOCK commands in the software which is
> contributing to the blocking.
We use the the jdbc driver. The jdbc driver might do some locking but we
don't.
>
> (4)  There is some external delay within the transaction, such as
> waiting for user input while the transaction is open.
>
No, no user interaction within a transaction.
> Maybe there's a combination of the above at play.  Can you rule any
> of these out?
>
> -Kevin
>
So, i will try to get the autovacuum to be more aggressive and will
report again if nothing changes.

Thanks a lot.
Lars

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lars Feistner

Kompetenzzentrum für Prüfungen in der Medizin
Medizinische Fakultät Heidelberg,
Im Neuenheimer Feld 346, Raum 013
69120 Heidelberg

E-Mail: feistner@uni-heidelberg.de
Fon:   +49-6221-56-8269
Fax:   +49-6221-56-7175

WWW:   http://www.ims-m.de
        http://www.kompmed.de
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: very long updates very small tables

From
pasman pasmański
Date:
2011/3/30, Lars Feistner <feistner@uni-heidelberg.de>:
> Hello Kevin,
>
>
> On 03/29/2011 09:28 PM, Kevin Grittner wrote:
>> Lars Feistner<feistner@uni-heidelberg.de>  wrote:
>>
>>> The log tells me that certain update statements take sometimes
>>> about 3-10 minutes. But we are talking about updates on tables
>>> with 1000 to 10000 rows and updates that are supposed to update 1
>>> row.
>>
>> The top possibilities that come to my mind are:
>>
>> (1)  The tables are horribly bloated.  If autovacuum is off or not
>> aggressive enough, things can degenerate to this level.
>>
> Some tables are auto vacuumed regularly others are not. The specific
> table extjs_recentlist was never autovacuumed. So i would think that
> updates on this table should be always very slow, but they are not. Only
> every 4 or 5th day for maybe half an hour and then everything is fine
> again. And;-) there is no anti virus installed.
>> (2)  Memory is over-committed and your machine is thrashing.
>>
> We can rule this out. There is enough memory installed and the database
> is less than 500MB.
>> (3)  There are explicit LOCK commands in the software which is
>> contributing to the blocking.
> We use the the jdbc driver. The jdbc driver might do some locking but we
> don't.
>>
>> (4)  There is some external delay within the transaction, such as
>> waiting for user input while the transaction is open.
>>
> No, no user interaction within a transaction.
>> Maybe there's a combination of the above at play.  Can you rule any
>> of these out?
>>
>> -Kevin
>>
> So, i will try to get the autovacuum to be more aggressive and will
> report again if nothing changes.
>
> Thanks a lot.
> Lars
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Lars Feistner
>
> Kompetenzzentrum für Prüfungen in der Medizin
> Medizinische Fakultät Heidelberg,
> Im Neuenheimer Feld 346, Raum 013
> 69120 Heidelberg
>
> E-Mail: feistner@uni-heidelberg.de
> Fon:   +49-6221-56-8269
> Fax:   +49-6221-56-7175
>
> WWW:   http://www.ims-m.de
>         http://www.kompmed.de
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Hi. try to log all statements for an hour and show us it. And Postgresql.conf .



------------
pasman

Re: very long updates very small tables

From
"Kevin Grittner"
Date:
Lars Feistner <feistner@uni-heidelberg.de> wrote:
> On 03/29/2011 09:28 PM, Kevin Grittner wrote:
>> Lars Feistner<feistner@uni-heidelberg.de>  wrote:
>>
>>> The log tells me that certain update statements take sometimes
>>> about 3-10 minutes. But we are talking about updates on tables
>>> with 1000 to 10000 rows and updates that are supposed to update
>>> 1 row.
>>
>> The top possibilities that come to my mind are:

> [all eliminated as possibilities]

If you haven't already done so, you should probably turn on
checkpoint logging to see if this corresponds to checkpoint
activity.  If it does, you can try cranking up how aggressive your
background writer is, and perhaps limiting your shared_buffers to
something around the size of your RAID controller's BBU cache.  (I
hope you have a RAID controller with BBU cache configured for
write-back, anyway.)

-Kevin

Re: very long updates very small tables

From
Lars Feistner
Date:

On 03/30/2011 06:54 PM, Kevin Grittner wrote:
> Lars Feistner<feistner@uni-heidelberg.de>  wrote:
>> On 03/29/2011 09:28 PM, Kevin Grittner wrote:
>>> Lars Feistner<feistner@uni-heidelberg.de>   wrote:
>>>
>>>> The log tells me that certain update statements take sometimes
>>>> about 3-10 minutes. But we are talking about updates on tables
>>>> with 1000 to 10000 rows and updates that are supposed to update
>>>> 1 row.
>>>
>>> The top possibilities that come to my mind are:
>
>> [all eliminated as possibilities]
>
> If you haven't already done so, you should probably turn on
> checkpoint logging to see if this corresponds to checkpoint
> activity.  If it does, you can try cranking up how aggressive your
> background writer is, and perhaps limiting your shared_buffers to
> something around the size of your RAID controller's BBU cache.  (I
> hope you have a RAID controller with BBU cache configured for
> write-back, anyway.)
>
> -Kevin
>

Hello Kevin,

i am sorry to disappoint you here. As I said in my first E-Mail we don't
have much traffic and the database fits easily into memory. The traffic
might increase, at least it was increasing the last 12 months. The
database will always fit into memory.
No, we don't have a raid and thus we don't have a bbu. Actually we
started off with a big SAN that our data centre offered. But sometimes
this SAN was a bit slow and when we first encountered the very long
updates i thought there was a connection between the long running
updates and the slowliness of the SAN, so i started to use the local
disk (we are talking about one disk not disks) for the database. I am
still seeing the long running inserts and updates. I am still following
the auto vacuum trail, it does still not run frequently enough. Thanks a
lot for the replies so far. I will keep you guys informed about my next
steps and the results.

Thanx a lot
Lars

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lars Feistner

Kompetenzzentrum für Prüfungen in der Medizin
Medizinische Fakultät Heidelberg,
Im Neuenheimer Feld 346, Raum 013
69120 Heidelberg

E-Mail: feistner@uni-heidelberg.de
Fon:   +49-6221-56-8269
Fax:   +49-6221-56-7175

WWW:   http://www.ims-m.de
        http://www.kompmed.de
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: very long updates very small tables

From
"Kevin Grittner"
Date:
Lars Feistner <feistner@uni-heidelberg.de> wrote:
> On 03/30/2011 06:54 PM, Kevin Grittner wrote:

>> If you haven't already done so, you should probably turn on
>> checkpoint logging to see if this corresponds to checkpoint
>> activity.  If it does, you can try cranking up how aggressive
>> your background writer is, and perhaps limiting your
>> shared_buffers to something around the size of your RAID
>> controller's BBU cache. (I hope you have a RAID controller with
>> BBU cache configured for write-back, anyway.)

> i am sorry to disappoint you here. As I said in my first E-Mail we
> don't have much traffic and the database fits easily into memory.
> The traffic might increase, at least it was increasing the last 12
> months. The database will always fit into memory.
> No, we don't have a raid and thus we don't have a bbu. Actually
> we started off with a big SAN that our data centre offered. But
> sometimes this SAN was a bit slow and when we first encountered
> the very long updates i thought there was a connection between the
> long running updates and the slowliness of the SAN, so i started
> to use the local disk (we are talking about one disk not disks)
> for the database. I am still seeing the long running inserts and
> updates. I am still following the auto vacuum trail, it does still
> not run frequently enough. Thanks a lot for the replies so far. I
> will keep you guys informed about my next steps and the results.

Nothing there makes a write glut on checkpoint less likely to be the
cause.  Without a BBU write-back cache it is actually *more* likely,
and having enough RAM to hold the whole database makes it *more*
likely.  If you haven't placed your pg_xlog directory on a separate
file system, it is also more likely.

Turning on logging of checkpoint activity and checking whether that
correlates with your problem times is strongly indicated.

-Kevin

Re: very long updates very small tables

From
Tomas Vondra
Date:
Dne 4.4.2011 16:32, Kevin Grittner napsal(a):
> Nothing there makes a write glut on checkpoint less likely to be the
> cause.  Without a BBU write-back cache it is actually *more* likely,
> and having enough RAM to hold the whole database makes it *more*
> likely.  If you haven't placed your pg_xlog directory on a separate
> file system, it is also more likely.
>
> Turning on logging of checkpoint activity and checking whether that
> correlates with your problem times is strongly indicated.
>
> -Kevin

Checkpoints would be my first guess too, but the whole database is just
500MB. Lars, how did you get this number? Did you measure the amount of
disk space occupied or somehow else?

BTW how much memory is there (total RAM and dedicated to shared
buffers)? How many checkpoint segments are there?

Have you monitored the overall behavior of the system (what processes
are running etc.) when the problems occur? I don't have much experience
with Windows but tools from sysinternals are reasonable.

And yet another idea - have you tried to use the stats collected by
PostgreSQL? I mean the pg_stat_ tables, especially pg_stat_bgwriter and
maybe pg_stat_all_tables. Those numbers are cummulative, so do two
snapshot when the problems are happening and subtract them to get an
idea of what's going on.

regards
Tomas