Thread: High activity short table and locks

High activity short table and locks

From
"Guillaume Bog"
Date:
Hi!

We have been using postgresql since a while without problems. But now I find we experience some slowness and the weird thing is that it seems to happen because of a very short table (less than 200 lines), called "lockers" (see below it's structure).

This table is accessed very often by a lot of different sessions, with SELECT, UPDATE and DELETE statements. We keep in this table some shared state about our application users, so we make sure they don't work together on the same things, thus the name.

Maybe we do something really wrong. Maybe we even should not use a database for this kind of persistence (we tried a simple file but it was much worse). We have improved the code by merging many little SELECTs into one bigger with all the results needed. This ridiculously short table should hold in cache memory, I'd suppose, so why is it so slow?

Today I found something that could help me to find an answer: by running the "locks" sql below I have above 100 lines of results, with many locks detected.

I have added the indexes sometime ago because, to my great surprise, it did really improve the speed. I just tried to remove them on the fly and it was worse, *but* I had less locks.

It seems I'm a bit stuck here. I'd appreciate some help. My main general question is "how to handle very small but hot status table that has to be updated every 30 seconds by 100 different persons, read and updated from many sides, and also joined with some more common tables (i.e. much larger but less hot)"

Thanks for your help. Please find below some tech info.


Table structure

        Column         |           Type           |                       Modifiers                       
-----------------------+--------------------------+--------------------------------------------------------
 l_id                  | integer                  | not null default nextval('lockers_l_id_seq'::regclass)
 l_xref_u_id           | integer                  |
 l_type                | character varying        |
 l_what                | character varying        |
 l_status              | character varying        |
 l_tech_modification   | timestamp with time zone | not null default now()
 l_tech_creation       | timestamp with time zone | not null default now()
 l_tech_deleted        | boolean                  | not null default false
 l_status_modification | timestamp with time zone |
 l_comment             | character varying        |
 l_csl                 | character varying        |
Indexes:
    "lockers_pkey" PRIMARY KEY, btree (l_id)
    "lockers_l_csl_idx" btree (l_csl)
    "lockers_l_type_idx" btree (l_type)
    "lockers_l_what_idx" btree (l_what)
    "lockers_l_xref_u_id_idx" btree (l_xref_u_id)


Lockers SQL

SELECT pg_stat_activity.datname,pg_class.relname,pg_locks.transaction, pg_locks.mode, pg_locks.granted,
pg_stat_activity.usename, substr(pg_stat_activity.current_query,1,20),
to_char(pg_stat_activity.query_start,'HH24:MI'),
to_char(age(now(),pg_stat_activity.query_start),'HH24:MI') AS "age", pg_stat_activity.procpid
FROM pg_stat_activity,pg_locks LEFT OUTER JOIN pg_class ON (pg_locks.relation = pg_class.oid)
WHERE pg_locks.pid=pg_stat_activity.procpid
ORDER BY query_start;


Sample result of lockers SQL

 vf_cn2fr | lockers                        |   468474452 | RowExclusiveLock | t       | vf_cn2fr | UPDATE lockers SET l | 15:50   | 00:00 |   30395
 vf_cn2fr | lockers                        |   468474458 | AccessShareLock  | t       | vf_cn2fr | SELECT * FROM locker | 15:50   | 00:00 |   30414
 vf_cn2fr | lockers_l_csl_idx              |   468474458 | AccessShareLock  | t       | vf_cn2fr | SELECT * FROM locker | 15:50   | 00:00 |   30414
 vf_cn2fr | lockers_l_what_idx             |   468474458 | AccessShareLock  | t       | vf_cn2fr | SELECT * FROM locker | 15:50   | 00:00 |   30414
 vf_cn2fr | lockers_l_type_idx             |   468474458 | AccessShareLock  | t       | vf_cn2fr | SELECT * FROM locker | 15:50   | 00:00 |   30414
 vf_cn2fr | lockers_l_xref_u_id_idx        |   468474458 | AccessShareLock  | t       | vf_cn2fr | SELECT * FROM locker | 15:50   | 00:00 |   30414
 vf_cn2fr | lockers_pkey                   |   468474458 | AccessShareLock  | t       | vf_cn2fr | SELECT * FROM locker | 15:50   | 00:00 |   30414
 vf_cn2fr |                                |   468474458 | ExclusiveLock    | t       | vf_cn2fr | SELECT * FROM locker | 15:50   | 00:00 |   30414
 vf_cn2fr | pg_locks                       |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_database                    |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_authid_oid_index            |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_class                       |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr |                                |   468474459 | ExclusiveLock    | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_class_relname_nsp_index     |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_class_oid_index             |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_authid                      |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_authid_rolname_index        |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_stat_activity               |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_database_datname_index      |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008
 vf_cn2fr | pg_database_oid_index          |   468474459 | AccessShareLock  | t       | postgres | SELECT pg_stat_activ | 15:50   | 00:00 |   18008

Re: High activity short table and locks

From
Richard Huxton
Date:
Guillaume Bog wrote:
> It seems I'm a bit stuck here. I'd appreciate some help. My main general
> question is "how to handle very small but hot status table that has to be
> updated every 30 seconds by 100 different persons, read and updated from
> many sides, and also joined with some more common tables (i.e. much larger
> but less hot)"

Remove all indexes except the one backing the primary-key. Run a VACUUM
  FULL and REINDEX or CLUSTER the table. Vacuum *very frequently* -
you'll want custom values in pg_autovacuum. Add indexes back one at a
time to see what's really necessary. If you can keep the dead rows to a
reasonable level, I'd have thought you could get by without indexes.

You might want to consider setting synchronous_commit=off for updates to
the table. I'm assuming the information in the table isn't vital in the
event of a system crash, and that could reduce WAL activity if you're
limited by disk bandwidth.

--
   Richard Huxton
   Archonet Ltd

Re: High activity short table and locks

From
"Guillaume Bog"
Date:


On Wed, Jul 23, 2008 at 4:50 PM, Richard Huxton <dev@archonet.com> wrote:
Guillaume Bog wrote:
It seems I'm a bit stuck here. I'd appreciate some help. My main general
question is "how to handle very small but hot status table that has to be
updated every 30 seconds by 100 different persons, read and updated from
many sides, and also joined with some more common tables (i.e. much larger
but less hot)"

Remove all indexes except the one backing the primary-key. Run a VACUUM  FULL and REINDEX or CLUSTER the table. Vacuum *very frequently* - you'll want custom values in pg_autovacuum. Add indexes back one at a time to see what's really necessary. If you can keep the dead rows to a reasonable level, I'd have thought you could get by without indexes.

Yes, such a small table, very frequently updated, would suggest no index at all. I understand that I may have many dead rows, that would explain that fact that indexes do speed up the selects. We have a daily VACUUM FULL on the database, but it may be not enough. I'll check tomorrow if a verboze vacuum tell me that many rows are dead. The problem is that I need to work directly on production server, as everything goes very well when there is not enough people actually working.

 
You might want to consider setting synchronous_commit=off for updates to the table. I'm assuming the information in the table isn't vital in the event of a system crash, and that could reduce WAL activity if you're limited by disk bandwidth.

Yes, this table's data is very short lived and can be lost without problem in case of a crash. I could even have no WAL at all for this table if it is possible. In my mind, this data should be stored and modified in a fixed chunk of RAM and never go to hard-drive.
 
Thanks for your help. I'll try your suggestions tomorrow.

Re: High activity short table and locks

From
Richard Huxton
Date:
Guillaume Bog wrote:
> On Wed, Jul 23, 2008 at 11:17 PM, Richard Huxton <dev@archonet.com> wrote:
>
> I tried a vacuum full and had to stop it as it was blocking the server for
> too long. Below is the partial results I got. It seems you are right:
> enormous amount of dead space and rows. I did the same vacuum later and it
> seems to have improved a lot the performance. I need to check again
> tomorrow.
>
> We don't have autovacuum, but as it seems autovacuum cannot target a
> specific table, I may prefer do it by cron every minute, as you suggest.

There's a pg_autovacuum system table that lets you tune things
table-by-table. See the manual for details. In your case, a manual
vacuum once a minute will be a huge step forward anyway.

> vf_cn2fr=# VACUUM FULL VERBOSE lockers ;
> INFO:  vacuuming "public.lockers"
> INFO:  "lockers": found 4228421 removable, 107 nonremovable row versions in
> 64803 pages

Well, that table is at least 1000 times larger than it needs to be.

If you've run a VACUUM FULL, you'll want to run REINDEX on all the
indexes on that table too.

> 64803 pages containing 512643700 free bytes are potential move destinations.

Ouch! that's a 500MB table holding 100 live rows.

>> You could fiddle around setting up ramdisks and pointing tablespaces there,
>> but I'm not sure it's worth it.
>
> If it is possible to have no WAL at all on this table, I'd prefer to try it.
> It seems completely useless and is probably taking a fair amount of i/o.
>
> It's a bit early to be sure if the solution is there, but I feel you already
> did throw some good light on my dark path, I have to thank you for that.

Afraid not. The synchronous_commit setting can reduce the disk I/O though.

--
   Richard Huxton
   Archonet Ltd

Re: High activity short table and locks

From
"Guillaume Bog"
Date:
Hello,

My first impression is that vacuuming the offending table very often helps a lot. I'm doing it by hand for now but I will have a cronjob for this. By the way, it seems I don't need thoses indexes anymore. Thanks a lot for your helpful advices.

On Thu, Jul 24, 2008 at 6:19 PM, Richard Huxton <dev@archonet.com> wrote:
Guillaume Bog wrote:
On Wed, Jul 23, 2008 at 11:17 PM, Richard Huxton <dev@archonet.com> wrote:

I tried a vacuum full and had to stop it as it was blocking the server for
too long. Below is the partial results I got. It seems you are right:
enormous amount of dead space and rows. I did the same vacuum later and it
seems to have improved a lot the performance. I need to check again
tomorrow.

We don't have autovacuum, but as it seems autovacuum cannot target a
specific table, I may prefer do it by cron every minute, as you suggest.

There's a pg_autovacuum system table that lets you tune things table-by-table. See the manual for details. In your case, a manual vacuum once a minute will be a huge step forward anyway.


vf_cn2fr=# VACUUM FULL VERBOSE lockers ;
INFO:  vacuuming "public.lockers"
INFO:  "lockers": found 4228421 removable, 107 nonremovable row versions in
64803 pages

Well, that table is at least 1000 times larger than it needs to be.

If you've run a VACUUM FULL, you'll want to run REINDEX on all the indexes on that table too.


64803 pages containing 512643700 free bytes are potential move destinations.

Ouch! that's a 500MB table holding 100 live rows.


You could fiddle around setting up ramdisks and pointing tablespaces there,
but I'm not sure it's worth it.

If it is possible to have no WAL at all on this table, I'd prefer to try it.
It seems completely useless and is probably taking a fair amount of i/o.

It's a bit early to be sure if the solution is there, but I feel you already
did throw some good light on my dark path, I have to thank you for that.

Afraid not. The synchronous_commit setting can reduce the disk I/O though.

--
 Richard Huxton
 Archonet Ltd

Re: High activity short table and locks

From
Richard Huxton
Date:
Guillaume Bog wrote:
> Hello,
>
> My first impression is that vacuuming the offending table very often helps a
> lot. I'm doing it by hand for now but I will have a cronjob for this. By the
> way, it seems I don't need thoses indexes anymore. Thanks a lot for your
> helpful advices.

Excellent! Oh, when you set up your cron-job, it's probably better to
vacuum this sort of table too often rather than not enough.

--
   Richard Huxton
   Archonet Ltd