Thread: optimizing db for small table with tons of updates

optimizing db for small table with tons of updates

From
Kenji Morishige
Date:
I am using postgresql to be the central database for a variety of tools for
our testing infrastructure. We have web tools and CLI tools that require access
to machine configuration and other states for automation.  We have one tool that
uses a table that looks like this:

systest_live=# \d cuty
                Table "public.cuty"
   Column    |           Type           | Modifiers
-------------+--------------------------+-----------
 resource_id | integer                  | not null
 lock_start  | timestamp with time zone |
 lock_by     | character varying(12)    |
 frozen      | timestamp with time zone |
 freeze_end  | timestamp with time zone |
 freeze_by   | character varying(12)    |
 state       | character varying(15)    |
Indexes:
    "cuty_pkey" PRIMARY KEY, btree (resource_id)
    "cuty_main_idx" btree (resource_id, lock_start)
Foreign-key constraints:
    "cuty_resource_id_fkey" FOREIGN KEY (resource_id) REFERENCES resource(resource_id) ON UPDATE CASCADE ON DELETE
CASCADE

Various users run a tool that updates this table to determine if the particular
resource is available or not.  Within a course of a few days, this table can
be updated up to 200,000 times.  There are only about 3500 records in this
table, but the update and select queries against this table start to slow
down considerablly after a few days.  Ideally, this table doesn't even need
to be stored and written to the filesystem.  After I run a vacuum against this
table, the overall database performance seems to rise again.  When database
is running with recent vacuum the average server load is about .40, but after
this table is updated 200,000+ times, the server load can go up to 5.0.

here is a typical update query:
2006-04-03 10:53:39 PDT testtool systest_live kyoto.englab.juniper.net(4888) LOG:  duration: 2263.741 ms  statement:
UPDATEcuty SET 
         lock_start = NOW(),
         lock_by = 'tlim'
        WHERE resource_id='2262' and (lock_start IS NULL OR lock_start < (NOW() - interval '3600 second'))

We used to use MySQL for these tools and we never had any issues, but I believe
it is due to the transactional nature of Postgres that is adding an overhead
to this problem.  Are there any table options that enables the table contents
to be maintained in ram only or have delayed writes for this particular table?

Thanks in advance,
Kenji

Re: optimizing db for small table with tons of updates

From
Josh Berkus
Date:
Kenji,

> We used to use MySQL for these tools and we never had any issues, but I
> believe it is due to the transactional nature of Postgres that is adding
> an overhead to this problem.  

You're correct.

> Are there any table options that enables
> the table contents to be maintained in ram only or have delayed writes
> for this particular table?

No.  That's not really the right solution anyway; if you want
non-transactional data, why not just use a flat file?  Or Memcached?

Possible solutions:
1) if the data is non-transactional, consider using pgmemcached.
2) if you want to maintain transactions, use a combination of autovacuum
and vacuum delay to do more-or-less continuous low-level vacuuming of the
table.  Using Postgres 8.1 will help you to be able to manage this.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: optimizing db for small table with tons of updates

From
"Rajesh Kumar Mallah"
Date:
Dear Kenji,

we had similar issuse with a banner impression update system,
that had high concurrency. we modfied the system to use insert
instead of update of the same row. performance wise things are
much better , but you have to keep deleting old data.

hope you extrapolate what i mean if its applicable to your case.

Regds
Rajesh Kumar Mallah

On 4/3/06, Kenji Morishige <kenjim@juniper.net> wrote:
> I am using postgresql to be the central database for a variety of tools for
> our testing infrastructure. We have web tools and CLI tools that require access
> to machine configuration and other states for automation.  We have one tool that
> uses a table that looks like this:
>
> systest_live=# \d cuty
>                 Table "public.cuty"
>    Column    |           Type           | Modifiers
> -------------+--------------------------+-----------
>  resource_id | integer                  | not null
>  lock_start  | timestamp with time zone |
>  lock_by     | character varying(12)    |
>  frozen      | timestamp with time zone |
>  freeze_end  | timestamp with time zone |
>  freeze_by   | character varying(12)    |
>  state       | character varying(15)    |
> Indexes:
>     "cuty_pkey" PRIMARY KEY, btree (resource_id)
>     "cuty_main_idx" btree (resource_id, lock_start)
> Foreign-key constraints:
>     "cuty_resource_id_fkey" FOREIGN KEY (resource_id) REFERENCES resource(resource_id) ON UPDATE CASCADE ON DELETE
CASCADE
>
> Various users run a tool that updates this table to determine if the particular
> resource is available or not.  Within a course of a few days, this table can
> be updated up to 200,000 times.  There are only about 3500 records in this
> table, but the update and select queries against this table start to slow
> down considerablly after a few days.  Ideally, this table doesn't even need
> to be stored and written to the filesystem.  After I run a vacuum against this
> table, the overall database performance seems to rise again.  When database
> is running with recent vacuum the average server load is about .40, but after
> this table is updated 200,000+ times, the server load can go up to 5.0.
>
> here is a typical update query:
> 2006-04-03 10:53:39 PDT testtool systest_live kyoto.englab.juniper.net(4888) LOG:  duration: 2263.741 ms  statement:
UPDATEcuty SET 
>          lock_start = NOW(),
>          lock_by = 'tlim'
>         WHERE resource_id='2262' and (lock_start IS NULL OR lock_start < (NOW() - interval '3600 second'))
>
> We used to use MySQL for these tools and we never had any issues, but I believe
> it is due to the transactional nature of Postgres that is adding an overhead
> to this problem.  Are there any table options that enables the table contents
> to be maintained in ram only or have delayed writes for this particular table?
>
> Thanks in advance,
> Kenji
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

Re: optimizing db for small table with tons of updates

From
Tom Lane
Date:
Kenji Morishige <kenjim@juniper.net> writes:
> Various users run a tool that updates this table to determine if the particular
> resource is available or not.  Within a course of a few days, this table can
> be updated up to 200,000 times.  There are only about 3500 records in this
> table, but the update and select queries against this table start to slow
> down considerablly after a few days.  Ideally, this table doesn't even need
> to be stored and written to the filesystem.  After I run a vacuum against this
> table, the overall database performance seems to rise again.

You should never have let such a table go that long without vacuuming.

You might consider using autovac to take care of it for you.  If you
don't want to use autovac, set up a cron job that will vacuum the table
at least once per every few thousand updates.

            regards, tom lane

Re: optimizing db for small table with tons of updates

From
Kenji Morishige
Date:
I've been stumped as to how to call psql from the command line without it
prompting me for a password. Is there a enviornoment variable I can specify for
the password or something I can place in .pgsql?  I could write a perl wrapper
around it, but I've been wondering how I can call psql -c without it prompting
me. Is it possible?

-Kenji

On Mon, Apr 03, 2006 at 02:39:10PM -0400, Tom Lane wrote:
> Kenji Morishige <kenjim@juniper.net> writes:
> > Various users run a tool that updates this table to determine if the particular
> > resource is available or not.  Within a course of a few days, this table can
> > be updated up to 200,000 times.  There are only about 3500 records in this
> > table, but the update and select queries against this table start to slow
> > down considerablly after a few days.  Ideally, this table doesn't even need
> > to be stored and written to the filesystem.  After I run a vacuum against this
> > table, the overall database performance seems to rise again.
>
> You should never have let such a table go that long without vacuuming.
>
> You might consider using autovac to take care of it for you.  If you
> don't want to use autovac, set up a cron job that will vacuum the table
> at least once per every few thousand updates.
>
>             regards, tom lane

Re: optimizing db for small table with tons of updates

From
Alvaro Herrera
Date:
Kenji Morishige wrote:
> I've been stumped as to how to call psql from the command line without it
> prompting me for a password. Is there a enviornoment variable I can specify for
> the password or something I can place in .pgsql?  I could write a perl wrapper
> around it, but I've been wondering how I can call psql -c without it prompting
> me. Is it possible?

Sure it is.  Set up a .pgpass file.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: optimizing db for small table with tons of updates

From
Kenji Morishige
Date:
Sweet! Thanks.
-Kenji

On Mon, Apr 03, 2006 at 03:03:54PM -0400, Alvaro Herrera wrote:
> Kenji Morishige wrote:
> > I've been stumped as to how to call psql from the command line without it
> > prompting me for a password. Is there a enviornoment variable I can specify for
> > the password or something I can place in .pgsql?  I could write a perl wrapper
> > around it, but I've been wondering how I can call psql -c without it prompting
> > me. Is it possible?
>
> Sure it is.  Set up a .pgpass file.
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.

Re: optimizing db for small table with tons of updates

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Kenji Morishige wrote:
>> I've been stumped as to how to call psql from the command line without it
>> prompting me for a password. Is there a enviornoment variable I can specify for
>> the password or something I can place in .pgsql?  I could write a perl wrapper
>> around it, but I've been wondering how I can call psql -c without it prompting
>> me. Is it possible?

> Sure it is.  Set up a .pgpass file.

Also, consider whether a non-password-based auth method (eg, ident)
might work for you.  Personally, I wouldn't trust ident over TCP, but
if your kernel supports it on unix-socket connections it is secure.

            regards, tom lane

Re: optimizing db for small table with tons of updates

From
Kenji Morishige
Date:
Cool, looks like I had tried the .pgpass thing a while back and wasn't working,
I realized I had a typo or something in there. It works like a charm. Security
in our intranet is not a big issue at the moment. Thanks for the help!
-Kenji

On Mon, Apr 03, 2006 at 03:23:50PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Kenji Morishige wrote:
> >> I've been stumped as to how to call psql from the command line without it
> >> prompting me for a password. Is there a enviornoment variable I can specify for
> >> the password or something I can place in .pgsql?  I could write a perl wrapper
> >> around it, but I've been wondering how I can call psql -c without it prompting
> >> me. Is it possible?
>
> > Sure it is.  Set up a .pgpass file.
>
> Also, consider whether a non-password-based auth method (eg, ident)
> might work for you.  Personally, I wouldn't trust ident over TCP, but
> if your kernel supports it on unix-socket connections it is secure.
>
>             regards, tom lane