Thread: optimizing db for small table with tons of updates
I am using postgresql to be the central database for a variety of tools for our testing infrastructure. We have web tools and CLI tools that require access to machine configuration and other states for automation. We have one tool that uses a table that looks like this: systest_live=# \d cuty Table "public.cuty" Column | Type | Modifiers -------------+--------------------------+----------- resource_id | integer | not null lock_start | timestamp with time zone | lock_by | character varying(12) | frozen | timestamp with time zone | freeze_end | timestamp with time zone | freeze_by | character varying(12) | state | character varying(15) | Indexes: "cuty_pkey" PRIMARY KEY, btree (resource_id) "cuty_main_idx" btree (resource_id, lock_start) Foreign-key constraints: "cuty_resource_id_fkey" FOREIGN KEY (resource_id) REFERENCES resource(resource_id) ON UPDATE CASCADE ON DELETE CASCADE Various users run a tool that updates this table to determine if the particular resource is available or not. Within a course of a few days, this table can be updated up to 200,000 times. There are only about 3500 records in this table, but the update and select queries against this table start to slow down considerablly after a few days. Ideally, this table doesn't even need to be stored and written to the filesystem. After I run a vacuum against this table, the overall database performance seems to rise again. When database is running with recent vacuum the average server load is about .40, but after this table is updated 200,000+ times, the server load can go up to 5.0. here is a typical update query: 2006-04-03 10:53:39 PDT testtool systest_live kyoto.englab.juniper.net(4888) LOG: duration: 2263.741 ms statement: UPDATEcuty SET lock_start = NOW(), lock_by = 'tlim' WHERE resource_id='2262' and (lock_start IS NULL OR lock_start < (NOW() - interval '3600 second')) We used to use MySQL for these tools and we never had any issues, but I believe it is due to the transactional nature of Postgres that is adding an overhead to this problem. Are there any table options that enables the table contents to be maintained in ram only or have delayed writes for this particular table? Thanks in advance, Kenji
Kenji, > We used to use MySQL for these tools and we never had any issues, but I > believe it is due to the transactional nature of Postgres that is adding > an overhead to this problem. You're correct. > Are there any table options that enables > the table contents to be maintained in ram only or have delayed writes > for this particular table? No. That's not really the right solution anyway; if you want non-transactional data, why not just use a flat file? Or Memcached? Possible solutions: 1) if the data is non-transactional, consider using pgmemcached. 2) if you want to maintain transactions, use a combination of autovacuum and vacuum delay to do more-or-less continuous low-level vacuuming of the table. Using Postgres 8.1 will help you to be able to manage this. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Dear Kenji, we had similar issuse with a banner impression update system, that had high concurrency. we modfied the system to use insert instead of update of the same row. performance wise things are much better , but you have to keep deleting old data. hope you extrapolate what i mean if its applicable to your case. Regds Rajesh Kumar Mallah On 4/3/06, Kenji Morishige <kenjim@juniper.net> wrote: > I am using postgresql to be the central database for a variety of tools for > our testing infrastructure. We have web tools and CLI tools that require access > to machine configuration and other states for automation. We have one tool that > uses a table that looks like this: > > systest_live=# \d cuty > Table "public.cuty" > Column | Type | Modifiers > -------------+--------------------------+----------- > resource_id | integer | not null > lock_start | timestamp with time zone | > lock_by | character varying(12) | > frozen | timestamp with time zone | > freeze_end | timestamp with time zone | > freeze_by | character varying(12) | > state | character varying(15) | > Indexes: > "cuty_pkey" PRIMARY KEY, btree (resource_id) > "cuty_main_idx" btree (resource_id, lock_start) > Foreign-key constraints: > "cuty_resource_id_fkey" FOREIGN KEY (resource_id) REFERENCES resource(resource_id) ON UPDATE CASCADE ON DELETE CASCADE > > Various users run a tool that updates this table to determine if the particular > resource is available or not. Within a course of a few days, this table can > be updated up to 200,000 times. There are only about 3500 records in this > table, but the update and select queries against this table start to slow > down considerablly after a few days. Ideally, this table doesn't even need > to be stored and written to the filesystem. After I run a vacuum against this > table, the overall database performance seems to rise again. When database > is running with recent vacuum the average server load is about .40, but after > this table is updated 200,000+ times, the server load can go up to 5.0. > > here is a typical update query: > 2006-04-03 10:53:39 PDT testtool systest_live kyoto.englab.juniper.net(4888) LOG: duration: 2263.741 ms statement: UPDATEcuty SET > lock_start = NOW(), > lock_by = 'tlim' > WHERE resource_id='2262' and (lock_start IS NULL OR lock_start < (NOW() - interval '3600 second')) > > We used to use MySQL for these tools and we never had any issues, but I believe > it is due to the transactional nature of Postgres that is adding an overhead > to this problem. Are there any table options that enables the table contents > to be maintained in ram only or have delayed writes for this particular table? > > Thanks in advance, > Kenji > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
Kenji Morishige <kenjim@juniper.net> writes: > Various users run a tool that updates this table to determine if the particular > resource is available or not. Within a course of a few days, this table can > be updated up to 200,000 times. There are only about 3500 records in this > table, but the update and select queries against this table start to slow > down considerablly after a few days. Ideally, this table doesn't even need > to be stored and written to the filesystem. After I run a vacuum against this > table, the overall database performance seems to rise again. You should never have let such a table go that long without vacuuming. You might consider using autovac to take care of it for you. If you don't want to use autovac, set up a cron job that will vacuum the table at least once per every few thousand updates. regards, tom lane
I've been stumped as to how to call psql from the command line without it prompting me for a password. Is there a enviornoment variable I can specify for the password or something I can place in .pgsql? I could write a perl wrapper around it, but I've been wondering how I can call psql -c without it prompting me. Is it possible? -Kenji On Mon, Apr 03, 2006 at 02:39:10PM -0400, Tom Lane wrote: > Kenji Morishige <kenjim@juniper.net> writes: > > Various users run a tool that updates this table to determine if the particular > > resource is available or not. Within a course of a few days, this table can > > be updated up to 200,000 times. There are only about 3500 records in this > > table, but the update and select queries against this table start to slow > > down considerablly after a few days. Ideally, this table doesn't even need > > to be stored and written to the filesystem. After I run a vacuum against this > > table, the overall database performance seems to rise again. > > You should never have let such a table go that long without vacuuming. > > You might consider using autovac to take care of it for you. If you > don't want to use autovac, set up a cron job that will vacuum the table > at least once per every few thousand updates. > > regards, tom lane
Kenji Morishige wrote: > I've been stumped as to how to call psql from the command line without it > prompting me for a password. Is there a enviornoment variable I can specify for > the password or something I can place in .pgsql? I could write a perl wrapper > around it, but I've been wondering how I can call psql -c without it prompting > me. Is it possible? Sure it is. Set up a .pgpass file. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Sweet! Thanks. -Kenji On Mon, Apr 03, 2006 at 03:03:54PM -0400, Alvaro Herrera wrote: > Kenji Morishige wrote: > > I've been stumped as to how to call psql from the command line without it > > prompting me for a password. Is there a enviornoment variable I can specify for > > the password or something I can place in .pgsql? I could write a perl wrapper > > around it, but I've been wondering how I can call psql -c without it prompting > > me. Is it possible? > > Sure it is. Set up a .pgpass file. > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Kenji Morishige wrote: >> I've been stumped as to how to call psql from the command line without it >> prompting me for a password. Is there a enviornoment variable I can specify for >> the password or something I can place in .pgsql? I could write a perl wrapper >> around it, but I've been wondering how I can call psql -c without it prompting >> me. Is it possible? > Sure it is. Set up a .pgpass file. Also, consider whether a non-password-based auth method (eg, ident) might work for you. Personally, I wouldn't trust ident over TCP, but if your kernel supports it on unix-socket connections it is secure. regards, tom lane
Cool, looks like I had tried the .pgpass thing a while back and wasn't working, I realized I had a typo or something in there. It works like a charm. Security in our intranet is not a big issue at the moment. Thanks for the help! -Kenji On Mon, Apr 03, 2006 at 03:23:50PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Kenji Morishige wrote: > >> I've been stumped as to how to call psql from the command line without it > >> prompting me for a password. Is there a enviornoment variable I can specify for > >> the password or something I can place in .pgsql? I could write a perl wrapper > >> around it, but I've been wondering how I can call psql -c without it prompting > >> me. Is it possible? > > > Sure it is. Set up a .pgpass file. > > Also, consider whether a non-password-based auth method (eg, ident) > might work for you. Personally, I wouldn't trust ident over TCP, but > if your kernel supports it on unix-socket connections it is secure. > > regards, tom lane