Thread: how delete/insert/update affects select performace?
Hi!
I have a table not too big but with aprox. 5 millions of rows, this table must have 300 to 400 select per second. But also must have 10~20 delete/insert/update per second.
So, I need to know if the insert/delete/update really affect the select performance and how to deal with it.
The table structure is very simple:
account_id integer (PK)
service_id integer (PK)
enabled char(1)
The index created on this has the same 3 columns.
Most of time the table has more insert or delete than update, when update occur the column changed is enabled;
Thanks!
On 8 Září 2011, 14:51, Anibal David Acosta wrote: > Hi! > > > > I have a table not too big but with aprox. 5 millions of rows, this table > must have 300 to 400 select per second. But also must have 10~20 > delete/insert/update per second. > > So, I need to know if the insert/delete/update really affect the select > performance and how to deal with it. Yes, insert/update do affect query performance, because whenever a row is modified a new copy is created. So the table might grow over time, and bigger tables mean more data to read. There are two ways to prevent this: 1) autovacuum - has to be configured properly (watch the table size and number of rows, and if it grows then make it a bit more aggressive) 2) HOT > The table structure is very simple: > > account_id integer (PK) > > service_id integer (PK) > > enabled char(1) > > The index created on this has the same 3 columns. > > Most of time the table has more insert or delete than update, when update > occur the column changed is enabled; So there's one index on all three columns? I'd remove the "enabled" from the index, it's not going to help much I guess and it makes HOT possible (the modified column must not be indexed). Plus there will be one less index (the other two columns are already a PK, so there's a unique index). Tomas
"Anibal David Acosta" <aa@devshock.com> wrote: > I have a table not too big but with aprox. 5 millions of rows, > this table must have 300 to 400 select per second. But also must > have 10~20 delete/insert/update per second. > > So, I need to know if the insert/delete/update really affect the > select performance and how to deal with it. In addition to the advice from Tomas (which was all good) you should be aware that depending on the version of PostgreSQL (which you didn't mention), your hardware (which you didn't describe), and your configuration (which you didn't show) the data modification can make you vulnerable to a phenomenon where a checkpoint can cause a blockage of all disk I/O for a matter of minutes, causing even simple SELECT statements which normally run in under a millisecond to run for minutes. This is more likely to occur in a system which has been aggressively tuned for maximum throughput -- you may need to balance throughput needs against response time needs. Every one of the last several major releases of PostgreSQL has gotten better at preventing this problem, so your best protection from it is to use a recent version. There's a good chance that you won't run into this, but if you do, you can generally correct it by reducing your shared_buffers setting or making your background writer more aggressive. -Kevin
Postgres 9.0 on windows server 2008 r2 HW is a dell dual processor with 16gb of ram . Tthe reason I add the enabled column to index is because a select won't need to read the table to get this value My select is : exists(select * from table where account_id=X and service_id=Y and enabled='T') So, do you think I must remove the enabled from index? Thanks -----Mensaje original----- De: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Enviado el: jueves, 08 de septiembre de 2011 10:51 a.m. Para: Anibal David Acosta; pgsql-performance@postgresql.org Asunto: Re: [PERFORM] how delete/insert/update affects select performace? "Anibal David Acosta" <aa@devshock.com> wrote: > I have a table not too big but with aprox. 5 millions of rows, this > table must have 300 to 400 select per second. But also must have 10~20 > delete/insert/update per second. > > So, I need to know if the insert/delete/update really affect the > select performance and how to deal with it. In addition to the advice from Tomas (which was all good) you should be aware that depending on the version of PostgreSQL (which you didn't mention), your hardware (which you didn't describe), and your configuration (which you didn't show) the data modification can make you vulnerable to a phenomenon where a checkpoint can cause a blockage of all disk I/O for a matter of minutes, causing even simple SELECT statements which normally run in under a millisecond to run for minutes. This is more likely to occur in a system which has been aggressively tuned for maximum throughput -- you may need to balance throughput needs against response time needs. Every one of the last several major releases of PostgreSQL has gotten better at preventing this problem, so your best protection from it is to use a recent version. There's a good chance that you won't run into this, but if you do, you can generally correct it by reducing your shared_buffers setting or making your background writer more aggressive. -Kevin
"Anibal David Acosta" <aa@devshock.com> wrote: > Tthe reason I add the enabled column to index is because a select > won't need to read the table to get this value That's not true in PostgreSQL, although there is an effort to support that optimization, at least to some degree. In all current versions of PostgreSQL, it will always need to read the heap to determine whether the index entry is pointing at a version of the row which is visible to your transaction. Adding the enabled column to an index will prevent faster HOT updates to that column. > My select is : exists(select * from table where account_id=X and > service_id=Y and enabled='T') On the other hand, if you have very many rows where enabled is not 'T', and you are generally searching for where enabled = 'T', you might want a partial index (an index with a WHERE clause in its definition). If enabled only has two states, you will probably get better performance using a boolean column. -Kevin
>On the other hand, if you have very many rows where enabled is not 'T', and you are generally searching for where enabled = 'T', you might want a partial index (an index with a WHERE clause in its definition). If >enabled only has two states, you will probably get better performance using a boolean column. Maybe 1% or 2% are enabled='F' all others are 'T' Another question Kevin (thanks for your time) When an insert/update occur, the index is "reindexed" how index deals with new or deleted rows. Whay happened with select, it wait that index "reindex" or rebuild or something? Or just select view another "version" of the table? Thanks -----Mensaje original----- De: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Enviado el: jueves, 08 de septiembre de 2011 01:01 p.m. Para: Anibal David Acosta; pgsql-performance@postgresql.org CC: 'Tomas Vondra' Asunto: RE: [PERFORM] how delete/insert/update affects select performace? "Anibal David Acosta" <aa@devshock.com> wrote: > Tthe reason I add the enabled column to index is because a select > won't need to read the table to get this value That's not true in PostgreSQL, although there is an effort to support that optimization, at least to some degree. In all current versions of PostgreSQL, it will always need to read the heap to determine whether the index entry is pointing at a version of the row which is visible to your transaction. Adding the enabled column to an index will prevent faster HOT updates to that column. > My select is : exists(select * from table where account_id=X and > service_id=Y and enabled='T') On the other hand, if you have very many rows where enabled is not 'T', and you are generally searching for where enabled = 'T', you might want a partial index (an index with a WHERE clause in its definition). If enabled only has two states, you will probably get better performance using a boolean column. -Kevin
"Anibal David Acosta" <aa@devshock.com> wrote: > Maybe 1% or 2% are enabled='F' all others are 'T' Then an index on this column is almost certainly going to be counter-productive. The only index on this column which *might* make sense is WHERE enabled = 'F', and only if you run queries for that often enough to outweigh the added maintenance cost. If it's always one of those two values, I would use boolean (with NOT NULL if appropriate). > When an insert/update occur, the index is "reindexed" how index > deals with new or deleted rows. Ignoring details of HOT updates, where less work is done if no indexed column is updated and there is room for the new version of the row (tuple) on the same page, an UPDATE is almost exactly like a DELETE and an INSERT in the same transaction. A new tuple (from an INSERT or UPDATE) is added to the index(es), and if you query through the index, it will see entries for both the old and new versions of the row; this is why it must visit both versions -- to check tuple visibility. Eventually the old tuples and their index entries are cleaned up through a "vacuum" process (autovacuum or an explicit VACUUM command). Until then queries do extra work visiting and ignoring the old tuples. (That is why people who turn off autovacuum almost always regret it later.) > Whay happened with select, it wait that index "reindex" or rebuild > or something? Or just select view another "version" of the table? The new information is immediately *added*, but there may be other transactions which should still see the old state of the table, so cleanup of old tuples and their index entries must wait for those transactions to complete. See this for more information: http://www.postgresql.org/docs/9.0/interactive/mvcc.html -Kevin
On 09/08/2011 12:40 PM, Anibal David Acosta wrote: > Postgres 9.0 on windows server 2008 r2 > HW is a dell dual processor with 16gb of ram . > The general guidelines for Windows servers such as http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server recommend a fairly small setting for the shared_buffers parameters on Windows--no more than 512MB. That makes your server a bit less likely to run in the nasty checkpoint spike issues Kevin was alluding to. I don't think we've seen any reports of that on Windows. The problem is worst on Linux. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Even if I have a server with 16GB of ram, I must set the shared_buffer to 512MB on windows? In the wiki page they talk about 1/4 of ram, in my case that represent a shared_buffer = 4GB, that is incorrect? I have 8 GB of ram for each processor, each processor is a quad core with hyperthreading, that means 16 "processors" o something like that. Windows show 16 in task manager. If I can't configure more than 512MB of shared_buffer all other RAM is unnecessary? Thanks for your time. -----Mensaje original----- De: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] En nombre de Greg Smith Enviado el: jueves, 08 de septiembre de 2011 09:29 p.m. Para: pgsql-performance@postgresql.org Asunto: Re: [PERFORM] how delete/insert/update affects select performace? On 09/08/2011 12:40 PM, Anibal David Acosta wrote: > Postgres 9.0 on windows server 2008 r2 HW is a dell dual processor > with 16gb of ram . > The general guidelines for Windows servers such as http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server recommend a fairly small setting for the shared_buffers parameters on Windows--no more than 512MB. That makes your server a bit less likely to run in the nasty checkpoint spike issues Kevin was alluding to. I don't think we've seen any reports of that on Windows. The problem is worst on Linux. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance