Thread: synchronous_commit=off doesn't always return immediately
Hi, I set synchronous_commit to off and expected trivial inserts (single row, 6-8 columns) to always return quickly. However, I see that during checkpoints they sometimes take over a second. I tried setting the full_page_write to off, but this didn't seem to have an effect. The version I'm using is 8.3.3 The configuration is: bgwriter_delay = 20ms # 10-10000ms between rounds bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round bgwriter_lru_multiplier = 5.0 # 0-10.0 multipler on buffers scanned/round synchronous_commit = off # immediate fsync at commit full_page_writes = off # recover from partial page writes checkpoint_segments = 32 # in logfile segments, min 1, 16MB each checkpoint_timeout = 30min # range 30s-1h checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 checkpoint_warning = 30s # 0 is off Thanks, Tomer -- View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24621119.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
tomrevam <tomer@fabrix.tv> writes: > I set synchronous_commit to off and expected trivial inserts (single row, > 6-8 columns) to always return quickly. However, I see that during > checkpoints they sometimes take over a second. What sort of disk hardware have you got? It sounds overstressed. It might help to increase wal_buffers. regards, tom lane
On Thu, 23 Jul 2009, tomrevam wrote: > bgwriter_delay = 20ms # 10-10000ms between rounds > bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round > bgwriter_lru_multiplier = 5.0 # 0-10.0 multipler on buffers scanned/round These settings may be contributing to the problem. You should never run the background writer that frequently--it just wastes resources and writes more than it should. I'd suggest turning it off altogether (bgwriter_lru_maxpages = 0) and seeing if things improve any, just to rule that out as a potential source of issues. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith-12 wrote: > > On Thu, 23 Jul 2009, tomrevam wrote: > >> bgwriter_delay = 20ms # 10-10000ms between rounds >> bgwriter_lru_maxpages = 1000 # 0-1000 max buffers >> written/round >> bgwriter_lru_multiplier = 5.0 # 0-10.0 multipler on buffers >> scanned/round > > These settings may be contributing to the problem. You should never run > the background writer that frequently--it just wastes resources and writes > more than it should. I'd suggest turning it off altogether > (bgwriter_lru_maxpages = 0) and seeing if things improve any, just to rule > that out as a potential source of issues. > > The behavior of the system is the same with bg_writer_lru_maxpages = 0. Can you explain why transactions are sometimes synchronous even with the synchrounous_commit set to off? Thanks, Tomer -- View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24666816.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
tomrevam wrote: > The behavior of the system is the same with bg_writer_lru_maxpages = > 0. Can you explain why transactions are sometimes synchronous even > with the synchrounous_commit set to off? Asynchronous transactions must still be logged to the journal from time to time. There's a background process called the "wal writer" that's supposed to do that on the background, but if you have too much WAL traffic then some transactions may block if there's not enough space in the in-memory WAL buffer to store their changes, which causes them to block. This is why Tom was suggesting you to increase wal_buffers. Did you try that? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera-7 wrote: > > This is why Tom was suggesting you to increase wal_buffers. Did > you try that? > Thanks for the explanation. I will try increasing the wal_buffers. Unfortunately this is on a system I can't restart for the next few days. Tomer -- View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24675696.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Alvaro Herrera-7 wrote: > > tomrevam wrote: > > This is why Tom was suggesting you to increase wal_buffers. Did > you try that? > > Increasing the wal_buffers improved the performance. Is there a limit on how big I can raise them? Is there anything that would work worse with bigger wal_buffers? Thanks, Tomer -- View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24712612.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Tom Lane-2 wrote: > > tomrevam <tomer@fabrix.tv> writes: > It might help to increase wal_buffers. > Hi, I increased the wal_buffers to 2 MB. Initially this improved the performance very much, but after 4 days of continuous operation the system is back to very long inserts and updates (at least as bad as it was before). Is there a way to know what resource the DB is running out of? Thanks, Tomer -- View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24785860.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Mon, Aug 3, 2009 at 2:14 AM, tomrevam<tomer@fabrix.tv> wrote: > > > > Tom Lane-2 wrote: >> >> tomrevam <tomer@fabrix.tv> writes: >> It might help to increase wal_buffers. >> > > Hi, > > I increased the wal_buffers to 2 MB. Initially this improved the performance > very much, but after 4 days of continuous operation the system is back to > very long inserts and updates (at least as bad as it was before). > Is there a way to know what resource the DB is running out of? I can almost guarantee you that you are i/o bound. synchronous commit helps, but is not magic: your single sata disk can only deliver about 100-200 iops and you are expecting more than that. I think you have only two options: add more/faster disks (and consider separating wal from the data volume) or disable sync completely with fsync=off (which is of course quite dangerous). This can be confirmed with an iostat, noting the tps on each volume and the iowait. merlin
I am considering whether or not to partition a few tables in our system. The guide lines I read said that this could help when the size of a table exceeds the amount of system memory. I'm wondering if this statement should be table or database. The way our system operates is that each individual location is actually a separate operation owned by another party. In most cases the data from one location should not be seen by others. There are several tables which are global in nature which hold mainly lookup information, but the actual processing tables are by location. I am wondering if partitioning would be a good way to isolate the information for each location. Each database would be created by location number. Out db servers is a dual quad Intel with 4 Gigs of RAM running Windows 2000 Enterprise Server. All Client machines are running Quad core servers with 8-16 gig of RAM partitioned using Windows 2003 and accessed by Citrix. The total size of our database with 5 years worth of data is about 3.4 gig. In the business we are in, we open about 5-7 new locations each year and close 2-3. I was also thinking that if each partition was by location it would be easier to disconnect the partion tables to use for historial use when we close a location. We probably would get 10 or so queries on the closed locations 6 months after closing. Does this seem like an appropriate use of table partitioning? Best Regards -- Michael Gould, Managing Partner Intermodal Software Solutions, LLC 904.226.0978 904.592.5250 fax
Ok, I'm a bit stumped on getting my group by query to work which iterates through a number of months that basically a generate_series provides for me. Here is what I am using in the from clause (along with other tables) to generate the series of numbers for the number of months. This seems to work: generate_series( 0, ((extract(years from age(current_date, DATE('2008-05-01')))*12) + extract(month from age(current_date, DATE('2008-05-01'))))::INTEGER) as s(a) Next, I want to group by and compare against dates in my table. These are the two parts I can't get to work: In the select part: select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of and in the where clause: and DATE(sometimestamp) >= DATE('2008-05-01') || 's.a??? months' - someOffsetVariable - 30 and DATE(sometimestamp) < DATE('2008-05-01') + s.a + 1 || 'months' The point is I want to use interval (s.a 'months') to iterate through them. I can obviously get DATE('2008-05-01') + interval '3 months' to work, but I want the '3' part to be generated with a series. I have tried various type casting to no avail. Any help with this syntax would be appreciated. Thanks in advance for insight on how to do this.
Michael Gould wrote: > I am considering whether or not to partition a few tables in our system. > The guide lines I read said that this could help when the size of a table > exceeds the amount of system memory. I'm wondering if this statement should > be table or database. > > The way our system operates is that each individual location is actually a > separate operation owned by another party. In most cases the data from one > location should not be seen by others. There are several tables which are > global in nature which hold mainly lookup information, but the actual > processing tables are by location. > > I am wondering if partitioning would be a good way to isolate the > information for each location. Each database would be created by location > number. Out db servers is a dual quad Intel with 4 Gigs of RAM running > Windows 2000 Enterprise Server. All Client machines are running Quad core > servers with 8-16 gig of RAM partitioned using Windows 2003 and accessed by > Citrix. > > The total size of our database with 5 years worth of data is about 3.4 gig. > In the business we are in, we open about 5-7 new locations each year and > close 2-3. I was also thinking that if each partition was by location it > would be easier to disconnect the partion tables to use for historial use > when we close a location. We probably would get 10 or so queries on the > closed locations 6 months after closing. > > Does this seem like an appropriate use of table partitioning? > Hi - partitioning by definition should be used to get your data faster and, in most of the cases, should be transparent to your client. Questions I'd suggest to ask first: 1. do you have performance issues reading from your db? 2. can you come up with a pattern you can use to segment your data into smaller chunks (like weeks, months, ids) to create partitions reasonably big but not huge 3. how do you populate your db? inserts? copy? if ever you create partitions, will you write to many or just one (the most recent one) Yes, it can be useful archiving old 'locations'. In that case I'd suggest to create dedicated datafile and put it on (slower=cheaper) disks and move your older partitions there... but again, it's not the main reason why you could consider partitioning. > Best Regards > > > -- > Michael Gould, Managing Partner > Intermodal Software Solutions, LLC > 904.226.0978 > 904.592.5250 fax > > > > Regards, foo
"Bill Reynolds" <Bill.Reynolds@ateb.com> writes: > Ok, I'm a bit stumped on getting my group by query to work which > iterates through a number of months that basically a generate_series > provides for me. > ... > select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of No, you're confusing a syntax that's meant to handle literal constants with something that's appropriate for computation. What you want is > select DATE('2008-05-01') + s.a * interval '1 month' as Month_of that is, multiply the sequence output by a suitable interval constant. regards, tom lane
Bill Reynolds <Bill.Reynolds@ateb.com> wrote: > Ok, I'm a bit stumped on getting my group by query to work which > iterates through a number of months that basically a generate_series > provides for me. > > Here is what I am using in the from clause (along with other tables) to > generate the series of numbers for the number of months. This seems to > work: > generate_series( 0, ((extract(years from age(current_date, > DATE('2008-05-01')))*12) + extract(month from age(current_date, > DATE('2008-05-01'))))::INTEGER) as s(a) > > Next, I want to group by and compare against dates in my table. These > are the two parts I can't get to work: > > In the select part: > select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of The trick is easy: test=*# select current_date + s.a * '1 month'::interval from (select generate_series(0,10) as a) as s; ?column? --------------------- 2009-08-03 00:00:00 2009-09-03 00:00:00 2009-10-03 00:00:00 2009-11-03 00:00:00 2009-12-03 00:00:00 2010-01-03 00:00:00 2010-02-03 00:00:00 2010-03-03 00:00:00 2010-04-03 00:00:00 2010-05-03 00:00:00 2010-06-03 00:00:00 (11 Zeilen) I think, you can solve your problem now. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Ok, it is Monday -:) Thanks Tom! -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, August 03, 2009 11:44 AM To: Bill Reynolds Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] using generate_series to iterate through months "Bill Reynolds" <Bill.Reynolds@ateb.com> writes: > Ok, I'm a bit stumped on getting my group by query to work which > iterates through a number of months that basically a generate_series > provides for me. > ... > select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of No, you're confusing a syntax that's meant to handle literal constants with something that's appropriate for computation. What you want is > select DATE('2008-05-01') + s.a * interval '1 month' as Month_of that is, multiply the sequence output by a suitable interval constant. regards, tom lane
On Mon, Aug 03, 2009 at 11:15:25AM -0400, Bill Reynolds wrote: > Here is what I am using in the from clause (along with other tables) to > generate the series of numbers for the number of months. This seems to > work: > generate_series( 0, ((extract(years from age(current_date, > DATE('2008-05-01')))*12) + extract(month from age(current_date, > DATE('2008-05-01'))))::INTEGER) as s(a) I doubt you're using it, but the generate_series in 8.4 knows how to handle dates and intervals, for example you can do: SELECT generate_series(timestamp '2001-1-1','2004-1-1',interval '1 month'); to go from 2001 to 2004 in one month steps. If not, I'd be tempted to bung the above into a function at that does the same. Something like this should work OK for series with only a few thousand rows, but don't use it to generate a microsecond spaced series covering several years: CREATE FUNCTION generate_series(timestamp,timestamp,interval) RETURNS SETOF timestamp LANGUAGE plpgsql IMMUTABLE AS $$ DECLARE _c timestamp := $1; BEGIN WHILE _c < $2 LOOP RETURN NEXT _c; _c := _c + $3; END LOOP; END; $$; -- Sam http://samason.me.uk/
On Mon, Aug 3, 2009 at 10:51 AM, Michael Gould<mgould@intermodalsoftwaresolutions.net> wrote: > The total size of our database with 5 years worth of data is about 3.4 gig. > In the business we are in, we open about 5-7 new locations each year and > close 2-3. I was also thinking that if each partition was by location it > It seems to me it would be cheaper to populate the server with 8Gb of RAM (or more) and be done with it. The effort you will expend to partition and manage the partitions as locations come and go will be worth far more than the cost of the RAM. Once your DB is in the 100+GB range, then you may want to consider paritioning, or perhaps using a schema per location, and for your few global queries, make a view that encompasses all the schemas. Given your rate of change, redefining the views will not be a major burden.
I am wondering how others handle the login situation. We use Active Directory and require our users to change their passwords every 30 days. Currently in our old system using SQL Anywhere we use the integrated login feature. Our db server is Windows 2003 R2 I believe we can mimic this in Postgres. What are peoples feelings about using passwords in Postgres in this situation? We know that only people authenticated to access our servers are actually getting logged in. All of our users must login through Citrix and access our system via our Citrix web page login. We I do not believe we can capture the password from Active Directory that the user types so I really do not want to use a password on the Postgres side. We do have application level security also which only allows certain users (same as the login id) access to the allowed area's within the system and only at the level of access prescribed within the system. What are others thoughts on this. With SQL Anywhere if you are using integrated logins, you need to enter a password when the account is first defined to the database but it is bypassed from that point forward unless you remove their access to use integrated logins. Best Regards -- Michael Gould, Managing Partner Intermodal Software Solutions, LLC 904.226.0978 904.592.5250 fax
On Wed, Aug 5, 2009 at 18:47, Michael Gould<mgould@intermodalsoftwaresolutions.net> wrote: > > I am wondering how others handle the login situation. We use Active > Directory and require our users to change their passwords every 30 days. > Currently in our old system using SQL Anywhere we use the integrated login > feature. Our db server is Windows 2003 R2 > > I believe we can mimic this in Postgres. You can do this as well with PostgreSQL using SSPI or GSSAPI (depending on your client platforms) > What are peoples feelings about using passwords in Postgres in this > situation? We know that only people authenticated to access our servers are > actually getting logged in. All of our users must login through Citrix and > access our system via our Citrix web page login. > > We I do not believe we can capture the password from Active Directory that > the user types so I really do not want to use a password on the Postgres > side. We do have application level security also which only allows certain > users (same as the login id) access to the allowed area's within the system > and only at the level of access prescribed within the system. No, I'd definitely avoid that. If you use LDAP, you don't need to capture the passwords. Just create the accounts without passwords, and PostgreSQL will ask the AD server for the login. Or if you use SSPI or GSSAPI, you will get a fully integrated login. > What are others thoughts on this. With SQL Anywhere if you are using > integrated logins, you need to enter a password when the account is first > defined to the database but it is bypassed from that point forward unless > you remove their access to use integrated logins. Um, ok, Ih ave to take that back. So SQL Anywhere is basically "store the password in a file on the client" then? You can use a .pgpass file for that, and just add something to your application that will prompt for the password and add it to the file when the app starts. pgAdmin does it this way. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/