Thread: synchronous_commit=off doesn't always return immediately

synchronous_commit=off doesn't always return immediately

From
tomrevam
Date:
Hi,

I set synchronous_commit to off and expected trivial inserts (single row,
6-8 columns) to always return quickly. However, I see that during
checkpoints they sometimes take over a second. I tried setting the
full_page_write to off, but this didn't seem to have an effect.

The version I'm using is 8.3.3
The configuration is:
bgwriter_delay = 20ms                   # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000            # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 5.0           # 0-10.0 multipler on buffers
scanned/round
synchronous_commit = off                # immediate fsync at commit
full_page_writes = off                  # recover from partial page writes
checkpoint_segments = 32                # in logfile segments, min 1, 16MB
each
checkpoint_timeout = 30min              # range 30s-1h
checkpoint_completion_target = 0.9      # checkpoint target duration, 0.0 -
1.0
checkpoint_warning = 30s                # 0 is off

Thanks,
Tomer
--
View this message in context:
http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24621119.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: synchronous_commit=off doesn't always return immediately

From
Tom Lane
Date:
tomrevam <tomer@fabrix.tv> writes:
> I set synchronous_commit to off and expected trivial inserts (single row,
> 6-8 columns) to always return quickly. However, I see that during
> checkpoints they sometimes take over a second.

What sort of disk hardware have you got?  It sounds overstressed.

It might help to increase wal_buffers.

            regards, tom lane

Re: synchronous_commit=off doesn't always return immediately

From
Greg Smith
Date:
On Thu, 23 Jul 2009, tomrevam wrote:

> bgwriter_delay = 20ms                   # 10-10000ms between rounds
> bgwriter_lru_maxpages = 1000            # 0-1000 max buffers written/round
> bgwriter_lru_multiplier = 5.0           # 0-10.0 multipler on buffers scanned/round

These settings may be contributing to the problem.  You should never run
the background writer that frequently--it just wastes resources and writes
more than it should.  I'd suggest turning it off altogether
(bgwriter_lru_maxpages = 0) and seeing if things improve any, just to rule
that out as a potential source of issues.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: synchronous_commit=off doesn't always return immediately

From
tomrevam
Date:


Greg Smith-12 wrote:
>
> On Thu, 23 Jul 2009, tomrevam wrote:
>
>> bgwriter_delay = 20ms                   # 10-10000ms between rounds
>> bgwriter_lru_maxpages = 1000            # 0-1000 max buffers
>> written/round
>> bgwriter_lru_multiplier = 5.0           # 0-10.0 multipler on buffers
>> scanned/round
>
> These settings may be contributing to the problem.  You should never run
> the background writer that frequently--it just wastes resources and writes
> more than it should.  I'd suggest turning it off altogether
> (bgwriter_lru_maxpages = 0) and seeing if things improve any, just to rule
> that out as a potential source of issues.
>
>

The behavior of the system is the same with bg_writer_lru_maxpages = 0. Can
you
explain why transactions are sometimes synchronous even with the
synchrounous_commit
set to off?

Thanks,
Tomer
--
View this message in context:
http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24666816.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: synchronous_commit=off doesn't always return immediately

From
Alvaro Herrera
Date:
tomrevam wrote:

> The behavior of the system is the same with bg_writer_lru_maxpages =
> 0. Can you explain why transactions are sometimes synchronous even
> with the synchrounous_commit set to off?

Asynchronous transactions must still be logged to the journal from time
to time.  There's a background process called the "wal writer" that's
supposed to do that on the background, but if you have too much WAL
traffic then some transactions may block if there's not enough space in
the in-memory WAL buffer to store their changes, which causes them to
block.  This is why Tom was suggesting you to increase wal_buffers.  Did
you try that?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: synchronous_commit=off doesn't always return immediately

From
tomrevam
Date:


Alvaro Herrera-7 wrote:
>
> This is why Tom was suggesting you to increase wal_buffers.  Did
> you try that?
>

Thanks for the explanation. I will try increasing the wal_buffers.
Unfortunately this is on a system I can't restart for the next few days.

Tomer
--
View this message in context:
http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24675696.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: synchronous_commit=off doesn't always return immediately

From
tomrevam
Date:


Alvaro Herrera-7 wrote:
>
> tomrevam wrote:
>
> This is why Tom was suggesting you to increase wal_buffers.  Did
> you try that?
>
>

Increasing the wal_buffers improved the performance. Is there a limit on how
big I can raise them? Is there anything that would work worse with bigger
wal_buffers?

Thanks,
Tomer
--
View this message in context:
http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24712612.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: synchronous_commit=off doesn't always return immediately

From
tomrevam
Date:


Tom Lane-2 wrote:
>
> tomrevam <tomer@fabrix.tv> writes:
> It might help to increase wal_buffers.
>

Hi,

I increased the wal_buffers to 2 MB. Initially this improved the performance
very much, but after 4 days of continuous operation the system is back to
very long inserts and updates (at least as bad as it was before).
Is there a way to know what resource the DB is running out of?

Thanks,
Tomer
--
View this message in context:
http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24785860.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: synchronous_commit=off doesn't always return immediately

From
Merlin Moncure
Date:
On Mon, Aug 3, 2009 at 2:14 AM, tomrevam<tomer@fabrix.tv> wrote:
>
>
>
> Tom Lane-2 wrote:
>>
>> tomrevam <tomer@fabrix.tv> writes:
>> It might help to increase wal_buffers.
>>
>
> Hi,
>
> I increased the wal_buffers to 2 MB. Initially this improved the performance
> very much, but after 4 days of continuous operation the system is back to
> very long inserts and updates (at least as bad as it was before).
> Is there a way to know what resource the DB is running out of?


I can almost guarantee you that you are i/o bound.  synchronous commit
helps, but is not magic: your single sata disk can only deliver about
100-200 iops and you are expecting more than that.  I think you have
only two options: add more/faster disks (and consider separating wal
from the data volume) or disable sync completely with fsync=off (which
is of course quite dangerous).

This can be confirmed with an iostat, noting the tps on each volume
and the iowait.

merlin

Partition tables

From
Michael Gould
Date:
I am considering whether or not to partition a few tables in our system.
The guide lines I read said that this could help when the size of a table
exceeds the amount of system memory.  I'm wondering if this statement should
be table or database.

The way our system operates is that each individual location is actually a
separate operation owned by another party.  In most cases the data from one
location should not be seen by others.  There are several tables which are
global in nature which hold mainly lookup information, but the actual
processing tables are by location.

I am wondering if partitioning would be a good way to isolate the
information for each location.  Each database would be created by location
number.  Out db servers is a dual  quad Intel with 4 Gigs of RAM running
Windows 2000 Enterprise Server.  All Client machines are running Quad core
servers with 8-16 gig of RAM partitioned using Windows 2003 and accessed by
Citrix.

The total size of our database with 5 years worth of data is about 3.4 gig.
In the business we are in, we open about 5-7 new locations each year and
close 2-3.  I was also thinking that if each partition was by location it
would be easier to disconnect the partion tables to use for historial use
when we close a location.  We probably would get 10 or so queries on the
closed locations 6 months after closing.

Does this seem like an appropriate use of table partitioning?

Best Regards


--
Michael Gould, Managing Partner
Intermodal Software Solutions, LLC
904.226.0978
904.592.5250 fax



using generate_series to iterate through months

From
"Bill Reynolds"
Date:
Ok, I'm a bit stumped on getting my group by query to work which
iterates through a number of months that basically a generate_series
provides for me.

Here is what I am using in the from clause (along with other tables) to
generate the series of numbers for the number of months.  This seems to
work:
generate_series( 0, ((extract(years from age(current_date,
DATE('2008-05-01')))*12) + extract(month from age(current_date,
DATE('2008-05-01'))))::INTEGER) as s(a)

Next, I want to group by and compare against dates in my table.  These
are the two parts I can't get to work:

In the select part:
select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of

and in the where clause:
   and DATE(sometimestamp) >= DATE('2008-05-01') || 's.a??? months' -
someOffsetVariable - 30
   and DATE(sometimestamp) < DATE('2008-05-01') + s.a + 1 || 'months'

The point is I want to use interval (s.a 'months') to iterate through
them.  I can obviously get DATE('2008-05-01') + interval '3 months' to
work, but I want the '3' part to be generated with a series.   I have
tried various type casting to no avail.  Any help with this syntax would
be appreciated.

Thanks in advance for insight on how to do this.


Re: Partition tables

From
Wojtek
Date:
Michael Gould wrote:
> I am considering whether or not to partition a few tables in our system.
> The guide lines I read said that this could help when the size of a table
> exceeds the amount of system memory.  I'm wondering if this statement should
> be table or database.
>
> The way our system operates is that each individual location is actually a
> separate operation owned by another party.  In most cases the data from one
> location should not be seen by others.  There are several tables which are
> global in nature which hold mainly lookup information, but the actual
> processing tables are by location.
>
> I am wondering if partitioning would be a good way to isolate the
> information for each location.  Each database would be created by location
> number.  Out db servers is a dual  quad Intel with 4 Gigs of RAM running
> Windows 2000 Enterprise Server.  All Client machines are running Quad core
> servers with 8-16 gig of RAM partitioned using Windows 2003 and accessed by
> Citrix.
>
> The total size of our database with 5 years worth of data is about 3.4 gig.
> In the business we are in, we open about 5-7 new locations each year and
> close 2-3.  I was also thinking that if each partition was by location it
> would be easier to disconnect the partion tables to use for historial use
> when we close a location.  We probably would get 10 or so queries on the
> closed locations 6 months after closing.
>
> Does this seem like an appropriate use of table partitioning?
>
Hi - partitioning by definition should be used to get your data faster
and, in most of the cases, should be transparent to your client.
Questions I'd suggest to ask first:
1. do you have performance issues reading from your db?
2. can you come up with a pattern you can use to segment your data into
smaller chunks (like weeks, months, ids) to create partitions reasonably
big but not huge
3. how do you populate your db? inserts? copy? if ever you create
partitions, will you write to many or just one (the most recent one)

Yes, it can be useful archiving old 'locations'. In that case I'd
suggest to create dedicated datafile and put it on (slower=cheaper)
disks and move your older partitions there...  but again, it's not the
main reason why you could  consider partitioning.

> Best Regards
>
>
> --
> Michael Gould, Managing Partner
> Intermodal Software Solutions, LLC
> 904.226.0978
> 904.592.5250 fax
>
>
>
>
Regards,
foo


Re: using generate_series to iterate through months

From
Tom Lane
Date:
"Bill Reynolds" <Bill.Reynolds@ateb.com> writes:
> Ok, I'm a bit stumped on getting my group by query to work which
> iterates through a number of months that basically a generate_series
> provides for me.
> ...
> select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of

No, you're confusing a syntax that's meant to handle literal constants
with something that's appropriate for computation.  What you want is

> select DATE('2008-05-01') + s.a * interval '1 month' as Month_of

that is, multiply the sequence output by a suitable interval constant.

            regards, tom lane

Re: using generate_series to iterate through months

From
Andreas Kretschmer
Date:
Bill Reynolds <Bill.Reynolds@ateb.com> wrote:

> Ok, I'm a bit stumped on getting my group by query to work which
> iterates through a number of months that basically a generate_series
> provides for me.
>
> Here is what I am using in the from clause (along with other tables) to
> generate the series of numbers for the number of months.  This seems to
> work:
> generate_series( 0, ((extract(years from age(current_date,
> DATE('2008-05-01')))*12) + extract(month from age(current_date,
> DATE('2008-05-01'))))::INTEGER) as s(a)
>
> Next, I want to group by and compare against dates in my table.  These
> are the two parts I can't get to work:
>
> In the select part:
> select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of

The trick is easy:

test=*# select current_date + s.a * '1 month'::interval from (select
generate_series(0,10) as a) as s;
      ?column?
---------------------
 2009-08-03 00:00:00
 2009-09-03 00:00:00
 2009-10-03 00:00:00
 2009-11-03 00:00:00
 2009-12-03 00:00:00
 2010-01-03 00:00:00
 2010-02-03 00:00:00
 2010-03-03 00:00:00
 2010-04-03 00:00:00
 2010-05-03 00:00:00
 2010-06-03 00:00:00
(11 Zeilen)

I think, you can solve your problem now.



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: using generate_series to iterate through months

From
"Bill Reynolds"
Date:
Ok, it is Monday -:)  Thanks Tom!


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, August 03, 2009 11:44 AM
To: Bill Reynolds
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] using generate_series to iterate through months

"Bill Reynolds" <Bill.Reynolds@ateb.com> writes:
> Ok, I'm a bit stumped on getting my group by query to work which
> iterates through a number of months that basically a generate_series
> provides for me.
> ...
> select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of

No, you're confusing a syntax that's meant to handle literal constants
with something that's appropriate for computation.  What you want is

> select DATE('2008-05-01') + s.a * interval '1 month' as Month_of

that is, multiply the sequence output by a suitable interval constant.

            regards, tom lane

Re: using generate_series to iterate through months

From
Sam Mason
Date:
On Mon, Aug 03, 2009 at 11:15:25AM -0400, Bill Reynolds wrote:
> Here is what I am using in the from clause (along with other tables) to
> generate the series of numbers for the number of months.  This seems to
> work:
> generate_series( 0, ((extract(years from age(current_date,
> DATE('2008-05-01')))*12) + extract(month from age(current_date,
> DATE('2008-05-01'))))::INTEGER) as s(a)

I doubt you're using it, but the generate_series in 8.4 knows how to
handle dates and intervals, for example you can do:

  SELECT generate_series(timestamp '2001-1-1','2004-1-1',interval '1 month');

to go from 2001 to 2004 in one month steps.  If not, I'd be tempted to
bung the above into a function at that does the same.  Something like
this should work OK for series with only a few thousand rows, but don't
use it to generate a microsecond spaced series covering several years:

  CREATE FUNCTION generate_series(timestamp,timestamp,interval)
      RETURNS SETOF timestamp
      LANGUAGE plpgsql
      IMMUTABLE AS $$
    DECLARE
      _c timestamp := $1;
    BEGIN
      WHILE _c < $2 LOOP
        RETURN NEXT _c;
    _c := _c + $3;
      END LOOP;
    END;
  $$;

--
  Sam  http://samason.me.uk/

Re: Partition tables

From
Vick Khera
Date:
On Mon, Aug 3, 2009 at 10:51 AM, Michael
Gould<mgould@intermodalsoftwaresolutions.net> wrote:
> The total size of our database with 5 years worth of data is about 3.4 gig.
> In the business we are in, we open about 5-7 new locations each year and
> close 2-3.  I was also thinking that if each partition was by location it
>

It seems to me it would be cheaper to populate the server with 8Gb of
RAM (or more) and be done with it.  The effort you will expend to
partition and manage the partitions as locations come and go will be
worth far more than the cost of the RAM.

Once your DB is in the 100+GB range, then you may want to consider
paritioning, or perhaps using a schema per location, and for your few
global queries, make a view that encompasses all the schemas.  Given
your rate of change, redefining the views will not be a major burden.

LDAP using Active Directory

From
Michael Gould
Date:
I am wondering how others handle the login situation.  We use Active
Directory and require our users to change their passwords every 30 days.
Currently in our old system using SQL Anywhere we use the integrated login
feature.  Our db server is Windows 2003 R2

I believe we can mimic this in Postgres.

What are peoples feelings about using passwords in Postgres in this
situation? We know that only people authenticated to access our servers are
actually getting logged in.  All of our users must login through Citrix and
access our system via our Citrix web page login.

We I do not believe we can capture the password from Active Directory that
the user types so I really do not want to use a password on the Postgres
side.  We do have application level security also which only allows certain
users (same as the login id) access to the allowed area's within the system
and only at the level of access prescribed within the system.

What are others thoughts on this. With SQL Anywhere if you are using
integrated logins, you need to enter a password when the account is first
defined to the database but it is bypassed from that point forward unless
you remove their access to use integrated logins.

Best Regards
--
Michael Gould, Managing Partner
Intermodal Software Solutions, LLC
904.226.0978
904.592.5250 fax



Re: LDAP using Active Directory

From
Magnus Hagander
Date:
On Wed, Aug 5, 2009 at 18:47, Michael
Gould<mgould@intermodalsoftwaresolutions.net> wrote:
>
> I am wondering how others handle the login situation.  We use Active
> Directory and require our users to change their passwords every 30 days.
> Currently in our old system using SQL Anywhere we use the integrated login
> feature.  Our db server is Windows 2003 R2
>
> I believe we can mimic this in Postgres.

You can do this as well with PostgreSQL using SSPI or GSSAPI
(depending on your client platforms)


> What are peoples feelings about using passwords in Postgres in this
> situation? We know that only people authenticated to access our servers are
> actually getting logged in.  All of our users must login through Citrix and
> access our system via our Citrix web page login.
>
> We I do not believe we can capture the password from Active Directory that
> the user types so I really do not want to use a password on the Postgres
> side.  We do have application level security also which only allows certain
> users (same as the login id) access to the allowed area's within the system
> and only at the level of access prescribed within the system.

No, I'd definitely avoid that. If you use LDAP, you don't need to
capture the passwords. Just create the accounts without passwords, and
PostgreSQL will ask the AD server for the login. Or if you use SSPI or
GSSAPI, you will get a fully integrated login.


> What are others thoughts on this. With SQL Anywhere if you are using
> integrated logins, you need to enter a password when the account is first
> defined to the database but it is bypassed from that point forward unless
> you remove their access to use integrated logins.

Um, ok, Ih ave to take that back. So SQL Anywhere is basically "store
the password in a file on the client" then? You can use a .pgpass file
for that, and just add something to your application that will prompt
for the password and add it to the file when the app starts. pgAdmin
does it this way.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/