Thread: Postgresql performance in production environment

Postgresql performance in production environment

From
"Phoenix Kiula"
Date:
[Sorry for the length of this post. It stretched as I provided as much
info as possible..]

So the rubber meets the road. We've put postgresql in a production
environment with some heavy simultaneous usage. It works well in
general, but often PG doesn't respond. How should I test what is going
wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs
are happening but not much, it's mostly SELECTs. Is PGSQL running out
of connections? We can temporarily fix this by restarting pgsql but
I'd like a more tenable solution.

Speculating that it could be some conf variable somewhere
(max_fsm_pages in particular) I am including three things at the
bottom of this post:

1. Our PS output (for "postgres")
2. *Verbose* vacuum info for a table that shows max_fsm warning
3. Our postgresql.conf settings

My question 1 -- how should we test and tweak our production
installation? Where should we look. In MySQL we could do a "show
status" at the console and it would give a mountain of information.
Then there was that handy little "tuning-primer" script that made it
all come alive. I suppose this stuff is also available in pg_catalog
but is there any website that goes in depth into HOW to tune, what
different values mean, and such?

My question 2 -- in production, we're constantly seeing this message
while vacuuming one table with less than 3 million rows, but one that
we expect to keep growing:

[-------------
WARNING:  relation "public.links" contains more than "max_fsm_pages"
pages with useful free space
HINT:  Consider compacting this relation or increasing the
configuration parameter "max_fsm_pages".
VACUUM
-------------]

I can merrily increase the "max_fsm_pages" directive, but the manual
also caveats that with "this can use more system V memory than
available on your system". My full verbose vacuum info below includes
the line:

[-------------
INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
rows and 0 dead rows; 3000 rows in sample, 2710124
estimated total rows
-------------]

Does this mean my table needs nearly 200,000 pages, and that should be
the setting of max_fsm_pages? This server is on a fairly common setup
these days: Dual AMD Opterons, 4GB memory, SATA RAID 1, 250GB each. I
don't mind letting postgres use up to 1GB of the memory for itself,
but the rest is needed for others.

From http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html
, it seems the "max_fsm_relations" is about how many tables and
indexes can be tracked in the free space map. Does this mean the
number of actual tables and indexes in postgres databases, or
instances of these tables? For example, if I only run 5 databases,
each of which have about 10 tables and 20 indexes, then I have only
150 (5 * 30) actual "relations" in postgresql lingo. So my
max_fsm_relations setting can be 150? (Which seems a little low
compared to what I see online in several posts online).

Assuming 150 is ok, that manual page, and other tweaking stuff such as -
http://www.revsys.com/writings/postgresql-performance.html -- suggest
that "max_fsm_pages" is even more critical. The manual says this
should be at least 16 times that of max_fsm_relations, so in my
example, it should be at least 150 * 16, which is about  2400. This
seems abysmally low! If I up this figure to, say, 24000 instead, I
still keep seeing the kinds of errors posted above.


My question no. 3 -- for a SELECT-heavy database, is there any
tried-and-tested caching tool that could be of use? I'd like to skip
connection pooling if possible, but would be very interested in good
caching products or contribs. I noticed pgmemcached (
http://pgfoundry.org/projects/pgmemcache/ ) but it's in beta 1.2.
Would love some thoughts from people who have used it...is it worth
the effort?


More of my info below. TIA for your thoughts and advice!

-/Phoenix




==========EXHIBIT 1: PS OUTPUT ==============
> ps auxnm | grep postgres
      26 20665  0.0  0.0 11760  612 ?        -    Aug18   0:00
postgres: logger process
      26 20670  0.0  1.1 188684 48312 ?      -    Aug18   0:00
postgres: writer process
      26 20671  0.0  0.0 12032  804 ?        -    Aug18   0:28
postgres: stats collector process
      26 14497  0.0  4.1 452108 172656 ?     -    02:05   0:02
postgres: traders_traders traders 127.0.0.1(56204) VACUUM
       0  9444  0.0  0.0  5008  656 pts/0    -    02:53   0:00 grep postgres




==========EXHIBIT 2: POSTGRES.CONF ==============
listen_addresses = 'localhost,*'

max_connections = 250
shared_buffers = 21000   # Not much more: http://snipr.com/pgperf
effective_cache_size = 32000
max_fsm_relations = 500
max_fsm_pages = 60000
sort_mem = 4096           # Low when not needed: http://snipr.com/pgperf
work_mem = 4096
temp_buffers = 4096
authentication_timeout = 10s
ssl = off

#VACUUM SETTINGS
autovacuum = on
vacuum_cost_delay = 10
stats_start_collector = on
stats_row_level = on
autovacuum_vacuum_threshold = 300
autovacuum_analyze_threshold = 100

#FOR BACKGROUND TASKS PERFORMANCE
wal_buffers=64
checkpoint_segments=128
checkpoint_timeout=900
fsync = on
maintenance_work_mem = 256MB  # Too high? Well, watch for it...




========== EXHIBIT 3: VACUUM VERBOSE OUTPUT =======

mydbuser=# vacuum analyze verbose traders;

INFO:  vacuuming "public.traders"
INFO:  scanned index "traders_pkey" to remove 6 row versions
DETAIL:  CPU 0.07s/0.04u sec elapsed 140.61 sec.
INFO:  scanned index "idx_traders_userid" to remove 6 row versions
DETAIL:  CPU 0.05s/0.04u sec elapsed 49.70 sec.
INFO:  scanned index "idx_traders_mdate" to remove 6 row versions
DETAIL:  CPU 0.02s/0.04u sec elapsed 32.66 sec.
INFO:  scanned index "traders_unique_alias" to remove 6 row versions
DETAIL:  CPU 0.10s/0.11u sec elapsed 167.20 sec.
INFO:  "traders": removed 6 row versions in 5 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "traders_pkey" now contains 2780925 row versions in 22821 pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_traders_userid" now contains 2780925 row versions in
11785 pages
DETAIL:  6 index row versions were removed.
127 index pages have been deleted, 127 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_traders_mdate" now contains 2780925 row versions in 7912 pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "traders_unique_alias" now contains 2780925 row versions
in 9342 pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "traders": found 6 removable, 2780925 nonremovable row versions
in 199396 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 2959732 unused item pointers.
137272 pages contain useful free space.
0 pages are entirely empty.
CPU 0.74s/0.40u sec elapsed 1335.71 sec.
WARNING:  relation "public.traders" contains more than "max_fsm_pages"
pages with useful free space
HINT:  Consider compacting this relation or increasing the
configuration parameter "max_fsm_pages".
INFO:  vacuuming "pg_toast.pg_toast_41513"
INFO:  index "pg_toast_41513_index" now contains 26 row versions in 2 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_41513": found 0 removable, 26 nonremovable row
versions in 5 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
4 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  analyzing "public.traders"
INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
rows and 0 dead rows; 3000 rows in sample, 2710124 estimated total
rows
VACUUM
Time: 1533601.235 ms

Re: Postgresql performance in production environment

From
"Phoenix Kiula"
Date:
On 19/08/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> [Sorry for the length of this post. It stretched as I provided as much
> info as possible..]
>
> So the rubber meets the road. We've put postgresql in a production
> environment with some heavy simultaneous usage. It works well in
> general, but often PG doesn't respond. How should I test what is going
> wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs
> are happening but not much, it's mostly SELECTs. Is PGSQL running out
> of connections? We can temporarily fix this by restarting pgsql but
> I'd like a more tenable solution.
>
> Speculating that it could be some conf variable somewhere
> (max_fsm_pages in particular) I am including three things at the
> bottom of this post:
>
> 1. Our PS output (for "postgres")
> 2. *Verbose* vacuum info for a table that shows max_fsm warning
> 3. Our postgresql.conf settings
>
> My question 1 -- how should we test and tweak our production
> installation? Where should we look. In MySQL we could do a "show
> status" at the console and it would give a mountain of information.
> Then there was that handy little "tuning-primer" script that made it
> all come alive. I suppose this stuff is also available in pg_catalog
> but is there any website that goes in depth into HOW to tune, what
> different values mean, and such?
>
> My question 2 -- in production, we're constantly seeing this message
> while vacuuming one table with less than 3 million rows, but one that
> we expect to keep growing:
>
> [-------------
> WARNING:  relation "public.links" contains more than "max_fsm_pages"
> pages with useful free space
> HINT:  Consider compacting this relation or increasing the
> configuration parameter "max_fsm_pages".
> VACUUM
> -------------]
>
> I can merrily increase the "max_fsm_pages" directive, but the manual
> also caveats that with "this can use more system V memory than
> available on your system". My full verbose vacuum info below includes
> the line:
>
> [-------------
> INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
> rows and 0 dead rows; 3000 rows in sample, 2710124
> estimated total rows
> -------------]
>
> Does this mean my table needs nearly 200,000 pages, and that should be
> the setting of max_fsm_pages? This server is on a fairly common setup
> these days: Dual AMD Opterons, 4GB memory, SATA RAID 1, 250GB each. I
> don't mind letting postgres use up to 1GB of the memory for itself,
> but the rest is needed for others.
>
> From http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html
> , it seems the "max_fsm_relations" is about how many tables and
> indexes can be tracked in the free space map. Does this mean the
> number of actual tables and indexes in postgres databases, or
> instances of these tables? For example, if I only run 5 databases,
> each of which have about 10 tables and 20 indexes, then I have only
> 150 (5 * 30) actual "relations" in postgresql lingo. So my
> max_fsm_relations setting can be 150? (Which seems a little low
> compared to what I see online in several posts online).
>
> Assuming 150 is ok, that manual page, and other tweaking stuff such as -
> http://www.revsys.com/writings/postgresql-performance.html -- suggest
> that "max_fsm_pages" is even more critical. The manual says this
> should be at least 16 times that of max_fsm_relations, so in my
> example, it should be at least 150 * 16, which is about  2400. This
> seems abysmally low! If I up this figure to, say, 24000 instead, I
> still keep seeing the kinds of errors posted above.
>
>
> My question no. 3 -- for a SELECT-heavy database, is there any
> tried-and-tested caching tool that could be of use? I'd like to skip
> connection pooling if possible, but would be very interested in good
> caching products or contribs. I noticed pgmemcached (
> http://pgfoundry.org/projects/pgmemcache/ ) but it's in beta 1.2.
> Would love some thoughts from people who have used it...is it worth
> the effort?
>
>
> More of my info below. TIA for your thoughts and advice!
>
> -/Phoenix
>
>
>
>
> ==========EXHIBIT 1: PS OUTPUT ==============
> > ps auxnm | grep postgres
>       26 20665  0.0  0.0 11760  612 ?        -    Aug18   0:00
> postgres: logger process
>       26 20670  0.0  1.1 188684 48312 ?      -    Aug18   0:00
> postgres: writer process
>       26 20671  0.0  0.0 12032  804 ?        -    Aug18   0:28
> postgres: stats collector process
>       26 14497  0.0  4.1 452108 172656 ?     -    02:05   0:02
> postgres: traders_traders traders 127.0.0.1(56204) VACUUM
>        0  9444  0.0  0.0  5008  656 pts/0    -    02:53   0:00 grep postgres
>
>
>
>
> ==========EXHIBIT 2: POSTGRES.CONF ==============
> listen_addresses = 'localhost,*'
>
> max_connections = 250
> shared_buffers = 21000   # Not much more: http://snipr.com/pgperf
> effective_cache_size = 32000
> max_fsm_relations = 500
> max_fsm_pages = 60000
> sort_mem = 4096           # Low when not needed: http://snipr.com/pgperf
> work_mem = 4096
> temp_buffers = 4096
> authentication_timeout = 10s
> ssl = off
>
> #VACUUM SETTINGS
> autovacuum = on
> vacuum_cost_delay = 10
> stats_start_collector = on
> stats_row_level = on
> autovacuum_vacuum_threshold = 300
> autovacuum_analyze_threshold = 100
>
> #FOR BACKGROUND TASKS PERFORMANCE
> wal_buffers=64
> checkpoint_segments=128
> checkpoint_timeout=900
> fsync = on
> maintenance_work_mem = 256MB  # Too high? Well, watch for it...
>
>
>
>
> ========== EXHIBIT 3: VACUUM VERBOSE OUTPUT =======
>
> mydbuser=# vacuum analyze verbose traders;
>
> INFO:  vacuuming "public.traders"
> INFO:  scanned index "traders_pkey" to remove 6 row versions
> DETAIL:  CPU 0.07s/0.04u sec elapsed 140.61 sec.
> INFO:  scanned index "idx_traders_userid" to remove 6 row versions
> DETAIL:  CPU 0.05s/0.04u sec elapsed 49.70 sec.
> INFO:  scanned index "idx_traders_mdate" to remove 6 row versions
> DETAIL:  CPU 0.02s/0.04u sec elapsed 32.66 sec.
> INFO:  scanned index "traders_unique_alias" to remove 6 row versions
> DETAIL:  CPU 0.10s/0.11u sec elapsed 167.20 sec.
> INFO:  "traders": removed 6 row versions in 5 pages
> DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "traders_pkey" now contains 2780925 row versions in 22821 pages
> DETAIL:  6 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "idx_traders_userid" now contains 2780925 row versions in
> 11785 pages
> DETAIL:  6 index row versions were removed.
> 127 index pages have been deleted, 127 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "idx_traders_mdate" now contains 2780925 row versions in 7912 pages
> DETAIL:  6 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "traders_unique_alias" now contains 2780925 row versions
> in 9342 pages
> DETAIL:  6 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "traders": found 6 removable, 2780925 nonremovable row versions
> in 199396 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 2959732 unused item pointers.
> 137272 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 0.74s/0.40u sec elapsed 1335.71 sec.
> WARNING:  relation "public.traders" contains more than "max_fsm_pages"
> pages with useful free space
> HINT:  Consider compacting this relation or increasing the
> configuration parameter "max_fsm_pages".
> INFO:  vacuuming "pg_toast.pg_toast_41513"
> INFO:  index "pg_toast_41513_index" now contains 26 row versions in 2 pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "pg_toast_41513": found 0 removable, 26 nonremovable row
> versions in 5 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 4 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.01 sec.
> INFO:  analyzing "public.traders"
> INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
> rows and 0 dead rows; 3000 rows in sample, 2710124 estimated total
> rows
> VACUUM
> Time: 1533601.235 ms
>




Well based on some past posts, I looked into my pg_log stuff and found
a number of these lines:


[----------------
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
----------------]


Which suggests that our guess of running out of connections is the right one.

So, we have three options (to begin with) --

1. Increase the number of max_connections. This seems to be a voodoo
art and a complex calculation of database size (which in our case is
difficult to predict; it grows very fast), hardware, and such. I
cannot risk other apps running on this same machine.

2. Use connection pooling. I've found pgpool2 and pgbouncer from the
Skype group. Does anyone have experience using either? The latter
looks good, although we're usually skeptical about connection pooling
in general (or is that just the mysqli_pconnect() hangover?)

3. Use caching of queries. Memcache comes recommended, but there's a
discussion as recently as Jan 2007 on this list about race conditions
and such (most of which I don't quite understand) which cautions
against its use. We do expect plenty of transactions and if something
that has been updated is not very correctly and promptly invalidated
in the cache, it has huge business repercussions for us.

I'd love to hear other recommendations.

This is for the connections bit. I'm also looking for advice on the
max_fsm_pages stuff. That's another voodoo!!

Re: Postgresql performance in production environment

From
Martijn van Oosterhout
Date:
On Sun, Aug 19, 2007 at 05:15:34PM +0800, Phoenix Kiula wrote:
> Well based on some past posts, I looked into my pg_log stuff and found
> a number of these lines:
> [----------------
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable

Usually this message means that the *kernel* refused to let you fork a
new process. Some resource was unavailable. So this means (usually) one
of two things:

- You have a maximum on the number of processes on the system and
you're exceeding it
- You have a limit of the amount of memory. If you have overcommit
disabled this may be causing the issue.

I couldn't find quickly details of your setup but if you're running
linux with overcommit disabled, make sure you have at least twice as
much space allocated for swap as you have real memory.

Make sure you don't have lots of idle postgres processes lying around.
You'll have to provide more detail about your system before getting any
better recommendations.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Postgresql performance in production environment

From
Magnus Hagander
Date:
Phoenix Kiula wrote:

>
>
> Well based on some past posts, I looked into my pg_log stuff and found
> a number of these lines:
>
>
> [----------------
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> ----------------]
>
>
> Which suggests that our guess of running out of connections is the right one.

No, you're not running out of "connections". You are, however, running
over some kernel limit.

> So, we have three options (to begin with) --
>
> 1. Increase the number of max_connections. This seems to be a voodoo
> art and a complex calculation of database size (which in our case is
> difficult to predict; it grows very fast), hardware, and such. I
> cannot risk other apps running on this same machine.

No. You are not yet reaching max_connections, that would give you an
error message that actually says so. This message indicates that you
have an ulimit for the account that postgresql runs under that limits
some resources - in this case most likely the number of processes. And
this limit is not "compatible" with your settings for max_connections.
You need to find this ulimit, and at least change it, or even remove it.


> 2. Use connection pooling. I've found pgpool2 and pgbouncer from the
> Skype group. Does anyone have experience using either? The latter
> looks good, although we're usually skeptical about connection pooling
> in general (or is that just the mysqli_pconnect() hangover?)

Connection pooling "in general", is something that pretty much *every*
larger app will always use. It may be implemented in the app (something
which has often been troublesome in PHP solutions, but it's certainly
the norm for Java or .Net apps) or in middleware like pgpool or
pgbouncer. There should be no need to be sceptical about it in general ;-)

Can't speak for either of those apps specifically, as I haven't used
them in production.


> 3. Use caching of queries. Memcache comes recommended, but there's a
> discussion as recently as Jan 2007 on this list about race conditions
> and such (most of which I don't quite understand) which cautions
> against its use. We do expect plenty of transactions and if something
> that has been updated is not very correctly and promptly invalidated
> in the cache, it has huge business repercussions for us.

There are ways to do this, but if you can't just use timeouts to expire
from the cache, things can become pretty complicated pretty fast. But
perhaps you can isolate some kinds of queries that can be cached for <n>
minutes, and keep the rest without caching?


//Magnus


Re: Postgresql performance in production environment

From
Magnus Hagander
Date:
Phoenix Kiula wrote:
> [Sorry for the length of this post. It stretched as I provided as much
> info as possible..]
>
> So the rubber meets the road. We've put postgresql in a production
> environment with some heavy simultaneous usage. It works well in
> general, but often PG doesn't respond. How should I test what is going
> wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs
> are happening but not much, it's mostly SELECTs. Is PGSQL running out
> of connections? We can temporarily fix this by restarting pgsql but
> I'd like a more tenable solution.
>
> Speculating that it could be some conf variable somewhere
> (max_fsm_pages in particular) I am including three things at the
> bottom of this post:
>
> 1. Our PS output (for "postgres")
> 2. *Verbose* vacuum info for a table that shows max_fsm warning
> 3. Our postgresql.conf settings
>
> My question 1 -- how should we test and tweak our production
> installation? Where should we look. In MySQL we could do a "show
> status" at the console and it would give a mountain of information.
> Then there was that handy little "tuning-primer" script that made it
> all come alive. I suppose this stuff is also available in pg_catalog
> but is there any website that goes in depth into HOW to tune, what
> different values mean, and such?
>
> My question 2 -- in production, we're constantly seeing this message
> while vacuuming one table with less than 3 million rows, but one that
> we expect to keep growing:
>
> [-------------
> WARNING:  relation "public.links" contains more than "max_fsm_pages"
> pages with useful free space
> HINT:  Consider compacting this relation or increasing the
> configuration parameter "max_fsm_pages".
> VACUUM
> -------------]
>
> I can merrily increase the "max_fsm_pages" directive, but the manual
> also caveats that with "this can use more system V memory than
> available on your system". My full verbose vacuum info below includes
> the line:

Do you actually run VACUUM FULL, or do you just mean you run VACUUM over
the full database? If you run VACUUM FULL, you need to stop doing that
:-) However, you will need to run it at least once over the whole
database once you've fixed your max_fsm_pages setting.


> [-------------
> INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
> rows and 0 dead rows; 3000 rows in sample, 2710124
> estimated total rows
> -------------]

There should be a line like this at the end of a "VACUUM VERBOSE" command:
INFO:  free space map contains 33 pages in 74 relations
DETAIL:  A total of 1184 page slots are in use (including overhead).
1184 page slots are required to track all free space.
Current limits are:  153600 page slots, 1000 relations, using 965 kB.
VACUUM


(note that my numbers are for a more or less empty database. Yours will
be much higher)

If your database size is reasonably stable, pick a good value a bit
above the numbers suggested. If you expect it to grow a lot, add some
more overhead, but monitor this value.


> Does this mean my table needs nearly 200,000 pages, and that should be
> the setting of max_fsm_pages? This server is on a fairly common setup

No. It means that the table "traders" is using 199396 pages - most of
them aren't free, so they are not tracked in the FSM.


//Magnus


Re: Postgresql performance in production environment

From
Stefan Kaltenbrunner
Date:
Phoenix Kiula wrote:
> On 19/08/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

[... ]

> Well based on some past posts, I looked into my pg_log stuff and found
> a number of these lines:
>
>
> [----------------
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> ----------------]
>
>
> Which suggests that our guess of running out of connections is the right one.
>
> So, we have three options (to begin with) --
>
> 1. Increase the number of max_connections. This seems to be a voodoo
> art and a complex calculation of database size (which in our case is
> difficult to predict; it grows very fast), hardware, and such. I
> cannot risk other apps running on this same machine.sql

this error is a sign that the OS(!) is running out of resources(or at
least won't allow pg to fork another process) - either you hit an ulimit
for the user postgresql runs under or you need to flip some kernel
setting to increase the number of processes. increasing max_connections
wil NOT help because you are not even hitting the current one yet ...

>
> 2. Use connection pooling. I've found pgpool2 and pgbouncer from the
> Skype group. Does anyone have experience using either? The latter
> looks good, although we're usually skeptical about connection pooling
> in general (or is that just the mysqli_pconnect() hangover?)

pgbouncer works quite fine here.

Stefan

Re: Postgresql performance in production environment

From
"Phoenix Kiula"
Date:
On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
> Phoenix Kiula wrote:

.....snipped....

> > I can merrily increase the "max_fsm_pages" directive, but the manual
> > also caveats that with "this can use more system V memory than
> > available on your system". My full verbose vacuum info below includes
> > the line:
>
> Do you actually run VACUUM FULL, or do you just mean you run VACUUM over
> the full database? If you run VACUUM FULL, you need to stop doing that
> :-) However, you will need to run it at least once over the whole
> database once you've fixed your max_fsm_pages setting.



No we only do a "vacuum analyze" when we do something manually.
Otherwise, it's all "autovacuum". Never done  a "vacuum full" --
should we do one now given that we've overrun our max_fsm_pages?



> > [-------------
> > INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
> > rows and 0 dead rows; 3000 rows in sample, 2710124
> > estimated total rows
> > -------------]
>
> There should be a line like this at the end of a "VACUUM VERBOSE" command:
> INFO:  free space map contains 33 pages in 74 relations
> DETAIL:  A total of 1184 page slots are in use (including overhead).
> 1184 page slots are required to track all free space.
> Current limits are:  153600 page slots, 1000 relations, using 965 kB.
> VACUUM
>


Nope, there's no line that includes such useful info. The entire
verbose output was included in my note. I did not see the words
"Current limits are". Do I need to enable something in the conf file
to get more verbose output? "debug2", "debug3" -- kind of stuff?


TIA

Re: Postgresql performance in production environment

From
"Phoenix Kiula"
Date:
On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
> Phoenix Kiula wrote:

> There are ways to do this, but if you can't just use timeouts to expire
> from the cache, things can become pretty complicated pretty fast. But
> perhaps you can isolate some kinds of queries that can be cached for <n>
> minutes, and keep the rest without caching?


Thanks. In fact we need caching on a very specific part of our
application, for only three queries which hit the DB hard with
thousands of simultaneous SELECTs.

Do pgmemcache or pgbouncer allow for very specific usage? Both look
way too complex. I don't mind the initial headachy setup and config,
but then I would like the system to hum on its own, and the querying
should be simple and intuitive.

I need a simple mechanism to query the cache, and invalidate a
specific query in the cache when the underlying table is UPDATED so
that the query gets cached afresh when issued later. (And a way to use
this mechanism through PHP or Perl would be splendid).

TIA for any tips!

Re: Postgresql performance in production environment

From
Magnus Hagander
Date:
Phoenix Kiula wrote:
> On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
>> Phoenix Kiula wrote:
>
> .....snipped....
>
>>> I can merrily increase the "max_fsm_pages" directive, but the manual
>>> also caveats that with "this can use more system V memory than
>>> available on your system". My full verbose vacuum info below includes
>>> the line:
>> Do you actually run VACUUM FULL, or do you just mean you run VACUUM over
>> the full database? If you run VACUUM FULL, you need to stop doing that
>> :-) However, you will need to run it at least once over the whole
>> database once you've fixed your max_fsm_pages setting.
>
>
>
> No we only do a "vacuum analyze" when we do something manually.
> Otherwise, it's all "autovacuum". Never done  a "vacuum full" --

Ok. That's good.

> should we do one now given that we've overrun our max_fsm_pages?

Yes, but not until you've fixed it. And only once.

>>> [-------------
>>> INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
>>> rows and 0 dead rows; 3000 rows in sample, 2710124
>>> estimated total rows
>>> -------------]
>> There should be a line like this at the end of a "VACUUM VERBOSE" command:
>> INFO:  free space map contains 33 pages in 74 relations
>> DETAIL:  A total of 1184 page slots are in use (including overhead).
>> 1184 page slots are required to track all free space.
>> Current limits are:  153600 page slots, 1000 relations, using 965 kB.
>> VACUUM
>>
>
>
> Nope, there's no line that includes such useful info. The entire
> verbose output was included in my note. I did not see the words
> "Current limits are". Do I need to enable something in the conf file
> to get more verbose output? "debug2", "debug3" -- kind of stuff?

Strange. It comes out at level INFO, and you do see other stuff at INFO
level. Any chance this just got mixed up with an autovacuum run and that
input it somewhere in the middle of your output? (that this "traders"
info is from autovac)

//Magnus

Re: Postgresql performance in production environment

From
Magnus Hagander
Date:
Phoenix Kiula wrote:
> On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
>> Phoenix Kiula wrote:
>
>> There are ways to do this, but if you can't just use timeouts to expire
>> from the cache, things can become pretty complicated pretty fast. But
>> perhaps you can isolate some kinds of queries that can be cached for <n>
>> minutes, and keep the rest without caching?
>
>
> Thanks. In fact we need caching on a very specific part of our
> application, for only three queries which hit the DB hard with
> thousands of simultaneous SELECTs.
>
> Do pgmemcache or pgbouncer allow for very specific usage? Both look
> way too complex. I don't mind the initial headachy setup and config,
> but then I would like the system to hum on its own, and the querying
> should be simple and intuitive.
>
> I need a simple mechanism to query the cache, and invalidate a
> specific query in the cache when the underlying table is UPDATED so
> that the query gets cached afresh when issued later. (And a way to use
> this mechanism through PHP or Perl would be splendid).
>
> TIA for any tips!

You can use LISTEN and NOTIFY to clear the cache, if you have many
clients that can cause cache invalidations. If you only have a single
app that can update the database, you can invalidate the cache from that
applications code directly (such as using asp.net output caching if you
were doing it in .net).
I've implemented the prior a couple of times, but it does get a bit
complex. The second part would be easier, but I don't have any direct
pointers on that since it depends on the app development framework
you're using.

//Magnus

Re: Postgresql performance in production environment

From
"Phoenix Kiula"
Date:
On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
> Phoenix Kiula wrote:
> > On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:


> > should we do one (VACUUM FULL) now given that we've overrun our max_fsm_pages?
>
> Yes, but not until you've fixed it. And only once.
>



FIxed what - the max_fsm_pages? That was my question: how to know what
value to set for this. If the "vacuum verbose" won't give me the info
you suggested because it is likely overlapping with autovacuum, should
I temporarily turn autovacuum off and then run vacuum verbose? Also,
while running vacuum full, any precautions to take?

Re: Postgresql performance in production environment

From
Magnus Hagander
Date:
Phoenix Kiula wrote:
> On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
>> Phoenix Kiula wrote:
>>> On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
>
>
>>> should we do one (VACUUM FULL) now given that we've overrun our max_fsm_pages?
>> Yes, but not until you've fixed it. And only once.
>>
>
>
>
> FIxed what - the max_fsm_pages? That was my question: how to know what
> value to set for this. If the "vacuum verbose" won't give me the info
> you suggested because it is likely overlapping with autovacuum, should
> I temporarily turn autovacuum off and then run vacuum verbose? Also,
> while running vacuum full, any precautions to take?

Yeah, you can do that - or you can just trawl back through the logs to
find that information - it's there somewhere. grep would be helpful to
find it.

vacuum full will take out blocking locks on your database, so run it
during a maintenance window or at least during a low-traffic time.

//Magnus

Re: Postgresql performance in production environment

From
"Phoenix Kiula"
Date:
On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
> Phoenix Kiula wrote:

...snip....

> There should be a line like this at the end of a "VACUUM VERBOSE" command:
> INFO:  free space map contains 33 pages in 74 relations
> DETAIL:  A total of 1184 page slots are in use (including overhead).
> 1184 page slots are required to track all free space.
> Current limits are:  153600 page slots, 1000 relations, using 965 kB.
> VACUUM
>


I ran the vacuum analyze verbose again, and did not see anything like
that. Should I run a vacuum alone?

In any case, in your example, which number would I take note of, and
derive the max_fsm_pages from?

I do notice this in my own output:

     There were 2959498 unused item pointers.
     133616 pages contain useful free space.
     0 pages are entirely empty.

Does this mean I should have over 133,616 in my max_fsm_pages. Should
I set it up at 150,000 for example?

Secondly, the max_fsm_relations -- if I have about 150 "relations" in
my database (relations as per PGSQL lingo) then can this figure be,
say, 200? Or does this have to match max_fsm_pages?

Many thanks

Re: Postgresql performance in production environment

From
"Phoenix Kiula"
Date:
Btw, related to one my earlier questions: where can I see how many
connections are being made to the DB, what was the maximum number
attempted at any given time, and so on? The connections related info.

Thanks!

Re: Postgresql performance in production environment

From
Magnus Hagander
Date:
Phoenix Kiula wrote:
> On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
>> Phoenix Kiula wrote:
>
> ...snip....
>
>> There should be a line like this at the end of a "VACUUM VERBOSE" command:
>> INFO:  free space map contains 33 pages in 74 relations
>> DETAIL:  A total of 1184 page slots are in use (including overhead).
>> 1184 page slots are required to track all free space.
>> Current limits are:  153600 page slots, 1000 relations, using 965 kB.
>> VACUUM
>>
>
>
> I ran the vacuum analyze verbose again, and did not see anything like
> that. Should I run a vacuum alone?

It shows up in both variants for me.
Oh, hang on. I think it only shows up if you're logged in with a
superuser - table owner is not enough. Check that.


> In any case, in your example, which number would I take note of, and
> derive the max_fsm_pages from?

The 1184 number (the one for "page slots are required to track")


> I do notice this in my own output:
>
>      There were 2959498 unused item pointers.
>      133616 pages contain useful free space.
>      0 pages are entirely empty.
>
> Does this mean I should have over 133,616 in my max_fsm_pages. Should
> I set it up at 150,000 for example?

Probably not enough - that's for a single table, no?

> Secondly, the max_fsm_relations -- if I have about 150 "relations" in
> my database (relations as per PGSQL lingo) then can this figure be,
> say, 200? Or does this have to match max_fsm_pages?

No need to match. If you have 150 relations, 200 is a reasonable value.
But once you get the proper output from the vacuum command, it tells you
that as well (74 in my example above)

As for your other question,how to view connections. Use "SELECT * FROM
pg_stat_activity". See
http://www.postgresql.org/docs/8.2/static/monitoring-stats.html.

//Magnus



Re: Postgresql performance in production environment

From
"Phoenix Kiula"
Date:
On 19/08/07, Magnus Hagander <magnus@hagander.net> wrote:
> Phoenix Kiula wrote:
> No need to match. If you have 150 relations, 200 is a reasonable value.
> But once you get the proper output from the vacuum command, it tells you
> that as well (74 in my example above)



Found it! You get those words if you do a generic "vacuum verbose",
not a specific "vacuum verbose MYTABLE". In hindsight, the conf
variable is for the entire database, so it makes sense to do a generic
one!

Here is my output:

[----------
INFO:  free space map contains 76059 pages in 32 relations
DETAIL:  A total of 136688 page slots are in use (including overhead).
136688 page slots are required to track all free space.
Current limits are:  150000 page slots, 200 relations, using 893 kB.
VACUUM
Time: 202065.807 ms
----------]

Now, will this value of "136688" keep incrementing, or will autovacuum
keep it in check? I have increased my max_fsm_pages to 150,000 as you
can see.

Thanks for the "Monitoring Stats" link. Looks like pgsql is a bit more
involved. I was looking for information that would allow me to set my
"max_connections" well. The command you gave only shows currently
active users, not the historic peak of connections for instance. I'll
keep digging tha manual but would love any nudges in the right
direction, thanks!

Re: Postgresql performance in production environment

From
"Webb Sprague"
Date:
> The command you gave only shows currently
> active users, not the historic peak of connections for instance. I'll
> keep digging tha manual but would love any nudges in the right
> direction, thanks!

Can you set up a snapshot in a cronjob?  It would still only be sample
of a sample, but?

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>