Postgresql performance in production environment - Mailing list pgsql-general

From Phoenix Kiula
Subject Postgresql performance in production environment
Date
Msg-id e373d31e0708190128i928bd02v629a49660601cca@mail.gmail.com
Whole thread Raw
Responses Re: Postgresql performance in production environment
Re: Postgresql performance in production environment
List pgsql-general
[Sorry for the length of this post. It stretched as I provided as much
info as possible..]

So the rubber meets the road. We've put postgresql in a production
environment with some heavy simultaneous usage. It works well in
general, but often PG doesn't respond. How should I test what is going
wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs
are happening but not much, it's mostly SELECTs. Is PGSQL running out
of connections? We can temporarily fix this by restarting pgsql but
I'd like a more tenable solution.

Speculating that it could be some conf variable somewhere
(max_fsm_pages in particular) I am including three things at the
bottom of this post:

1. Our PS output (for "postgres")
2. *Verbose* vacuum info for a table that shows max_fsm warning
3. Our postgresql.conf settings

My question 1 -- how should we test and tweak our production
installation? Where should we look. In MySQL we could do a "show
status" at the console and it would give a mountain of information.
Then there was that handy little "tuning-primer" script that made it
all come alive. I suppose this stuff is also available in pg_catalog
but is there any website that goes in depth into HOW to tune, what
different values mean, and such?

My question 2 -- in production, we're constantly seeing this message
while vacuuming one table with less than 3 million rows, but one that
we expect to keep growing:

[-------------
WARNING:  relation "public.links" contains more than "max_fsm_pages"
pages with useful free space
HINT:  Consider compacting this relation or increasing the
configuration parameter "max_fsm_pages".
VACUUM
-------------]

I can merrily increase the "max_fsm_pages" directive, but the manual
also caveats that with "this can use more system V memory than
available on your system". My full verbose vacuum info below includes
the line:

[-------------
INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
rows and 0 dead rows; 3000 rows in sample, 2710124
estimated total rows
-------------]

Does this mean my table needs nearly 200,000 pages, and that should be
the setting of max_fsm_pages? This server is on a fairly common setup
these days: Dual AMD Opterons, 4GB memory, SATA RAID 1, 250GB each. I
don't mind letting postgres use up to 1GB of the memory for itself,
but the rest is needed for others.

From http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html
, it seems the "max_fsm_relations" is about how many tables and
indexes can be tracked in the free space map. Does this mean the
number of actual tables and indexes in postgres databases, or
instances of these tables? For example, if I only run 5 databases,
each of which have about 10 tables and 20 indexes, then I have only
150 (5 * 30) actual "relations" in postgresql lingo. So my
max_fsm_relations setting can be 150? (Which seems a little low
compared to what I see online in several posts online).

Assuming 150 is ok, that manual page, and other tweaking stuff such as -
http://www.revsys.com/writings/postgresql-performance.html -- suggest
that "max_fsm_pages" is even more critical. The manual says this
should be at least 16 times that of max_fsm_relations, so in my
example, it should be at least 150 * 16, which is about  2400. This
seems abysmally low! If I up this figure to, say, 24000 instead, I
still keep seeing the kinds of errors posted above.


My question no. 3 -- for a SELECT-heavy database, is there any
tried-and-tested caching tool that could be of use? I'd like to skip
connection pooling if possible, but would be very interested in good
caching products or contribs. I noticed pgmemcached (
http://pgfoundry.org/projects/pgmemcache/ ) but it's in beta 1.2.
Would love some thoughts from people who have used it...is it worth
the effort?


More of my info below. TIA for your thoughts and advice!

-/Phoenix




==========EXHIBIT 1: PS OUTPUT ==============
> ps auxnm | grep postgres
      26 20665  0.0  0.0 11760  612 ?        -    Aug18   0:00
postgres: logger process
      26 20670  0.0  1.1 188684 48312 ?      -    Aug18   0:00
postgres: writer process
      26 20671  0.0  0.0 12032  804 ?        -    Aug18   0:28
postgres: stats collector process
      26 14497  0.0  4.1 452108 172656 ?     -    02:05   0:02
postgres: traders_traders traders 127.0.0.1(56204) VACUUM
       0  9444  0.0  0.0  5008  656 pts/0    -    02:53   0:00 grep postgres




==========EXHIBIT 2: POSTGRES.CONF ==============
listen_addresses = 'localhost,*'

max_connections = 250
shared_buffers = 21000   # Not much more: http://snipr.com/pgperf
effective_cache_size = 32000
max_fsm_relations = 500
max_fsm_pages = 60000
sort_mem = 4096           # Low when not needed: http://snipr.com/pgperf
work_mem = 4096
temp_buffers = 4096
authentication_timeout = 10s
ssl = off

#VACUUM SETTINGS
autovacuum = on
vacuum_cost_delay = 10
stats_start_collector = on
stats_row_level = on
autovacuum_vacuum_threshold = 300
autovacuum_analyze_threshold = 100

#FOR BACKGROUND TASKS PERFORMANCE
wal_buffers=64
checkpoint_segments=128
checkpoint_timeout=900
fsync = on
maintenance_work_mem = 256MB  # Too high? Well, watch for it...




========== EXHIBIT 3: VACUUM VERBOSE OUTPUT =======

mydbuser=# vacuum analyze verbose traders;

INFO:  vacuuming "public.traders"
INFO:  scanned index "traders_pkey" to remove 6 row versions
DETAIL:  CPU 0.07s/0.04u sec elapsed 140.61 sec.
INFO:  scanned index "idx_traders_userid" to remove 6 row versions
DETAIL:  CPU 0.05s/0.04u sec elapsed 49.70 sec.
INFO:  scanned index "idx_traders_mdate" to remove 6 row versions
DETAIL:  CPU 0.02s/0.04u sec elapsed 32.66 sec.
INFO:  scanned index "traders_unique_alias" to remove 6 row versions
DETAIL:  CPU 0.10s/0.11u sec elapsed 167.20 sec.
INFO:  "traders": removed 6 row versions in 5 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "traders_pkey" now contains 2780925 row versions in 22821 pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_traders_userid" now contains 2780925 row versions in
11785 pages
DETAIL:  6 index row versions were removed.
127 index pages have been deleted, 127 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_traders_mdate" now contains 2780925 row versions in 7912 pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "traders_unique_alias" now contains 2780925 row versions
in 9342 pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "traders": found 6 removable, 2780925 nonremovable row versions
in 199396 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 2959732 unused item pointers.
137272 pages contain useful free space.
0 pages are entirely empty.
CPU 0.74s/0.40u sec elapsed 1335.71 sec.
WARNING:  relation "public.traders" contains more than "max_fsm_pages"
pages with useful free space
HINT:  Consider compacting this relation or increasing the
configuration parameter "max_fsm_pages".
INFO:  vacuuming "pg_toast.pg_toast_41513"
INFO:  index "pg_toast_41513_index" now contains 26 row versions in 2 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_41513": found 0 removable, 26 nonremovable row
versions in 5 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
4 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  analyzing "public.traders"
INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
rows and 0 dead rows; 3000 rows in sample, 2710124 estimated total
rows
VACUUM
Time: 1533601.235 ms

pgsql-general by date:

Previous
From: "Vance Maverick"
Date:
Subject: Re: SUBSTRING performance for large BYTEA
Next
From: "Phoenix Kiula"
Date:
Subject: Re: Postgresql performance in production environment