Re: Postgresql performance in production environment - Mailing list pgsql-general

From Phoenix Kiula
Subject Re: Postgresql performance in production environment
Date
Msg-id e373d31e0708190215x1fa26e78ma5fac0058e8ee461@mail.gmail.com
Whole thread Raw
In response to Postgresql performance in production environment  ("Phoenix Kiula" <phoenix.kiula@gmail.com>)
Responses Re: Postgresql performance in production environment  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Postgresql performance in production environment  (Magnus Hagander <magnus@hagander.net>)
Re: Postgresql performance in production environment  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
List pgsql-general
On 19/08/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> [Sorry for the length of this post. It stretched as I provided as much
> info as possible..]
>
> So the rubber meets the road. We've put postgresql in a production
> environment with some heavy simultaneous usage. It works well in
> general, but often PG doesn't respond. How should I test what is going
> wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs
> are happening but not much, it's mostly SELECTs. Is PGSQL running out
> of connections? We can temporarily fix this by restarting pgsql but
> I'd like a more tenable solution.
>
> Speculating that it could be some conf variable somewhere
> (max_fsm_pages in particular) I am including three things at the
> bottom of this post:
>
> 1. Our PS output (for "postgres")
> 2. *Verbose* vacuum info for a table that shows max_fsm warning
> 3. Our postgresql.conf settings
>
> My question 1 -- how should we test and tweak our production
> installation? Where should we look. In MySQL we could do a "show
> status" at the console and it would give a mountain of information.
> Then there was that handy little "tuning-primer" script that made it
> all come alive. I suppose this stuff is also available in pg_catalog
> but is there any website that goes in depth into HOW to tune, what
> different values mean, and such?
>
> My question 2 -- in production, we're constantly seeing this message
> while vacuuming one table with less than 3 million rows, but one that
> we expect to keep growing:
>
> [-------------
> WARNING:  relation "public.links" contains more than "max_fsm_pages"
> pages with useful free space
> HINT:  Consider compacting this relation or increasing the
> configuration parameter "max_fsm_pages".
> VACUUM
> -------------]
>
> I can merrily increase the "max_fsm_pages" directive, but the manual
> also caveats that with "this can use more system V memory than
> available on your system". My full verbose vacuum info below includes
> the line:
>
> [-------------
> INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
> rows and 0 dead rows; 3000 rows in sample, 2710124
> estimated total rows
> -------------]
>
> Does this mean my table needs nearly 200,000 pages, and that should be
> the setting of max_fsm_pages? This server is on a fairly common setup
> these days: Dual AMD Opterons, 4GB memory, SATA RAID 1, 250GB each. I
> don't mind letting postgres use up to 1GB of the memory for itself,
> but the rest is needed for others.
>
> From http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html
> , it seems the "max_fsm_relations" is about how many tables and
> indexes can be tracked in the free space map. Does this mean the
> number of actual tables and indexes in postgres databases, or
> instances of these tables? For example, if I only run 5 databases,
> each of which have about 10 tables and 20 indexes, then I have only
> 150 (5 * 30) actual "relations" in postgresql lingo. So my
> max_fsm_relations setting can be 150? (Which seems a little low
> compared to what I see online in several posts online).
>
> Assuming 150 is ok, that manual page, and other tweaking stuff such as -
> http://www.revsys.com/writings/postgresql-performance.html -- suggest
> that "max_fsm_pages" is even more critical. The manual says this
> should be at least 16 times that of max_fsm_relations, so in my
> example, it should be at least 150 * 16, which is about  2400. This
> seems abysmally low! If I up this figure to, say, 24000 instead, I
> still keep seeing the kinds of errors posted above.
>
>
> My question no. 3 -- for a SELECT-heavy database, is there any
> tried-and-tested caching tool that could be of use? I'd like to skip
> connection pooling if possible, but would be very interested in good
> caching products or contribs. I noticed pgmemcached (
> http://pgfoundry.org/projects/pgmemcache/ ) but it's in beta 1.2.
> Would love some thoughts from people who have used it...is it worth
> the effort?
>
>
> More of my info below. TIA for your thoughts and advice!
>
> -/Phoenix
>
>
>
>
> ==========EXHIBIT 1: PS OUTPUT ==============
> > ps auxnm | grep postgres
>       26 20665  0.0  0.0 11760  612 ?        -    Aug18   0:00
> postgres: logger process
>       26 20670  0.0  1.1 188684 48312 ?      -    Aug18   0:00
> postgres: writer process
>       26 20671  0.0  0.0 12032  804 ?        -    Aug18   0:28
> postgres: stats collector process
>       26 14497  0.0  4.1 452108 172656 ?     -    02:05   0:02
> postgres: traders_traders traders 127.0.0.1(56204) VACUUM
>        0  9444  0.0  0.0  5008  656 pts/0    -    02:53   0:00 grep postgres
>
>
>
>
> ==========EXHIBIT 2: POSTGRES.CONF ==============
> listen_addresses = 'localhost,*'
>
> max_connections = 250
> shared_buffers = 21000   # Not much more: http://snipr.com/pgperf
> effective_cache_size = 32000
> max_fsm_relations = 500
> max_fsm_pages = 60000
> sort_mem = 4096           # Low when not needed: http://snipr.com/pgperf
> work_mem = 4096
> temp_buffers = 4096
> authentication_timeout = 10s
> ssl = off
>
> #VACUUM SETTINGS
> autovacuum = on
> vacuum_cost_delay = 10
> stats_start_collector = on
> stats_row_level = on
> autovacuum_vacuum_threshold = 300
> autovacuum_analyze_threshold = 100
>
> #FOR BACKGROUND TASKS PERFORMANCE
> wal_buffers=64
> checkpoint_segments=128
> checkpoint_timeout=900
> fsync = on
> maintenance_work_mem = 256MB  # Too high? Well, watch for it...
>
>
>
>
> ========== EXHIBIT 3: VACUUM VERBOSE OUTPUT =======
>
> mydbuser=# vacuum analyze verbose traders;
>
> INFO:  vacuuming "public.traders"
> INFO:  scanned index "traders_pkey" to remove 6 row versions
> DETAIL:  CPU 0.07s/0.04u sec elapsed 140.61 sec.
> INFO:  scanned index "idx_traders_userid" to remove 6 row versions
> DETAIL:  CPU 0.05s/0.04u sec elapsed 49.70 sec.
> INFO:  scanned index "idx_traders_mdate" to remove 6 row versions
> DETAIL:  CPU 0.02s/0.04u sec elapsed 32.66 sec.
> INFO:  scanned index "traders_unique_alias" to remove 6 row versions
> DETAIL:  CPU 0.10s/0.11u sec elapsed 167.20 sec.
> INFO:  "traders": removed 6 row versions in 5 pages
> DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "traders_pkey" now contains 2780925 row versions in 22821 pages
> DETAIL:  6 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "idx_traders_userid" now contains 2780925 row versions in
> 11785 pages
> DETAIL:  6 index row versions were removed.
> 127 index pages have been deleted, 127 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "idx_traders_mdate" now contains 2780925 row versions in 7912 pages
> DETAIL:  6 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "traders_unique_alias" now contains 2780925 row versions
> in 9342 pages
> DETAIL:  6 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "traders": found 6 removable, 2780925 nonremovable row versions
> in 199396 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 2959732 unused item pointers.
> 137272 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 0.74s/0.40u sec elapsed 1335.71 sec.
> WARNING:  relation "public.traders" contains more than "max_fsm_pages"
> pages with useful free space
> HINT:  Consider compacting this relation or increasing the
> configuration parameter "max_fsm_pages".
> INFO:  vacuuming "pg_toast.pg_toast_41513"
> INFO:  index "pg_toast_41513_index" now contains 26 row versions in 2 pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "pg_toast_41513": found 0 removable, 26 nonremovable row
> versions in 5 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 4 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.01 sec.
> INFO:  analyzing "public.traders"
> INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
> rows and 0 dead rows; 3000 rows in sample, 2710124 estimated total
> rows
> VACUUM
> Time: 1533601.235 ms
>




Well based on some past posts, I looked into my pg_log stuff and found
a number of these lines:


[----------------
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
LOG:  could not fork new process for connection: Resource temporarily
unavailable
----------------]


Which suggests that our guess of running out of connections is the right one.

So, we have three options (to begin with) --

1. Increase the number of max_connections. This seems to be a voodoo
art and a complex calculation of database size (which in our case is
difficult to predict; it grows very fast), hardware, and such. I
cannot risk other apps running on this same machine.

2. Use connection pooling. I've found pgpool2 and pgbouncer from the
Skype group. Does anyone have experience using either? The latter
looks good, although we're usually skeptical about connection pooling
in general (or is that just the mysqli_pconnect() hangover?)

3. Use caching of queries. Memcache comes recommended, but there's a
discussion as recently as Jan 2007 on this list about race conditions
and such (most of which I don't quite understand) which cautions
against its use. We do expect plenty of transactions and if something
that has been updated is not very correctly and promptly invalidated
in the cache, it has huge business repercussions for us.

I'd love to hear other recommendations.

This is for the connections bit. I'm also looking for advice on the
max_fsm_pages stuff. That's another voodoo!!

pgsql-general by date:

Previous
From: "Phoenix Kiula"
Date:
Subject: Postgresql performance in production environment
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Postgresql performance in production environment