Thread: Huge number of INSERTs

Huge number of INSERTs

From

Phoenix Kiula

Date:

17 November 2011, 20:44:43

Hi. I have a massive traffic website.

I keep getting "FATAL: Sorry, too many clients already" problems.

It's a Quad core machine with dual servers, 4 SCSI disks with RAID 10,
with RAM of 8GB.

Server is Nginx backed by Apache for the php.

Postgresql just has to do about 1000 SELECTs a minute, and about 200
INSERTs a minute. Maybe 10-20 UPDATEs.

My conf file is below. My vmstat + top are below too.

What else can I do?








max_connections                 = 350
shared_buffers                  = 256MB
effective_cache_size            = 1400MB   # Nov 11 2011, was 1500MB
temp_buffers                    = 16MB     # min 800kB
maintenance_work_mem            = 256MB    # min 1MB
wal_buffers                     = 12MB     # min 32kB
fsync                           = on       # turns forced synchronization on or off
checkpoint_segments             = 128       # was 128
checkpoint_timeout              = 1000     # was 1000
enable_indexscan                = on

#------------------------- LOGGING ----------------------
log_directory                   = 'pg_log'
log_filename                    = 'pglog.postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age                = 1d
log_min_messages                = 'error'
log_min_error_statement         = 'error'
log_min_duration_statement    = 5000       # In milliseconds
client_min_messages             = 'warning'
log_duration                    = off

#------------------------- AUTOVAC ----------------------
autovacuum                      = on
autovacuum_max_workers          = 5     # max number of autovacuum subprocesses
autovacuum_vacuum_cost_delay    = 10ms
autovacuum_vacuum_cost_limit    = 350









vmstat
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 6  1   4044 101396  84376 5569592    0    0   168   221  326  200 55
22 21  1  0







top - 19:43:49 up  7:33,  3 users,  load average: 19.63, 19.61, 19.25
Tasks: 663 total,  19 running, 644 sleeping,   0 stopped,   0 zombie
Cpu(s): 65.8%us, 15.5%sy,  0.0%ni,  1.7%id,  0.1%wa,  0.0%hi, 17.0%si,  0.0%st
Mem:   8177444k total,  8062608k used,   114836k free,    84440k buffers
Swap:  2096440k total,     4044k used,  2092396k free,  5572456k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6337 postgres  15   0  397m 100m  97m S  2.3  1.3   0:16.56 postgres:
MYDB_MYDB MYDB 127.0.0.1(60118) SELECT
  424 postgres  15   0  397m 101m  98m S  2.0  1.3   1:01.79 postgres:
MYDB_MYDB MYDB 127.0.0.1(37036) SELECT
 2887 postgres  15   0  397m 100m  98m S  2.0  1.3   0:34.55 postgres:
MYDB_MYDB MYDB 127.0.0.1(57710) SELECT
 3030 postgres  15   0  397m 101m  98m S  2.0  1.3   0:32.35 postgres:
MYDB_MYDB MYDB 127.0.0.1(45574) SELECT
 5273 postgres  15   0  397m 100m  98m S  2.0  1.3   0:22.38 postgres:
MYDB_MYDB MYDB 127.0.0.1(52143) SELECT
 5560 postgres  15   0  397m 100m  98m S  2.0  1.3   0:20.05 postgres:
MYDB_MYDB MYDB 127.0.0.1(56767) SELECT
 5613 postgres  16   0  397m 100m  98m S  2.0  1.3   0:19.51 postgres:
MYDB_MYDB MYDB 127.0.0.1(57745) SELECT
 5652 postgres  15   0  397m 100m  98m S  2.0  1.3   0:19.76 postgres:
MYDB_MYDB MYDB 127.0.0.1(58464) SELECT
32062 postgres  15   0  397m 101m  98m S  2.0  1.3   1:55.79 postgres:
MYDB_MYDB MYDB 127.0.0.1(55341) SELECT
  358 postgres  15   0  397m 101m  98m S  1.6  1.3   1:04.11 postgres:
MYDB_MYDB MYDB 127.0.0.1(35841) SELECT
  744 postgres  15   0  397m 101m  98m S  1.6  1.3   0:53.01 postgres:
MYDB_MYDB MYDB 127.0.0.1(50058) SELECT
  903 postgres  15   0  397m 101m  98m S  1.6  1.3   0:50.79 postgres:
MYDB_MYDB MYDB 127.0.0.1(51258) SELECT
  976 postgres  15   0  397m 101m  98m S  1.6  1.3   0:48.24 postgres:
MYDB_MYDB MYDB 127.0.0.1(52828) SELECT
 1011 postgres  15   0  397m 101m  98m S  1.6  1.3   0:48.20 postgres:
MYDB_MYDB MYDB 127.0.0.1(53503) SELECT
 2446 postgres  15   0  397m 101m  98m S  1.6  1.3   0:38.97 postgres:
MYDB_MYDB MYDB 127.0.0.1(51982) SELECT
 2806 postgres  16   0  397m 100m  98m R  1.6  1.3   0:34.83 postgres:
MYDB_MYDB MYDB 127.0.0.1(57204) SELECT
 3361 postgres  15   0  397m 101m  98m R  1.6  1.3   0:30.32 postgres:
MYDB_MYDB MYDB 127.0.0.1(48782) idle
 3577 postgres  15   0  397m 100m  98m S  1.6  1.3   0:27.92 postgres:
MYDB_MYDB MYDB 127.0.0.1(52019) SELECT
 3618 postgres  15   0  397m 101m  98m S  1.6  1.3   0:27.53 postgres:
MYDB_MYDB MYDB 127.0.0.1(41291) SELECT
 3704 postgres  15   0  397m 100m  98m S  1.6  1.3   0:25.70 postgres:
MYDB_MYDB MYDB 127.0.0.1(43642) SELECT
 5073 postgres  15   0  397m 100m  98m S  1.6  1.3   0:23.92 postgres:
MYDB_MYDB MYDB 127.0.0.1(47398) SELECT
 5185 postgres  15   0  397m 100m  98m S  1.6  1.3   0:23.03 postgres:
MYDB_MYDB MYDB 127.0.0.1(49137) SELECT
 5528 postgres  15   0  397m 100m  98m S  1.6  1.3   0:20.81 postgres:
MYDB_MYDB MYDB 127.0.0.1(55531) SELECT
 5549 postgres  15   0  397m 100m  98m S  1.6  1.3   0:20.71 postgres:
MYDB_MYDB MYDB 127.0.0.1(56391) SELECT
 5976 postgres  16   0  397m 100m  98m R  1.6  1.3   0:17.47 postgres:
MYDB_MYDB MYDB 127.0.0.1(57053) idle
 6301 postgres  15   0  397m 100m  97m S  1.6  1.3   0:16.58 postgres:
MYDB_MYDB MYDB 127.0.0.1(59544) SELECT
32318 postgres  15   0  397m 101m  98m S  1.6  1.3   1:24.09 postgres:
MYDB_MYDB MYDB 127.0.0.1(32942) SELECT
32728 postgres  15   0  397m 101m  98m S  1.6  1.3   1:09.87 postgres:
MYDB_MYDB MYDB 127.0.0.1(33792) SELECT
  377 postgres  16   0  397m 101m  98m S  1.3  1.3   1:03.51 postgres:
MYDB_MYDB MYDB 127.0.0.1(35925) SELECT

Re: Huge number of INSERTs

From

John R Pierce

Date:

17 November 2011, 21:04:13

On 11/17/11 4:44 PM, Phoenix Kiula wrote:
> I keep getting "FATAL: Sorry, too many clients already" problems.
>
> It's a Quad core machine with dual servers, 4 SCSI disks with RAID 10,
> with RAM of 8GB.
>
> Server is Nginx backed by Apache for the php.
>
> Postgresql just has to do about 1000 SELECTs a minute, and about 200
> INSERTs a minute. Maybe 10-20 UPDATEs.
>
> My conf file is below. My vmstat + top are below too.

are you using a connection pool?  it should be, pgbouncer or something,
the pooling built into php is weak sauce.   your php pages should be
grabbing a pool connection, doing their thing, releasing the pool
connection.



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast

Re: Huge number of INSERTs

From

Steve Crawford

Date:

17 November 2011, 21:26:21

On 11/17/2011 04:44 PM, Phoenix Kiula wrote:
> Hi. I have a massive traffic website.
"Massive" = what, exactly?
> I keep getting "FATAL: Sorry, too many clients already" problems.
>
> It's a Quad core machine with dual servers, 4 SCSI disks with RAID 10,
> with RAM of 8GB.
Database only? Or is it also your webserver?
> Server is Nginx backed by Apache for the php.
>
> Postgresql just has to do about 1000 SELECTs a minute, and about 200
> INSERTs a minute. Maybe 10-20 UPDATEs.
>
> My conf file is below. My vmstat + top are below too.
>
> What else can I do?
Provide more info.

What version of PostgreSQL? What OS? What OS tuning, if any, have you
done? (Have you increased readahead? Changed swappiness, turned off
atime on your mounts, made syslogging asynchronous, etc?). Does your
RAID have battery-backed cache? What are the cache settings?

What is the nature of the queries? Single record inserts or bulk? Same
for the selects. Have you run analyze on them and optimized the queries?
What is the typical duration of your queries? Are lots of queries
duplicated (caching candidates)? What is the size of your database? Do
you have any bandwidth bottleneck to the Internet?

Is this your database server only or is it running web and/or other
processes? How long does a typical web-request take to handle?

At first blush, and shooting in the dark, I'll guess there are lots of
things you can do. Your shared_buffers seems a bit low - a rough
starting point would be closer to 25% of your available RAM.

You are a prime candidate for using a connection pooler. I have had good
luck with pgbouncer but there are others.

If you have lots of repeated queries, you could benefit from memcached
or similar.

If your typical web request involves a database hit, there is not really
a benefit to having so many web processes that you exhaust your database
connections. At least until you fix the underlying issues, you might
want to decrease the maximum number of allowed web connections. (If you
server lots of static content, you may want to adjust your process count
accordingly).

Note: bandwidth bottlenecks can screw everything up. Your web processes
stay alive dribbling the data to the client and, even though they don't
have much work to do, they are still holding database connections, using
memory, etc. Such cases can often benefit from a reverse proxy.

Provide more data and we can provide more assistance.

Cheers,
Steve

Re: Huge number of INSERTs

From

"Tomas Vondra"

Date:

17 November 2011, 22:41:57

Hi, there's a pretty wiki page about tuning PostgreSQL databases:

http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

On 18 Listopad 2011, 1:44, Phoenix Kiula wrote:
> Hi. I have a massive traffic website.
>
> I keep getting "FATAL: Sorry, too many clients already" problems.

That has nothing to do with the inserts, it means the number of connection
requests exceeds the max_connections. You've set it to 350, and that seems
too high - the processes are going to struggle for resources (CPU, I/O and
memory) and the scheduling gets expensive too.

A good starting point is usually 2*(number of cores + number of drives)
which is 16 or 24 (not sure what a "dual server" is - probably dual CPU).
You may increase that if the database more or less fits into memory (so
less I/O is needed).

But go step by step - by 10 connections or something like that. The
problem is that each connection can allocate memory (work_mem), and if you
have too many connections doing that at the same time you'll get OOM or a
swapping system (and that's not what you want). And you should give the
sessions enough memory, because otherwise they're going to do on-disk
sort.

So you have to keep in mind these "rules"

(1) give each session enough memory to perform the operations in RAM
(enough work_mem to sort in memory etc.), but not more

(2) don't use too many connections - watch the I/O utilization and don't
overload it (you won't get higher throughput, just higher latencies)

BTW the same rule holds for the number of Apache workers - how many are
you using? Is that on the same machine or on a dedicated one? The fact
that you're receiving "too many clients" suggests that you have MaxClients
higher than 350. Have you actually tested this to see if it gives better
performance than 50? If the clients actually need to connect / query the
database, there's probably no point in having more than max_connections of
them.

> It's a Quad core machine with dual servers, 4 SCSI disks with RAID 10,
> with RAM of 8GB.
>
> Server is Nginx backed by Apache for the php.
>
> Postgresql just has to do about 1000 SELECTs a minute, and about 200
> INSERTs a minute. Maybe 10-20 UPDATEs.

That's completely information-less description. Those SELECTs may be a
simple "fetch by PK" queries or complex queries aggregating data from 20
tables. So the fact that you need to execute 1000 of them is useless.

The same for UPDATEs and INSERTs.

Post an example of the queries with EXPLAIN ANALYZE for each of them (use
explain.depesz.com to post it).

> My conf file is below. My vmstat + top are below too.

A static (single line) of vmstat is not very useful - we need a few lines
of "vmstat 1" (say 30) collected when the application is in use.

> What else can I do?

1) Decrease the number of connections to a reasonable value.

2) Use a connection pooler. You may also use persistent connections in PHP
too, but you have to set MaxClients in apache config to the same value
(otherwise you'll get "too many clients"). The connection pooler can
handle this for you - it will wait until a connection is available.

And the most important thing - prepare a simple stress script (a few HTTP
requests, performed by a typical client) and use it to stress test the
application. Start with low max_connections / MaxClients (say 20),
increase them gradually and watch the performance (throughput).

The usual behavior is that at the beginning the throughput scales linearly
(2 clients give you 2x the throughput of 1 client, with the same latency).
Then this growth stops and the throughput does not grow anymore - adding
more clients just increases the latency. Then the throughput usually goes
down.

Tomas

Re: Huge number of INSERTs

From

Phoenix Kiula

Date:

18 November 2011, 08:30:42

On Fri, Nov 18, 2011 at 9:25 AM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:

>> Database only? Or is it also your webserver?

It's my webserver and DB. Webserver is nginx, proxying all PHP
requests to apache in the backend.

> What version of PostgreSQL? What OS? What OS tuning, if any, have you done?
> (Have you increased readahead? Changed swappiness, turned off atime on your
> mounts, made syslogging asynchronous, etc?). Does your RAID have
> battery-backed cache? What are the cache settings?

PG 9.0.5

CentOS 5 64 bit

OS tuning - lots of it since the beginning of time. What specifically
would you like to know? Please let me know and I can share info. Like
SHM Max and Min variables type of things?

RAID has the 3Com battery backed cache, yes. Not reporting any errors.

> What is the nature of the queries? Single record inserts or bulk? Same for
> the selects. Have you run analyze on them and optimized the queries?

Simple INSERTs. Into a table with 6 columns. Column 1 is a primary
key, column 5 is a date. There are two indexes on this table, on the
pkey (col1) and one on the date (col5).

SELECTs are simple straight selects, based on pkey with limit 1. No
joins, no sorting.

> What is
> the typical duration of your queries? Are lots of queries duplicated
> (caching candidates)?

The bulk of the big SELECTs are in "memcached". Much faster than PG.

It's INSERTs I don't know what to do with. Memcached is not a good
solution for INSERTs, which do need to go into a proper DB.

> What is the size of your database? Do you have any
> bandwidth bottleneck to the Internet?

Full DB:   32GB
The big table referenced above:  28 GB

It's inserts into this one that are taking time.

> Is this your database server only or is it running web and/or other
> processes? How long does a typical web-request take to handle?

How can I measure the time taken per web request? Nginx is super fast,
based on apache bench. Apache -- how do I test it? Don't want to do
fake inserts. With selects, apache bench uses memcached instead..

> At first blush, and shooting in the dark, I'll guess there are lots of
> things you can do. Your shared_buffers seems a bit low - a rough starting
> point would be closer to 25% of your available RAM.

If I make shared_buffers too high, (2GB....25% of my 8GB RAM), this
brings the server to its knees instantly. Probably because I have
apache, nginx, memcached running on the same server. Nginx and
memcached are negligible in terms of memory consumption.

> You are a prime candidate for using a connection pooler. I have had good
> luck with pgbouncer but there are others.

Will pgbouncer or pgpool help with INSERTs?

> Note: bandwidth bottlenecks can screw everything up. Your web processes stay
> alive dribbling the data to the client and, even though they don't have much
> work to do, they are still holding database connections, using memory, etc.
> Such cases can often benefit from a reverse proxy.

In addition to nginx proxying to apache, I am using CloudFlare. Is
this a problem?

Many thanks for the informative seeking of information. Hope the above
details shed more light?

I've currently disabled any INSERT functions on my website...but even
with disabled INSERTs and only SELECTs alive, I still see the "psql:
FATAL:  sorry, too many clients already" message.

Btw, I don't see any PG logs. What could be the problem? The config
says that it should store it in the directory "pg_log", but this
directory is empty.

Also, here's the output of "vmstat 5 10"

> vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
11  3  17672  44860  24084 6559348    0    0   147   275   17   63 64
26  9  1  0
14  3  14376  48820  24208 6555968  438    0 24374  1287 1529 56176 73
26  1  0  0
13  2  14112  47320  24344 6555916   10    2 27350  1219 1523 57979 72
27  1  0  0
20  2  14100  46672  24468 6553420    2    3 28473  1172 1499 59492 71
27  1  0  0
17  3  10400  46284  24524 6548520  730    1 22237  1164 1482 59761 68
31  1  0  0
18  2   7984  45708  24712 6552308  478    0 26966  1164 1487 58218 69
30  1  0  0
12  2   7980  47636  24816 6549020    2    1 25210  1134 1486 57972 71
27  1  1  0
18  1   7924  44300  25108 6548836    1    0 25918  1310 1515 60067 70
28  1  1  0
18  2   7812  45444  25288 6543668   26    0 26474  1326 1465 62633 70
29  1  0  0
22  2   7800  46852  25488 6542360    0    0 25620  1258 1510 63181 69
29  1  1  0

> vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
10  4   7712  46420  27416 6449628    0    0   167   275   18  114 64
26  9  1  0
18  2   7704  47196  27580 6448252    4    0 22546  1146 1507 55693 68
26  3  4  0
20  2   7724  47616  27628 6444084    3    1 25419  1114 1424 58069 72
27  1  0  0
15  2   7840  47240  27852 6443056    0    0 22962  1145 2079 59501 71
27  1  1  0
17  3   7852  47400  28084 6442840    1    3 21262  1189 2038 58908 69
27  2  2  0
13  2   7864  47024  28220 6438784    0    2 21131  1030 1716 57518 69
30  1  0  0
18  0   7868  45948  28496 6442860    2    0 23282  1261 1479 57482 71
28  1  0  0
11  2   7904  45784  28708 6442748    0    1 25155  1239 1468 58439 72
27  1  0  0
13  2   7988  44616  28856 6443992    0    0 23411  1248 1435 58626 72
27  1  0  0
26  2   8024  44364  28848 6443120    0    0 22922  1229 1484 59022 71
27  1  0  0

Re: Huge number of INSERTs

From

Richard Huxton

Date:

18 November 2011, 08:53:25

On 18/11/11 12:30, Phoenix Kiula wrote:
> I've currently disabled any INSERT functions on my website...but even
> with disabled INSERTs and only SELECTs alive, I still see the "psql:
> FATAL:  sorry, too many clients already" message.

As Tomas has said, this is nothing to do with inserts and everything to
do with too many clients. Take the time to read through his reply. Fix
the number of clients before worrying about other details.

> Btw, I don't see any PG logs. What could be the problem? The config
> says that it should store it in the directory "pg_log", but this
> directory is empty.

You'll need to check the manuals for full details on how to configure
your logging - I'd expect a zero-length file even if you weren't logging
anything to it. Might be worth checking the directory is owned by user
"postgres" (or whoever your server runs as).

--
   Richard Huxton
   Archonet Ltd

Re: Huge number of INSERTs

From

Steve Crawford

Date:

18 November 2011, 13:30:15

On 11/18/2011 04:30 AM, Phoenix Kiula wrote:
> On Fri, Nov 18, 2011 at 9:25 AM, Steve Crawford
> <scrawford@pinpointresearch.com>  wrote:
>
>
>>> Database only? Or is it also your webserver?
>
> It's my webserver and DB. Webserver is nginx, proxying all PHP
> requests to apache in the backend.
You still didn't answer what "massive traffic" means.
>
>> What version of PostgreSQL? What OS? What OS tuning, if any, have you done?
>> (Have you increased readahead? Changed swappiness, turned off atime on your
>> mounts, made syslogging asynchronous, etc?). Does your RAID have
>> battery-backed cache? What are the cache settings?
>
> PG 9.0.5
>
> CentOS 5 64 bit
>
> OS tuning - lots of it since the beginning of time. What specifically
> would you like to know? Please let me know and I can share info. Like
> SHM Max and Min variables type of things?
>
> RAID has the 3Com battery backed cache, yes. Not reporting any errors.
3Com? Perhaps you mean 3Ware? And (again) what are the RAID cache
*settings*? In particular, the write-back/write-through setting.

>
>
>> What is the nature of the queries? Single record inserts or bulk? Same for
>> the selects. Have you run analyze on them and optimized the queries?
>
> Simple INSERTs. Into a table with 6 columns. Column 1 is a primary
> key, column 5 is a date. There are two indexes on this table, on the
> pkey (col1) and one on the date (col5).
>
> SELECTs are simple straight selects, based on pkey with limit 1. No
> joins, no sorting.
>
>
>
>> What is
>> the typical duration of your queries? Are lots of queries duplicated
>> (caching candidates)?
>
> The bulk of the big SELECTs are in "memcached". Much faster than PG.
>
> It's INSERTs I don't know what to do with. Memcached is not a good
> solution for INSERTs, which do need to go into a proper DB.
So most of your selects aren't hitting the database. Since we are
talking db tuning, it would have been nice to know how many queries are
hitting the database, not the number of requests hitting the webserver.
But the question was "what is the typical duration of the queries" -
specifically the queries hitting the database.
>
>
>> What is the size of your database? Do you have any
>> bandwidth bottleneck to the Internet?
>
> Full DB:   32GB
> The big table referenced above:  28 GB
>
> It's inserts into this one that are taking time.
Earlier you said you were doing 200 inserts/minute. Is that an average
throughout the day or is that at peak time. Peak load is really what is
of interest. 200 inserts/minute is not even 4/second.
>
>> Is this your database server only or is it running web and/or other
>> processes? How long does a typical web-request take to handle?
>
> How can I measure the time taken per web request? Nginx is super fast,
> based on apache bench. Apache -- how do I test it? Don't want to do
> fake inserts. With selects, apache bench uses memcached instead..
Look at your log. If it isn't set to record request time, set it to do
so. I set my Apache servers to log request time in microseconds.

>> At first blush, and shooting in the dark, I'll guess there are lots of
>> things you can do. Your shared_buffers seems a bit low - a rough starting
>> point would be closer to 25% of your available RAM.
>
> If I make shared_buffers too high, (2GB....25% of my 8GB RAM), this
> brings the server to its knees instantly. Probably because I have
> apache, nginx, memcached running on the same server. Nginx and
> memcached are negligible in terms of memory consumption.
Not total RAM, "*available* RAM" - that is the memory available after
loading the OS, Nginx, Apache, etc. Earlier you had a snapshot from
"top" which showed over 5G cached and swap basically unused which means
all your programs combined are using well under half your RAM and the
remaining RAM is acting as cache. But that output was from a
point-in-time. You would need to observe it over time and under load.
>> You are a prime candidate for using a connection pooler. I have had good
>> luck with pgbouncer but there are others.
>
> Will pgbouncer or pgpool help with INSERTs?
Only indirectly. As mentioned by myself and others, you have a real
problem with the number of simultaneous connections. A connection pooler
will allow you to have fewer database connections open and thus use
resources more efficiently. It will also reduce the overhead from
connections. In the simplest case, I've seen a 10x improvement in
database connection setup with pgbouncer. And a pooler can let you use
persistent connections and virtually eliminate the connection setup
overhead. You have to be careful, though. If you use the most aggressive
pooling settings, one web process can affect the operation of another.
In particular, things like "SET ... TO ..." statements or creation of
temporary tables will be associated with the backend connection to the
database. With aggressive settings, the same web request could have each
database statement handled by a different database backend. So start off
with conservative pool settings and advance only if required and have
studied the potential side-effects.

BTW, what things are competing for disk? Perhaps you could run iostat
for a few minutes at peak load. If you haven't turned off atime on your
mounting, every request will probably generate several write requests
just to update the access time for each file that gets read. If you are
logging your web requests and getting, say, 1200 requests/minute
(20/second) to syslog and it is set to synchronous writes which is often
the default then logging alone is triggering lots of fsync activity. If
you can live with losing a few log entries after a crash, switch logging
to asynchronous.

Also, are you using APC? It won't help PostgreSQL inserts directly but
by pre-compiling/caching the PHP code you will free up resources for
your other processes including PostgreSQL.

Cheers,
Steve

Re: Huge number of INSERTs

From

Phoenix Kiula

Date:

19 November 2011, 15:27:23

On Fri, Nov 18, 2011 at 10:41 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>
> That has nothing to do with the inserts, it means the number of connection
> requests exceeds the max_connections. You've set it to 350, and that seems
> too high - the processes are going to struggle for resources (CPU, I/O and
> memory) and the scheduling gets expensive too.
>
> A good starting point is usually 2*(number of cores + number of drives)
> which is 16 or 24 (not sure what a "dual server" is - probably dual CPU).
> You may increase that if the database more or less fits into memory (so
> less I/O is needed).

Ok, there's just too much conflicting info on the web.

If I reduce the max_connections to 16, how does this reflect on the
Apache MaxClients? There's a school of thought that recommends that
MaxClients in Apache should be the same as max_connection in PGSQL.
But 16 for MaxClients with a prefork MPM would be disastrous. No?

Anyway, even if I do try 16 as the number, what about these settings:

work_mem
shared_buffers
effective_cache_size

With nginx and apache, and some other sundries, I think about 4-5GB is
left for PGSQL. This is 9.0.5. And all other details - Centos 5 on 64
bit, SCSI disks with RAID 10, 3Ware RAID controller...etc.

Any help on settings appreciated.

Thanks!

Re: Huge number of INSERTs

From

"Tomas Vondra"

Date:

19 November 2011, 16:25:29

On 19 Listopad 2011, 20:27, Phoenix Kiula wrote:
> On Fri, Nov 18, 2011 at 10:41 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>
>> That has nothing to do with the inserts, it means the number of
>> connection
>> requests exceeds the max_connections. You've set it to 350, and that
>> seems
>> too high - the processes are going to struggle for resources (CPU, I/O
>> and
>> memory) and the scheduling gets expensive too.
>>
>> A good starting point is usually 2*(number of cores + number of drives)
>> which is 16 or 24 (not sure what a "dual server" is - probably dual
>> CPU).
>> You may increase that if the database more or less fits into memory (so
>> less I/O is needed).
>
>
>
> Ok, there's just too much conflicting info on the web.
>
> If I reduce the max_connections to 16, how does this reflect on the
> Apache MaxClients? There's a school of thought that recommends that
> MaxClients in Apache should be the same as max_connection in PGSQL.
> But 16 for MaxClients with a prefork MPM would be disastrous. No?

See, the max_connections is the maximum number of allowed connections. So
if there are 16 open connections and someone asks for another one, he'll
receive "too many clients". So decreasing the max_connections without
MaxClients would make your problem even worse.

I'm not sure about the Apache prefork worker - why are you using it
instead the threaded worker? Anyway as I asked before, do you have a proof
the current MaxClient value provides the best performance? It seems to me
you've just used some very high values in belief that it will give better
performance. Have you performed some stress test to verify the settings.
I'm not saying you should set MaxClients to 16, but 350 probably is too
high?

But if you can't set MaxClients to the same value as max_connections (or
actually a bit lower, because there are connections reserved for superuser
etc.), that's exactly the proof that you need a pooler - see pgbouncer.

>
> Anyway, even if I do try 16 as the number, what about these settings:
>
> work_mem
> shared_buffers
> effective_cache_size
>
> With nginx and apache, and some other sundries, I think about 4-5GB is
> left for PGSQL. This is 9.0.5. And all other details - Centos 5 on 64
> bit, SCSI disks with RAID 10, 3Ware RAID controller...etc.

The phrase "I think" suggests that you actually don't know how much memory
is available. Suggestions with this number of components are going to be a
bit unreliable. Can you post a few lines of "vmstat 5" and "free"?

The recommended shared_buffers size is usually 25% of RAM, that's about
1GB of RAM. I see you've originally set it to 256MB - have you checked the
cache hit ratio, i.e. how many requests were resolved using the cache?

SELECT datname, (100 * blks_hit) / (blks_hit + blks_read + 1) hit_ratio
  FROM pg_stat_database;

Again, this is a point where an application benchmark would really help.
What is the database size, anyway?

It's difficult to recommend a work_mem without deeper knowledge of your
queries and how much memory is available. Using less connections actually
allows you to grant more memory to each of them, i.e. setting higher
work_mem.

Say you have 1GB available, 20 connections - that's about 50MB per
connection. The work_mem is actually per node, so if a query sorts twice
it may allocate 2x work_mem etc. So a conservative estimate would be
work_mem=20MB or something, so that even if all the connections start
sorting at the same time you won't get OOM. But is that really enough or
too much for your queries? I have no idea.

I recommend to set a conservative work_mem value (e.g. 4MB), log slow
queries and check if they'd benefit from higher work_mem values.

Regarding the effective_cache_size - this is just a hint how much data
might be cached. What does "free" says about the cache size? I see you've
decreased the size from 1500MB to 1400MB on (Nov 11 2011). Why?

Tomas

Re: Huge number of INSERTs

From

Phoenix Kiula

Date:

19 November 2011, 18:35:12

On Sat, Nov 19, 2011 at 1:29 AM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:
> On 11/18/2011 04:30 AM, Phoenix Kiula wrote:
>>
>> On Fri, Nov 18, 2011 at 9:25 AM, Steve Crawford
>> <scrawford@pinpointresearch.com>  wrote:
>>
>>
>>>> Database only? Or is it also your webserver?
>>
>> It's my webserver and DB. Webserver is nginx, proxying all PHP
>> requests to apache in the backend.
>
> You still didn't answer what "massive traffic" means.


Thousands of website hits per minute. (At peak time)

Average is a few hundred per minute.






> 3Com? Perhaps you mean 3Ware? And (again) what are the RAID cache
> *settings*? In particular, the write-back/write-through setting.


Yes 3Ware.

RAID cache settings:


----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name : RAID10-A
RAID level : 10
Status of logical device : Optimal
Size : 1906678 MB
Stripe-unit size : 256 KB
Read-cache mode : Enabled
MaxIQ preferred cache setting : Disabled
MaxIQ cache setting : Disabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back) when protected by battery/ZMM
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
Power settings : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Group 0, Segment 0 : Present (0,0) 9QJ00FMB
Group 0, Segment 1 : Present (0,1) 9QJ1R3NW
Group 1, Segment 0 : Present (0,2) 9QJ00L58
Group 1, Segment 1 : Present (0,3) 9QJ01JJ5





> So most of your selects aren't hitting the database. Since we are talking db
> tuning, it would have been nice to know how many queries are hitting the
> database, not the number of requests hitting the webserver. But the question
> was "what is the typical duration of the queries" - specifically the queries
> hitting the database.


Right now single SELECTs with just that one "WHERE indexed_column =
'Value' LIMIT 1" type queries are taking 3.0 of CPU, and so on. Why
should these queries be taking so much time and resources?



> Earlier you said you were doing 200 inserts/minute. Is that an average
> throughout the day or is that at peak time. Peak load is really what is of
> interest. 200 inserts/minute is not even 4/second.


As above.



> Look at your log. If it isn't set to record request time, set it to do so. I
> set my Apache servers to log request time in microseconds.


Could you specify how precisely you have set up this log? Through
CustomLog? Thanks!


Thanks!

Re: Huge number of INSERTs

From

Tomas Vondra

Date:

20 November 2011, 00:36:04

Dne 19.11.2011 23:34, Phoenix Kiula napsal(a):
> On Sat, Nov 19, 2011 at 1:29 AM, Steve Crawford
> <scrawford@pinpointresearch.com> wrote:
>> You still didn't answer what "massive traffic" means.
>
>
> Thousands of website hits per minute. (At peak time)
>
> Average is a few hundred per minute.

This is pretty vague description of the workload, as we have no clue how
demanding the request processing is. It might be a simple script that
does almost nothing (in that case the thousands of hits is easy to
handle) but it might be rather expensive.

Anyway I personally see this as a rather unrelated to the problem we're
trying to help you with - setting the right number of connections and
maybe fixing some of the queries.

>> 3Com? Perhaps you mean 3Ware? And (again) what are the RAID cache
>> *settings*? In particular, the write-back/write-through setting.
>
>
> Yes 3Ware.
>
> RAID cache settings:
>
>
> ----------------------------------------------------------------------
> Logical device information
> ----------------------------------------------------------------------
> Logical device number 0
> Logical device name : RAID10-A
> RAID level : 10
> Status of logical device : Optimal
> Size : 1906678 MB
> Stripe-unit size : 256 KB
> Read-cache mode : Enabled
> MaxIQ preferred cache setting : Disabled
> MaxIQ cache setting : Disabled
> Write-cache mode : Enabled (write-back)
> Write-cache setting : Enabled (write-back) when protected by battery/ZMM
> Partitioned : Yes
> Protected by Hot-Spare : No
> Bootable : Yes
> Failed stripes : No
> Power settings : Disabled
> --------------------------------------------------------
> Logical device segment information
> --------------------------------------------------------
> Group 0, Segment 0 : Present (0,0) 9QJ00FMB
> Group 0, Segment 1 : Present (0,1) 9QJ1R3NW
> Group 1, Segment 0 : Present (0,2) 9QJ00L58
> Group 1, Segment 1 : Present (0,3) 9QJ01JJ5

So how much write cache is there and what 3Ware model is that? Because I
don't see this information there.

>> So most of your selects aren't hitting the database. Since we are talking db
>> tuning, it would have been nice to know how many queries are hitting the
>> database, not the number of requests hitting the webserver. But the question
>> was "what is the typical duration of the queries" - specifically the queries
>> hitting the database.
>
>
> Right now single SELECTs with just that one "WHERE indexed_column =
> 'Value' LIMIT 1" type queries are taking 3.0 of CPU, and so on. Why
> should these queries be taking so much time and resources?

3.0 of CPU? Is that seconds or what? Anyway post a more detailed
description of the tables (columns, data types, indexes) and EXPLAIN
ANALYZE of the queries (using explain.depesz.com).

>> Earlier you said you were doing 200 inserts/minute. Is that an average
>> throughout the day or is that at peak time. Peak load is really what is of
>> interest. 200 inserts/minute is not even 4/second.
>
> As above.

As above what? Please, be more specific, it's a bit difficult to know
which paragraph above you're refering to.

Again - post an EXPLAIN ANALYZE of the queries using explain.depesz.com.

>> Look at your log. If it isn't set to record request time, set it to do so. I
>> set my Apache servers to log request time in microseconds.
>
>
> Could you specify how precisely you have set up this log? Through
> CustomLog? Thanks!

log_line_prefix = '%t'

And you should probably add a few more fields (session ID, ...)



Tomas

Re: Huge number of INSERTs

From

Tomas Vondra

Date:

20 November 2011, 01:13:30

Dne 18.11.2011 13:30, Phoenix Kiula napsal(a):
> Full DB:   32GB
> The big table referenced above:  28 GB
>
> It's inserts into this one that are taking time.

Hm, in that case the shared_buffers is probably too low. It'd be nice to
have at least the indexes on the table in the buffers, and I guess
they're significantly over 256MB (your shared_buffers).

But regarding the "vmstat 5 10" output you've posted, you probably don't
issue with I/O as the iowait is 0 most of the time.

You do have a serious problem with CPU, though - most of the time, the
CPU is almost 100% utilized. Not sure which process is responsible for
this, but this might be the main problem problem.

I'm not saying adding a row to the table (and indexes) is extremely
expensive, but you do have an insane number of processes (350
connections, a lot of apache workers) and a lot of them are asking for
CPU time.

So once again: set the number of connections and workers to sane values,
considering your current hardware. Those numbers are actually a handy
throttle - you may increase the numbers until the CPU is reasonably
utilized (don't use 100%, leave a reasonable margin - I wouldn't go
higher than 90%).

Tomas

Re: Huge number of INSERTs

From

Phoenix Kiula

Date:

20 November 2011, 08:46:31

On Sun, Nov 20, 2011 at 2:11 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> Dne 18.11.2011 13:30, Phoenix Kiula napsal(a):
>> Full DB:   32GB
>> The big table referenced above:  28 GB
>>
>> It's inserts into this one that are taking time.
>
> Hm, in that case the shared_buffers is probably too low. It'd be nice to
> have at least the indexes on the table in the buffers, and I guess
> they're significantly over 256MB (your shared_buffers).
>
> But regarding the "vmstat 5 10" output you've posted, you probably don't
> issue with I/O as the iowait is 0 most of the time.
>
> You do have a serious problem with CPU, though - most of the time, the
> CPU is almost 100% utilized. Not sure which process is responsible for
> this, but this might be the main problem problem.
>
> I'm not saying adding a row to the table (and indexes) is extremely
> expensive, but you do have an insane number of processes (350
> connections, a lot of apache workers) and a lot of them are asking for
> CPU time.
>
> So once again: set the number of connections and workers to sane values,
> considering your current hardware. Those numbers are actually a handy
> throttle - you may increase the numbers until the CPU is reasonably
> utilized (don't use 100%, leave a reasonable margin - I wouldn't go
> higher than 90%).



Thanks Tomas. And others.

Some observations and questions from my ongoing saga.

I have disabled all ADDing of data (INSERT + UPDATE) and just allowed
SELECTs so far. Site is under maintenance.

For a moment there, I unleashed the valve and allowed the INSERT
functionality. The log was immediately flooded with this:


LOG:  duration: 6851.054 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 6848.266 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 6846.672 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 6853.451 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 6991.966 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 8244.315 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 6991.071 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 6990.043 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 6988.483 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 6986.793 ms  statement: select nextval('maintable_id_seq')
LOG:  duration: 6985.802 ms  statement: select nextval('maintable_id_seq')
...

I hope it's just because of too much load that even a simple query
such as this was taking so much time?

Other queries taking too much time are also indexed queries!

Anyway, right now, with that valve closed, and only SELECTs allowed,
here's the stats:



> vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 7  1   1352  47596  26412 6189960    3    3  5228   243   17   10 51
19 26  4  0
16  1   1352  45520  26440 6191656    0    0  1230  3819 1368 65722 68
31  1  0  0
 9  0   1352  61048  26464 6174688    0    0  1000  4290 1370 65545 67
32  1  0  0
27  1   1352  51908  26508 6183852    0    0  1332  3916 1381 65684 68
32  1  0  0
29  0   1352  48380  26536 6185764    0    0   977  3983 1368 65684 67
32  1  0  0
24  1   1352  46436  26576 6189080    0    0   220  4135 1373 65743 66
33  1  0  0
25  1   1352  46204  26616 6191452    0    0     0  3963 1348 66867 67
32  1  0  0
13  1   1352  57444  26692 6193220    0    0    24  4038 1436 66891 66
32  2  0  0
22  1   1352  51300  26832 6196736    0    0   439  5088 1418 66995 66
31  2  0  0
26  1   1352  51940  26872 6198384    0    0     0  3354 1385 67122 67
31  2  0  0




> iostat -d -x 5 3
Linux 2.6.18-238.9.1.el5 (host.MYDB.com)     11/20/2011

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda              86.34   151.41 392.90 92.36 41796.00  1949.66
90.15     1.59    3.27   0.40  19.65
sda1              0.00     0.00  0.00  0.00     0.01     0.00    22.38
    0.00    3.04   3.01   0.00
sda2              0.27     8.20  0.06  0.22     3.35    67.05   255.22
    0.01   34.36   3.02   0.08
sda3              1.02    13.83  3.29  3.65   165.35   139.75    43.96
    0.16   22.52   7.32   5.08
sda4              0.00     0.00  0.00  0.00     0.00     0.00     2.00
    0.00    0.00   0.00   0.00
sda5              0.57     3.63  0.64  0.72    26.52    34.72    45.16
    0.02   11.26   4.67   0.63
sda6              0.21     0.57  0.41  0.27    13.79     6.76    30.24
    0.02   24.31  16.51   1.12
sda7              0.24     5.36  0.11  0.44     1.92    46.32    86.94
    0.02   44.21   7.99   0.44
sda8              2.24     2.25  1.22  0.98    27.62    25.83    24.33
    0.06   27.61  18.20   4.00
sda9             81.79   117.57 387.18 86.08 41557.45  1629.24
91.25     1.30    2.75   0.39  18.30

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00   775.20  0.00 143.40     0.00  7348.80
51.25     0.04    0.30   0.16   2.28
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda3              0.00     8.20  0.00  1.60     0.00    78.40    49.00
    0.00    0.50   0.50   0.08
sda4              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda5              0.00     0.20  0.00  1.20     0.00    11.20     9.33
    0.00    0.00   0.00   0.00
sda6              0.00     0.80  0.00  0.60     0.00    11.20    18.67
    0.00    0.00   0.00   0.00
sda7              0.00     0.20  0.00  2.40     0.00    20.80     8.67
    0.00    0.50   0.25   0.06
sda8              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda9              0.00   765.80  0.00 137.60     0.00  7227.20
52.52     0.04    0.30   0.16   2.20

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.60   913.00  4.40 149.40   160.00  8499.20
56.30     0.07    0.46   0.25   3.88
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda3              0.60    10.00  3.20  1.60   145.60    92.80    49.67
    0.02    3.29   2.58   1.24
sda4              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda5              0.00     1.00  0.80  1.40    11.20    19.20    13.82
    0.01    3.09   3.00   0.66
sda6              0.00     0.80  0.00  0.60     0.00    11.20    18.67
    0.00    0.00   0.00   0.00
sda7              0.00     0.20  0.00  2.40     0.00    20.80     8.67
    0.00    0.67   0.33   0.08
sda8              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda9              0.00   901.00  0.40 143.40     3.20  8355.20
58.13     0.05    0.32   0.15   2.14

Re: Huge number of INSERTs

From

Tomas Vondra

Date:

20 November 2011, 20:43:37

Dne 20.11.2011 13:46, Phoenix Kiula napsal(a):
> On Sun, Nov 20, 2011 at 2:11 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

> For a moment there, I unleashed the valve and allowed the INSERT
> functionality. The log was immediately flooded with this:
>
>
> LOG:  duration: 6851.054 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 6848.266 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 6846.672 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 6853.451 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 6991.966 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 8244.315 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 6991.071 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 6990.043 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 6988.483 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 6986.793 ms  statement: select nextval('maintable_id_seq')
> LOG:  duration: 6985.802 ms  statement: select nextval('maintable_id_seq')
> ...
>
> I hope it's just because of too much load that even a simple query
> such as this was taking so much time?

Probably, unless you have system with infinite amount of CPU time.

According to the vmstat output you've posted, the CPU is 99% utilized
all the time.

I'm not sure about the I/O, because you haven't posted iostat output
with the INSERTs enabled. But from the information you've provided so
far I guess the main issue is the load and overall system overload.

Another sign of this is rather high number of processes waiting in the
queue. So once again - decrease the number of connections and apache
clients to a reasonable number.

> Other queries taking too much time are also indexed queries!

And? Indexes are not a magical fairy dust - when the system is as
overloaded as yours, even the least expensive operations are going to
take insane amount of time.

And it's rather difficult to help you with queries, unless you provide
us EXPLAIN ANALYZE output - I've already asked you for this twice.
Without that piece of information, we can't tell whether the queries are
slo because of bad query plan or because of the load.

Tomas