Re: Huge number of INSERTs - Mailing list pgsql-general

From Steve Crawford
Subject Re: Huge number of INSERTs
Date
Msg-id 4EC6960F.2030501@pinpointresearch.com
Whole thread Raw
In response to Re: Huge number of INSERTs  (Phoenix Kiula <phoenix.kiula@gmail.com>)
Responses Re: Huge number of INSERTs
List pgsql-general
On 11/18/2011 04:30 AM, Phoenix Kiula wrote:
> On Fri, Nov 18, 2011 at 9:25 AM, Steve Crawford
> <scrawford@pinpointresearch.com>  wrote:
>
>
>>> Database only? Or is it also your webserver?
>
> It's my webserver and DB. Webserver is nginx, proxying all PHP
> requests to apache in the backend.
You still didn't answer what "massive traffic" means.
>
>> What version of PostgreSQL? What OS? What OS tuning, if any, have you done?
>> (Have you increased readahead? Changed swappiness, turned off atime on your
>> mounts, made syslogging asynchronous, etc?). Does your RAID have
>> battery-backed cache? What are the cache settings?
>
> PG 9.0.5
>
> CentOS 5 64 bit
>
> OS tuning - lots of it since the beginning of time. What specifically
> would you like to know? Please let me know and I can share info. Like
> SHM Max and Min variables type of things?
>
> RAID has the 3Com battery backed cache, yes. Not reporting any errors.
3Com? Perhaps you mean 3Ware? And (again) what are the RAID cache
*settings*? In particular, the write-back/write-through setting.

>
>
>> What is the nature of the queries? Single record inserts or bulk? Same for
>> the selects. Have you run analyze on them and optimized the queries?
>
> Simple INSERTs. Into a table with 6 columns. Column 1 is a primary
> key, column 5 is a date. There are two indexes on this table, on the
> pkey (col1) and one on the date (col5).
>
> SELECTs are simple straight selects, based on pkey with limit 1. No
> joins, no sorting.
>
>
>
>> What is
>> the typical duration of your queries? Are lots of queries duplicated
>> (caching candidates)?
>
> The bulk of the big SELECTs are in "memcached". Much faster than PG.
>
> It's INSERTs I don't know what to do with. Memcached is not a good
> solution for INSERTs, which do need to go into a proper DB.
So most of your selects aren't hitting the database. Since we are
talking db tuning, it would have been nice to know how many queries are
hitting the database, not the number of requests hitting the webserver.
But the question was "what is the typical duration of the queries" -
specifically the queries hitting the database.
>
>
>> What is the size of your database? Do you have any
>> bandwidth bottleneck to the Internet?
>
> Full DB:   32GB
> The big table referenced above:  28 GB
>
> It's inserts into this one that are taking time.
Earlier you said you were doing 200 inserts/minute. Is that an average
throughout the day or is that at peak time. Peak load is really what is
of interest. 200 inserts/minute is not even 4/second.
>
>> Is this your database server only or is it running web and/or other
>> processes? How long does a typical web-request take to handle?
>
> How can I measure the time taken per web request? Nginx is super fast,
> based on apache bench. Apache -- how do I test it? Don't want to do
> fake inserts. With selects, apache bench uses memcached instead..
Look at your log. If it isn't set to record request time, set it to do
so. I set my Apache servers to log request time in microseconds.

>> At first blush, and shooting in the dark, I'll guess there are lots of
>> things you can do. Your shared_buffers seems a bit low - a rough starting
>> point would be closer to 25% of your available RAM.
>
> If I make shared_buffers too high, (2GB....25% of my 8GB RAM), this
> brings the server to its knees instantly. Probably because I have
> apache, nginx, memcached running on the same server. Nginx and
> memcached are negligible in terms of memory consumption.
Not total RAM, "*available* RAM" - that is the memory available after
loading the OS, Nginx, Apache, etc. Earlier you had a snapshot from
"top" which showed over 5G cached and swap basically unused which means
all your programs combined are using well under half your RAM and the
remaining RAM is acting as cache. But that output was from a
point-in-time. You would need to observe it over time and under load.
>> You are a prime candidate for using a connection pooler. I have had good
>> luck with pgbouncer but there are others.
>
> Will pgbouncer or pgpool help with INSERTs?
Only indirectly. As mentioned by myself and others, you have a real
problem with the number of simultaneous connections. A connection pooler
will allow you to have fewer database connections open and thus use
resources more efficiently. It will also reduce the overhead from
connections. In the simplest case, I've seen a 10x improvement in
database connection setup with pgbouncer. And a pooler can let you use
persistent connections and virtually eliminate the connection setup
overhead. You have to be careful, though. If you use the most aggressive
pooling settings, one web process can affect the operation of another.
In particular, things like "SET ... TO ..." statements or creation of
temporary tables will be associated with the backend connection to the
database. With aggressive settings, the same web request could have each
database statement handled by a different database backend. So start off
with conservative pool settings and advance only if required and have
studied the potential side-effects.

BTW, what things are competing for disk? Perhaps you could run iostat
for a few minutes at peak load. If you haven't turned off atime on your
mounting, every request will probably generate several write requests
just to update the access time for each file that gets read. If you are
logging your web requests and getting, say, 1200 requests/minute
(20/second) to syslog and it is set to synchronous writes which is often
the default then logging alone is triggering lots of fsync activity. If
you can live with losing a few log entries after a crash, switch logging
to asynchronous.

Also, are you using APC? It won't help PostgreSQL inserts directly but
by pre-compiling/caching the PHP code you will free up resources for
your other processes including PostgreSQL.

Cheers,
Steve


pgsql-general by date:

Previous
From: "Good Day Books"
Date:
Subject: Re: Result of ORDER-BY
Next
From: jonesd@xmission.com
Date:
Subject: Upgrading from 8.3.14 to 8.3.16 on Windows