Thread: Need to tune for Heavy Write
Dear all, From the last few days, I researched a lot on Postgresql Performance Tuning due to slow speed of my server. My application selects data from mysql database about 100000 rows , process it & insert into postgres 2 tables by making about 45 connections. I set my postgresql parameters in postgresql.conf as below: ( OS : Ubuntu, RAM : 16 GB, Postgres : 8.4.2 ) max_connections = 80 shared_buffers = 2048MB work_mem = 32MB maintenance_work_mem = 512MB fsync=off full_page_writes=off synchronous_commit=off checkpoint_segments = 32 checkpoint_completion_target = 0.7 effective_cache_size = 4096MB After this I change my pg_xlog directory to a separate directory other than data directory by symlinking. By Application issue insert statements through postgresql connections only. Please let me know if I missing any other important configuration. Thanks
Hi Adarsh,
Have you set checkpoint_segments and checkpoint_completion_target the right way?
Tuning these parameters are a MUST if you want good write performance.
See this link for more information: http://www.postgresql.org/docs/current/static/runtime-config-wal.html
Cheers,
Dusan
Have you set checkpoint_segments and checkpoint_completion_target the right way?
Tuning these parameters are a MUST if you want good write performance.
See this link for more information: http://www.postgresql.org/docs/current/static/runtime-config-wal.html
Cheers,
Dusan
On Thu, Aug 4, 2011 at 6:56 AM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
Dear all,
From the last few days, I researched a lot on Postgresql Performance Tuning due to slow speed of my server.
My application selects data from mysql database about 100000 rows , process it & insert into postgres 2 tables by making about 45 connections.
I set my postgresql parameters in postgresql.conf as below: ( OS : Ubuntu, RAM : 16 GB, Postgres : 8.4.2 )
max_connections = 80
shared_buffers = 2048MB
work_mem = 32MB
maintenance_work_mem = 512MB
fsync=off full_page_writes=off synchronous_commit=off checkpoint_segments = 32
checkpoint_completion_target = 0.7 effective_cache_size = 4096MB
After this I change my pg_xlog directory to a separate directory other than data directory by symlinking.
By Application issue insert statements through postgresql connections only.
Please let me know if I missing any other important configuration.
Thanks
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Thu, Aug 4, 2011 at 6:56 AM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote: > After this I change my pg_xlog directory to a separate directory other than > data directory by symlinking. >(...) > Please let me know if I missing any other important configuration. Moving the pg_xlog to a different directory only helps when that directory is on a different harddisk (or whatever I/O device). HTH, WBL -- "Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw
To put it simple, you need to set checkpoint_segments way higher than your current value!
Link: wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
On Aug 4, 2011 6:57 AM, "Adarsh Sharma" <adarsh.sharma@orkash.com> wrote:
On Thu, Aug 4, 2011 at 2:34 AM, Willy-Bas Loos <willybas@gmail.com> wrote: > On Thu, Aug 4, 2011 at 6:56 AM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote: >> After this I change my pg_xlog directory to a separate directory other than >> data directory by symlinking. >>(...) >> Please let me know if I missing any other important configuration. > > Moving the pg_xlog to a different directory only helps when that > directory is on a different harddisk (or whatever I/O device). Not entirely true. By simply being on a different mounted file system this moves the fsync calls on the pg_xlog directories off of the same file system as the main data store. Previous testing has shown improvements in performance from just using a different file system. That said, the only real solution to a heavy write load is a heavy duty IO subsystem, with lots of drives and battery backed cache.
Scott is right. His answer solves the problem in the long run. Even if your write load increases, it will perform fast enough.
For now try increasing checkpoint_segments size, restart Postgres for new settings to take effect and try again with your write load.
If you are not satisfied with write speed, then it is time to upgrade your storage system / aka to increase I/O performance.
On Aug 4, 2011 10:46 AM, "Scott Marlowe" <scott.marlowe@gmail.com> wrote:
> On Thu, Aug 4, 2011 at 2:34 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
>> On Thu, Aug 4, 2011 at 6:56 AM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
>>> After this I change my pg_xlog directory to a separate directory other than
>>> data directory by symlinking.
>>>(...)
>>> Please let me know if I missing any other important configuration.
>>
>> Moving the pg_xlog to a different directory only helps when that
>> directory is on a different harddisk (or whatever I/O device).
>
> Not entirely true. By simply being on a different mounted file
> system this moves the fsync calls on the pg_xlog directories off of
> the same file system as the main data store. Previous testing has
> shown improvements in performance from just using a different file
> system.
>
> That said, the only real solution to a heavy write load is a heavy
> duty IO subsystem, with lots of drives and battery backed cache.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> On Thu, Aug 4, 2011 at 2:34 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
>> On Thu, Aug 4, 2011 at 6:56 AM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
>>> After this I change my pg_xlog directory to a separate directory other than
>>> data directory by symlinking.
>>>(...)
>>> Please let me know if I missing any other important configuration.
>>
>> Moving the pg_xlog to a different directory only helps when that
>> directory is on a different harddisk (or whatever I/O device).
>
> Not entirely true. By simply being on a different mounted file
> system this moves the fsync calls on the pg_xlog directories off of
> the same file system as the main data store. Previous testing has
> shown improvements in performance from just using a different file
> system.
>
> That said, the only real solution to a heavy write load is a heavy
> duty IO subsystem, with lots of drives and battery backed cache.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> -----Original Message----- > From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance- > owner@postgresql.org] On Behalf Of Scott Marlowe > Sent: Thursday, August 04, 2011 4:46 AM > To: Willy-Bas Loos > Cc: Adarsh Sharma; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Need to tune for Heavy Write > > > > Moving the pg_xlog to a different directory only helps when that > > directory is on a different harddisk (or whatever I/O device). > > Not entirely true. By simply being on a different mounted file > system this moves the fsync calls on the pg_xlog directories off of > the same file system as the main data store. Previous testing has > shown improvements in performance from just using a different file > system. > Is this still the case for xfs or ext4 where fsync is properly flushing only the correct blocks to disk, or was this referringto the good old ext3 flush everything on fysnc issue? Brad.
On 4/08/2011 12:56 PM, Adarsh Sharma wrote: > Dear all, > > From the last few days, I researched a lot on Postgresql Performance > Tuning due to slow speed of my server. > My application selects data from mysql database about 100000 rows , > process it & insert into postgres 2 tables by making about 45 connections. Why 45? Depending on your disk subsystem, that may be way too many for optimum throughput. Or too few, for that matter. Also, how are you doing your inserts? Are they being done in a single big transaction per connection, or at least in resonable chunks? If you're doing stand-alone INSERTs autocommit-style you'll see pretty shoddy performance. Have you looked into using COPY to bulk load your data? Possibly using the libpq or jdbc copy APIs, or possibly using server-side COPY? > fsync=off full_page_writes=off synchronous_commit=off !!!! I hope you don't want to KEEP that data if you have a hardware fault or power loss. Setting fsync=off is pretty much saying "I don't mind if you eat my data". Keep. Really. Really. Good. Backups. -- Craig Ringer
Adarsh Sharma <adarsh.sharma@orkash.com> wrote: > Postgres : 8.4.2 You should definitely update to a more recent bug patch level: http://www.postgresql.org/support/versioning > RAM : 16 GB > effective_cache_size = 4096MB That should probably be more like 12GB to 15GB. It probably won't affect the load time here, but could affect other queries. > My application selects data from mysql database about 100000 > rows process it & insert into postgres 2 tables by making about 45 > connections. How many cores do you have? How many disk spindles in what sort of array with what sort of controller. Quite possibly you can improve performance dramatically by not turning loose a "thundering herd" of competing processes. Can you load the target table without indexes and then build the indexes? Can you use the COPY command (or at least prepared statements) for the inserts to minimize parse/plan time? An important setting you're missing is: wal_buffers = 16MB -Kevin
On Thu, Aug 4, 2011 at 7:57 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> RAM : 16 GB > >> effective_cache_size = 4096MB > > That should probably be more like 12GB to 15GB. It probably won't > affect the load time here, but could affect other queries. Actually on a heavily written database a large effective cache size makes things slower.
On Thu, Aug 4, 2011 at 6:41 AM, Nicholson, Brad (Toronto, ON, CA) <bnicholson@hp.com> wrote: >> -----Original Message----- >> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance- >> owner@postgresql.org] On Behalf Of Scott Marlowe >> Sent: Thursday, August 04, 2011 4:46 AM >> To: Willy-Bas Loos >> Cc: Adarsh Sharma; pgsql-performance@postgresql.org >> Subject: Re: [PERFORM] Need to tune for Heavy Write >> >> >> > Moving the pg_xlog to a different directory only helps when that >> > directory is on a different harddisk (or whatever I/O device). >> >> Not entirely true. By simply being on a different mounted file >> system this moves the fsync calls on the pg_xlog directories off of >> the same file system as the main data store. Previous testing has >> shown improvements in performance from just using a different file >> system. >> > > Is this still the case for xfs or ext4 where fsync is properly flushing only the correct blocks to disk, or was this referringto the good old ext3 flush everything on fysnc issue? Good question. One I do not know the answer to. Since I run my dbs with separate pg_xlog drive sets I've never been able to test that.
On Wed, Aug 3, 2011 at 9:56 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
Dear all,
From the last few days, I researched a lot on Postgresql Performance Tuning due to slow speed of my server.
My application selects data from mysql database about 100000 rows , process it & insert into postgres 2 tables by making about 45 connections.
It's already been mentioned, but is worth reinforcing, that if you are inserting 100,000 rows in 100,000 transactions, you'll see a huge performance improvement by doing many more inserts per transaction. Try doing at least 500 inserts in each transaction (though you can possibly go quite a bit higher than that without any issues, depending upon what other traffic the database is handling in parallel). You almost certainly don't need 45 connections in order to insert only 100,000 rows. I've got a crappy VM with 2GB of RAM in which inserting 100,000 relatively narrow rows requires less than 10 seconds if I do it in a single transaction on a single connection. Probably much less than 10 seconds, but the code I just tested with does other work while doing the inserts, so I don't have a pure test at hand.
On 05/08/11 05:40, Samuel Gendler wrote:
Also worth mentioning is doing those 500 inserts in *fewer* than 500 INSERT operations is likely to be a huge improvement, e.g:
INSERT INTO table VALUES (....),(....);
instead of
INSERT INTO table VALUES (....);
INSERT INTO table VALUES (....);
I'd be tempted to do all 500 row insertions in one INSERT statement as above. You might find that 1 connection doing this is fast enough (it is only doing 200 actual INSERT calls in that case to put in 100000 rows).
regards
Mark
On Wed, Aug 3, 2011 at 9:56 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:Dear all,
From the last few days, I researched a lot on Postgresql Performance Tuning due to slow speed of my server.
My application selects data from mysql database about 100000 rows , process it & insert into postgres 2 tables by making about 45 connections.It's already been mentioned, but is worth reinforcing, that if you are inserting 100,000 rows in 100,000 transactions, you'll see a huge performance improvement by doing many more inserts per transaction. Try doing at least 500 inserts in each transaction (though you can possibly go quite a bit higher than that without any issues, depending upon what other traffic the database is handling in parallel). You almost certainly don't need 45 connections in order to insert only 100,000 rows. I've got a crappy VM with 2GB of RAM in which inserting 100,000 relatively narrow rows requires less than 10 seconds if I do it in a single transaction on a single connection. Probably much less than 10 seconds, but the code I just tested with does other work while doing the inserts, so I don't have a pure test at hand.
Also worth mentioning is doing those 500 inserts in *fewer* than 500 INSERT operations is likely to be a huge improvement, e.g:
INSERT INTO table VALUES (....),(....);
instead of
INSERT INTO table VALUES (....);
INSERT INTO table VALUES (....);
I'd be tempted to do all 500 row insertions in one INSERT statement as above. You might find that 1 connection doing this is fast enough (it is only doing 200 actual INSERT calls in that case to put in 100000 rows).
regards
Mark
On Aug 4, 2011, at 10:07 AM, Scott Marlowe wrote: > On Thu, Aug 4, 2011 at 7:57 AM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >>> RAM : 16 GB >> >>> effective_cache_size = 4096MB >> >> That should probably be more like 12GB to 15GB. It probably won't >> affect the load time here, but could affect other queries. > > Actually on a heavily written database a large effective cache size > makes things slower. effective_cache_size or shared_buffers? I can see why a large shared_buffers could cause problems, but what effect does effective_cache_sizehave on a write workload? -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net