Thread: Optimize update query
Hi, i have these update queries, that run very often, and takes too long time, in order for us to reach the throughput weare aiming at. However, the update query is very simple, and I can't figure out any way to improve the situation. The querylooks like this: UPDATE "adverts" SET "last_observed_at" = '2012-11-28 00:02:30.265154', "data_source_id" ='83d024a57bc2958940f3ca281bddcbf4'WHERE"adverts"."id" IN ( 1602382, 4916432, 3221246, 4741057, 3853335, 571429, 3222740,571736, 3544903, 325378,5774338, 5921451, 4295768, 3223170, 5687001, 4741966, 325519, 580867, 325721, 4412200, 4139598,325567, 1616653,1616664, 6202007, 3223748, 325613, 3223764, 325615, 4296536, 3854595, 4971428, 3224146, 5150522,4412617, 5073048,325747, 325771, 1622154, 5794384, 5736581, 1623767, 5686945, 3224627, 5073009, 3224747, 3224749,325809, 5687051,3224811, 5687052, 4917824, 5073013, 3224816, 3224834, 4297331, 1623907, 325864, 1623947, 6169706,325869, 325877,3225074, 3225112, 325893, 325912, 3225151, 3225184, 3225175, 1624659, 325901, 4033926, 325904, 325911,4412835,1624737, 5073004, 5921434, 325915, 3225285, 3225452, 4917672, 1624984, 3225472, 325940, 5380611, 325957, 5073258,3225500,1625002, 5923489, 4413009, 325952, 3961122, 3637777 ) ; An explain outputs me the following: "Update on adverts (cost=0.12..734.27 rows=95 width=168)" " -> Index Scan using adverts_pkey on adverts (cost=0.12..734.27 rows=95 width=168)" " Index Cond: (id = ANY ('{1602382,4916432,3221246,4741057,3853335,571429,3222740,571736,3544903,325378,5774338,5921451,4295768,3223170,5687001,4741966,325519,580867,325721,4412200,4139598,325567,1616653,1616664,6202007,3223748,325613,3223764,325615,4296536,3854595,4971428,3224146,5150522,4412617,5073048,325747,325771,1622154,5794384,5736581,1623767,5686945,3224627,5073009,3224747,3224749,325809,5687051,3224811,5687052,4917824,5073013,3224816,3224834,4297331,1623907,325864,1623947,6169706,325869,325877,3225074,3225112,325893,325912,3225151,3225184,3225175,1624659,325901,4033926,325904,325911,4412835,1624737,5073004,5921434,325915,3225285,3225452,4917672,1624984,3225472,325940,5380611,325957,5073258,3225500,1625002,5923489,4413009,325952,3961122,3637777}'::integer[]))" So as you can see, it's already pretty optimized, it's just not enough :-) So what can I do? the two columns last_observed_atand data_source_id has an index, and it is needed elsewhere, so I can't delete those. PS. I'm on postgres 9.2 on a server with 32gb ram, 8 cores and two 3T disks in a software raid 1 setup. Is the only way out of this really a SSD disk?
On 11/28/2012 06:57 AM, Niels Kristian Schjødt wrote: Before I go crazy, here... you really need to tell us what "not enough" means. You didn't provide an explain analyze, so we don't know what your actual performance is. But I have my suspicions. > So as you can see, it's already pretty optimized, it's just not > enough :-) So what can I do? the two columns last_observed_at and > data_source_id has an index, and it is needed elsewhere, so I can't > delete those. Ok, so part of your problem is that you're tying an advertising system directly to the database for direct updates. That's a big no-no. Any time you got a huge influx of views, there would be a logjam. You need to decouple this so you can use a second tool to load the database in larger batches. You'll get much higher throughput this way. If you absolutely must use this approach, you're going to have to beef up your hardware. > PS. I'm on postgres 9.2 on a server with 32gb ram, 8 cores and two 3T > disks in a software raid 1 setup. This is not sufficient for a high-bandwidth stream of updates. Not even close. Even if those 3T disks are 7200 RPM, and even in RAID-1, you're going to have major problems with concurrent reads and writes. You need to do several things: 1. Move your transaction logs (pg_xlog) to another pair of disks entirely. Do not put these on the same disks as your data if you need high write throughput. 2. Get a better disk architecture. You need 10k, or 15k RPM disks. Starting with 6 or more of them in a RAID-10 would be a good beginning. You never told us your postgresql.conf settings, so I'm just going with very generic advice. Essentially, you're expecting too much for too little. That machine would have been low-spec three years ago, and unsuited to database use simply due to the 2-disk RAID. > Is the only way out of this really a SSD disk? No. There are many, many steps you can and should take before going this route. You need to know the problem you're solving before making potentially expensive hardware decisions. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
W dniu 28.11.2012 15:07, Shaun Thomas pisze: > On 11/28/2012 06:57 AM, Niels Kristian Schjødt wrote: > > Before I go crazy, here... you really need to tell us what "not enough" > means. You didn't provide an explain analyze, so we don't know what your > actual performance is. But I have my suspicions. > >> So as you can see, it's already pretty optimized, it's just not >> enough :-) So what can I do? the two columns last_observed_at and >> data_source_id has an index, and it is needed elsewhere, so I can't >> delete those. > > Ok, so part of your problem is that you're tying an advertising system > directly to the database for direct updates. That's a big no-no. Any > time you got a huge influx of views, there would be a logjam. You need > to decouple this so you can use a second tool to load the database in > larger batches. You'll get much higher throughput this way. +1, sql databases has limited number of inserts/updates per second. Even with highend hardware you won't have more than XXX operations per second. As Thomas said, you should feed something like nosql database from www server and use other tool to do aggregation and batch inserts to postgresql. It will scale much better. Marcin
Niels Kristian Schjødt wrote: > PS. I'm on postgres 9.2 on a server with 32gb ram, 8 cores and > two 3T disks in a software raid 1 setup. In addtion to the excellent advice from Shaun, I would like to point out a few other things. One query runs on one core. In a test of a single query, the other seven cores aren't doing anything. Be sure to pay attention to how a representative workload is handled. Unless you have tuned your postgresql.conf settings, you probably aren't taking very good advantage of that RAM. For heavy load you need lots of spindles and a good RAID controller with battery-backed cache configured for write-back. You will probably benefit from reading this page: http://wiki.postgresql.org/wiki/SlowQueryQuestions If you don't already have it, you will probably find Greg Smith's book on PostgreSQL performance a great investment: http://www.postgresql.org/docs/books/ -Kevin
I assume that SQL databases ( Banks? Telecom?) can handle an used car shop. No need for an unstructured data tool.
> +1, sql databases has limited number of inserts/updates per second. Even
> with highend hardware you won't have more than XXX operations per
> second. As Thomas said, you should feed something like nosql database
> from www server and use other tool to do aggregation and batch inserts
> to postgresql. It will scale much better.
>
> Marcin
Okay guys,
Thanks for all the great help and advice already! Let me just clear some things, to make my question a little easier to answer :-)
Now my site is a search engine for used cars - not just a car shop with a few hundred cars.
The update query you look at, is an update that is executed once a day in chunks for all active adverts, so we know they are still for sale (one car can be advertised at several places hence several "adverts"). So it's not a "constant stream" but it has a fairly high volume especially at night time though.
A compressed version of my .conf looks like this (note: there is some tweaks at the end of the file)
data_directory = '/var/lib/postgresql/9.2/main'
hba_file = '/etc/postgresql/9.2/main/pg_hba.conf'
ident_file = '/etc/postgresql/9.2/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/9.2-main.pid'
listen_addresses = '192.168.0.2, localhost'
port = 5432
max_connections = 1000
unix_socket_directory = '/var/run/postgresql'
wal_level = hot_standby
synchronous_commit = off
archive_mode = onarchive_command = 'rsync -a %p postgres@192.168.0.4:/var/lib/postgresql/9.2/wals/%f </dev/null'
max_wal_senders = 1
wal_keep_segments = 32
logging_collector = on
log_min_messages = debug1
log_min_error_statement = debug1
log_min_duration_statement = 0
log_checkpoints = on
log_connections = on
log_disconnections = onlog_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
log_lock_waits = on log_temp_files = 0
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
default_statistics_target = 100
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
effective_cache_size = 22GB
work_mem = 160MB
wal_buffers = 4MB
checkpoint_segments = 16
shared_buffers = 7680MB
# All the log stuff is mainly temporary requirement for pgBadger
# The database has been tuned with pgtuner
You might be familiar with new relic, and I use that for quite a lot of monitoring. So, this is what I see at night time (a lot of I/O). So I went to play around with pgBadger to get some insights at database level.
<iframe src="https://rpm.newrelic.com/public/charts/h2dtedghfsv" width="500" height="300" scrolling="no" frameborder="no"></iframe>
This shows me, that the by far most time-consuming queries are updates (in general). On avg. a query like the one I showed you, take 1,3 sec (but often it takes several minutes - which makes me wonder). So correct me if I'm wrong here: my theory is, that I have too many too slow update queries, that then often end up in a situation, where they "wait" for each other to finish, hence the sometimes VERY long execution times. So my basic idea here is, that if I could reduce the cost of the updates, then I could get a hight throughput overall.
Here is a sample of the pgBadger analysis:
Queries that took up the most time (N) ^
Rank Total duration Times executed Av. duration (s) Query
1 1d15h28m38.71s
948,711
0.15s
COMMIT;
2 1d2h17m55.43s
401,002
0.24s
INSERT INTO "car_images" ( "car_id", "created_at", "image", "updated_at" ) VALUES ( '', '', '', '' ) returning "id";
3 23h18m33.68s
195,093
0.43s
SELECT DISTINCT "cars".id FROM "cars" LEFT OUTER JOIN "adverts" ON "adverts"."car_id" = "cars"."id" LEFT OUTERJOIN "sellers" ON "sellers"."id" = "adverts"."seller_id" WHERE "cars"."sales_state" = '' AND "cars"."year" = 0 AND"cars"."engine_size" = 0.0 AND ( ( "cars"."id" IS NOT NULL AND "cars"."brand" = '' AND "cars"."model_name" = ''AND "cars"."fuel" = '' AND "cars"."km" = 0 AND "cars"."price" = 0 AND "sellers"."kind" = '' ) ) LIMIT 0;
4 22h45m26.52s
3,374,133
0.02s
SELECT "adverts".* FROM "adverts" WHERE ( source_name = '' AND md5 ( url ) = md5 ( '' ) ) LIMIT 0;
5 10h31m37.18s
29,671
1.28s
UPDATE "adverts" SET "last_observed_at" = '', "data_source_id" = '' WHERE "adverts"."id" IN ( ... ) ;
6 7h18m40.65s
396,393
0.07s
UPDATE "cars" SET "updated_at" = '' WHERE "cars"."id" = 0;
7 7h6m7.87s
241,294
0.11s
UPDATE "cars" SET "images_count" = COALESCE ( "images_count", 0 ) + 0 WHERE "cars"."id" = 0;
8 6h56m11.78s
84,571
0.30s
INSERT INTO "failed_adverts" ( "active_record_object_class", "advert_candidate", "created_at", "exception_class","exception_message", "from_rescraper", "last_retried_at", "retry_count", "source_name", "stack_trace","updated_at", "url" ) VALUES ( NULL, '', '', '', '', NULL, NULL, '', '', '', '', '' ) returning "id";
9 5h47m25.45s
188,402
0.11s
INSERT INTO "adverts" ( "availability_state", "car_id", "created_at", "data_source_id", "deactivated_at","first_extraction", "last_observed_at", "price", "seller_id", "source_id", "source_name", "updated_at", "url" )VALUES ( '', '', '', '', NULL, '', '', '', '', '', '', '', '' ) returning "id";
10 3h4m26.86s
166,235
0.07s
UPDATE "adverts" SET "deactivated_at" = '', "availability_state" = '', "updated_at" = '' WHERE "adverts"."id" = 0;
Rank Total duration Times executed Av. duration (s) Query
1 1d15h28m38.71s
948,711
0.15s
COMMIT;
2 1d2h17m55.43s
401,002
0.24s
INSERT INTO "car_images" ( "car_id", "created_at", "image", "updated_at" ) VALUES ( '', '', '', '' ) returning "id";
3 23h18m33.68s
195,093
0.43s
SELECT DISTINCT "cars".id FROM "cars" LEFT OUTER JOIN "adverts" ON "adverts"."car_id" = "cars"."id" LEFT OUTERJOIN "sellers" ON "sellers"."id" = "adverts"."seller_id" WHERE "cars"."sales_state" = '' AND "cars"."year" = 0 AND"cars"."engine_size" = 0.0 AND ( ( "cars"."id" IS NOT NULL AND "cars"."brand" = '' AND "cars"."model_name" = ''AND "cars"."fuel" = '' AND "cars"."km" = 0 AND "cars"."price" = 0 AND "sellers"."kind" = '' ) ) LIMIT 0;
4 22h45m26.52s
3,374,133
0.02s
SELECT "adverts".* FROM "adverts" WHERE ( source_name = '' AND md5 ( url ) = md5 ( '' ) ) LIMIT 0;
5 10h31m37.18s
29,671
1.28s
UPDATE "adverts" SET "last_observed_at" = '', "data_source_id" = '' WHERE "adverts"."id" IN ( ... ) ;
6 7h18m40.65s
396,393
0.07s
UPDATE "cars" SET "updated_at" = '' WHERE "cars"."id" = 0;
7 7h6m7.87s
241,294
0.11s
UPDATE "cars" SET "images_count" = COALESCE ( "images_count", 0 ) + 0 WHERE "cars"."id" = 0;
8 6h56m11.78s
84,571
0.30s
INSERT INTO "failed_adverts" ( "active_record_object_class", "advert_candidate", "created_at", "exception_class","exception_message", "from_rescraper", "last_retried_at", "retry_count", "source_name", "stack_trace","updated_at", "url" ) VALUES ( NULL, '', '', '', '', NULL, NULL, '', '', '', '', '' ) returning "id";
9 5h47m25.45s
188,402
0.11s
INSERT INTO "adverts" ( "availability_state", "car_id", "created_at", "data_source_id", "deactivated_at","first_extraction", "last_observed_at", "price", "seller_id", "source_id", "source_name", "updated_at", "url" )VALUES ( '', '', '', '', NULL, '', '', '', '', '', '', '', '' ) returning "id";
10 3h4m26.86s
166,235
0.07s
UPDATE "adverts" SET "deactivated_at" = '', "availability_state" = '', "updated_at" = '' WHERE "adverts"."id" = 0;
(Yes I'm already on the task of improving the selects)
Den 28/11/2012 kl. 16.11 skrev Willem Leenen <willem_leenen@hotmail.com>:
I assume that SQL databases ( Banks? Telecom?) can handle an used car shop. No need for an unstructured data tool.
> +1, sql databases has limited number of inserts/updates per second. Even
> with highend hardware you won't have more than XXX operations per
> second. As Thomas said, you should feed something like nosql database
> from www server and use other tool to do aggregation and batch inserts
> to postgresql. It will scale much better.
>
> Marcin
max_connections = 1000 looks bad... why not a pooler in place? Cheers Bèrto On 28 November 2012 16:19, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote: > max_connections = 1000 -- ============================== If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music.
On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote: > https://rpm.newrelic.com/public/charts/h2dtedghfsv Doesn't this answer your question? That iowait is crushing your server into the ground. It's no surprise updates are taking several seconds. That update you sent us *should* execute on the order of only a few milliseconds. So I'll reiterate that you *must* move your pg_xlog location elsewhere. You've got row lookup bandwidth conflicting with writes. There are a couple other changes you should probably make to your config: > checkpoint_segments = 16 This is not enough for the workload you describe. Every time the database checkpoints, all of those changes in pg_xlog are applied to the backend data files. You should set these values: checkpoint_segments = 100 checkpoint_timeout = 10m checkpoint_completion_target = 0.9 This will reduce your overall write workload, and make it less active. Too many checkpoints massively reduce write throughput. With the settings you have, it's probably checkpointing constantly while your load runs. Start with this, but experiment with increasing checkpoint_segments further. If you check your logs now, you probably see a ton of "checkpoint starting: xlog" in there. That's very bad. It should say "checkpoint starting: time" meaning it's keeping up with your writes naturally. > work_mem = 160MB This is probably way too high. work_mem is used every sort operation in a query. So each connection could have several of these allocated, thus starting your system of memory which will reduce that available for page cache. Change it to 8mb, and increase it in small increments if necessary. > So correct me if I'm wrong here: my theory is, that I have too many > too slow update queries, that then often end up in a situation, where > they "wait" for each other to finish, hence the sometimes VERY long > execution times. Sometimes this is the case, but for you, you're running into IO contention, not lock contention. Your 3TB RAID-1 is simply insufficient for this workload. If you check your logs after making the changes I've suggested, take a look at your checkpoint sync times. That will tell you how long it took the kernel to physically commit those blocks to disk and get a confirmation back from the controller. If those take longer than a second or two, you're probably running into controller buffer overflows. You have a large amount of RAM, so you should also make these two kernel changes to sysctl.conf: vm.dirty_ratio = 10 vm.dirty_writeback_ratio = 1 Then run this: sysctl -p This will help prevent large IO write spikes caused when the kernel decides to write out dirty memory. That can make checkpoints take minutes to commit in some cases, which basically stops all write traffic to your database entirely. That should get you going, anyway. You still need more/better disks so you can move your pg_xlog directory. With your write load, that will make a huge difference. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 11/28/2012 11:44 AM, Niels Kristian Schjødt wrote: > Thanks a lot - on the server I already have one additional SSD 250gb > disk, that I don't use for anything at the moment. Goooood. An SSD would actually be better for your data, as it follows more random access patterns, and xlogs are more sequential. But it's better than nothing. And yes, you'd be better off with a RAID-1 of two of these SSDs, because the xlogs are critical to database health. You have your archived copy due to the rsync, which helps. But if you had a crash, there could potentially be a need to replay unarchived transaction logs, and you'd end up with some data loss. > BTW. as you might have seen from the .conf I have a second slave > server with the exact same setup, which currently runs as a hot > streaming replication slave. I might ask a stupid question here, but > this does not affect the performance of the master does it? Only if you're using synchronous replication. From what I saw in the config, that isn't the case. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Hi, I have started to implement your suggestions . I have a small error so far though. The "vm.dirty_writeback_ratio = 1"command rerurns: error: "vm.dirty_writeback_ratio" is an unknown key I'm on ubuntu 12.04 Den 28/11/2012 kl. 17.54 skrev Shaun Thomas <sthomas@optionshouse.com>: > On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote: > >> https://rpm.newrelic.com/public/charts/h2dtedghfsv > > Doesn't this answer your question? > > That iowait is crushing your server into the ground. It's no surprise updates are taking several seconds. That update yousent us *should* execute on the order of only a few milliseconds. > > So I'll reiterate that you *must* move your pg_xlog location elsewhere. You've got row lookup bandwidth conflicting withwrites. There are a couple other changes you should probably make to your config: > >> checkpoint_segments = 16 > > This is not enough for the workload you describe. Every time the database checkpoints, all of those changes in pg_xlogare applied to the backend data files. You should set these values: > > checkpoint_segments = 100 > checkpoint_timeout = 10m > checkpoint_completion_target = 0.9 > > This will reduce your overall write workload, and make it less active. Too many checkpoints massively reduce write throughput.With the settings you have, it's probably checkpointing constantly while your load runs. Start with this, butexperiment with increasing checkpoint_segments further. > > If you check your logs now, you probably see a ton of "checkpoint starting: xlog" in there. That's very bad. It shouldsay "checkpoint starting: time" meaning it's keeping up with your writes naturally. > >> work_mem = 160MB > > This is probably way too high. work_mem is used every sort operation in a query. So each connection could have severalof these allocated, thus starting your system of memory which will reduce that available for page cache. Change itto 8mb, and increase it in small increments if necessary. > >> So correct me if I'm wrong here: my theory is, that I have too many >> too slow update queries, that then often end up in a situation, where >> they "wait" for each other to finish, hence the sometimes VERY long >> execution times. > > Sometimes this is the case, but for you, you're running into IO contention, not lock contention. Your 3TB RAID-1 is simplyinsufficient for this workload. > > If you check your logs after making the changes I've suggested, take a look at your checkpoint sync times. That will tellyou how long it took the kernel to physically commit those blocks to disk and get a confirmation back from the controller.If those take longer than a second or two, you're probably running into controller buffer overflows. You havea large amount of RAM, so you should also make these two kernel changes to sysctl.conf: > > vm.dirty_ratio = 10 > vm.dirty_writeback_ratio = 1 > > Then run this: > > sysctl -p > > This will help prevent large IO write spikes caused when the kernel decides to write out dirty memory. That can make checkpointstake minutes to commit in some cases, which basically stops all write traffic to your database entirely. > > That should get you going, anyway. You still need more/better disks so you can move your pg_xlog directory. With your writeload, that will make a huge difference. > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 > 312-444-8534 > sthomas@optionshouse.com > > ______________________________________________ > > See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
In later kernels these have been renamed: Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-32-generic x86_64) $ sysctl -a|grep dirty vm.dirty_background_ratio = 5 vm.dirty_background_bytes = 0 vm.dirty_ratio = 10 vm.dirty_bytes = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 3000 You the option of specifying either a ratio, or - more usefully for machines with a lot of ram - bytes. Regards Mark P.s: People on this list usually prefer it if you *bottom* post (i.e reply underneath the original). On 29/11/12 16:32, Niels Kristian Schjødt wrote: > Hi, I have started to implement your suggestions . I have a small error so far though. The "vm.dirty_writeback_ratio =1" command rerurns: > error: "vm.dirty_writeback_ratio" is an unknown key > I'm on ubuntu 12.04 > >
Den 28/11/2012 kl. 17.54 skrev Shaun Thomas <sthomas@optionshouse.com>: > On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote: > >> https://rpm.newrelic.com/public/charts/h2dtedghfsv > > Doesn't this answer your question? > > That iowait is crushing your server into the ground. It's no surprise updates are taking several seconds. That update yousent us *should* execute on the order of only a few milliseconds. > > So I'll reiterate that you *must* move your pg_xlog location elsewhere. You've got row lookup bandwidth conflicting withwrites. There are a couple other changes you should probably make to your config: > >> checkpoint_segments = 16 > > This is not enough for the workload you describe. Every time the database checkpoints, all of those changes in pg_xlogare applied to the backend data files. You should set these values: > > checkpoint_segments = 100 > checkpoint_timeout = 10m > checkpoint_completion_target = 0.9 > > This will reduce your overall write workload, and make it less active. Too many checkpoints massively reduce write throughput.With the settings you have, it's probably checkpointing constantly while your load runs. Start with this, butexperiment with increasing checkpoint_segments further. > > If you check your logs now, you probably see a ton of "checkpoint starting: xlog" in there. That's very bad. It shouldsay "checkpoint starting: time" meaning it's keeping up with your writes naturally. > >> work_mem = 160MB > > This is probably way too high. work_mem is used every sort operation in a query. So each connection could have severalof these allocated, thus starting your system of memory which will reduce that available for page cache. Change itto 8mb, and increase it in small increments if necessary. > >> So correct me if I'm wrong here: my theory is, that I have too many >> too slow update queries, that then often end up in a situation, where >> they "wait" for each other to finish, hence the sometimes VERY long >> execution times. > > Sometimes this is the case, but for you, you're running into IO contention, not lock contention. Your 3TB RAID-1 is simplyinsufficient for this workload. > > If you check your logs after making the changes I've suggested, take a look at your checkpoint sync times. That will tellyou how long it took the kernel to physically commit those blocks to disk and get a confirmation back from the controller.If those take longer than a second or two, you're probably running into controller buffer overflows. You havea large amount of RAM, so you should also make these two kernel changes to sysctl.conf: > > vm.dirty_ratio = 10 > vm.dirty_writeback_ratio = 1 > > Then run this: > > sysctl -p > > This will help prevent large IO write spikes caused when the kernel decides to write out dirty memory. That can make checkpointstake minutes to commit in some cases, which basically stops all write traffic to your database entirely. > > That should get you going, anyway. You still need more/better disks so you can move your pg_xlog directory. With your writeload, that will make a huge difference. > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 > 312-444-8534 > sthomas@optionshouse.com > > ______________________________________________ > > See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email Okay, now I'm done the updating as described above. I did the postgres.conf changes. I did the kernel changes, i added twoSSD's in a software RAID1 where the pg_xlog is now located - unfortunately the the picture is still the same :-( When the database is under "heavy" load, there is almost no improvement to see in the performance compared to before thechanges. A lot of both read and writes takes more than a 1000 times as long as they usually do, under "lighter" overallload. I added All the overview charts I can get hold on from new relic beneath. What am I overlooking? There must be an obviousbottleneck? Where should I dive in? Database server CPU usage https://rpm.newrelic.com/public/charts/cEdIvvoQZCr Database server load average https://rpm.newrelic.com/public/charts/cMNdrYW51QJ Database server physical memory https://rpm.newrelic.com/public/charts/c3dZBntNpa1 Database server disk I/O utulization https://rpm.newrelic.com/public/charts/9YEVw6RekFG Database server network I/O (Mb/s) https://rpm.newrelic.com/public/charts/lKiZ0Szmwe7 Top 5 database operations by wall clock time https://rpm.newrelic.com/public/charts/dCt45YH12FK Database throughput https://rpm.newrelic.com/public/charts/bIbtQ1mDzMI Database response time https://rpm.newrelic.com/public/charts/fPcNL8WA6xx
Niels Kristian Schjødt wrote: > Okay, now I'm done the updating as described above. I did the > postgres.conf changes. I did the kernel changes, i added two > SSD's in a software RAID1 where the pg_xlog is now located - > unfortunately the the picture is still the same :-( You said before that you were seeing high disk wait numbers. Now it is zero accourding to your disk utilization graph. That sounds like a change to me. > When the database is under "heavy" load, there is almost no > improvement to see in the performance compared to before the > changes. In client-visible response time and throughput, I assume, not resource usage numbers? > A lot of both read and writes takes more than a 1000 times as > long as they usually do, under "lighter" overall load. As an odd coincidence, you showed your max_connections setting to be 1000. http://wiki.postgresql.org/wiki/Number_Of_Database_Connections -Kevin
Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com>: > Niels Kristian Schjødt wrote: > >> Okay, now I'm done the updating as described above. I did the >> postgres.conf changes. I did the kernel changes, i added two >> SSD's in a software RAID1 where the pg_xlog is now located - >> unfortunately the the picture is still the same :-( > > You said before that you were seeing high disk wait numbers. Now it > is zero accourding to your disk utilization graph. That sounds like > a change to me. > >> When the database is under "heavy" load, there is almost no >> improvement to see in the performance compared to before the >> changes. > > In client-visible response time and throughput, I assume, not > resource usage numbers? > >> A lot of both read and writes takes more than a 1000 times as >> long as they usually do, under "lighter" overall load. > > As an odd coincidence, you showed your max_connections setting to > be 1000. > > http://wiki.postgresql.org/wiki/Number_Of_Database_Connections > > -Kevin Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot of I/O" it was CPU I/O, it also states that in the chartin the link. However, as I'm not very familiar with these deep down database and server things, I had no idea wether a disk bottle neckcould hide in this I/O, so i went along with Shauns great help, that unfortunately didn't solve my issues. Back to the issue: Could it be that it is the fact that I'm using ubuntus built in software raid to raid my disks, and thatit is not at all capable of handling the throughput?
Hmm I'm getting suspicious here. Maybe my new great setup with the SSD's is not really working as it should., and maybe newrelic is not monitoring as It should. If I do a "sudo iostat -k 1" I get a lot of output like this: Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdc 546.00 2296.00 6808.00 2296 6808 sdd 593.00 1040.00 7416.00 1040 7416 md1 0.00 0.00 0.00 0 0 md0 0.00 0.00 0.00 0 0 md2 1398.00 3328.00 13064.00 3328 13064 md3 0.00 0.00 0.00 0 0 The storage thing is, that the sda and sdb is the SSD drives and the sdc and sdd is the HDD drives. The md0, md1 and md2is the raid arrays on the HDD's and the md3 is the raid on the SSD's. Neither of the md3 or the SSD's are getting utilized- and I should expect that since they are serving my pg_xlog right? - so maybe I did something wrong in the setup.Here is the path I followed: # 1) First setup the SSD drives in a software RAID1 setup: # http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1 # # 2) Then move the postgres pg_xlog dir # sudo /etc/init.d/postgresql-9.2 stop # sudo mkdir -p /ssd/pg_xlog # sudo chown -R postgres.postgres /ssd/pg_xlog # sudo chmod 700 /ssd/pg_xlog # sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog # sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old # sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog # sudo /etc/init.d/postgresql-9.2 start Can you spot something wrong? Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt <nielskristian@autouncle.com>: > Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com>: > >> Niels Kristian Schjødt wrote: >> >>> Okay, now I'm done the updating as described above. I did the >>> postgres.conf changes. I did the kernel changes, i added two >>> SSD's in a software RAID1 where the pg_xlog is now located - >>> unfortunately the the picture is still the same :-( >> >> You said before that you were seeing high disk wait numbers. Now it >> is zero accourding to your disk utilization graph. That sounds like >> a change to me. >> >>> When the database is under "heavy" load, there is almost no >>> improvement to see in the performance compared to before the >>> changes. >> >> In client-visible response time and throughput, I assume, not >> resource usage numbers? >> >>> A lot of both read and writes takes more than a 1000 times as >>> long as they usually do, under "lighter" overall load. >> >> As an odd coincidence, you showed your max_connections setting to >> be 1000. >> >> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections >> >> -Kevin > > Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot of I/O" it was CPU I/O, it also states that in the chartin the link. > However, as I'm not very familiar with these deep down database and server things, I had no idea wether a disk bottle neckcould hide in this I/O, so i went along with Shauns great help, that unfortunately didn't solve my issues. > Back to the issue: Could it be that it is the fact that I'm using ubuntus built in software raid to raid my disks, andthat it is not at all capable of handling the throughput? >
Actually, what's the point in putting logs to ssd? SSDs are good for random access and logs are accessed sequentially. I'd put table spaces on ssd and leave logs on hdd
30 лист. 2012 04:33, "Niels Kristian Schjødt" <nielskristian@autouncle.com> напис.
Hmm I'm getting suspicious here. Maybe my new great setup with the SSD's is not really working as it should., and maybe new relic is not monitoring as It should.
If I do a "sudo iostat -k 1"
I get a lot of output like this:
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 546.00 2296.00 6808.00 2296 6808
sdd 593.00 1040.00 7416.00 1040 7416
md1 0.00 0.00 0.00 0 0
md0 0.00 0.00 0.00 0 0
md2 1398.00 3328.00 13064.00 3328 13064
md3 0.00 0.00 0.00 0 0
The storage thing is, that the sda and sdb is the SSD drives and the sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid arrays on the HDD's and the md3 is the raid on the SSD's. Neither of the md3 or the SSD's are getting utilized - and I should expect that since they are serving my pg_xlog right? - so maybe I did something wrong in the setup. Here is the path I followed:
# 1) First setup the SSD drives in a software RAID1 setup:
# http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1
#
# 2) Then move the postgres pg_xlog dir
# sudo /etc/init.d/postgresql-9.2 stop
# sudo mkdir -p /ssd/pg_xlog
# sudo chown -R postgres.postgres /ssd/pg_xlog
# sudo chmod 700 /ssd/pg_xlog
# sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
# sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old
# sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
# sudo /etc/init.d/postgresql-9.2 start
Can you spot something wrong?
Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt <nielskristian@autouncle.com>:
> Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com>:
>
>> Niels Kristian Schjødt wrote:
>>
>>> Okay, now I'm done the updating as described above. I did the
>>> postgres.conf changes. I did the kernel changes, i added two
>>> SSD's in a software RAID1 where the pg_xlog is now located -
>>> unfortunately the the picture is still the same :-(
>>
>> You said before that you were seeing high disk wait numbers. Now it
>> is zero accourding to your disk utilization graph. That sounds like
>> a change to me.
>>
>>> When the database is under "heavy" load, there is almost no
>>> improvement to see in the performance compared to before the
>>> changes.
>>
>> In client-visible response time and throughput, I assume, not
>> resource usage numbers?
>>
>>> A lot of both read and writes takes more than a 1000 times as
>>> long as they usually do, under "lighter" overall load.
>>
>> As an odd coincidence, you showed your max_connections setting to
>> be 1000.
>>
>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>>
>> -Kevin
>
> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot of I/O" it was CPU I/O, it also states that in the chart in the link.
> However, as I'm not very familiar with these deep down database and server things, I had no idea wether a disk bottle neck could hide in this I/O, so i went along with Shauns great help, that unfortunately didn't solve my issues.
> Back to the issue: Could it be that it is the fact that I'm using ubuntus built in software raid to raid my disks, and that it is not at all capable of handling the throughput?
>
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Most modern SSD are much faster for fsync type operations than a spinning disk - similar performance to spinning disk + writeback raid controller + battery. However as you mention, they are great at random IO too, so Niels, it might be worth putting your postgres logs *and* data on the SSDs and retesting. Regards Mark On 30/11/12 21:37, Vitalii Tymchyshyn wrote: > Actually, what's the point in putting logs to ssd? SSDs are good for > random access and logs are accessed sequentially. I'd put table spaces > on ssd and leave logs on hdd > > 30 лист. 2012 04:33, "Niels Kristian Schjødt" > <nielskristian@autouncle.com <mailto:nielskristian@autouncle.com>> напис. > > Hmm I'm getting suspicious here. Maybe my new great setup with the > SSD's is not really working as it should., and maybe new relic is > not monitoring as It should. > > If I do a "sudo iostat -k 1" > I get a lot of output like this: > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn > sda 0.00 0.00 0.00 0 0 > sdb 0.00 0.00 0.00 0 0 > sdc 546.00 2296.00 6808.00 2296 6808 > sdd 593.00 1040.00 7416.00 1040 7416 > md1 0.00 0.00 0.00 0 0 > md0 0.00 0.00 0.00 0 0 > md2 1398.00 3328.00 13064.00 3328 13064 > md3 0.00 0.00 0.00 0 0 > > The storage thing is, that the sda and sdb is the SSD drives and the > sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid > arrays on the HDD's and the md3 is the raid on the SSD's. Neither of > the md3 or the SSD's are getting utilized - and I should expect that > since they are serving my pg_xlog right? - so maybe I did something > wrong in the setup. Here is the path I followed: > > # 1) First setup the SSD drives in a software RAID1 setup: > # > http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1 > # > # 2) Then move the postgres pg_xlog dir > # sudo /etc/init.d/postgresql-9.2 stop > # sudo mkdir -p /ssd/pg_xlog > # sudo chown -R postgres.postgres /ssd/pg_xlog > # sudo chmod 700 /ssd/pg_xlog > # sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog > # sudo mv /var/lib/postgresql/9.2/main/pg_xlog > /var/lib/postgresql/9.2/main/pg_xlog_old > # sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog > # sudo /etc/init.d/postgresql-9.2 start > > Can you spot something wrong? > > > > Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt > <nielskristian@autouncle.com <mailto:nielskristian@autouncle.com>>: > > > Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com > <mailto:kgrittn@mail.com>>: > > > >> Niels Kristian Schjødt wrote: > >> > >>> Okay, now I'm done the updating as described above. I did the > >>> postgres.conf changes. I did the kernel changes, i added two > >>> SSD's in a software RAID1 where the pg_xlog is now located - > >>> unfortunately the the picture is still the same :-( > >> > >> You said before that you were seeing high disk wait numbers. Now it > >> is zero accourding to your disk utilization graph. That sounds like > >> a change to me. > >> > >>> When the database is under "heavy" load, there is almost no > >>> improvement to see in the performance compared to before the > >>> changes. > >> > >> In client-visible response time and throughput, I assume, not > >> resource usage numbers? > >> > >>> A lot of both read and writes takes more than a 1000 times as > >>> long as they usually do, under "lighter" overall load. > >> > >> As an odd coincidence, you showed your max_connections setting to > >> be 1000. > >> > >> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections > >> > >> -Kevin > > > > Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot > of I/O" it was CPU I/O, it also states that in the chart in the link. > > However, as I'm not very familiar with these deep down database > and server things, I had no idea wether a disk bottle neck could > hide in this I/O, so i went along with Shauns great help, that > unfortunately didn't solve my issues. > > Back to the issue: Could it be that it is the fact that I'm using > ubuntus built in software raid to raid my disks, and that it is not > at all capable of handling the throughput? > > > > > > -- > Sent via pgsql-performance mailing list > (pgsql-performance@postgresql.org > <mailto:pgsql-performance@postgresql.org>) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
When I try your command sequence I end up with the contents of the new pg_xlog owned by root. Postgres will not start: PANIC: could not open file "pg_xlog/000000010000000600000080" (log file 6, segment 128): Permission denied While this is fixable, I suspect you have managed to leave the xlogs directory that postgres is actually using on the HDD drives. When I do this I normally do: $ service postgresql stop $ sudo mkdir -p /ssd/pg_xlog $ sudo chown -R postgres.postgres /ssd/pg_xlog $ sudo chmod 700 /ssd/pg_xlog $ sudo su - postgres postgres $ mv /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog postgres $ rmdir /var/lib/postgresql/9.2/main/pg_xlog postgres $ ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog postgres $ service postgresql start regards Mark On 30/11/12 15:32, Niels Kristian Schjødt wrote: > Hmm I'm getting suspicious here. Maybe my new great setup with the SSD's is not really working as it should., and maybenew relic is not monitoring as It should. > > If I do a "sudo iostat -k 1" > I get a lot of output like this: > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn > sda 0.00 0.00 0.00 0 0 > sdb 0.00 0.00 0.00 0 0 > sdc 546.00 2296.00 6808.00 2296 6808 > sdd 593.00 1040.00 7416.00 1040 7416 > md1 0.00 0.00 0.00 0 0 > md0 0.00 0.00 0.00 0 0 > md2 1398.00 3328.00 13064.00 3328 13064 > md3 0.00 0.00 0.00 0 0 > > The storage thing is, that the sda and sdb is the SSD drives and the sdc and sdd is the HDD drives. The md0, md1 and md2is the raid arrays on the HDD's and the md3 is the raid on the SSD's. Neither of the md3 or the SSD's are getting utilized- and I should expect that since they are serving my pg_xlog right? - so maybe I did something wrong in the setup.Here is the path I followed: > > # 1) First setup the SSD drives in a software RAID1 setup: > # http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1 > # > # 2) Then move the postgres pg_xlog dir > # sudo /etc/init.d/postgresql-9.2 stop > # sudo mkdir -p /ssd/pg_xlog > # sudo chown -R postgres.postgres /ssd/pg_xlog > # sudo chmod 700 /ssd/pg_xlog > # sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog > # sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old > # sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog > # sudo /etc/init.d/postgresql-9.2 start > > Can you spot something wrong? > > > > Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt <nielskristian@autouncle.com>: > >> Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com>: >> >>> Niels Kristian Schjødt wrote: >>> >>>> Okay, now I'm done the updating as described above. I did the >>>> postgres.conf changes. I did the kernel changes, i added two >>>> SSD's in a software RAID1 where the pg_xlog is now located - >>>> unfortunately the the picture is still the same :-( >>> >>> You said before that you were seeing high disk wait numbers. Now it >>> is zero accourding to your disk utilization graph. That sounds like >>> a change to me. >>> >>>> When the database is under "heavy" load, there is almost no >>>> improvement to see in the performance compared to before the >>>> changes. >>> >>> In client-visible response time and throughput, I assume, not >>> resource usage numbers? >>> >>>> A lot of both read and writes takes more than a 1000 times as >>>> long as they usually do, under "lighter" overall load. >>> >>> As an odd coincidence, you showed your max_connections setting to >>> be 1000. >>> >>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections >>> >>> -Kevin >> >> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot of I/O" it was CPU I/O, it also states that in the chartin the link. >> However, as I'm not very familiar with these deep down database and server things, I had no idea wether a disk bottleneck could hide in this I/O, so i went along with Shauns great help, that unfortunately didn't solve my issues. >> Back to the issue: Could it be that it is the fact that I'm using ubuntus built in software raid to raid my disks, andthat it is not at all capable of handling the throughput? >> > > >
Oh, yes. I don't imagine DB server without RAID+BBU :)
--
Best regards,
Vitalii Tymchyshyn
When there is no BBU, SSD can be handy.
But you know, SSD is worse in linear read/write than HDD.
Best regards, Vitalii Tymchyshyn
2012/11/30 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Most modern SSD are much faster for fsync type operations than a spinning disk - similar performance to spinning disk + writeback raid controller + battery.
However as you mention, they are great at random IO too, so Niels, it might be worth putting your postgres logs *and* data on the SSDs and retesting.
Regards
Mark
On 30/11/12 21:37, Vitalii Tymchyshyn wrote:Actually, what's the point in putting logs to ssd? SSDs are good for<nielskristian@autouncle.com <mailto:nielskristian@autouncle.com>> напис.
random access and logs are accessed sequentially. I'd put table spaces
on ssd and leave logs on hdd
30 лист. 2012 04:33, "Niels Kristian Schjødt"<nielskristian@autouncle.com <mailto:nielskristian@autouncle.com>>: <mailto:kgrittn@mail.com>>:
Hmm I'm getting suspicious here. Maybe my new great setup with the
SSD's is not really working as it should., and maybe new relic is
not monitoring as It should.
If I do a "sudo iostat -k 1"
I get a lot of output like this:
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 546.00 2296.00 6808.00 2296 6808
sdd 593.00 1040.00 7416.00 1040 7416
md1 0.00 0.00 0.00 0 0
md0 0.00 0.00 0.00 0 0
md2 1398.00 3328.00 13064.00 3328 13064
md3 0.00 0.00 0.00 0 0
The storage thing is, that the sda and sdb is the SSD drives and the
sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid
arrays on the HDD's and the md3 is the raid on the SSD's. Neither of
the md3 or the SSD's are getting utilized - and I should expect that
since they are serving my pg_xlog right? - so maybe I did something
wrong in the setup. Here is the path I followed:
# 1) First setup the SSD drives in a software RAID1 setup:
#
http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1
#
# 2) Then move the postgres pg_xlog dir
# sudo /etc/init.d/postgresql-9.2 stop
# sudo mkdir -p /ssd/pg_xlog
# sudo chown -R postgres.postgres /ssd/pg_xlog
# sudo chmod 700 /ssd/pg_xlog
# sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
# sudo mv /var/lib/postgresql/9.2/main/pg_xlog
/var/lib/postgresql/9.2/main/pg_xlog_old
# sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
# sudo /etc/init.d/postgresql-9.2 start
Can you spot something wrong?
Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt<mailto:pgsql-performance@postgresql.org>)
>
>> Niels Kristian Schjødt wrote:
>>
>>> Okay, now I'm done the updating as described above. I did the
>>> postgres.conf changes. I did the kernel changes, i added two
>>> SSD's in a software RAID1 where the pg_xlog is now located -
>>> unfortunately the the picture is still the same :-(
>>
>> You said before that you were seeing high disk wait numbers. Now it
>> is zero accourding to your disk utilization graph. That sounds like
>> a change to me.
>>
>>> When the database is under "heavy" load, there is almost no
>>> improvement to see in the performance compared to before the
>>> changes.
>>
>> In client-visible response time and throughput, I assume, not
>> resource usage numbers?
>>
>>> A lot of both read and writes takes more than a 1000 times as
>>> long as they usually do, under "lighter" overall load.
>>
>> As an odd coincidence, you showed your max_connections setting to
>> be 1000.
>>
>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>>
>> -Kevin
>
> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot
of I/O" it was CPU I/O, it also states that in the chart in the link.
> However, as I'm not very familiar with these deep down database
and server things, I had no idea wether a disk bottle neck could
hide in this I/O, so i went along with Shauns great help, that
unfortunately didn't solve my issues.
> Back to the issue: Could it be that it is the fact that I'm using
ubuntus built in software raid to raid my disks, and that it is not
at all capable of handling the throughput?
>
--
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org
Best regards,
Vitalii Tymchyshyn
Actually, what's the point in putting logs to ssd? SSDs are good for random access and logs are accessed sequentially. I'd put table spaces on ssd and leave logs on hdd
30 лист. 2012 04:33, "Niels Kristian Schjødt" <nielskristian@autouncle.com> напис.
Because SSD's are considered faster. Then you have to put the most phyisical IO intensive operations on SSD. For the majority of databases, these are the logfiles. But you should investigate where the optimum is for your situation.
SSDs are not faster for sequential IO as I know. That's why (with BBU or synchronious_commit=off) I prefer to have logs on regular HDDs.
--
Best regards,
Vitalii Tymchyshyn
Best reag
2012/11/30 Willem Leenen <willem_leenen@hotmail.com>
Actually, what's the point in putting logs to ssd? SSDs are good for random access and logs are accessed sequentially. I'd put table spaces on ssd and leave logs on hdd30 лист. 2012 04:33, "Niels Kristian Schjødt" <nielskristian@autouncle.com> напис.Because SSD's are considered faster. Then you have to put the most phyisical IO intensive operations on SSD. For the majority of databases, these are the logfiles. But you should investigate where the optimum is for your situation.
Best regards,
Vitalii Tymchyshyn
Niels Kristian Schjødt wrote: >> You said before that you were seeing high disk wait numbers. Now >> it is zero accourding to your disk utilization graph. That >> sounds like a change to me. > Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot > of I/O" it was CPU I/O >>> A lot of both read and writes takes more than a 1000 times as >>> long as they usually do, under "lighter" overall load. >> >> As an odd coincidence, you showed your max_connections setting >> to be 1000. >> >> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections > Back to the issue: Could it be that it is the fact that I'm using > ubuntus built in software raid to raid my disks, and that it is > not at all capable of handling the throughput? For high performance situations I would always use a high quality RAID controller with battery-backed RAM configured for write-back; however: The graphs you included suggest that your problem has nothing to do with your storage system. Now maybe you didn't capture the data for the graphs while the problem was occurring, in which case the graphs would be absolutely useless; but based on what slim data you have provided, you need a connection pool (like maybe pgbouncer configured in transaction mode) to limit the number of database connections used to something like twice the number of cores. If you still have problems, pick the query which is using the most time on your database server, and post it with the information suggested on this page: http://wiki.postgresql.org/wiki/SlowQueryQuestions -Kevin
Okay, So to understand this better before I go with that solution: In theory what difference should it make to the performance, to have a pool in front of the database, that all my workersand web servers connect to instead of connecting directly? Where is the performance gain coming from in that situation? Den 30/11/2012 kl. 13.03 skrev "Kevin Grittner" <kgrittn@mail.com>: > Niels Kristian Schjødt wrote: > >>> You said before that you were seeing high disk wait numbers. Now >>> it is zero accourding to your disk utilization graph. That >>> sounds like a change to me. > >> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot >> of I/O" it was CPU I/O > >>>> A lot of both read and writes takes more than a 1000 times as >>>> long as they usually do, under "lighter" overall load. >>> >>> As an odd coincidence, you showed your max_connections setting >>> to be 1000. >>> >>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections > >> Back to the issue: Could it be that it is the fact that I'm using >> ubuntus built in software raid to raid my disks, and that it is >> not at all capable of handling the throughput? > > For high performance situations I would always use a high quality > RAID controller with battery-backed RAM configured for write-back; > however: > > The graphs you included suggest that your problem has nothing to do > with your storage system. Now maybe you didn't capture the data for > the graphs while the problem was occurring, in which case the > graphs would be absolutely useless; but based on what slim data you > have provided, you need a connection pool (like maybe pgbouncer > configured in transaction mode) to limit the number of database > connections used to something like twice the number of cores. > > If you still have problems, pick the query which is using the most > time on your database server, and post it with the information > suggested on this page: > > http://wiki.postgresql.org/wiki/SlowQueryQuestions > > -Kevin
On 11/30/2012 07:31 AM, Niels Kristian Schjødt wrote: > In theory what difference should it make to the performance, to have > a pool in front of the database, that all my workers and web servers > connect to instead of connecting directly? Where is the performance > gain coming from in that situation? If you have several more connections than you have processors, the database does a *lot* more context switching, and among other things, that drastically reduces PG performance. On a testbed, I can get over 150k transactions per second on PG 9.1 with a 1-1 relationship between CPU and client. Increase that to a few hundred, and my TPS drops down to 30k. Simply having the clients there kills performance. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 11/29/2012 08:32 PM, Niels Kristian Schjødt wrote: > If I do a "sudo iostat -k 1" > I get a lot of output like this: > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn > sda 0.00 0.00 0.00 0 0 > sdb 0.00 0.00 0.00 0 0 > sdc 546.00 2296.00 6808.00 2296 6808 > sdd 593.00 1040.00 7416.00 1040 7416 > md1 0.00 0.00 0.00 0 0 > md0 0.00 0.00 0.00 0 0 > md2 1398.00 3328.00 13064.00 3328 13064 > md3 0.00 0.00 0.00 0 0 > > The storage thing is, that the sda and sdb is the SSD drives and the > sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid > arrays on the HDD's and the md3 is the raid on the SSD's. Neither of > the md3 or the SSD's are getting utilized - and I should expect that > since they are serving my pg_xlog right? No, that's right. They are, but it would appear that the majority of your traffic actually isn't due to transaction logs like I'd suspected. If you get a chance, could you monitor the contents of: /var/lib/postgresql/9.2/main/base/pgsql_tmp Your main drives are getting way, way more writes than they should. 13MB per second is ridiculous even under heavy write loads. Based on the TPS count, you're basically saturating the ability of those two 3TB drives. Those writes have to be coming from somewhere. > # sudo mkdir -p /ssd/pg_xlog This is going to sound stupid, but are you *sure* the SSD is mounted at /ssd ? > # sudo chown -R postgres.postgres /ssd/pg_xlog > # sudo chmod 700 /ssd/pg_xlog > # sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog > # sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old > # sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog > # sudo /etc/init.d/postgresql-9.2 start The rest of this is fine, except that you probably should have added: sudo chown -R postgres:postgres /ssd/pg_xlog/* -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 11/30/2012 02:37 AM, Vitalii Tymchyshyn wrote: > Actually, what's the point in putting logs to ssd? SSDs are good for > random access and logs are accessed sequentially. While this is true, Niels' problem is that his regular HDs are getting saturated. In that case, moving any activity off of them is an improvement. Why not move the data to the SSDs, you ask? Because he bought two 3TB drives. The assumption here is that a 256GB SSD will not have enough space for the long-term lifespan of this database. Either way, based on the iostat activity he posted, clearly there's some other write stream happening we're not privy to. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Den 30/11/2012 kl. 15.02 skrev Shaun Thomas <sthomas@optionshouse.com>: > On 11/29/2012 08:32 PM, Niels Kristian Schjødt wrote: > >> If I do a "sudo iostat -k 1" >> I get a lot of output like this: >> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn >> sda 0.00 0.00 0.00 0 0 >> sdb 0.00 0.00 0.00 0 0 >> sdc 546.00 2296.00 6808.00 2296 6808 >> sdd 593.00 1040.00 7416.00 1040 7416 >> md1 0.00 0.00 0.00 0 0 >> md0 0.00 0.00 0.00 0 0 >> md2 1398.00 3328.00 13064.00 3328 13064 >> md3 0.00 0.00 0.00 0 0 >> > >> The storage thing is, that the sda and sdb is the SSD drives and the >> sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid >> arrays on the HDD's and the md3 is the raid on the SSD's. Neither of >> the md3 or the SSD's are getting utilized - and I should expect that >> since they are serving my pg_xlog right? > > No, that's right. They are, but it would appear that the majority of your traffic actually isn't due to transaction logslike I'd suspected. If you get a chance, could you monitor the contents of: > > /var/lib/postgresql/9.2/main/base/pgsql_tmp > > Your main drives are getting way, way more writes than they should. 13MB per second is ridiculous even under heavy writeloads. Based on the TPS count, you're basically saturating the ability of those two 3TB drives. Those writes have tobe coming from somewhere. > >> # sudo mkdir -p /ssd/pg_xlog > > This is going to sound stupid, but are you *sure* the SSD is mounted at /ssd ? > >> # sudo chown -R postgres.postgres /ssd/pg_xlog >> # sudo chmod 700 /ssd/pg_xlog >> # sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog >> # sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old >> # sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog >> # sudo /etc/init.d/postgresql-9.2 start > > The rest of this is fine, except that you probably should have added: > > sudo chown -R postgres:postgres /ssd/pg_xlog/* > > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 > 312-444-8534 > sthomas@optionshouse.com > > ______________________________________________ > > See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email Oh my, Shaun once again you nailed it! That's what you get from working too late in the night - I forgot to run 'sudo mount-a' I feel so embarrassed now :-( - In other words no the drive was not mounted to the /ssd dir. So now it is, and this has gained me a performance increase of roughly around 20% - a little less than what I would havehoped for but still better - but anyways yes that's right. I still see a lot of CPU I/O when doing a lot of writes, so the question is, what's next. Should I try and go' for the connectionpooling thing or monitor that /var/lib/postgresql/9.2/main/base/pgsql_tmp dir (and what exactly do you mean bymonitor - size?) PS. comment on the "Why not move the data to the SSDs" you are exactly right. i don't think the SSD's will be big enoughfor the data within a not too long timeframe, so that is exactly why I want to keep my data on the "big" drives. PPS. I talked with New Relic and it turns out there is something wrong with the disk monitoring tool, so that's why therewas nothing in the disk charts but iostat showed a lot of activity.
On 11/30/2012 08:48 AM, Niels Kristian Schjødt wrote: > I forgot to run 'sudo mount -a' I feel so embarrassed now :-( - In > other words no the drive was not mounted to the /ssd dir. Yeah, that'll get ya. > I still see a lot of CPU I/O when doing a lot of writes, so the > question is, what's next. Should I try and go' for the connection > pooling thing or monitor that > /var/lib/postgresql/9.2/main/base/pgsql_tmp dir (and what exactly do > you mean by monitor - size?) Well, like Keven said, if you have more than a couple dozen connections on your hardware, you're losing TPS. It's probably a good idea to install pgbouncer or pgpool and let your clients connect to those instead. You should see a good performance boost from that. But what concerns me is that your previous CPU charts showed a lot of iowait. Even with the SSD taking some of the load off your write stream, something else is going on, there. That's why you need to monitor the "size" in MB, or number of files, for the pgsql_tmp directory. That's where PG puts temp files when sorts are too big for your work_mem. If that's getting a ton of activity, that would explain some of your write overhead. > PPS. I talked with New Relic and it turns out there is something > wrong with the disk monitoring tool, so that's why there was nothing > in the disk charts but iostat showed a lot of activity. Yeah. Next time you need to check IO, use iostat. It's not as pretty, but it tells everything. ;) Just to help out with that, use: iostat -dmx That will give you extended information, including the % utilization of your drives. TPS stats are nice, but I was just guessing your drives were stalling out based on experience. Getting an outright percentage is better. You should also use sar. Just a plain: sar 1 100 Will give you a lot of info on what the CPU is doing. You want that %iowait column to be as low as possible. Keep us updated. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Hmm very very interesting. Currently I run at "medium" load compared to the very high loads in the night.
This is what the CPU I/O on new relic show: https://rpm.newrelic.com/public/charts/8RnSOlWjfBy
And this is what iostat shows:
Linux 3.2.0-33-generic (master-db) 11/30/2012 _x86_64_ (8 CPU)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 3.46 26.62 57.06 1.66 0.68 57.41 0.04 0.43 0.77 0.28 0.09 0.73
sdb 0.03 16.85 0.01 70.26 0.00 2.35 68.36 0.06 0.81 0.21 0.81 0.10 0.73
sdc 1.96 56.37 25.45 172.56 0.53 3.72 43.98 30.83 155.70 25.15 174.96 1.74 34.46
sdd 1.83 56.52 25.48 172.42 0.52 3.72 43.90 30.50 154.11 25.66 173.09 1.74 34.37
md1 0.00 0.00 0.00 0.00 0.00 0.00 3.02 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.57 0.59 0.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 54.14 227.94 1.05 3.72 34.61 0.00 0.00 0.00 0.00 0.00 0.00
md3 0.00 0.00 0.01 60.46 0.00 0.68 23.12 0.00 0.00 0.00 0.00 0.00 0.00
A little reminder md3 is the raid array of the ssd drives sda and sdb and the md0-2 is the array of the regular hdd drives sdc and sdd
The pgsql_tmp dir is not changing at all it's constantly empty (a size of 4.0K).
So It doesn't seem like the ssd drives is at all utilized but the regular drives certainly is. but now i know for sure that the /ssd is mounted correctly:
"sudo df /ssd"
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md3 230619228 5483796 213420620 3% /ssd
Den 30/11/2012 kl. 16.00 skrev Shaun Thomas <sthomas@optionshouse.com>:
On 11/30/2012 08:48 AM, Niels Kristian Schjødt wrote:I forgot to run 'sudo mount -a' I feel so embarrassed now :-( - In
other words no the drive was not mounted to the /ssd dir.
Yeah, that'll get ya.I still see a lot of CPU I/O when doing a lot of writes, so the
question is, what's next. Should I try and go' for the connection
pooling thing or monitor that
/var/lib/postgresql/9.2/main/base/pgsql_tmp dir (and what exactly do
you mean by monitor - size?)
Well, like Keven said, if you have more than a couple dozen connections on your hardware, you're losing TPS. It's probably a good idea to install pgbouncer or pgpool and let your clients connect to those instead. You should see a good performance boost from that.
But what concerns me is that your previous CPU charts showed a lot of iowait. Even with the SSD taking some of the load off your write stream, something else is going on, there. That's why you need to monitor the "size" in MB, or number of files, for the pgsql_tmp directory. That's where PG puts temp files when sorts are too big for your work_mem. If that's getting a ton of activity, that would explain some of your write overhead.PPS. I talked with New Relic and it turns out there is something
wrong with the disk monitoring tool, so that's why there was nothing
in the disk charts but iostat showed a lot of activity.
Yeah. Next time you need to check IO, use iostat. It's not as pretty, but it tells everything. ;) Just to help out with that, use:
iostat -dmx
That will give you extended information, including the % utilization of your drives. TPS stats are nice, but I was just guessing your drives were stalling out based on experience. Getting an outright percentage is better. You should also use sar. Just a plain:
sar 1 100
Will give you a lot of info on what the CPU is doing. You want that %iowait column to be as low as possible.
Keep us updated.
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 11/30/2012 09:44 AM, Niels Kristian Schjødt wrote: Just a note on your iostat numbers. The first reading is actually just a summary. You want the subsequent readings. > The pgsql_tmp dir is not changing at all it's constantly empty (a size > of 4.0K). Good. > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/md3 230619228 5483796 213420620 3% /ssd Good. You could just be seeing lots of genuine activity. But going back on the thread, I remember seeing this in your postgresql.conf: shared_buffers = 7680MB Change this to: shared_buffers = 4GB I say that because you mentioned you're using Ubuntu 12.04, and we were having some problems with PG on that platform. With shared_buffers over 4GB, it starts doing really weird things to the memory subsystem. Whatever it does causes the kernel to purge cache rather aggressively. We saw a 60% reduction in read IO by reducing shared_buffers to 4GB. Without as many reads, your writes should be much less disruptive. You'll need to restart PG to adopt that change. But I encourage you to keep iostat running in a terminal window so you can watch it for a while. It's very revealing. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Nov 30, 2012, at 8:06 AM, Shaun Thomas wrote:
I say that because you mentioned you're using Ubuntu 12.04, and we were
having some problems with PG on that platform. With shared_buffers over
4GB, it starts doing really weird things to the memory subsystem.
Whatever it does causes the kernel to purge cache rather aggressively.
We saw a 60% reduction in read IO by reducing shared_buffers to 4GB.
Without as many reads, your writes should be much less disruptive.
Hm, this sounds like something we should look into. Before we start digging do you have more to share, or did you leave it with the "huh, that's weird; this seems to fix it" solution?
On 11/30/2012 01:57 PM, Ben Chobot wrote: > Hm, this sounds like something we should look into. Before we start > digging do you have more to share, or did you leave it with the "huh, > that's weird; this seems to fix it" solution? We're still testing. We're still on the -31 kernel. We tried the -33 kernel which *might* fix it, but then this happened: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1084264 So now we're testing -34 which is currently proposed. Either way, it's pretty clear that Ubuntu's choice of patches to backport is rather eclectic and a little wonky, or that nailing down load calculations went awry since the NOHZ stuff started, or both. At this point, I wish we'd stayed on CentOS. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Fri, Nov 30, 2012 at 02:01:45PM -0600, Shaun Thomas wrote: > On 11/30/2012 01:57 PM, Ben Chobot wrote: > > >Hm, this sounds like something we should look into. Before we start > >digging do you have more to share, or did you leave it with the "huh, > >that's weird; this seems to fix it" solution? > > We're still testing. We're still on the -31 kernel. We tried the -33 > kernel which *might* fix it, but then this happened: > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1084264 > > So now we're testing -34 which is currently proposed. Either way, > it's pretty clear that Ubuntu's choice of patches to backport is > rather eclectic and a little wonky, or that nailing down load > calculations went awry since the NOHZ stuff started, or both. At > this point, I wish we'd stayed on CentOS. Or Debian. Not sure what would justify use of Ubuntu as a server, except wanting to have the exact same OS as their personal computers. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 11/30/2012 02:38 PM, Bruce Momjian wrote: > Or Debian. Not sure what would justify use of Ubuntu as a server, > except wanting to have the exact same OS as their personal computers. Honestly not sure why we went that direction. I'm not in the sysadmin group, though I do work with them pretty closely. I think it was because of the LTS label, and the fact that the packages are quite a bit more recent than Debian stable. I can say however, that I'm testing the 3.4 kernel right now, and it seems much better. I may be able to convince them to install that instead if their own tests prove beneficial. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Fri, Nov 30, 2012 at 12:38 PM, Bruce Momjian <bruce@momjian.us> wrote: > Or Debian. Not sure what would justify use of Ubuntu as a server, > except wanting to have the exact same OS as their personal computers. We have switched from Debian to Ubuntu: there is definitely non-zero value in the PPA hosting (although it's rather terrible in many ways), regular LTS releases (even if you choose not to use them right away, and know they are somewhat buggy at times), and working with AWS and Canonical as organizations (that, most importantly, can interact directly without my own organization) on certain issues. For example, this dog of a bug: https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/929941 I also frequently take advantage of Debian unstable for backporting of specific packages that are very important to me, so there's a lot of value to me in Ubuntu being quite similar to Debian. In fact, even though I say we 'switched', it's not as though we re-did some entrenched systems from Debian to Ubuntu -- rather, we employ both systems at the same time and I don't recall gnashing of teeth about that, because they are very similar. Yet, there is a clear Ubuntu preference for new systems made today and, to wit, I can't think of anyone with more than the most mild preference for Debian. Conversely, I'd say the preference for Ubuntu for the aforementioned reasons is clear but moderate at most. Also, there's the similarity to the lap/desktop environment. Often cited with some derision, yet it does add a lot of value, even if people run slightly newer Ubuntus on their non-production computer. -- fdr
On 01/12/12 11:21, Daniel Farina wrote: > On Fri, Nov 30, 2012 at 12:38 PM, Bruce Momjian <bruce@momjian.us> wrote: >> Or Debian. Not sure what would justify use of Ubuntu as a server, >> except wanting to have the exact same OS as their personal computers. > > We have switched from Debian to Ubuntu: there is definitely non-zero > value in the PPA hosting (although it's rather terrible in many ways), > regular LTS releases (even if you choose not to use them right away, > and know they are somewhat buggy at times), and working with AWS and > Canonical as organizations (that, most importantly, can interact > directly without my own organization) on certain issues. For example, > this dog of a bug: > > https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/929941 > > I also frequently take advantage of Debian unstable for backporting of > specific packages that are very important to me, so there's a lot of > value to me in Ubuntu being quite similar to Debian. In fact, even > though I say we 'switched', it's not as though we re-did some > entrenched systems from Debian to Ubuntu -- rather, we employ both > systems at the same time and I don't recall gnashing of teeth about > that, because they are very similar. Yet, there is a clear Ubuntu > preference for new systems made today and, to wit, I can't think of > anyone with more than the most mild preference for Debian. Conversely, > I'd say the preference for Ubuntu for the aforementioned reasons is > clear but moderate at most. > > Also, there's the similarity to the lap/desktop environment. Often > cited with some derision, yet it does add a lot of value, even if > people run slightly newer Ubuntus on their non-production computer. > +1 We have gone through pretty much the same process in the last couple of years. Most of our new systems run Ubuntu, some Debian. There is definitely value in running the "same" system on the desktop too - often makes bug replication ridiculously easy (no having to find the appropriate test environment, ask if I can hammer/punish/modify it etc etc, and no need even spin up a VM). Cheers Mark
Hmm - not strictly true as stated: 1 SSD will typically do 500MB/s sequential read/write. 1 HDD will be lucky to get a 1/3 that. We are looking at replacing 4 to 6 disk RAID10 arrays of HDD with a RAID1 pair of SSD, as they perform about the same for sequential work and vastly better at random. Plus they only use 2x 2.5" slots (or, ahem 2x PCIe sockets), so allow smaller form factor servers and save on power and cooling. Cheers Mark On 30/11/12 23:07, Vitalii Tymchyshyn wrote: > Oh, yes. I don't imagine DB server without RAID+BBU :) > When there is no BBU, SSD can be handy. > But you know, SSD is worse in linear read/write than HDD. > > Best regards, Vitalii Tymchyshyn > > > 2012/11/30 Mark Kirkwood <mark.kirkwood@catalyst.net.nz > <mailto:mark.kirkwood@catalyst.net.nz>> > > Most modern SSD are much faster for fsync type operations than a > spinning disk - similar performance to spinning disk + writeback > raid controller + battery. > > However as you mention, they are great at random IO too, so Niels, > it might be worth putting your postgres logs *and* data on the SSDs > and retesting. >
Well, it seems that my data can be outdated, sorry for that. I've just checked performance numbers on Tom's hardware and it seems that best sad really do 500 MB/s. Some others do 100. So, I'd say one must choose wisely (as always :-) ).
Best regards,
Vitalii Tymchyshyn
1 груд. 2012 00:43, "Mark Kirkwood" <mark.kirkwood@catalyst.net.nz> напис.
Hmm - not strictly true as stated: 1 SSD will typically do 500MB/s sequential read/write. 1 HDD will be lucky to get a 1/3 that.
We are looking at replacing 4 to 6 disk RAID10 arrays of HDD with a RAID1 pair of SSD, as they perform about the same for sequential work and vastly better at random. Plus they only use 2x 2.5" slots (or, ahem 2x PCIe sockets), so allow smaller form factor servers and save on power and cooling.
Cheers
Mark
On 30/11/12 23:07, Vitalii Tymchyshyn wrote:Oh, yes. I don't imagine DB server without RAID+BBU :)
When there is no BBU, SSD can be handy.
But you know, SSD is worse in linear read/write than HDD.
Best regards, Vitalii Tymchyshyn
2012/11/30 Mark Kirkwood <mark.kirkwood@catalyst.net.nz
<mailto:mark.kirkwood@catalyst.net.nz>>
Most modern SSD are much faster for fsync type operations than a
spinning disk - similar performance to spinning disk + writeback
raid controller + battery.
However as you mention, they are great at random IO too, so Niels,
it might be worth putting your postgres logs *and* data on the SSDs
and retesting.
Yeah, this area is changing very fast! I agree - choosing carefully is important, as there are still plenty of older models around that are substantially slower. Also choice of motherboard chipset can strongly effect overall performance too. The 6 Gbit/s ports on Sandy and Ivy bridge Mobos [1] seem to get close to that rated performance out of the SSD that I've tested (Crucial m4, Intel various). Cheers Mark [1] Which I think are actually Intel or Marvell controllers. On 03/12/12 00:14, Vitalii Tymchyshyn wrote: > Well, it seems that my data can be outdated, sorry for that. I've just > checked performance numbers on Tom's hardware and it seems that best sad > really do 500 MB/s. Some others do 100. So, I'd say one must choose wisely > (as always :-) ). > > Best regards, > Vitalii Tymchyshyn > 1 груд. 2012 00:43, "Mark Kirkwood" <mark.kirkwood@catalyst.net.nz> напис. > >> Hmm - not strictly true as stated: 1 SSD will typically do 500MB/s >> sequential read/write. 1 HDD will be lucky to get a 1/3 that. >> >> We are looking at replacing 4 to 6 disk RAID10 arrays of HDD with a RAID1 >> pair of SSD, as they perform about the same for sequential work and vastly >> better at random. Plus they only use 2x 2.5" slots (or, ahem 2x PCIe >> sockets), so allow smaller form factor servers and save on power and >> cooling. >> >> Cheers >> >> Mark >> >> On 30/11/12 23:07, Vitalii Tymchyshyn wrote: >> >>> Oh, yes. I don't imagine DB server without RAID+BBU :) >>> When there is no BBU, SSD can be handy. >>> But you know, SSD is worse in linear read/write than HDD. >>> >>> Best regards, Vitalii Tymchyshyn >>> >>> >>> 2012/11/30 Mark Kirkwood <mark.kirkwood@catalyst.net.nz >>> <mailto:mark.kirkwood@**catalyst.net.nz <mark.kirkwood@catalyst.net.nz>>> >>> >>> Most modern SSD are much faster for fsync type operations than a >>> spinning disk - similar performance to spinning disk + writeback >>> raid controller + battery. >>> >>> However as you mention, they are great at random IO too, so Niels, >>> it might be worth putting your postgres logs *and* data on the SSDs >>> and retesting. >>> >>>
Den 30/11/2012 kl. 17.06 skrev Shaun Thomas <sthomas@optionshouse.com>: > On 11/30/2012 09:44 AM, Niels Kristian Schjødt wrote: > > Just a note on your iostat numbers. The first reading is actually just a summary. You want the subsequent readings. > >> The pgsql_tmp dir is not changing at all it's constantly empty (a size >> of 4.0K). > > Good. > >> Filesystem 1K-blocks Used Available Use% Mounted on >> /dev/md3 230619228 5483796 213420620 3% /ssd > > Good. > > You could just be seeing lots of genuine activity. But going back on the thread, I remember seeing this in your postgresql.conf: > > shared_buffers = 7680MB > > Change this to: > > shared_buffers = 4GB > > I say that because you mentioned you're using Ubuntu 12.04, and we were having some problems with PG on that platform.With shared_buffers over 4GB, it starts doing really weird things to the memory subsystem. Whatever it does causesthe kernel to purge cache rather aggressively. We saw a 60% reduction in read IO by reducing shared_buffers to 4GB.Without as many reads, your writes should be much less disruptive. > > You'll need to restart PG to adopt that change. > > But I encourage you to keep iostat running in a terminal window so you can watch it for a while. It's very revealing. > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 > 312-444-8534 > sthomas@optionshouse.com > > ______________________________________________ > > See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email Couldn't this be if you haven't changed these: http://www.postgresql.org/docs/9.2/static/kernel-resources.html ? I have changed the following in my configuration: kernel.shmmax = 8589934592 #(8GB) kernel.shmall = 17179869184 #(16GB)