Thread: Optimize update query

Optimize update query

From

Niels Kristian Schjødt

Date:

28 November 2012, 12:57:56

Hi, i have these update queries, that run very often, and takes too long time, in order for us to reach the throughput
weare aiming at. However, the update query is very simple, and I can't figure out any way to improve the situation. The
querylooks like this: 

UPDATE "adverts" SET "last_observed_at" = '2012-11-28 00:02:30.265154', "data_source_id"
='83d024a57bc2958940f3ca281bddcbf4'WHERE"adverts"."id" IN ( 1602382, 4916432, 3221246, 4741057, 3853335, 571429,
3222740,571736, 3544903, 325378,5774338, 5921451, 4295768, 3223170, 5687001, 4741966, 325519, 580867, 325721, 4412200,
4139598,325567, 1616653,1616664, 6202007, 3223748, 325613, 3223764, 325615, 4296536, 3854595, 4971428, 3224146,
5150522,4412617, 5073048,325747, 325771, 1622154, 5794384, 5736581, 1623767, 5686945, 3224627, 5073009, 3224747,
3224749,325809, 5687051,3224811, 5687052, 4917824, 5073013, 3224816, 3224834, 4297331, 1623907, 325864, 1623947,
6169706,325869, 325877,3225074, 3225112, 325893, 325912, 3225151, 3225184, 3225175, 1624659, 325901, 4033926, 325904,
325911,4412835,1624737, 5073004, 5921434, 325915, 3225285, 3225452, 4917672, 1624984, 3225472, 325940, 5380611, 325957,
5073258,3225500,1625002, 5923489, 4413009, 325952, 3961122, 3637777 ) ; 

An explain outputs me the following:

"Update on adverts  (cost=0.12..734.27 rows=95 width=168)"
"  ->  Index Scan using adverts_pkey on adverts  (cost=0.12..734.27 rows=95 width=168)"
"        Index Cond: (id = ANY
('{1602382,4916432,3221246,4741057,3853335,571429,3222740,571736,3544903,325378,5774338,5921451,4295768,3223170,5687001,4741966,325519,580867,325721,4412200,4139598,325567,1616653,1616664,6202007,3223748,325613,3223764,325615,4296536,3854595,4971428,3224146,5150522,4412617,5073048,325747,325771,1622154,5794384,5736581,1623767,5686945,3224627,5073009,3224747,3224749,325809,5687051,3224811,5687052,4917824,5073013,3224816,3224834,4297331,1623907,325864,1623947,6169706,325869,325877,3225074,3225112,325893,325912,3225151,3225184,3225175,1624659,325901,4033926,325904,325911,4412835,1624737,5073004,5921434,325915,3225285,3225452,4917672,1624984,3225472,325940,5380611,325957,5073258,3225500,1625002,5923489,4413009,325952,3961122,3637777}'::integer[]))"

So as you can see, it's already pretty optimized, it's just not enough :-) So what can I do? the two columns
last_observed_atand data_source_id has an index, and it is needed elsewhere, so I can't delete those. 

PS. I'm on postgres 9.2 on a server with 32gb ram, 8 cores and two 3T disks in a software raid 1 setup.

Is the only way out of this really a SSD disk?

Re: Optimize update query

From

Shaun Thomas

Date:

28 November 2012, 14:08:11

On 11/28/2012 06:57 AM, Niels Kristian Schjødt wrote:

Before I go crazy, here... you really need to tell us what "not enough"
means. You didn't provide an explain analyze, so we don't know what your
actual performance is. But I have my suspicions.

> So as you can see, it's already pretty optimized, it's just not
> enough :-) So what can I do? the two columns last_observed_at and
> data_source_id has an index, and it is needed elsewhere, so I can't
> delete those.

Ok, so part of your problem is that you're tying an advertising system
directly to the database for direct updates. That's a big no-no. Any
time you got a huge influx of views, there would be a logjam. You need
to decouple this so you can use a second tool to load the database in
larger batches. You'll get much higher throughput this way.

If you absolutely must use this approach, you're going to have to beef
up your hardware.

> PS. I'm on postgres 9.2 on a server with 32gb ram, 8 cores and two 3T
> disks in a software raid 1 setup.

This is not sufficient for a high-bandwidth stream of updates. Not even
close. Even if those 3T disks are 7200 RPM, and even in RAID-1, you're
going to have major problems with concurrent reads and writes. You need
to do several things:

1. Move your transaction logs (pg_xlog) to another pair of disks
entirely. Do not put these on the same disks as your data if you need
high write throughput.
2. Get a better disk architecture. You need 10k, or 15k RPM disks.
Starting with 6 or more of them in a RAID-10 would be a good beginning.

You never told us your postgresql.conf settings, so I'm just going with
very generic advice. Essentially, you're expecting too much for too
little. That machine would have been low-spec three years ago, and
unsuited to database use simply due to the 2-disk RAID.

> Is the only way out of this really a SSD disk?

No. There are many, many steps you can and should take before going this
route. You need to know the problem you're solving before making
potentially expensive hardware decisions.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Optimize update query

From

Marcin Mirosław

Date:

28 November 2012, 14:39:59

W dniu 28.11.2012 15:07, Shaun Thomas pisze:
> On 11/28/2012 06:57 AM, Niels Kristian Schjødt wrote:
>
> Before I go crazy, here... you really need to tell us what "not enough"
> means. You didn't provide an explain analyze, so we don't know what your
> actual performance is. But I have my suspicions.
>
>> So as you can see, it's already pretty optimized, it's just not
>> enough :-) So what can I do? the two columns last_observed_at and
>> data_source_id has an index, and it is needed elsewhere, so I can't
>> delete those.
>
> Ok, so part of your problem is that you're tying an advertising system
> directly to the database for direct updates. That's a big no-no. Any
> time you got a huge influx of views, there would be a logjam. You need
> to decouple this so you can use a second tool to load the database in
> larger batches. You'll get much higher throughput this way.

+1, sql databases has limited number of inserts/updates per second. Even
with highend hardware you won't have more than XXX operations per
second. As Thomas said, you should feed something like nosql database
from www server and use other tool to do aggregation and batch inserts
to postgresql. It will scale much better.

Marcin

Re: Optimize update query

From

"Kevin Grittner"

Date:

28 November 2012, 14:51:35

Niels Kristian Schjødt wrote:

> PS. I'm on postgres 9.2 on a server with 32gb ram, 8 cores and
> two 3T disks in a software raid 1 setup.

In addtion to the excellent advice from Shaun, I would like to
point out a few other things.

One query runs on one core. In a test of a single query, the other
seven cores aren't doing anything. Be sure to pay attention to how
a representative workload is handled.

Unless you have tuned your postgresql.conf settings, you probably
aren't taking very good advantage of that RAM.

For heavy load you need lots of spindles and a good RAID controller
with battery-backed cache configured for write-back.

You will probably benefit from reading this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

If you don't already have it, you will probably find Greg Smith's
book on PostgreSQL performance a great investment:

http://www.postgresql.org/docs/books/

-Kevin

Re: Optimize update query

From

Willem Leenen

Date:

28 November 2012, 15:11:53

I assume that SQL databases ( Banks? Telecom?) can handle an used car shop. No need for an unstructured data tool.

> +1, sql databases has limited number of inserts/updates per second. Even
> with highend hardware you won't have more than XXX operations per
> second. As Thomas said, you should feed something like nosql database
> from www server and use other tool to do aggregation and batch inserts
> to postgresql. It will scale much better.
>
> Marcin

Re: Optimize update query

From

Niels Kristian Schjødt

Date:

28 November 2012, 16:19:27

Okay guys,

Thanks for all the great help and advice already! Let me just clear some things, to make my question a little easier to answer :-)

Now my site is a search engine for used cars - not just a car shop with a few hundred cars.

The update query you look at, is an update that is executed once a day in chunks for all active adverts, so we know they are still for sale (one car can be advertised at several places hence several "adverts"). So it's not a "constant stream" but it has a fairly high volume especially at night time though.

A compressed version of my .conf looks like this (note: there is some tweaks at the end of the file)

data_directory = '/var/lib/postgresql/9.2/main'

hba_file = '/etc/postgresql/9.2/main/pg_hba.conf'

ident_file = '/etc/postgresql/9.2/main/pg_ident.conf'

external_pid_file = '/var/run/postgresql/9.2-main.pid'

listen_addresses = '192.168.0.2, localhost'

port = 5432

max_connections = 1000

unix_socket_directory = '/var/run/postgresql'

wal_level = hot_standby

synchronous_commit = off

archive_mode = onarchive_command = 'rsync -a %p postgres@192.168.0.4:/var/lib/postgresql/9.2/wals/%f </dev/null'

max_wal_senders = 1

wal_keep_segments = 32

logging_collector = on

log_min_messages = debug1

log_min_error_statement = debug1

log_min_duration_statement = 0

log_checkpoints = on

log_connections = on

log_disconnections = onlog_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '

log_lock_waits = on log_temp_files = 0

datestyle = 'iso, mdy'

lc_messages = 'C'

lc_monetary = 'en_US.UTF-8'

lc_numeric = 'en_US.UTF-8'

lc_time = 'en_US.UTF-8'

default_text_search_config = 'pg_catalog.english'

default_statistics_target = 100

maintenance_work_mem = 1GB

checkpoint_completion_target = 0.7

effective_cache_size = 22GB

work_mem = 160MB

wal_buffers = 4MB

checkpoint_segments = 16

shared_buffers = 7680MB

# All the log stuff is mainly temporary requirement for pgBadger

# The database has been tuned with pgtuner

You might be familiar with new relic, and I use that for quite a lot of monitoring. So, this is what I see at night time (a lot of I/O). So I went to play around with pgBadger to get some insights at database level.

This shows me, that the by far most time-consuming queries are updates (in general). On avg. a query like the one I showed you, take 1,3 sec (but often it takes several minutes - which makes me wonder). So correct me if I'm wrong here: my theory is, that I have too many too slow update queries, that then often end up in a situation, where they "wait" for each other to finish, hence the sometimes VERY long execution times. So my basic idea here is, that if I could reduce the cost of the updates, then I could get a hight throughput overall.

Here is a sample of the pgBadger analysis:

Queries that took up the most time (N) ^
Rank Total duration Times executed Av. duration (s) Query
1 1d15h28m38.71s
948,711
0.15s
COMMIT;

2 1d2h17m55.43s
401,002
0.24s
INSERT INTO "car_images" ( "car_id", "created_at", "image", "updated_at" ) VALUES ( '', '', '', '' ) returning "id";

3 23h18m33.68s
195,093
0.43s
SELECT DISTINCT "cars".id FROM "cars" LEFT OUTER JOIN "adverts" ON "adverts"."car_id" = "cars"."id" LEFT OUTERJOIN "sellers" ON "sellers"."id" = "adverts"."seller_id" WHERE "cars"."sales_state" = '' AND "cars"."year" = 0 AND"cars"."engine_size" = 0.0 AND ( ( "cars"."id" IS NOT NULL AND "cars"."brand" = '' AND "cars"."model_name" = ''AND "cars"."fuel" = '' AND "cars"."km" = 0 AND "cars"."price" = 0 AND "sellers"."kind" = '' ) ) LIMIT 0;

4 22h45m26.52s
3,374,133
0.02s
SELECT "adverts".* FROM "adverts" WHERE ( source_name = '' AND md5 ( url ) = md5 ( '' ) ) LIMIT 0;

5 10h31m37.18s
29,671
1.28s
UPDATE "adverts" SET "last_observed_at" = '', "data_source_id" = '' WHERE "adverts"."id" IN ( ... ) ;

6 7h18m40.65s
396,393
0.07s
UPDATE "cars" SET "updated_at" = '' WHERE "cars"."id" = 0;

7 7h6m7.87s
241,294
0.11s
UPDATE "cars" SET "images_count" = COALESCE ( "images_count", 0 ) + 0 WHERE "cars"."id" = 0;

8 6h56m11.78s
84,571
0.30s
INSERT INTO "failed_adverts" ( "active_record_object_class", "advert_candidate", "created_at", "exception_class","exception_message", "from_rescraper", "last_retried_at", "retry_count", "source_name", "stack_trace","updated_at", "url" ) VALUES ( NULL, '', '', '', '', NULL, NULL, '', '', '', '', '' ) returning "id";

9 5h47m25.45s
188,402
0.11s
INSERT INTO "adverts" ( "availability_state", "car_id", "created_at", "data_source_id", "deactivated_at","first_extraction", "last_observed_at", "price", "seller_id", "source_id", "source_name", "updated_at", "url" )VALUES ( '', '', '', '', NULL, '', '', '', '', '', '', '', '' ) returning "id";

10 3h4m26.86s
166,235
0.07s
UPDATE "adverts" SET "deactivated_at" = '', "availability_state" = '', "updated_at" = '' WHERE "adverts"."id" = 0;

(Yes I'm already on the task of improving the selects)

Den 28/11/2012 kl. 16.11 skrev Willem Leenen <willem_leenen@hotmail.com>:

I assume that SQL databases ( Banks? Telecom?) can handle an used car shop. No need for an unstructured data tool.

> +1, sql databases has limited number of inserts/updates per second. Even
> with highend hardware you won't have more than XXX operations per
> second. As Thomas said, you should feed something like nosql database
> from www server and use other tool to do aggregation and batch inserts
> to postgresql. It will scale much better.
>
> Marcin

Re: Optimize update query

From

Bèrto ëd Sèra

Date:

28 November 2012, 16:29:27

max_connections = 1000 looks bad... why not a pooler in place?
Cheers
Bèrto

On 28 November 2012 16:19, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> max_connections = 1000



--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a
darkened room munching pills and listening to repetitive music.

Re: Optimize update query

From

Shaun Thomas

Date:

28 November 2012, 16:54:14

On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote:

> https://rpm.newrelic.com/public/charts/h2dtedghfsv

Doesn't this answer your question?

That iowait is crushing your server into the ground. It's no surprise
updates are taking several seconds. That update you sent us *should*
execute on the order of only a few milliseconds.

So I'll reiterate that you *must* move your pg_xlog location elsewhere.
You've got row lookup bandwidth conflicting with writes. There are a
couple other changes you should probably make to your config:

> checkpoint_segments = 16

This is not enough for the workload you describe. Every time the
database checkpoints, all of those changes in pg_xlog are applied to the
backend data files. You should set these values:

checkpoint_segments = 100
checkpoint_timeout = 10m
checkpoint_completion_target = 0.9

This will reduce your overall write workload, and make it less active.
Too many checkpoints massively reduce write throughput. With the
settings you have, it's probably checkpointing constantly while your
load runs. Start with this, but experiment with increasing
checkpoint_segments further.

If you check your logs now, you probably see a ton of "checkpoint
starting: xlog" in there. That's very bad. It should say "checkpoint
starting: time" meaning it's keeping up with your writes naturally.

> work_mem = 160MB

This is probably way too high. work_mem is used every sort operation in
a query. So each connection could have several of these allocated, thus
starting your system of memory which will reduce that available for page
cache. Change it to 8mb, and increase it in small increments if necessary.

> So correct me if I'm wrong here: my theory is, that I have too many
> too slow update queries, that then often end up in a situation, where
> they "wait" for each other to finish, hence the sometimes VERY long
> execution times.

Sometimes this is the case, but for you, you're running into IO
contention, not lock contention. Your 3TB RAID-1 is simply insufficient
for this workload.

If you check your logs after making the changes I've suggested, take a
look at your checkpoint sync times. That will tell you how long it took
the kernel to physically commit those blocks to disk and get a
confirmation back from the controller. If those take longer than a
second or two, you're probably running into controller buffer overflows.
You have a large amount of RAM, so you should also make these two kernel
changes to sysctl.conf:

vm.dirty_ratio = 10
vm.dirty_writeback_ratio = 1

Then run this:

sysctl -p

This will help prevent large IO write spikes caused when the kernel
decides to write out dirty memory. That can make checkpoints take
minutes to commit in some cases, which basically stops all write traffic
to your database entirely.

That should get you going, anyway. You still need more/better disks so
you can move your pg_xlog directory. With your write load, that will
make a huge difference.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Optimize update query

From

Shaun Thomas

Date:

28 November 2012, 18:01:48

On 11/28/2012 11:44 AM, Niels Kristian Schjødt wrote:

> Thanks a lot - on the server I already have one additional SSD 250gb
> disk, that I don't use for anything at the moment.

Goooood. An SSD would actually be better for your data, as it follows
more random access patterns, and xlogs are more sequential. But it's
better than nothing.

And yes, you'd be better off with a RAID-1 of two of these SSDs, because
the xlogs are critical to database health. You have your archived copy
due to the rsync, which helps. But if you had a crash, there could
potentially be a need to replay unarchived transaction logs, and you'd
end up with some data loss.

> BTW. as you might have seen from the .conf I have a second slave
> server with the exact same setup, which currently runs as a hot
> streaming replication slave. I might ask a stupid question here, but
> this does not affect the performance of the master does it?

Only if you're using synchronous replication. From what I saw in the
config, that isn't the case.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Optimize update query

From

Niels Kristian Schjødt

Date:

29 November 2012, 03:32:19

Hi, I have started to implement your suggestions . I have a small error so far though. The "vm.dirty_writeback_ratio =
1"command rerurns:  
  error: "vm.dirty_writeback_ratio" is an unknown key
I'm on ubuntu 12.04


Den 28/11/2012 kl. 17.54 skrev Shaun Thomas <sthomas@optionshouse.com>:

> On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote:
>
>> https://rpm.newrelic.com/public/charts/h2dtedghfsv
>
> Doesn't this answer your question?
>
> That iowait is crushing your server into the ground. It's no surprise updates are taking several seconds. That update
yousent us *should* execute on the order of only a few milliseconds. 
>
> So I'll reiterate that you *must* move your pg_xlog location elsewhere. You've got row lookup bandwidth conflicting
withwrites. There are a couple other changes you should probably make to your config: 
>
>> checkpoint_segments = 16
>
> This is not enough for the workload you describe. Every time the database checkpoints, all of those changes in
pg_xlogare applied to the backend data files. You should set these values: 
>
> checkpoint_segments = 100
> checkpoint_timeout = 10m
> checkpoint_completion_target = 0.9
>
> This will reduce your overall write workload, and make it less active. Too many checkpoints massively reduce write
throughput.With the settings you have, it's probably checkpointing constantly while your load runs. Start with this,
butexperiment with increasing checkpoint_segments further. 
>
> If you check your logs now, you probably see a ton of "checkpoint starting: xlog" in there. That's very bad. It
shouldsay "checkpoint starting: time" meaning it's keeping up with your writes naturally. 
>
>> work_mem = 160MB
>
> This is probably way too high. work_mem is used every sort operation in a query. So each connection could have
severalof these allocated, thus starting your system of memory which will reduce that available for page cache. Change
itto 8mb, and increase it in small increments if necessary. 
>
>> So correct me if I'm wrong here: my theory is, that I have too many
>> too slow update queries, that then often end up in a situation, where
>> they "wait" for each other to finish, hence the sometimes VERY long
>> execution times.
>
> Sometimes this is the case, but for you, you're running into IO contention, not lock contention. Your 3TB RAID-1 is
simplyinsufficient for this workload. 
>
> If you check your logs after making the changes I've suggested, take a look at your checkpoint sync times. That will
tellyou how long it took the kernel to physically commit those blocks to disk and get a confirmation back from the
controller.If those take longer than a second or two, you're probably running into controller buffer overflows. You
havea large amount of RAM, so you should also make these two kernel changes to sysctl.conf: 
>
> vm.dirty_ratio = 10
> vm.dirty_writeback_ratio = 1
>
> Then run this:
>
> sysctl -p
>
> This will help prevent large IO write spikes caused when the kernel decides to write out dirty memory. That can make
checkpointstake minutes to commit in some cases, which basically stops all write traffic to your database entirely. 
>
> That should get you going, anyway. You still need more/better disks so you can move your pg_xlog directory. With your
writeload, that will make a huge difference. 
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas@optionshouse.com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Optimize update query

From

Mark Kirkwood

Date:

29 November 2012, 04:31:07

In later kernels these have been renamed:

Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-32-generic x86_64)

$ sysctl -a|grep dirty
vm.dirty_background_ratio = 5
vm.dirty_background_bytes = 0
vm.dirty_ratio = 10
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000

You the option of specifying either a ratio, or - more usefully for
machines with a lot of ram - bytes.

Regards

Mark

P.s: People on this list usually prefer it if you *bottom* post (i.e
reply underneath the original).

On 29/11/12 16:32, Niels Kristian Schjødt wrote:
> Hi, I have started to implement your suggestions . I have a small error so far though. The "vm.dirty_writeback_ratio
=1" command rerurns: 
>    error: "vm.dirty_writeback_ratio" is an unknown key
> I'm on ubuntu 12.04
>
>

Re: Optimize update query

From

Niels Kristian Schjødt

Date:

30 November 2012, 00:59:08

Den 28/11/2012 kl. 17.54 skrev Shaun Thomas <sthomas@optionshouse.com>:

> On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote:
>
>> https://rpm.newrelic.com/public/charts/h2dtedghfsv
>
> Doesn't this answer your question?
>
> That iowait is crushing your server into the ground. It's no surprise updates are taking several seconds. That update
yousent us *should* execute on the order of only a few milliseconds. 
>
> So I'll reiterate that you *must* move your pg_xlog location elsewhere. You've got row lookup bandwidth conflicting
withwrites. There are a couple other changes you should probably make to your config: 
>
>> checkpoint_segments = 16
>
> This is not enough for the workload you describe. Every time the database checkpoints, all of those changes in
pg_xlogare applied to the backend data files. You should set these values: 
>
> checkpoint_segments = 100
> checkpoint_timeout = 10m
> checkpoint_completion_target = 0.9
>
> This will reduce your overall write workload, and make it less active. Too many checkpoints massively reduce write
throughput.With the settings you have, it's probably checkpointing constantly while your load runs. Start with this,
butexperiment with increasing checkpoint_segments further. 
>
> If you check your logs now, you probably see a ton of "checkpoint starting: xlog" in there. That's very bad. It
shouldsay "checkpoint starting: time" meaning it's keeping up with your writes naturally. 
>
>> work_mem = 160MB
>
> This is probably way too high. work_mem is used every sort operation in a query. So each connection could have
severalof these allocated, thus starting your system of memory which will reduce that available for page cache. Change
itto 8mb, and increase it in small increments if necessary. 
>
>> So correct me if I'm wrong here: my theory is, that I have too many
>> too slow update queries, that then often end up in a situation, where
>> they "wait" for each other to finish, hence the sometimes VERY long
>> execution times.
>
> Sometimes this is the case, but for you, you're running into IO contention, not lock contention. Your 3TB RAID-1 is
simplyinsufficient for this workload. 
>
> If you check your logs after making the changes I've suggested, take a look at your checkpoint sync times. That will
tellyou how long it took the kernel to physically commit those blocks to disk and get a confirmation back from the
controller.If those take longer than a second or two, you're probably running into controller buffer overflows. You
havea large amount of RAM, so you should also make these two kernel changes to sysctl.conf: 
>
> vm.dirty_ratio = 10
> vm.dirty_writeback_ratio = 1
>
> Then run this:
>
> sysctl -p
>
> This will help prevent large IO write spikes caused when the kernel decides to write out dirty memory. That can make
checkpointstake minutes to commit in some cases, which basically stops all write traffic to your database entirely. 
>
> That should get you going, anyway. You still need more/better disks so you can move your pg_xlog directory. With your
writeload, that will make a huge difference. 
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas@optionshouse.com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Okay, now I'm done the updating as described above. I did the postgres.conf changes. I did the kernel changes, i added
twoSSD's in a software RAID1 where the pg_xlog is now located - unfortunately the the picture is still the same :-(  
When the database is under "heavy" load, there is almost no improvement to see in the performance compared to before
thechanges. A lot of both read and writes takes more than a 1000 times as long as they usually do, under "lighter"
overallload.  

I added All the overview charts I can get hold on from new relic beneath. What am I overlooking? There must be an
obviousbottleneck? Where should I dive in? 

Database server CPU usage
https://rpm.newrelic.com/public/charts/cEdIvvoQZCr

Database server load average
https://rpm.newrelic.com/public/charts/cMNdrYW51QJ

Database server physical memory
https://rpm.newrelic.com/public/charts/c3dZBntNpa1

Database server disk I/O utulization
https://rpm.newrelic.com/public/charts/9YEVw6RekFG

Database server network I/O (Mb/s)
https://rpm.newrelic.com/public/charts/lKiZ0Szmwe7

Top 5 database operations by wall clock time
https://rpm.newrelic.com/public/charts/dCt45YH12FK

Database throughput
https://rpm.newrelic.com/public/charts/bIbtQ1mDzMI

Database response time
https://rpm.newrelic.com/public/charts/fPcNL8WA6xx

Re: Optimize update query

From

"Kevin Grittner"

Date:

30 November 2012, 01:24:34

Niels Kristian Schjødt wrote:

> Okay, now I'm done the updating as described above. I did the
> postgres.conf changes. I did the kernel changes, i added two
> SSD's in a software RAID1 where the pg_xlog is now located -
> unfortunately the the picture is still the same :-(

You said before that you were seeing high disk wait numbers. Now it
is zero accourding to your disk utilization graph. That sounds like
a change to me.

> When the database is under "heavy" load, there is almost no
> improvement to see in the performance compared to before the
> changes.

In client-visible response time and throughput, I assume, not
resource usage numbers?

> A lot of both read and writes takes more than a 1000 times as
> long as they usually do, under "lighter" overall load.

As an odd coincidence, you showed your max_connections setting to
be 1000.

http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

-Kevin

Re: Optimize update query

From

Niels Kristian Schjødt

Date:

30 November 2012, 01:43:08

Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com>:

> Niels Kristian Schjødt wrote:
>
>> Okay, now I'm done the updating as described above. I did the
>> postgres.conf changes. I did the kernel changes, i added two
>> SSD's in a software RAID1 where the pg_xlog is now located -
>> unfortunately the the picture is still the same :-(
>
> You said before that you were seeing high disk wait numbers. Now it
> is zero accourding to your disk utilization graph. That sounds like
> a change to me.
>
>> When the database is under "heavy" load, there is almost no
>> improvement to see in the performance compared to before the
>> changes.
>
> In client-visible response time and throughput, I assume, not
> resource usage numbers?
>
>> A lot of both read and writes takes more than a 1000 times as
>> long as they usually do, under "lighter" overall load.
>
> As an odd coincidence, you showed your max_connections setting to
> be 1000.
>
> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>
> -Kevin

Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot of I/O" it was CPU I/O, it also states that in the
chartin the link.  
However, as I'm not very familiar with these deep down database and server things, I had no idea wether a disk bottle
neckcould hide in this I/O, so i went along with Shauns great help, that unfortunately didn't solve my issues.  
Back to the issue: Could it be that it is the fact that I'm using ubuntus built in software raid to raid my disks, and
thatit is not at all capable of handling the throughput?

Re: Optimize update query

From

Niels Kristian Schjødt

Date:

30 November 2012, 02:32:41

Hmm I'm getting suspicious here. Maybe my new great setup with the SSD's is not really working as it should., and maybe
newrelic is not monitoring as It should. 

If I do a "sudo iostat -k 1"
I get a lot of output like this:
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc             546.00      2296.00      6808.00       2296       6808
sdd             593.00      1040.00      7416.00       1040       7416
md1               0.00         0.00         0.00          0          0
md0               0.00         0.00         0.00          0          0
md2            1398.00      3328.00     13064.00       3328      13064
md3               0.00         0.00         0.00          0          0

The storage thing is, that the sda and sdb is the SSD drives and the sdc and sdd is the HDD drives. The md0, md1 and
md2is the raid arrays on the HDD's and the md3 is the raid on the SSD's. Neither of the md3 or the SSD's are getting
utilized- and I should expect that since they are serving my pg_xlog right? - so maybe I did something wrong in the
setup.Here is the path I followed: 

# 1) First setup the SSD drives in a software RAID1 setup:
#   http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1
#
# 2) Then move the postgres pg_xlog dir
#   sudo /etc/init.d/postgresql-9.2 stop
#   sudo mkdir -p /ssd/pg_xlog
#   sudo chown -R  postgres.postgres /ssd/pg_xlog
#   sudo chmod 700 /ssd/pg_xlog
#   sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
#   sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old
#   sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
#   sudo /etc/init.d/postgresql-9.2 start

Can you spot something wrong?



Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt <nielskristian@autouncle.com>:

> Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com>:
>
>> Niels Kristian Schjødt wrote:
>>
>>> Okay, now I'm done the updating as described above. I did the
>>> postgres.conf changes. I did the kernel changes, i added two
>>> SSD's in a software RAID1 where the pg_xlog is now located -
>>> unfortunately the the picture is still the same :-(
>>
>> You said before that you were seeing high disk wait numbers. Now it
>> is zero accourding to your disk utilization graph. That sounds like
>> a change to me.
>>
>>> When the database is under "heavy" load, there is almost no
>>> improvement to see in the performance compared to before the
>>> changes.
>>
>> In client-visible response time and throughput, I assume, not
>> resource usage numbers?
>>
>>> A lot of both read and writes takes more than a 1000 times as
>>> long as they usually do, under "lighter" overall load.
>>
>> As an odd coincidence, you showed your max_connections setting to
>> be 1000.
>>
>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>>
>> -Kevin
>
> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot of I/O" it was CPU I/O, it also states that in the
chartin the link.  
> However, as I'm not very familiar with these deep down database and server things, I had no idea wether a disk bottle
neckcould hide in this I/O, so i went along with Shauns great help, that unfortunately didn't solve my issues.  
> Back to the issue: Could it be that it is the fact that I'm using ubuntus built in software raid to raid my disks,
andthat it is not at all capable of handling the throughput? 
>

Re: Optimize update query

From

Vitalii Tymchyshyn

Date:

30 November 2012, 08:38:02

Actually, what's the point in putting logs to ssd? SSDs are good for random access and logs are accessed sequentially. I'd put table spaces on ssd and leave logs on hdd

30 лист. 2012 04:33, "Niels Kristian Schjødt" <nielskristian@autouncle.com> напис.

Hmm I'm getting suspicious here. Maybe my new great setup with the SSD's is not really working as it should., and maybe new relic is not monitoring as It should.

If I do a "sudo iostat -k 1"
I get a lot of output like this:
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 546.00 2296.00 6808.00 2296 6808
sdd 593.00 1040.00 7416.00 1040 7416
md1 0.00 0.00 0.00 0 0
md0 0.00 0.00 0.00 0 0
md2 1398.00 3328.00 13064.00 3328 13064
md3 0.00 0.00 0.00 0 0

The storage thing is, that the sda and sdb is the SSD drives and the sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid arrays on the HDD's and the md3 is the raid on the SSD's. Neither of the md3 or the SSD's are getting utilized - and I should expect that since they are serving my pg_xlog right? - so maybe I did something wrong in the setup. Here is the path I followed:

# 1) First setup the SSD drives in a software RAID1 setup:
# http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1
#
# 2) Then move the postgres pg_xlog dir
# sudo /etc/init.d/postgresql-9.2 stop
# sudo mkdir -p /ssd/pg_xlog
# sudo chown -R postgres.postgres /ssd/pg_xlog
# sudo chmod 700 /ssd/pg_xlog
# sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
# sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old
# sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
# sudo /etc/init.d/postgresql-9.2 start

Can you spot something wrong?

Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt <nielskristian@autouncle.com>:

> Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com>:
>
>> Niels Kristian Schjødt wrote:
>>
>>> Okay, now I'm done the updating as described above. I did the
>>> postgres.conf changes. I did the kernel changes, i added two
>>> SSD's in a software RAID1 where the pg_xlog is now located -
>>> unfortunately the the picture is still the same :-(
>>
>> You said before that you were seeing high disk wait numbers. Now it
>> is zero accourding to your disk utilization graph. That sounds like
>> a change to me.
>>
>>> When the database is under "heavy" load, there is almost no
>>> improvement to see in the performance compared to before the
>>> changes.
>>
>> In client-visible response time and throughput, I assume, not
>> resource usage numbers?
>>
>>> A lot of both read and writes takes more than a 1000 times as
>>> long as they usually do, under "lighter" overall load.
>>
>> As an odd coincidence, you showed your max_connections setting to
>> be 1000.
>>
>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>>
>> -Kevin
>
> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot of I/O" it was CPU I/O, it also states that in the chart in the link.
> However, as I'm not very familiar with these deep down database and server things, I had no idea wether a disk bottle neck could hide in this I/O, so i went along with Shauns great help, that unfortunately didn't solve my issues.
> Back to the issue: Could it be that it is the fact that I'm using ubuntus built in software raid to raid my disks, and that it is not at all capable of handling the throughput?
>

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Optimize update query

From

Mark Kirkwood

Date:

30 November 2012, 09:19:53

Most modern SSD are much faster for fsync type operations than a
spinning disk - similar performance to spinning disk + writeback raid
controller + battery.

However as you mention, they are great at random IO too, so Niels, it
might be worth putting your postgres logs *and* data on the SSDs and
retesting.

Regards

Mark



On 30/11/12 21:37, Vitalii Tymchyshyn wrote:
> Actually, what's the point in putting logs to ssd? SSDs are good for
> random access and logs are accessed sequentially. I'd put table spaces
> on ssd and leave logs on hdd
>
> 30 лист. 2012 04:33, "Niels Kristian Schjødt"
> <nielskristian@autouncle.com <mailto:nielskristian@autouncle.com>> напис.
>
>     Hmm I'm getting suspicious here. Maybe my new great setup with the
>     SSD's is not really working as it should., and maybe new relic is
>     not monitoring as It should.
>
>     If I do a "sudo iostat -k 1"
>     I get a lot of output like this:
>     Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
>     sda               0.00         0.00         0.00          0          0
>     sdb               0.00         0.00         0.00          0          0
>     sdc             546.00      2296.00      6808.00       2296       6808
>     sdd             593.00      1040.00      7416.00       1040       7416
>     md1               0.00         0.00         0.00          0          0
>     md0               0.00         0.00         0.00          0          0
>     md2            1398.00      3328.00     13064.00       3328      13064
>     md3               0.00         0.00         0.00          0          0
>
>     The storage thing is, that the sda and sdb is the SSD drives and the
>     sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid
>     arrays on the HDD's and the md3 is the raid on the SSD's. Neither of
>     the md3 or the SSD's are getting utilized - and I should expect that
>     since they are serving my pg_xlog right? - so maybe I did something
>     wrong in the setup. Here is the path I followed:
>
>     # 1) First setup the SSD drives in a software RAID1 setup:
>     #
>     http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1
>     #
>     # 2) Then move the postgres pg_xlog dir
>     #   sudo /etc/init.d/postgresql-9.2 stop
>     #   sudo mkdir -p /ssd/pg_xlog
>     #   sudo chown -R  postgres.postgres /ssd/pg_xlog
>     #   sudo chmod 700 /ssd/pg_xlog
>     #   sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
>     #   sudo mv /var/lib/postgresql/9.2/main/pg_xlog
>     /var/lib/postgresql/9.2/main/pg_xlog_old
>     #   sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
>     #   sudo /etc/init.d/postgresql-9.2 start
>
>     Can you spot something wrong?
>
>
>
>     Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt
>     <nielskristian@autouncle.com <mailto:nielskristian@autouncle.com>>:
>
>      > Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com
>     <mailto:kgrittn@mail.com>>:
>      >
>      >> Niels Kristian Schjødt wrote:
>      >>
>      >>> Okay, now I'm done the updating as described above. I did the
>      >>> postgres.conf changes. I did the kernel changes, i added two
>      >>> SSD's in a software RAID1 where the pg_xlog is now located -
>      >>> unfortunately the the picture is still the same :-(
>      >>
>      >> You said before that you were seeing high disk wait numbers. Now it
>      >> is zero accourding to your disk utilization graph. That sounds like
>      >> a change to me.
>      >>
>      >>> When the database is under "heavy" load, there is almost no
>      >>> improvement to see in the performance compared to before the
>      >>> changes.
>      >>
>      >> In client-visible response time and throughput, I assume, not
>      >> resource usage numbers?
>      >>
>      >>> A lot of both read and writes takes more than a 1000 times as
>      >>> long as they usually do, under "lighter" overall load.
>      >>
>      >> As an odd coincidence, you showed your max_connections setting to
>      >> be 1000.
>      >>
>      >> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>      >>
>      >> -Kevin
>      >
>      > Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot
>     of I/O" it was CPU I/O, it also states that in the chart in the link.
>      > However, as I'm not very familiar with these deep down database
>     and server things, I had no idea wether a disk bottle neck could
>     hide in this I/O, so i went along with Shauns great help, that
>     unfortunately didn't solve my issues.
>      > Back to the issue: Could it be that it is the fact that I'm using
>     ubuntus built in software raid to raid my disks, and that it is not
>     at all capable of handling the throughput?
>      >
>
>
>
>     --
>     Sent via pgsql-performance mailing list
>     (pgsql-performance@postgresql.org
>     <mailto:pgsql-performance@postgresql.org>)
>     To make changes to your subscription:
>     http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Optimize update query

From

Mark Kirkwood

Date:

30 November 2012, 09:39:03

When I try your command sequence I end up with the contents of the new
pg_xlog owned by root. Postgres will not start:

PANIC:  could not open file "pg_xlog/000000010000000600000080" (log file
6, segment 128): Permission denied

While this is fixable, I suspect you have managed to leave the xlogs
directory that postgres is actually using on the HDD drives.


When I do this I normally do:
$   service postgresql stop
$   sudo mkdir -p /ssd/pg_xlog
$   sudo chown -R  postgres.postgres /ssd/pg_xlog
$   sudo chmod 700 /ssd/pg_xlog
$   sudo su - postgres
postgres $   mv /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
postgres $   rmdir /var/lib/postgresql/9.2/main/pg_xlog
postgres $   ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
postgres $   service postgresql start

regards

Mark

On 30/11/12 15:32, Niels Kristian Schjødt wrote:
> Hmm I'm getting suspicious here. Maybe my new great setup with the SSD's is not really working as it should., and
maybenew relic is not monitoring as It should. 
>
> If I do a "sudo iostat -k 1"
> I get a lot of output like this:
> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
> sda               0.00         0.00         0.00          0          0
> sdb               0.00         0.00         0.00          0          0
> sdc             546.00      2296.00      6808.00       2296       6808
> sdd             593.00      1040.00      7416.00       1040       7416
> md1               0.00         0.00         0.00          0          0
> md0               0.00         0.00         0.00          0          0
> md2            1398.00      3328.00     13064.00       3328      13064
> md3               0.00         0.00         0.00          0          0
>
> The storage thing is, that the sda and sdb is the SSD drives and the sdc and sdd is the HDD drives. The md0, md1 and
md2is the raid arrays on the HDD's and the md3 is the raid on the SSD's. Neither of the md3 or the SSD's are getting
utilized- and I should expect that since they are serving my pg_xlog right? - so maybe I did something wrong in the
setup.Here is the path I followed: 
>
> # 1) First setup the SSD drives in a software RAID1 setup:
> #   http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1
> #
> # 2) Then move the postgres pg_xlog dir
> #   sudo /etc/init.d/postgresql-9.2 stop
> #   sudo mkdir -p /ssd/pg_xlog
> #   sudo chown -R  postgres.postgres /ssd/pg_xlog
> #   sudo chmod 700 /ssd/pg_xlog
> #   sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
> #   sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old
> #   sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
> #   sudo /etc/init.d/postgresql-9.2 start
>
> Can you spot something wrong?
>
>
>
> Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt <nielskristian@autouncle.com>:
>
>> Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com>:
>>
>>> Niels Kristian Schjødt wrote:
>>>
>>>> Okay, now I'm done the updating as described above. I did the
>>>> postgres.conf changes. I did the kernel changes, i added two
>>>> SSD's in a software RAID1 where the pg_xlog is now located -
>>>> unfortunately the the picture is still the same :-(
>>>
>>> You said before that you were seeing high disk wait numbers. Now it
>>> is zero accourding to your disk utilization graph. That sounds like
>>> a change to me.
>>>
>>>> When the database is under "heavy" load, there is almost no
>>>> improvement to see in the performance compared to before the
>>>> changes.
>>>
>>> In client-visible response time and throughput, I assume, not
>>> resource usage numbers?
>>>
>>>> A lot of both read and writes takes more than a 1000 times as
>>>> long as they usually do, under "lighter" overall load.
>>>
>>> As an odd coincidence, you showed your max_connections setting to
>>> be 1000.
>>>
>>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>>>
>>> -Kevin
>>
>> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot of I/O" it was CPU I/O, it also states that in the
chartin the link. 
>> However, as I'm not very familiar with these deep down database and server things, I had no idea wether a disk
bottleneck could hide in this I/O, so i went along with Shauns great help, that unfortunately didn't solve my issues. 
>> Back to the issue: Could it be that it is the fact that I'm using ubuntus built in software raid to raid my disks,
andthat it is not at all capable of handling the throughput? 
>>
>
>
>

Re: Optimize update query

From

Vitalii Tymchyshyn

Date:

30 November 2012, 10:07:59

Oh, yes. I don't imagine DB server without RAID+BBU :)

When there is no BBU, SSD can be handy.

But you know, SSD is worse in linear read/write than HDD.

Best regards, Vitalii Tymchyshyn

2012/11/30 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>

Most modern SSD are much faster for fsync type operations than a spinning disk - similar performance to spinning disk + writeback raid controller + battery.

However as you mention, they are great at random IO too, so Niels, it might be worth putting your postgres logs *and* data on the SSDs and retesting.

Regards

Mark

On 30/11/12 21:37, Vitalii Tymchyshyn wrote:
Actually, what's the point in putting logs to ssd? SSDs are good for
random access and logs are accessed sequentially. I'd put table spaces
on ssd and leave logs on hdd

30 лист. 2012 04:33, "Niels Kristian Schjødt"
<nielskristian@autouncle.com <mailto:nielskristian@autouncle.com>> напис.

Hmm I'm getting suspicious here. Maybe my new great setup with the
SSD's is not really working as it should., and maybe new relic is
not monitoring as It should.

If I do a "sudo iostat -k 1"
I get a lot of output like this:
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 546.00 2296.00 6808.00 2296 6808
sdd 593.00 1040.00 7416.00 1040 7416
md1 0.00 0.00 0.00 0 0
md0 0.00 0.00 0.00 0 0
md2 1398.00 3328.00 13064.00 3328 13064
md3 0.00 0.00 0.00 0 0

The storage thing is, that the sda and sdb is the SSD drives and the
sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid
arrays on the HDD's and the md3 is the raid on the SSD's. Neither of
the md3 or the SSD's are getting utilized - and I should expect that
since they are serving my pg_xlog right? - so maybe I did something
wrong in the setup. Here is the path I followed:

# 1) First setup the SSD drives in a software RAID1 setup:
#
http://askubuntu.com/questions/223194/setup-of-two-additional-ssd-drives-in-raid-1
#
# 2) Then move the postgres pg_xlog dir
# sudo /etc/init.d/postgresql-9.2 stop
# sudo mkdir -p /ssd/pg_xlog
# sudo chown -R postgres.postgres /ssd/pg_xlog
# sudo chmod 700 /ssd/pg_xlog
# sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
# sudo mv /var/lib/postgresql/9.2/main/pg_xlog
/var/lib/postgresql/9.2/main/pg_xlog_old
# sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
# sudo /etc/init.d/postgresql-9.2 start

Can you spot something wrong?

Den 30/11/2012 kl. 02.43 skrev Niels Kristian Schjødt
<nielskristian@autouncle.com <mailto:nielskristian@autouncle.com>>:

> Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" <kgrittn@mail.com
<mailto:kgrittn@mail.com>>:

>
>> Niels Kristian Schjødt wrote:
>>
>>> Okay, now I'm done the updating as described above. I did the
>>> postgres.conf changes. I did the kernel changes, i added two
>>> SSD's in a software RAID1 where the pg_xlog is now located -
>>> unfortunately the the picture is still the same :-(
>>
>> You said before that you were seeing high disk wait numbers. Now it
>> is zero accourding to your disk utilization graph. That sounds like
>> a change to me.
>>
>>> When the database is under "heavy" load, there is almost no
>>> improvement to see in the performance compared to before the
>>> changes.
>>
>> In client-visible response time and throughput, I assume, not
>> resource usage numbers?
>>
>>> A lot of both read and writes takes more than a 1000 times as
>>> long as they usually do, under "lighter" overall load.
>>
>> As an odd coincidence, you showed your max_connections setting to
>> be 1000.
>>
>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>>
>> -Kevin
>
> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot
of I/O" it was CPU I/O, it also states that in the chart in the link.
> However, as I'm not very familiar with these deep down database
and server things, I had no idea wether a disk bottle neck could
hide in this I/O, so i went along with Shauns great help, that
unfortunately didn't solve my issues.
> Back to the issue: Could it be that it is the fact that I'm using
ubuntus built in software raid to raid my disks, and that it is not
at all capable of handling the throughput?
>

--
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org
<mailto:pgsql-performance@postgresql.org>)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

--
Best regards,
Vitalii Tymchyshyn

Re: Optimize update query

From

Willem Leenen

Date:

30 November 2012, 10:14:26

Actually, what's the point in putting logs to ssd? SSDs are good for random access and logs are accessed sequentially. I'd put table spaces on ssd and leave logs on hdd

30 лист. 2012 04:33, "Niels Kristian Schjødt" <nielskristian@autouncle.com> напис.

Because SSD's are considered faster. Then you have to put the most phyisical IO intensive operations on SSD. For the majority of databases, these are the logfiles. But you should investigate where the optimum is for your situation.

Re: Optimize update query

From

Vitalii Tymchyshyn

Date:

30 November 2012, 10:31:55

SSDs are not faster for sequential IO as I know. That's why (with BBU or synchronious_commit=off) I prefer to have logs on regular HDDs.

Best reag

2012/11/30 Willem Leenen <willem_leenen@hotmail.com>

Actually, what's the point in putting logs to ssd? SSDs are good for random access and logs are accessed sequentially. I'd put table spaces on ssd and leave logs on hdd
30 лист. 2012 04:33, "Niels Kristian Schjødt" <nielskristian@autouncle.com> напис.
Because SSD's are considered faster. Then you have to put the most phyisical IO intensive operations on SSD. For the majority of databases, these are the logfiles. But you should investigate where the optimum is for your situation.

--
Best regards,
Vitalii Tymchyshyn

Re: Optimize update query

From

"Kevin Grittner"

Date:

30 November 2012, 12:04:04

Niels Kristian Schjødt wrote:

>> You said before that you were seeing high disk wait numbers. Now
>> it is zero accourding to your disk utilization graph. That
>> sounds like a change to me.

> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot
> of I/O" it was CPU I/O

>>> A lot of both read and writes takes more than a 1000 times as
>>> long as they usually do, under "lighter" overall load.
>>
>> As an odd coincidence, you showed your max_connections setting
>> to be 1000.
>>
>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

> Back to the issue: Could it be that it is the fact that I'm using
> ubuntus built in software raid to raid my disks, and that it is
> not at all capable of handling the throughput?

For high performance situations I would always use a high quality
RAID controller with battery-backed RAM configured for write-back;
however:

The graphs you included suggest that your problem has nothing to do
with your storage system. Now maybe you didn't capture the data for
the graphs while the problem was occurring, in which case the
graphs would be absolutely useless; but based on what slim data you
have provided, you need a connection pool (like maybe pgbouncer
configured in transaction mode) to limit the number of database
connections used to something like twice the number of cores.

If you still have problems, pick the query which is using the most
time on your database server, and post it with the information
suggested on this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

Re: Optimize update query

From

Niels Kristian Schjødt

Date:

30 November 2012, 13:31:42

Okay, So to understand this better before I go with that solution:
In theory what difference should it make to the performance, to have a pool in front of the database, that all my
workersand web servers connect to instead of connecting directly? Where is the performance gain coming from in that
situation?

Den 30/11/2012 kl. 13.03 skrev "Kevin Grittner" <kgrittn@mail.com>:

> Niels Kristian Schjødt wrote:
>
>>> You said before that you were seeing high disk wait numbers. Now
>>> it is zero accourding to your disk utilization graph. That
>>> sounds like a change to me.
>
>> Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot
>> of I/O" it was CPU I/O
>
>>>> A lot of both read and writes takes more than a 1000 times as
>>>> long as they usually do, under "lighter" overall load.
>>>
>>> As an odd coincidence, you showed your max_connections setting
>>> to be 1000.
>>>
>>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>
>> Back to the issue: Could it be that it is the fact that I'm using
>> ubuntus built in software raid to raid my disks, and that it is
>> not at all capable of handling the throughput?
>
> For high performance situations I would always use a high quality
> RAID controller with battery-backed RAM configured for write-back;
> however:
>
> The graphs you included suggest that your problem has nothing to do
> with your storage system. Now maybe you didn't capture the data for
> the graphs while the problem was occurring, in which case the
> graphs would be absolutely useless; but based on what slim data you
> have provided, you need a connection pool (like maybe pgbouncer
> configured in transaction mode) to limit the number of database
> connections used to something like twice the number of cores.
>
> If you still have problems, pick the query which is using the most
> time on your database server, and post it with the information
> suggested on this page:
>
> http://wiki.postgresql.org/wiki/SlowQueryQuestions
>
> -Kevin

Re: Optimize update query

From

Shaun Thomas

Date:

30 November 2012, 13:49:13

On 11/30/2012 07:31 AM, Niels Kristian Schjødt wrote:

> In theory what difference should it make to the performance, to have
> a pool in front of the database, that all my workers and web servers
> connect to instead of connecting directly? Where is the performance
> gain coming from in that situation?

If you have several more connections than you have processors, the
database does a *lot* more context switching, and among other things,
that drastically reduces PG performance. On a testbed, I can get over
150k transactions per second on PG 9.1 with a 1-1 relationship between
CPU and client. Increase that to a few hundred, and my TPS drops down to
30k. Simply having the clients there kills performance.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Optimize update query

From

Shaun Thomas

Date:

30 November 2012, 14:02:47

On 11/29/2012 08:32 PM, Niels Kristian Schjødt wrote:

> If I do a "sudo iostat -k 1"
> I get a lot of output like this:
> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
> sda               0.00         0.00         0.00          0          0
> sdb               0.00         0.00         0.00          0          0
> sdc             546.00      2296.00      6808.00       2296       6808
> sdd             593.00      1040.00      7416.00       1040       7416
> md1               0.00         0.00         0.00          0          0
> md0               0.00         0.00         0.00          0          0
> md2            1398.00      3328.00     13064.00       3328      13064
> md3               0.00         0.00         0.00          0          0
>

> The storage thing is, that the sda and sdb is the SSD drives and the
> sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid
> arrays on the HDD's and the md3 is the raid on the SSD's. Neither of
> the md3 or the SSD's are getting utilized - and I should expect that
> since they are serving my pg_xlog right?

No, that's right. They are, but it would appear that the majority of
your traffic actually isn't due to transaction logs like I'd suspected.
If you get a chance, could you monitor the contents of:

/var/lib/postgresql/9.2/main/base/pgsql_tmp

Your main drives are getting way, way more writes than they should. 13MB
per second is ridiculous even under heavy write loads. Based on the TPS
count, you're basically saturating the ability of those two 3TB drives.
Those writes have to be coming from somewhere.

> #   sudo mkdir -p /ssd/pg_xlog

This is going to sound stupid, but are you *sure* the SSD is mounted at
/ssd ?

> #   sudo chown -R  postgres.postgres /ssd/pg_xlog
> #   sudo chmod 700 /ssd/pg_xlog
> #   sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
> #   sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old
> #   sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
> #   sudo /etc/init.d/postgresql-9.2 start

The rest of this is fine, except that you probably should have added:

sudo chown -R postgres:postgres /ssd/pg_xlog/*


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Optimize update query

From

Shaun Thomas

Date:

30 November 2012, 14:06:55

On 11/30/2012 02:37 AM, Vitalii Tymchyshyn wrote:

> Actually, what's the point in putting logs to ssd? SSDs are good for
> random access and logs are accessed sequentially.

While this is true, Niels' problem is that his regular HDs are getting
saturated. In that case, moving any activity off of them is an improvement.

Why not move the data to the SSDs, you ask? Because he bought two 3TB
drives. The assumption here is that a 256GB SSD will not have enough
space for the long-term lifespan of this database.

Either way, based on the iostat activity he posted, clearly there's some
other write stream happening we're not privy to.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Optimize update query

From

Niels Kristian Schjødt

Date:

30 November 2012, 14:49:06

Den 30/11/2012 kl. 15.02 skrev Shaun Thomas <sthomas@optionshouse.com>:

> On 11/29/2012 08:32 PM, Niels Kristian Schjødt wrote:
>
>> If I do a "sudo iostat -k 1"
>> I get a lot of output like this:
>> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
>> sda               0.00         0.00         0.00          0          0
>> sdb               0.00         0.00         0.00          0          0
>> sdc             546.00      2296.00      6808.00       2296       6808
>> sdd             593.00      1040.00      7416.00       1040       7416
>> md1               0.00         0.00         0.00          0          0
>> md0               0.00         0.00         0.00          0          0
>> md2            1398.00      3328.00     13064.00       3328      13064
>> md3               0.00         0.00         0.00          0          0
>>
>
>> The storage thing is, that the sda and sdb is the SSD drives and the
>> sdc and sdd is the HDD drives. The md0, md1 and md2 is the raid
>> arrays on the HDD's and the md3 is the raid on the SSD's. Neither of
>> the md3 or the SSD's are getting utilized - and I should expect that
>> since they are serving my pg_xlog right?
>
> No, that's right. They are, but it would appear that the majority of your traffic actually isn't due to transaction
logslike I'd suspected. If you get a chance, could you monitor the contents of: 
>
> /var/lib/postgresql/9.2/main/base/pgsql_tmp
>
> Your main drives are getting way, way more writes than they should. 13MB per second is ridiculous even under heavy
writeloads. Based on the TPS count, you're basically saturating the ability of those two 3TB drives. Those writes have
tobe coming from somewhere. 
>
>> #   sudo mkdir -p /ssd/pg_xlog
>
> This is going to sound stupid, but are you *sure* the SSD is mounted at /ssd ?
>
>> #   sudo chown -R  postgres.postgres /ssd/pg_xlog
>> #   sudo chmod 700 /ssd/pg_xlog
>> #   sudo cp -rf /var/lib/postgresql/9.2/main/pg_xlog/* /ssd/pg_xlog
>> #   sudo mv /var/lib/postgresql/9.2/main/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog_old
>> #   sudo ln -s /ssd/pg_xlog /var/lib/postgresql/9.2/main/pg_xlog
>> #   sudo /etc/init.d/postgresql-9.2 start
>
> The rest of this is fine, except that you probably should have added:
>
> sudo chown -R postgres:postgres /ssd/pg_xlog/*
>
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas@optionshouse.com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Oh my, Shaun once again you nailed it! That's what you get from working too late in the night - I forgot to run 'sudo
mount-a' I feel so embarrassed now :-( - In other words no the drive was not mounted to the /ssd dir.  
So now it is, and this has gained me a performance increase of roughly around 20% - a little less than what I would
havehoped for but still better - but anyways yes that's right. 
I still see a lot of CPU I/O when doing a lot of writes, so the question is, what's next. Should I try and go' for the
connectionpooling thing or monitor that /var/lib/postgresql/9.2/main/base/pgsql_tmp dir (and what exactly do you mean
bymonitor - size?) 

PS. comment on the "Why not move the data to the SSDs" you are exactly right. i don't think the SSD's will be big
enoughfor the data within a not too long timeframe, so that is exactly why I want to keep my data on the "big" drives. 
PPS. I talked with New Relic and it turns out there is something wrong with the disk monitoring tool, so that's why
therewas nothing in the disk charts but iostat showed a lot of activity.

Re: Optimize update query

From

Shaun Thomas

Date:

30 November 2012, 15:00:24

On 11/30/2012 08:48 AM, Niels Kristian Schjødt wrote:

> I forgot to run 'sudo mount -a' I feel so embarrassed now :-( - In
> other words no the drive was not mounted to the /ssd dir.

Yeah, that'll get ya.

> I still see a lot of CPU I/O when doing a lot of writes, so the
> question is, what's next. Should I try and go' for the connection
> pooling thing or monitor that
> /var/lib/postgresql/9.2/main/base/pgsql_tmp dir (and what exactly do
> you mean by monitor - size?)

Well, like Keven said, if you have more than a couple dozen connections
on your hardware, you're losing TPS. It's probably a good idea to
install pgbouncer or pgpool and let your clients connect to those
instead. You should see a good performance boost from that.

But what concerns me is that your previous CPU charts showed a lot of
iowait. Even with the SSD taking some of the load off your write stream,
something else is going on, there. That's why you need to monitor the
"size" in MB, or number of files, for the pgsql_tmp directory. That's
where PG puts temp files when sorts are too big for your work_mem. If
that's getting a ton of activity, that would explain some of your write
overhead.

> PPS. I talked with New Relic and it turns out there is something
> wrong with the disk monitoring tool, so that's why there was nothing
> in the disk charts but iostat showed a lot of activity.

Yeah. Next time you need to check IO, use iostat. It's not as pretty,
but it tells everything. ;) Just to help out with that, use:

iostat -dmx

That will give you extended information, including the % utilization of
your drives. TPS stats are nice, but I was just guessing your drives
were stalling out based on experience. Getting an outright percentage is
better. You should also use sar. Just a plain:

sar 1 100

Will give you a lot of info on what the CPU is doing. You want that
%iowait column to be as low as possible.

Keep us updated.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Optimize update query

From

Niels Kristian Schjødt

Date:

30 November 2012, 15:44:41

Hmm very very interesting. Currently I run at "medium" load compared to the very high loads in the night.

This is what the CPU I/O on new relic show: https://rpm.newrelic.com/public/charts/8RnSOlWjfBy

And this is what iostat shows:

Linux 3.2.0-33-generic (master-db) 11/30/2012 _x86_64_ (8 CPU)

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sda 0.00 3.46 26.62 57.06 1.66 0.68 57.41 0.04 0.43 0.77 0.28 0.09 0.73

sdb 0.03 16.85 0.01 70.26 0.00 2.35 68.36 0.06 0.81 0.21 0.81 0.10 0.73

sdc 1.96 56.37 25.45 172.56 0.53 3.72 43.98 30.83 155.70 25.15 174.96 1.74 34.46

sdd 1.83 56.52 25.48 172.42 0.52 3.72 43.90 30.50 154.11 25.66 173.09 1.74 34.37

md1 0.00 0.00 0.00 0.00 0.00 0.00 3.02 0.00 0.00 0.00 0.00 0.00 0.00

md0 0.00 0.00 0.57 0.59 0.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00

md2 0.00 0.00 54.14 227.94 1.05 3.72 34.61 0.00 0.00 0.00 0.00 0.00 0.00

md3 0.00 0.00 0.01 60.46 0.00 0.68 23.12 0.00 0.00 0.00 0.00 0.00 0.00

A little reminder md3 is the raid array of the ssd drives sda and sdb and the md0-2 is the array of the regular hdd drives sdc and sdd

The pgsql_tmp dir is not changing at all it's constantly empty (a size of 4.0K).

So It doesn't seem like the ssd drives is at all utilized but the regular drives certainly is. but now i know for sure that the /ssd is mounted correctly:

"sudo df /ssd"

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/md3 230619228 5483796 213420620 3% /ssd

Den 30/11/2012 kl. 16.00 skrev Shaun Thomas <sthomas@optionshouse.com>:

On 11/30/2012 08:48 AM, Niels Kristian Schjødt wrote:

I forgot to run 'sudo mount -a' I feel so embarrassed now :-( - In
other words no the drive was not mounted to the /ssd dir.

Yeah, that'll get ya.

I still see a lot of CPU I/O when doing a lot of writes, so the
question is, what's next. Should I try and go' for the connection
pooling thing or monitor that
/var/lib/postgresql/9.2/main/base/pgsql_tmp dir (and what exactly do
you mean by monitor - size?)

Well, like Keven said, if you have more than a couple dozen connections on your hardware, you're losing TPS. It's probably a good idea to install pgbouncer or pgpool and let your clients connect to those instead. You should see a good performance boost from that.

But what concerns me is that your previous CPU charts showed a lot of iowait. Even with the SSD taking some of the load off your write stream, something else is going on, there. That's why you need to monitor the "size" in MB, or number of files, for the pgsql_tmp directory. That's where PG puts temp files when sorts are too big for your work_mem. If that's getting a ton of activity, that would explain some of your write overhead.

PPS. I talked with New Relic and it turns out there is something
wrong with the disk monitoring tool, so that's why there was nothing
in the disk charts but iostat showed a lot of activity.

Yeah. Next time you need to check IO, use iostat. It's not as pretty, but it tells everything. ;) Just to help out with that, use:

iostat -dmx

That will give you extended information, including the % utilization of your drives. TPS stats are nice, but I was just guessing your drives were stalling out based on experience. Getting an outright percentage is better. You should also use sar. Just a plain:

sar 1 100

Will give you a lot of info on what the CPU is doing. You want that %iowait column to be as low as possible.

Keep us updated.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Optimize update query

From

Shaun Thomas

Date:

30 November 2012, 16:06:48

On 11/30/2012 09:44 AM, Niels Kristian Schjødt wrote:

Just a note on your iostat numbers. The first reading is actually just
a summary. You want the subsequent readings.

> The pgsql_tmp dir is not changing at all it's constantly empty (a size
> of 4.0K).

Good.

> Filesystem     1K-blocks    Used Available Use% Mounted on
> /dev/md3       230619228 5483796 213420620   3% /ssd

Good.

You could just be seeing lots of genuine activity. But going back on the
thread, I remember seeing this in your postgresql.conf:

shared_buffers = 7680MB

Change this to:

shared_buffers = 4GB

I say that because you mentioned you're using Ubuntu 12.04, and we were
having some problems with PG on that platform. With shared_buffers over
4GB, it starts doing really weird things to the memory subsystem.
Whatever it does causes the kernel to purge cache rather aggressively.
We saw a 60% reduction in read IO by reducing shared_buffers to 4GB.
Without as many reads, your writes should be much less disruptive.

You'll need to restart PG to adopt that change.

But I encourage you to keep iostat running in a terminal window so you
can watch it for a while. It's very revealing.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

shared_buffers on ubuntu precise

From

Ben Chobot

Date:

30 November 2012, 19:57:14

On Nov 30, 2012, at 8:06 AM, Shaun Thomas wrote:

I say that because you mentioned you're using Ubuntu 12.04, and we were
having some problems with PG on that platform. With shared_buffers over
4GB, it starts doing really weird things to the memory subsystem.
Whatever it does causes the kernel to purge cache rather aggressively.
We saw a 60% reduction in read IO by reducing shared_buffers to 4GB.
Without as many reads, your writes should be much less disruptive.

Hm, this sounds like something we should look into. Before we start digging do you have more to share, or did you leave it with the "huh, that's weird; this seems to fix it" solution?

Re: shared_buffers on ubuntu precise

From

Shaun Thomas

Date:

30 November 2012, 20:01:51

On 11/30/2012 01:57 PM, Ben Chobot wrote:

> Hm, this sounds like something we should look into. Before we start
> digging do you have more to share, or did you leave it with the "huh,
> that's weird; this seems to fix it" solution?

We're still testing. We're still on the -31 kernel. We tried the -33
kernel which *might* fix it, but then this happened:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1084264

So now we're testing -34 which is currently proposed. Either way, it's
pretty clear that Ubuntu's choice of patches to backport is rather
eclectic and a little wonky, or that nailing down load calculations went
awry since the NOHZ stuff started, or both. At this point, I wish we'd
stayed on CentOS.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: shared_buffers on ubuntu precise

From

Bruce Momjian

Date:

30 November 2012, 20:38:45

On Fri, Nov 30, 2012 at 02:01:45PM -0600, Shaun Thomas wrote:
> On 11/30/2012 01:57 PM, Ben Chobot wrote:
>
> >Hm, this sounds like something we should look into. Before we start
> >digging do you have more to share, or did you leave it with the "huh,
> >that's weird; this seems to fix it" solution?
>
> We're still testing. We're still on the -31 kernel. We tried the -33
> kernel which *might* fix it, but then this happened:
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1084264
>
> So now we're testing -34 which is currently proposed. Either way,
> it's pretty clear that Ubuntu's choice of patches to backport is
> rather eclectic and a little wonky, or that nailing down load
> calculations went awry since the NOHZ stuff started, or both. At
> this point, I wish we'd stayed on CentOS.

Or Debian.  Not sure what would justify use of Ubuntu as a server,
except wanting to have the exact same OS as their personal computers.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: shared_buffers on ubuntu precise

From

Shaun Thomas

Date:

30 November 2012, 20:46:30

On 11/30/2012 02:38 PM, Bruce Momjian wrote:

> Or Debian.  Not sure what would justify use of Ubuntu as a server,
> except wanting to have the exact same OS as their personal computers.

Honestly not sure why we went that direction. I'm not in the sysadmin
group, though I do work with them pretty closely. I think it was because
of the LTS label, and the fact that the packages are quite a bit more
recent than Debian stable.

I can say however, that I'm testing the 3.4 kernel right now, and it
seems much better. I may be able to convince them to install that
instead if their own tests prove beneficial.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: shared_buffers on ubuntu precise

From

Daniel Farina

Date:

30 November 2012, 22:22:20

On Fri, Nov 30, 2012 at 12:38 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Or Debian.  Not sure what would justify use of Ubuntu as a server,
> except wanting to have the exact same OS as their personal computers.

We have switched from Debian to Ubuntu: there is definitely non-zero
value in the PPA hosting (although it's rather terrible in many ways),
regular LTS releases (even if you choose not to use them right away,
and know they are somewhat buggy at times), and working with AWS and
Canonical as organizations (that, most importantly, can interact
directly without my own organization) on certain issues. For example,
this dog of a bug:

  https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/929941

I also frequently take advantage of Debian unstable for backporting of
specific packages that are very important to me, so there's a lot of
value to me in Ubuntu being quite similar to Debian.  In fact, even
though I say we 'switched', it's not as though we re-did some
entrenched systems from Debian to Ubuntu -- rather, we employ both
systems at the same time and I don't recall gnashing of teeth about
that, because they are very similar.  Yet, there is a clear Ubuntu
preference for new systems made today and, to wit, I can't think of
anyone with more than the most mild preference for Debian. Conversely,
I'd say the preference for Ubuntu for the aforementioned reasons is
clear but moderate at most.

Also, there's the similarity to the lap/desktop environment. Often
cited with some derision, yet it does add a lot of value, even if
people run slightly newer Ubuntus on their non-production computer.

--
fdr

Re: shared_buffers on ubuntu precise

From

Mark Kirkwood

Date:

30 November 2012, 22:36:49

On 01/12/12 11:21, Daniel Farina wrote:
> On Fri, Nov 30, 2012 at 12:38 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> Or Debian.  Not sure what would justify use of Ubuntu as a server,
>> except wanting to have the exact same OS as their personal computers.
>
> We have switched from Debian to Ubuntu: there is definitely non-zero
> value in the PPA hosting (although it's rather terrible in many ways),
> regular LTS releases (even if you choose not to use them right away,
> and know they are somewhat buggy at times), and working with AWS and
> Canonical as organizations (that, most importantly, can interact
> directly without my own organization) on certain issues. For example,
> this dog of a bug:
>
>    https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/929941
>
> I also frequently take advantage of Debian unstable for backporting of
> specific packages that are very important to me, so there's a lot of
> value to me in Ubuntu being quite similar to Debian.  In fact, even
> though I say we 'switched', it's not as though we re-did some
> entrenched systems from Debian to Ubuntu -- rather, we employ both
> systems at the same time and I don't recall gnashing of teeth about
> that, because they are very similar.  Yet, there is a clear Ubuntu
> preference for new systems made today and, to wit, I can't think of
> anyone with more than the most mild preference for Debian. Conversely,
> I'd say the preference for Ubuntu for the aforementioned reasons is
> clear but moderate at most.
>
> Also, there's the similarity to the lap/desktop environment. Often
> cited with some derision, yet it does add a lot of value, even if
> people run slightly newer Ubuntus on their non-production computer.
>

+1

We have gone through pretty much the same process in the last couple of
years. Most of our new systems run Ubuntu, some Debian.

There is definitely value in running the "same" system on the desktop
too - often makes bug replication ridiculously easy (no having to find
the appropriate test environment, ask if I can hammer/punish/modify it
etc etc, and no need even spin up a VM).

Cheers

Mark

Re: Optimize update query

From

Mark Kirkwood

Date:

30 November 2012, 22:43:10

Hmm - not strictly true as stated: 1 SSD will typically do 500MB/s
sequential read/write. 1 HDD will be lucky to get a 1/3 that.

We are looking at replacing 4 to 6 disk RAID10 arrays of HDD with a
RAID1 pair of SSD, as they perform about the same for sequential work
and vastly better at random. Plus they only use 2x 2.5" slots (or, ahem
2x PCIe sockets), so allow smaller form factor servers and save on power
and cooling.

Cheers

Mark

On 30/11/12 23:07, Vitalii Tymchyshyn wrote:
> Oh, yes. I don't imagine DB server without RAID+BBU :)
> When there is no BBU, SSD can be handy.
> But you know, SSD is worse in linear read/write than HDD.
>
> Best regards, Vitalii Tymchyshyn
>
>
> 2012/11/30 Mark Kirkwood <mark.kirkwood@catalyst.net.nz
> <mailto:mark.kirkwood@catalyst.net.nz>>
>
>     Most modern SSD are much faster for fsync type operations than a
>     spinning disk - similar performance to spinning disk + writeback
>     raid controller + battery.
>
>     However as you mention, they are great at random IO too, so Niels,
>     it might be worth putting your postgres logs *and* data on the SSDs
>     and retesting.
>

Re: Optimize update query

From

Vitalii Tymchyshyn

Date:

02 December 2012, 11:14:31

Well, it seems that my data can be outdated, sorry for that. I've just checked performance numbers on Tom's hardware and it seems that best sad really do 500 MB/s. Some others do 100. So, I'd say one must choose wisely (as always :-) ).

Best regards,
Vitalii Tymchyshyn

1 груд. 2012 00:43, "Mark Kirkwood" <mark.kirkwood@catalyst.net.nz> напис.

Hmm - not strictly true as stated: 1 SSD will typically do 500MB/s sequential read/write. 1 HDD will be lucky to get a 1/3 that.

We are looking at replacing 4 to 6 disk RAID10 arrays of HDD with a RAID1 pair of SSD, as they perform about the same for sequential work and vastly better at random. Plus they only use 2x 2.5" slots (or, ahem 2x PCIe sockets), so allow smaller form factor servers and save on power and cooling.

Cheers

Mark

On 30/11/12 23:07, Vitalii Tymchyshyn wrote:
Oh, yes. I don't imagine DB server without RAID+BBU :)
When there is no BBU, SSD can be handy.
But you know, SSD is worse in linear read/write than HDD.

Best regards, Vitalii Tymchyshyn

2012/11/30 Mark Kirkwood <mark.kirkwood@catalyst.net.nz
<mailto:mark.kirkwood@catalyst.net.nz>>

Most modern SSD are much faster for fsync type operations than a
spinning disk - similar performance to spinning disk + writeback
raid controller + battery.

However as you mention, they are great at random IO too, so Niels,
it might be worth putting your postgres logs *and* data on the SSDs
and retesting.

Re: Optimize update query

From

Mark Kirkwood

Date:

02 December 2012, 23:32:42

Yeah, this area is changing very fast!

I agree - choosing carefully is important, as there are still plenty of
older models around that are substantially slower. Also choice of
motherboard chipset can strongly effect overall performance too. The 6
Gbit/s ports on Sandy and Ivy bridge Mobos [1] seem to get close to that
rated performance out of the SSD that I've tested (Crucial m4, Intel
various).

Cheers

Mark

[1] Which I think are actually Intel or Marvell controllers.

On 03/12/12 00:14, Vitalii Tymchyshyn wrote:
> Well, it seems that my data can be outdated, sorry for that. I've just
> checked performance numbers on Tom's hardware and it seems that best sad
> really do 500 MB/s. Some others do 100. So, I'd say one must choose wisely
> (as always :-) ).
>
> Best regards,
> Vitalii Tymchyshyn
> 1 груд. 2012 00:43, "Mark Kirkwood" <mark.kirkwood@catalyst.net.nz> напис.
>
>> Hmm - not strictly true as stated: 1 SSD will typically do 500MB/s
>> sequential read/write. 1 HDD will be lucky to get a 1/3 that.
>>
>> We are looking at replacing 4 to 6 disk RAID10 arrays of HDD with a RAID1
>> pair of SSD, as they perform about the same for sequential work and vastly
>> better at random. Plus they only use 2x 2.5" slots (or, ahem 2x PCIe
>> sockets), so allow smaller form factor servers and save on power and
>> cooling.
>>
>> Cheers
>>
>> Mark
>>
>> On 30/11/12 23:07, Vitalii Tymchyshyn wrote:
>>
>>> Oh, yes. I don't imagine DB server without RAID+BBU :)
>>> When there is no BBU, SSD can be handy.
>>> But you know, SSD is worse in linear read/write than HDD.
>>>
>>> Best regards, Vitalii Tymchyshyn
>>>
>>>
>>> 2012/11/30 Mark Kirkwood <mark.kirkwood@catalyst.net.nz
>>> <mailto:mark.kirkwood@**catalyst.net.nz <mark.kirkwood@catalyst.net.nz>>>
>>>
>>>      Most modern SSD are much faster for fsync type operations than a
>>>      spinning disk - similar performance to spinning disk + writeback
>>>      raid controller + battery.
>>>
>>>      However as you mention, they are great at random IO too, so Niels,
>>>      it might be worth putting your postgres logs *and* data on the SSDs
>>>      and retesting.
>>>
>>>

Re: Optimize update query

From

Niels Kristian Schjødt

Date:

03 December 2012, 09:37:10

Den 30/11/2012 kl. 17.06 skrev Shaun Thomas <sthomas@optionshouse.com>:

> On 11/30/2012 09:44 AM, Niels Kristian Schjødt wrote:
>
> Just a note on your iostat numbers. The first reading is actually just a summary. You want the subsequent readings.
>
>> The pgsql_tmp dir is not changing at all it's constantly empty (a size
>> of 4.0K).
>
> Good.
>
>> Filesystem     1K-blocks    Used Available Use% Mounted on
>> /dev/md3       230619228 5483796 213420620   3% /ssd
>
> Good.
>
> You could just be seeing lots of genuine activity. But going back on the thread, I remember seeing this in your
postgresql.conf:
>
> shared_buffers = 7680MB
>
> Change this to:
>
> shared_buffers = 4GB
>
> I say that because you mentioned you're using Ubuntu 12.04, and we were having some problems with PG on that
platform.With shared_buffers over 4GB, it starts doing really weird things to the memory subsystem. Whatever it does
causesthe kernel to purge cache rather aggressively. We saw a 60% reduction in read IO by reducing shared_buffers to
4GB.Without as many reads, your writes should be much less disruptive. 
>
> You'll need to restart PG to adopt that change.
>
> But I encourage you to keep iostat running in a terminal window so you can watch it for a while. It's very revealing.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas@optionshouse.com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Couldn't this be if you haven't changed these: http://www.postgresql.org/docs/9.2/static/kernel-resources.html ?
I have changed the following in my configuration:

kernel.shmmax = 8589934592 #(8GB)
kernel.shmall = 17179869184 #(16GB)