Thread: Hardware performance for large updates

Hardware performance for large updates

From
"Josh Berkus"
Date:
Folks,

I'm doing some massive data transformations on Postgresql, and they're
a lot slower than they should be.   I'm looking for some tips on
improving things.  If the PGSQL-PERFORMANCE list was ever created,
please tell me and I'll go over there.

The update:  A series of 7 update statements which cull data from a 1.5
million row table to update a 120,000 row table.

The Machine:  A dual-processor RAID 5 UW SCSI server.

The postgresql.conf settings:
Connections: 128
Shared Buffers: 256
Sort Mem: 1024
Checkpoint Segments: 16
Stats on.
Light debug logging.

The problem: The update series (done as a function) takes 10-15
minutes.  During this time, the CPU is never more than 31% busy, only
256mb of 512 is in use, and the disk channel is only 25% - 50%
saturated.    As such, is seems like we could run things faster.

What does everybody suggest tweaking?

-Josh Berkus



Re: Hardware performance for large updates

From
Joe Conway
Date:
Josh Berkus wrote:
> The problem: The update series (done as a function) takes 10-15
> minutes.  During this time, the CPU is never more than 31% busy, only
> 256mb of 512 is in use, and the disk channel is only 25% - 50%
> saturated.    As such, is seems like we could run things faster.
> 
> What does everybody suggest tweaking?
> 

I think we'd need more information to be of any help -- schema, 
functions, explain output, etc.

I do think you probably could increase Shared Buffers, as 256 is pretty 
small. There's been a lot of debate over the best setting. The usual 
guidance is start at 25% of physical RAM (16384 == 128MB if you have 
512MB RAM), then tweak to optimize performance for your application and 
hardware. You might also bump sort mem up a bit (maybe to 2048). Again, 
I would test using my app and hardware to get the best value. Are you on 
a Linux server -- if so I found that fdatasync works better than (the 
default) fsync for wal_sync_method.

HTH,

Joe



Re: Hardware performance for large updates

From
"Josh Berkus"
Date:
Joe,

> I think we'd need more information to be of any help -- schema,
> functions, explain output, etc.

Yeah, I know.   I'm just looking for general tips here ... I need to do
the actual optimization interactively.    

Particularly, the difficulty is that this application gets many small
requests during the day (100 simultaneous uses) and shares a server
with Apache.   So I have to be concerned about how much memory each
connection soaks up, during the day.   At night, the maintainence tasks
run a few, really massive procedures.

So I should probably restart Postgres with different settings at night,
hey?

> I do think you probably could increase Shared Buffers, as 256 is
> pretty small. There's been a lot of debate over the best setting. The
> usual guidance is start at 25% of physical RAM (16384 == 128MB if you
> have 512MB RAM), then tweak to optimize performance for your
> application and hardware. 

Hmmm... how big is a shared buffer, anyway?   I'm having trouble
finding actual numbers in the docs.

> You might also bump sort mem up a bit
> (maybe to 2048). Again, I would test using my app and hardware to get
> the best value. 

> Are you on a Linux server -- if so I found that
> fdatasync works better than (the default) fsync for wal_sync_method.

Yes, I am.   Any particular reason why fdatasync works better?

Thanks a lot!

-Josh Berkus



Re: Hardware performance for large updates

From
Joe Conway
Date:
Josh Berkus wrote:
> Particularly, the difficulty is that this application gets many small
> requests during the day (100 simultaneous uses) and shares a server
> with Apache.   So I have to be concerned about how much memory each
> connection soaks up, during the day.   At night, the maintainence tasks
> run a few, really massive procedures.
> 
> So I should probably restart Postgres with different settings at night,
> hey?

Actually, if you can afford the twice daily changes, it sounds like a 
great idea. I think you can get new conf settings to take by sending a 
SIGHUP to the postmaster, so you don't even really need any downtime to 
do it. Yup, here it is:  http://www.postgresql.org/idocs/index.php?runtime-config.html

>>I do think you probably could increase Shared Buffers, as 256 is
>>pretty small. There's been a lot of debate over the best setting. The
>>usual guidance is start at 25% of physical RAM (16384 == 128MB if you
>>have 512MB RAM), then tweak to optimize performance for your
>>application and hardware. 
> 
> 
> Hmmm... how big is a shared buffer, anyway?   I'm having trouble
> finding actual numbers in the docs.

By default it is 8K. It's mentioned here:  http://www.postgresql.org/idocs/index.php?kernel-resources.html
So, as I mentioned above, Shared Buffers of 16384 == 128MB if you have a 
default 8K block size.


>>Are you on a Linux server -- if so I found that
>>fdatasync works better than (the default) fsync for wal_sync_method.
> 
> Yes, I am.   Any particular reason why fdatasync works better?

I can't remember the technical reason (although I've seen one on the 
list before), but I have determined it empirically true, at least for my 
setup. Ahh, here we go:  http://archives.postgresql.org/pgsql-hackers/1998-04/msg00326.php

Joe