Thread: big database performance

big database performance

From
Adrian Moisey
Date:
Hi

We recently converted to postgres (from mssql) and we're having
performance issues.  Not all the issues are related to postgres, but
we're trying to sort everything out.

The server is running ubuntu Gutsy with the database stored on a IBM
SAN.  It is a Dell box with dual quad core 2.00GHz Xeons and 8GB RAM.


The database is about 71GB in size.

I've looked at the postgres config files and we've tweaked as much as
our knowledge allows.

Can someone shed some light on the settings I should use ?


Thanks in advance
--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: big database performance

From
"Joshua D. Drake"
Date:
Adrian Moisey wrote:
> Hi
>
> We recently converted to postgres (from mssql) and we're having
> performance issues.  Not all the issues are related to postgres, but
> we're trying to sort everything out.
>
> The server is running ubuntu Gutsy with the database stored on a IBM
> SAN.  It is a Dell box with dual quad core 2.00GHz Xeons and 8GB RAM.
>
>
> The database is about 71GB in size.
>
> I've looked at the postgres config files and we've tweaked as much as
> our knowledge allows.
>
> Can someone shed some light on the settings I should use ?

Umpf that isn't quite enough info :) but assuming you are running 8.2.x:

Start with 1GB shared_buffers (you may be able to go hire), 4MB
work_mem, wal_sync_method = open_sync, checkpoint_segments = 30,
default_statistics_target = 150, effective_cache_size = 6GB .


Restart, VACUUM ANALYZE VERBOSE, post back last 4 lines of output.

Other good items to know:

64bit Gutsy?
How is the SAN connected?
What does mpstat 5 (3 iterations) say?
Even better what does sar -A say over a 24 hour period?

Sincerely,

Joshua D. Drake




>
>
> Thanks in advance


Re: big database performance

From
Frank Habermann
Date:
Hi,

what segment size do you use for the san partition? This could also be a
bottle neck for db servers.

Frank

Re: big database performance

From
Adrian Moisey
Date:
Hi

>> We recently converted to postgres (from mssql) and we're having
>> performance issues.  Not all the issues are related to postgres, but
>> we're trying to sort everything out.
>>
>> The server is running ubuntu Gutsy with the database stored on a IBM
>> SAN.  It is a Dell box with dual quad core 2.00GHz Xeons and 8GB RAM.
>>
>>
>> The database is about 71GB in size.
>>
>> I've looked at the postgres config files and we've tweaked as much as
>> our knowledge allows.
>>
>> Can someone shed some light on the settings I should use ?
>
> Umpf that isn't quite enough info :) but assuming you are running 8.2.x:

Sorry :/  Yes, we are running 8.2.x

> Start with 1GB shared_buffers (you may be able to go hire), 4MB
> work_mem, wal_sync_method = open_sync, checkpoint_segments = 30,
> default_statistics_target = 150, effective_cache_size = 6GB .

Our shared_buffers is 1GB.
work_mem is 32MB
I changed wal_sync_method to open_sync (which helped a ton!)

Can someone please explain effective_cache_size.  what cache does it
want to know about?  Linux cache?

Also, we're running the db on ext3 with noatime.   Should I look at
changing or getting rid of journaling ?


> 64bit Gutsy?

Yes

> How is the SAN connected?

fibre

> What does mpstat 5 (3 iterations) say?
> Even better what does sar -A say over a 24 hour period?

I'll get these for you

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: big database performance

From
Alan Hodgson
Date:
On Wednesday 09 January 2008, Adrian Moisey <adrian@careerjunction.co.za>
wrote:
>
> Also, we're running the db on ext3 with noatime.   Should I look at
> changing or getting rid of journaling ?

No (unless you like really long fsck times). data=writeback is safe with
PostgreSQL, though.


Re: big database performance

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 9 Jan 2008 08:16:48 -0800
Alan Hodgson <ahodgson@simkin.ca> wrote:

> On Wednesday 09 January 2008, Adrian Moisey
> <adrian@careerjunction.co.za> wrote:
> >
> > Also, we're running the db on ext3 with noatime.   Should I look at
> > changing or getting rid of journaling ?
> 
> No (unless you like really long fsck times). data=writeback is safe
> with PostgreSQL, though.
> 

Except :)... for pg_xlog. If you have pg_xlog on a different partition,
feel free to run ext2 for it.

Joshua D. Drake

> 
> ---------------------------(end of
> broadcast)--------------------------- TIP 7: You can help support the
> PostgreSQL project by donating at
> 
>                 http://www.postgresql.org/about/donate
> 


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHhPtyATb/zqfZUUQRAk32AKCTvJPBCvHtb4JWMu7+xwxQZdA/ZQCgn3K2
pCmcUXAiAibLkTgEwGVXPyQ=
=H2bK
-----END PGP SIGNATURE-----

Re: big database performance

From
"Guillaume Smet"
Date:
Hi Joshua,

On Jan 9, 2008 9:27 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
> wal_sync_method = open_sync

Do you recommend it in every situation or just because data are on a
SAN? Do you have any numbers/real cases explaining this choice.

Thanks.

--
Guillaume

Re: big database performance

From
Jared Mauch
Date:
On Wed, Jan 09, 2008 at 12:27:33AM -0800, Joshua D. Drake wrote:
> Adrian Moisey wrote:
>> Hi
>>
>> We recently converted to postgres (from mssql) and we're having
>> performance issues.  Not all the issues are related to postgres, but we're
>> trying to sort everything out.

    Hi,

    I do large databases in Pg, like 300GB/day of new data.  Need a lot
more data on what you're having issues with.

    Is your problem with performance database reads?
writes? (insert/copy?)  How many indicies do you have?

    - jared

--
Jared Mauch  | pgp key available via finger from jared@puck.nether.net
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.

Re: big database performance

From
Greg Smith
Date:
On Wed, 9 Jan 2008, Guillaume Smet wrote:

> On Jan 9, 2008 9:27 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
>> wal_sync_method = open_sync
>
> Do you recommend it in every situation or just because data are on a
> SAN? Do you have any numbers/real cases explaining this choice.

Sync writes are faster on Linux in every case I've ever tried, compared to
the default config that does a write followed by a sync.  With regular
discs they're just a little faster.  On some SAN configurations, they're
enormously faster, because the SANs are often optimized to handle
syncronous writes far more efficiently than write/sync ones.  This is
mainly because Oracle does its writes that way, so if you want good Oracle
performance you have to handle sync writes well.

I have something on this topic I keep meaning to publish, but I got
spooked about the potential to have silent problems or crashes when using
open_sync due to a Linux kernel issue reported here:

http://archives.postgresql.org/pgsql-hackers/2007-10/msg01310.php

Certainly with that report floating out there I'd only recommend open_sync
to people who are putting plenty of time into testing their database is
robust under load with that configuration before deploying it; I sure
wouldn't just make that changes on a production system just to see if it's
faster.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: big database performance

From
Simon Riggs
Date:
On Wed, 2008-01-09 at 10:18 +0200, Adrian Moisey wrote:

> We recently converted to postgres (from mssql) and we're having
> performance issues.

I think you need to say more about what the performance issues actually
are, otherwise everybody will just speculate you to death.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


Re: big database performance

From
Adrian Moisey
Date:
Hi

>> Also, we're running the db on ext3 with noatime.   Should I look at
>> changing or getting rid of journaling ?
>
> No (unless you like really long fsck times). data=writeback is safe with
> PostgreSQL, though.

I tested that on a dev box, and I didn't notice a difference when using
pgbench

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: big database performance

From
Adrian Moisey
Date:
Hi

>     I do large databases in Pg, like 300GB/day of new data.  Need a lot
> more data on what you're having issues with.

That is big!

What sort of information do you need from me ?

>     Is your problem with performance database reads?
> writes? (insert/copy?)  How many indicies do you have?

I think the problem is related to load.  Everything is slow because
there are way too many connections.  So everything is making everything
else slow.  Not much detail, is it?

We have 345 indicies on the db.

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: big database performance

From
Stephane Bailliez
Date:
Jared Mauch wrote:
>     I do large databases in Pg, like 300GB/day of new data.
That's impressive.  Would it be possible to have details on your
hardware, schema and configuration and type of usage ?

I'm sure there's something to learn in there for a lot of people (or at
least for me)

Cheers,

-- stephane

Re: big database performance

From
Jared Mauch
Date:
On Thu, Jan 10, 2008 at 10:57:46AM +0200, Adrian Moisey wrote:
> What sort of information do you need from me ?

    Ratio of read vs write operations (select vs insert/copy).

    average number of indicies per table

    average table size.  (analyze verbose <tablename> if you want to get
into more details).

    What is the process doing (eg: in top, is it just on the CPU or
is it blocking for I/O?).

    I/O information, from iostat -d (You may need to build an iostat
binary for Linux, the source is out there, i can give you a pointer if
you need it).

>>     Is your problem with performance database reads? writes? (insert/copy?)
>> How many indicies do you have?
>
> I think the problem is related to load.  Everything is slow because there
> are way too many connections.  So everything is making everything else
> slow.  Not much detail, is it?
>
> We have 345 indicies on the db.

    If the tables are heavily indexed this could easily slow down
insert performance.  Taking a large dataset and adding a second
index, postgres doesn't use threads to create the two indicies on
different cpus/cores in parallel.  This could represent some of your
performance difference.  If you're doing a lot of write operations
and fewer read, perhaps the cost of an index isn't worth it in the
cpu time spent creating it vs the amount of time for a seq scan.

    - Jared

--
Jared Mauch  | pgp key available via finger from jared@puck.nether.net
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.

Re: big database performance

From
Jared Mauch
Date:
On Thu, Jan 10, 2008 at 12:08:39PM +0100, Stephane Bailliez wrote:
> Jared Mauch wrote:
>>     I do large databases in Pg, like 300GB/day of new data.
>
> That's impressive.  Would it be possible to have details on your hardware,
> schema and configuration and type of usage ?
>
> I'm sure there's something to learn in there for a lot of people (or at
> least for me)

http://archives.postgresql.org/pgsql-performance/2007-12/msg00372.php

http://archives.postgresql.org/pgsql-performance/2006-05/msg00444.php

    The hardware specs are kinda boring since it's not
I/O bound, so you could get the same disk performance out of
some EIDE 7200 rpm disks (which I have done for testing).

    The current setup is a 4xOpteron 8218 (dual core) w/ 16G ram.
I have roughly 12TB usable disk space on the sysem connected via some
SATA <-> FC thing our systems folks got us.  Problem I have is the linear
cpu speed isn't enough and there would be challenges splitting the
workload across multiple cpus.  All my major reporting is done via
pg_dump and I'm pondering what would happen if I just removed Pg
from the equation for the major reporting tasks entirely.  I may see
much better performance without the database [in my way].  I've not
done that as some types of data access would need to be significantly
redone and I don't want to spend the time on that...

    - Jared

--
Jared Mauch  | pgp key available via finger from jared@puck.nether.net
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.