Thread: Data Warehousing Tuning

Data Warehousing Tuning

From
"Paul Johnson"
Date:
Hi all, we have the following setup:

- Sun V250 server
- 2*1.3GHz Sparc IIIi CPU
- 8GB RAM
- 8*73GB SCSI drives
- Solaris 10
- Postgres 8

Disks 0 and 1 are mirrored and contain the OS and the various software
packages, disks 2-7 are configured as a 320GB concatenation mounted on
/data, which is where load files and Postgres database and log files live.

The box is used by a small number of developers doing solely
Postgres-based data warehousing work. There are no end-users on the box,
and we are aiming for the maximum IO throughput.

Questions are as follows:

1) Should we have set the page size to 32MB when we compiled Postgres?

We mainly do bulk loads using 'copy', full-table scans and large joins so
this would seem sensible. Tables are typically 10 million rows at present.

2) What are the obvious changes to make to postgresql.conf?

Things like shared_buffers, work_mem, maintenance_work_mem and
checkpoint_segments seem like good candidates given our data warehousing
workloads.

3) Ditto /etc/system?

4) We moved the pg_xlog files off /data/postgres (disks 2-7) and into
/opt/pg_xlog (disks 0-1), but it seemed like performance decreased, so we
moved them back again.

Has anyone experienced real performance gains by moving the pg_xlog files?

Thanks in anticipation,

Paul.



Re: Data Warehousing Tuning

From
"Frank Wosczyna"
Date:
Hi Paul, just some quick thoughts:


> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of
> Paul Johnson
> Sent: Wednesday, July 06, 2005 10:16 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Data Warehousing Tuning
>

>
> Questions are as follows:
>
> 1) Should we have set the page size to 32MB when we compiled Postgres?
>
> We mainly do bulk loads using 'copy', full-table scans and
> large joins so this would seem sensible. Tables are typically
> 10 million rows at present.

I would defer changing page size to the "fine-tuning" category, our
experience with that has not produced substantial gains.  Would focus on
the other categories you mention below first.

Also, for heavy use of COPY, you may consider using the latest release
of Bizgres 0.6, which should speed loads:
http://www.bizgres.org/pages.php?pg=downloads or
http://www.greenplum.com/prod_download.html for compiled version.


>
> 2) What are the obvious changes to make to postgresql.conf?
>
> Things like shared_buffers, work_mem, maintenance_work_mem
> and checkpoint_segments seem like good candidates given our
> data warehousing workloads.

You're on the right track, it depends on nature of queries (sorry for
giving you the "consulting" answer on that one), but here are some
PostgreSQL configurations to consider:

a - Consider using separate disk partitions for transaction log, temp
space and WAL.  See separate note about WAL and directio in Solaris
tuning note, link below.  May put temp space on a separate partition, in
anticipation of forthcoming changes which take advantage of this.

b - Sizing temp space?  Should be as large as the largest index.  Set
max speed read/write config: minimal journaling, use write-through cache
on this.

c - Might try increasing checkpoint segments (64?).  More logs produces
significant benefit.  And turn checkpoint warnings on (Off by default).

d - Sort mem and work mem - What queries are you running?  Workmem used
in aggregation/sorts.  How many concurrent reports?  For 3 complex
queries, might try 256MB at work mem?

e - You probably do this already, but always ANALYZE after loads.

f - Maintenance work mem - used in vacuum, analyze, creating bulk
indexes, bulk checking for keys.  Might consider using 512 or 750?


>
> 3) Ditto /etc/system?

See http://www.bizgres.org/bb/viewtopic.php?t=6 for Solaris.


>
> 4) We moved the pg_xlog files off /data/postgres (disks 2-7)
> and into /opt/pg_xlog (disks 0-1), but it seemed like
> performance decreased, so we moved them back again.
>
> Has anyone experienced real performance gains by moving the
> pg_xlog files?
>

Likely to help only with COPY.

Feel free to contact me directly if you have any questions on my
statements above.  There is a Configurator in development which you
might find helpful when it is complete:
http://www.bizgres.org/pages.php?pg=developers%7Cprojects%7Cconfigurator



Kind Regards,
Frank

Frank Wosczyna
Systems Engineer
+1 650 224 7374

http://www.greenplum.com

GreenPlum, Inc.
1900 South Norfolk Street, Suite 224
San Mateo, California 94403 USA


Re: Data Warehousing Tuning

From
Josh Berkus
Date:
Paul,

> Has anyone experienced real performance gains by moving the pg_xlog
> files?

Yes.   Both for data load and on OLTP workloads, this increased write
performance by as much as 15%.   However, you need to configure the xlog
drive correctly, you can't just move it to a new disk.    Make sure the
other disk is dedicated exclusively to the xlog, set it forcedirectio, and
increase your checkpoint_segments to something like 128.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Data Warehousing Tuning

From
"Alexander Kirpa"
Date:
>- Sun V250 server
>- 2*1.3GHz Sparc IIIi CPU
>- 8GB RAM
>- 8*73GB SCSI drives
>- Solaris 10
>- Postgres 8
>4) We moved the pg_xlog files off /data/postgres (disks 2-7) and into
>/opt/pg_xlog (disks 0-1), but it seemed like performance decreased,
>so we moved them back again.
You have saturated SCSI bus.
1x160GB/s SCSI too small for 8xHDD with 30-70MB/s
Solutions:
Replace CD/DVD/tape at top 2x5" slots on 2xHDD (320 SCSI),
 install PCI 64/66 SCSI 320 controller
 (or simple RAID1 controller for minimize
 saturation of PCI buses)
 and attach to 2xHDD. Move /opt/pg_xlog on this drives.

Best regards,
 Alexander Kirpa