Re: Performance while loading data and indexing - Mailing list pgsql-general

From Richard Huxton
Subject Re: Performance while loading data and indexing
Date
Msg-id 200209261048.07761.dev@archonet.com
Whole thread Raw
In response to Performance while loading data and indexing  ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>)
List pgsql-general
On Thursday 26 Sep 2002 9:35 am, Shridhar Daithankar wrote:

[questions re: large database]

Before reading my advice please bear in mind you are operating way beyond the
scale of anything I have ever built.

> Now the details. Note that this is a test run only..
>
> Platform:- 4x Xeon2.4GHz/4GB RAM/4x48 SCSI RAID5/72 GB SCSI
> RedHat7.2/PostgreSQL7.1.3
>
> Database in flat file:
> 125,000,000 records of around 100 bytes each.
> Flat file size 12GB
>
> Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec.
> Create unique composite index on 2 char and a timestamp field:  25226 sec.
> Database size on disk: 26GB
> Select query: 1.5 sec. for approx. 150 rows.
>
> Important postgresql.conf settings
[snipped setting details for moment]

Have you tried putting the wal files, syslog etc on separate disks/volumes? If
you've settled on Intel, about the only thing you can optimise further is the
disks.

Oh - and the OS - make sure you're running a (good) recent kernel for that
sort of hardware, I seem to remember some substantial changes in the 2.4
series regarding multi-processor.

> Now the requirements
>
> Initial flat data load: 250GB of data. This has gone up since last query.
> It was 150GB earlier..
> Ongoing inserts: 5000/sec.
> Number of queries: 4800 queries/hour
> Query response time: 10 sec.

Is this 5000 rows in say 500 transactions or 5000 insert transactions per
second. How many concurrent clients is this? Similarly for the 4800 queries,
how many concurrent clients is this? Are they expected to return approx 150
rows as in your test?

> Now questions.
>
> 1)  Instead of copying from a single 12GB data file,  will a parallel copy
> from say 5 files will speed up the things?

If the CPU is the bottle-neck then it should, but it's difficult to say
without figures.

> Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5
> setup..

What is saturating during the flat-file load? Something must be maxed in top /
iostat / vmstat.

[snip]
>
> 5)Will upgrading to 7.2.2/7.3 beta help?

It's unlikely to hurt.

> All in all, in the  test, we didn't see the performance where hardware is
> saturated to it's limits.

Something *must* be.

What are your disaster recovery plans? I can see problems with taking backups
if this beast is live 24/7.

- Richard Huxton

pgsql-general by date:

Previous
From: Justin Clift
Date:
Subject: Re: [HACKERS] Performance while loading data and indexing
Next
From: "mourad_dhambri"
Date:
Subject: two databases