Re: Upgrade Woes - Mailing list pgsql-performance

From aturner@neteconomist.com
Subject Re: Upgrade Woes
Date
Msg-id 20030911141622.GB26759@neteconomist.com
Whole thread Raw
In response to Re: Upgrade Woes  (Richard Huxton <dev@archonet.com>)
Responses Re: Upgrade Woes  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Upgrade Woes  (Jeff <threshar@torgo.978.org>)
List pgsql-performance
Thanks for the URL, I went through postgresql.conf and made some modifications to the config based on information
therein. I will have to wait and see how it affects things, as I won't know for a week or so. 

Select time has never been a problem, the DB has always been very fast, it's the insert time that has been a problem.
I'mnot sure how much this is a function of the drive array sucking, the OS not doing a good job or the DB getting
caughtup in transaction logs. 

What does seem odd is that the performance degrades as time goes on, and the space that the DB files takes up increases
aswell. 

The Vacuum full is performed once at the end of the whole job.  We could probably get away with doing this once per
week,but in the past I have noticed that if I don't run it regularlly, when I do run it, it seems to take much longer.
Thishas lead me to run more regularly than not. 

As for 7.3.3, the project in question suffered a 10x performance degredation on 7.3.3 which went away when we rolled
backto 7.3.2.  Almost all the inserts had triggers which updated stats tables, the database in question was very very
writeheavy, it was pretty much a datawarehouse for X10 sensor information which was then mined for analysis. 

I had certainly considered building the script to do binary seperation style inserts, split the job in half, insert, if
itfails, split in half again until you get everything in.  This would probably work okay considering only about two
dozenout of 30,000 rows fail. The only reason not to do that it the time and effort required, particularly as we are
lookingat a substantial overhaul of the whole system in the next 6 months. 

Alex Turner


On Wed, Sep 10, 2003 at 07:31:53PM +0100, Richard Huxton wrote:
> On Wednesday 10 September 2003 18:53, aturner@neteconomist.com wrote:
> > Hi,
> >
> > My name is Alex Turner and I work for a small Tech company in Pottstown PA.
> >  We run Postgresql on a number of systems for a variety of different
> > applications, and it has been a joy to deal with all around, working fast
> > and reliably for over 2 years.
> >
> > We recently upgraded from RedHat 7.2 to RedHat 9.0, and we are running
> > Postgres 7.3.2 on our Proliant ML370 (Raid 1 2x18 10k, and Raid 5 3x36 10k,
> > 2x866 PIII, 2GB RAM).
> [snip]
> > I have noticed that whilst inserts seem to be slower than before, the
> > vacuum full doesn't seem to take as long overall.
> >
> > postgresql.conf is pretty virgin, and we run postmaster with -B512 -N256
> > -i.  /var/lib/pgsql/data is a symlink to /eda/data, /eda being the mount
> > point for the Raid 5 array.
>
> First things first then, go to:
> http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
> and read the item on Performance Tuning and the commented postgresql.conf
>
> > the database isn't huge, storing about 30000 properties, and the largest
> > table is 2.1 Million rows for property features.  The dump file is only
> > 221MB. Alas, I did not design the schema, but I have made several 'tweaks'
> > to it to greatly improve read performance allowing us to be the fastest
> > provider in the Tristate area.  Unfortunately the Job starts at 01:05
> > (thats the earliest the dump is available) and runs until completion
> > finishing with a vacuum full.  The vacuum full locks areas of the database
> > long enough that our service is temporarily down.  At the worst point, the
> > vacuum full was starting after 09:00, which our customers didn't
> > appreciate.
>
> You might be able to avoid a vacuum full by tweaking the *fsm* settings to be
> able to cope with activity.
>
> > I'm wondering if there is anything I can do with postgres to allieviate
> > this problem.  Either upgrading to 7.3.4 (although I tried 7.3.3 for
> > another app, and we had to roll back to 7.3.2 because of performance
> > problems),
>
> Hmm - can't think what would have changed radically between 7.3.2 and 7.3.3,
> upgrading to .4 is probably sensible.
>
> [snip]
> > Any help/suggestions would be grealy appreciated,
>
> You say that each insert/update is a separate transaction. I don't know how
> much "bad" data you get in the dump, but you might be able to do something
> like:
>
> 1. Set batch size to 128 items
> 2. Read batch-size rows from the dump
> 3. Try to insert/update the batch. If it works, move along by the size of the
> batch and back to #1
> 4. If batch-size=1, record error, move along one row and back to #1
> 5. If batch-size>1, halve batch-size and go back to #3
>
> Your initial batch-size will depend on how many errors there are (but
> obviously use a power of 2).
>
> You could also run an ordinary vacuum every 1000 rows or so (number depends on
> your *fsm* settings as mentioned above).
>
> You might also want to try a REINDEX once a night/week too.
> --
>   Richard Huxton
>   Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

pgsql-performance by date:

Previous
From: "Christopher Kings-Lynne"
Date:
Subject: Re: Reading data in bulk - help?
Next
From: Tom Lane
Date:
Subject: Re: Upgrade Woes