Thread: Upgrade Woes

Upgrade Woes

From
aturner@neteconomist.com
Date:
Hi,

My name is Alex Turner and I work for a small Tech company in Pottstown PA.  We run Postgresql on a number of systems
fora variety of different applications, and it has been a joy to deal with all around, working fast and reliably for
over2 years. 

We recently upgraded from RedHat 7.2 to RedHat 9.0, and we are running Postgres 7.3.2 on our Proliant ML370 (Raid 1
2x1810k, and Raid 5 3x36 10k, 2x866 PIII, 2GB RAM). 

We seem to have had a serious drop after the upgrade.  The database is a database of properties that is updated on a
dailybasis, and when I say updated I mean that I insert/update the whole data download because the data provider
doesn'ttell us what changed, just gives us a complete dump. The integrity of the dumb isn't great so I can't process as
aCOPY or a block transaction because some of the data is often bad.  Each and every row is a seperate insert or update.
 
Data insert performance used to degrade in a linear fasion as time progressed I'm guessing as the transaction logs
filledup.  About once every six weeks I would dump the database, destroy and recreate the db and reload the dump.  This
'reset'the whole thing, and brought insert/vacuum times back down.  Since the upgrade, performance has degraded very
rapidlyover the first week, and then more slowly later, but enough that we now have to reload the db every 2-3 weeks.
Theinsert procedure triggers a stored procedure that updates a timestamp on the record so that we can figure out what
recordshave been touched, and which have not so that we can determine which properties have been removed from the feed
asthe record was not touched in the last two days. 

I have noticed that whilst inserts seem to be slower than before, the vacuum full doesn't seem to take as long overall.

postgresql.conf is pretty virgin, and we run postmaster with -B512 -N256 -i.  /var/lib/pgsql/data is a symlink to
/eda/data,/eda being the mount point for the Raid 5 array. 

the database isn't huge, storing about 30000 properties, and the largest table is 2.1 Million rows for property
features. The dump file is only 221MB. Alas, I did not design the schema, but I have made several 'tweaks' to it to
greatlyimprove read performance allowing us to be the fastest provider in the Tristate area.  Unfortunately the Job
startsat 01:05 (thats the earliest the dump is available) and runs until completion finishing with a vacuum full.  The
vacuumfull locks areas of the database long enough that our service is temporarily down.  At the worst point, the
vacuumfull was starting after 09:00, which our customers didn't appreciate. 

I'm wondering if there is anything I can do with postgres to allieviate this problem.  Either upgrading to 7.3.4
(althoughI tried 7.3.3 for another app, and we had to roll back to 7.3.2 because of performance problems), or working
withthe postgresql.conf to enhance performance.  I really don't want to roll back the OS version if possible, but I'm
notruling it out at this point, as that seems to be the biggest thing that has changed.  All the drive lights are
showinggreen, so I don't believe the array is running in degraded mode.  I keep logs of all the insert jobs, and
plottingaverage insert times on a graph revealed that this started at the time of the upgrade. 

Any help/suggestions would be grealy appreciated,

Thanks,

Alex Turner
NetEconomist

P.S. Sorry this is so long, but I wanted to include as much info as possible.

Re: Upgrade Woes

From
Richard Huxton
Date:
On Wednesday 10 September 2003 18:53, aturner@neteconomist.com wrote:
> Hi,
>
> My name is Alex Turner and I work for a small Tech company in Pottstown PA.
>  We run Postgresql on a number of systems for a variety of different
> applications, and it has been a joy to deal with all around, working fast
> and reliably for over 2 years.
>
> We recently upgraded from RedHat 7.2 to RedHat 9.0, and we are running
> Postgres 7.3.2 on our Proliant ML370 (Raid 1 2x18 10k, and Raid 5 3x36 10k,
> 2x866 PIII, 2GB RAM).
[snip]
> I have noticed that whilst inserts seem to be slower than before, the
> vacuum full doesn't seem to take as long overall.
>
> postgresql.conf is pretty virgin, and we run postmaster with -B512 -N256
> -i.  /var/lib/pgsql/data is a symlink to /eda/data, /eda being the mount
> point for the Raid 5 array.

First things first then, go to:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
and read the item on Performance Tuning and the commented postgresql.conf

> the database isn't huge, storing about 30000 properties, and the largest
> table is 2.1 Million rows for property features.  The dump file is only
> 221MB. Alas, I did not design the schema, but I have made several 'tweaks'
> to it to greatly improve read performance allowing us to be the fastest
> provider in the Tristate area.  Unfortunately the Job starts at 01:05
> (thats the earliest the dump is available) and runs until completion
> finishing with a vacuum full.  The vacuum full locks areas of the database
> long enough that our service is temporarily down.  At the worst point, the
> vacuum full was starting after 09:00, which our customers didn't
> appreciate.

You might be able to avoid a vacuum full by tweaking the *fsm* settings to be
able to cope with activity.

> I'm wondering if there is anything I can do with postgres to allieviate
> this problem.  Either upgrading to 7.3.4 (although I tried 7.3.3 for
> another app, and we had to roll back to 7.3.2 because of performance
> problems),

Hmm - can't think what would have changed radically between 7.3.2 and 7.3.3,
upgrading to .4 is probably sensible.

[snip]
> Any help/suggestions would be grealy appreciated,

You say that each insert/update is a separate transaction. I don't know how
much "bad" data you get in the dump, but you might be able to do something
like:

1. Set batch size to 128 items
2. Read batch-size rows from the dump
3. Try to insert/update the batch. If it works, move along by the size of the
batch and back to #1
4. If batch-size=1, record error, move along one row and back to #1
5. If batch-size>1, halve batch-size and go back to #3

Your initial batch-size will depend on how many errors there are (but
obviously use a power of 2).

You could also run an ordinary vacuum every 1000 rows or so (number depends on
your *fsm* settings as mentioned above).

You might also want to try a REINDEX once a night/week too.
--
  Richard Huxton
  Archonet Ltd

Re: Upgrade Woes

From
Christopher Browne
Date:
aturner@neteconomist.com writes:
> P.S. Sorry this is so long, but I wanted to include as much info as possible.

Throw in the non-commented lines in postgresql.conf; that would more
than likely make numeric answers possible, for some of it.  If the
config is "out-of-the-box," then it's pretty likely that some
significant improvements can be gotten from modifying a few of the
config parameters.  Increasing buffers would probably help query
speed, and if you're getting too many dead tuples, increasing the free
space map would make it possible for more to vacuum out.

Beyond that, you might want to grab the code for pg_autovacuum, and
drop that into place, as that would do periodic ANALYZEs that would
probably improve the quality of your selects somewhat.  (It's in the
7.4 code "contrib" base, but works fine with 7.3.)

I think you might also get some significant improvements out of
changing the way you load the properties.  If you set up a schema that
is suitably "permissive," and write a script that massages it a
little, COPY should do the trick to load the data in, which should be
helpful to the load process.  If the data comes in a little more
intelligently (which might well involve some parts of the process
"dumbing down" :-)), you might take advantage of COPY and perhaps
other things (we see through the glass darkly).

I would think it also begs the question of whether or not you _truly_
need the "vacuum full."  Are you _certain_ you need that?  I would
think it likely that running "vacuum analyze" (and perhaps doing it a
little bit, continuously, during the load, via pg_autovacuum) would
likely suffice.  Have you special reason to think otherwise?
--
output = ("cbbrowne" "@" "libertyrms.info")
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

Re: Upgrade Woes

From
aturner@neteconomist.com
Date:
Thanks for the URL, I went through postgresql.conf and made some modifications to the config based on information
therein. I will have to wait and see how it affects things, as I won't know for a week or so. 

Select time has never been a problem, the DB has always been very fast, it's the insert time that has been a problem.
I'mnot sure how much this is a function of the drive array sucking, the OS not doing a good job or the DB getting
caughtup in transaction logs. 

What does seem odd is that the performance degrades as time goes on, and the space that the DB files takes up increases
aswell. 

The Vacuum full is performed once at the end of the whole job.  We could probably get away with doing this once per
week,but in the past I have noticed that if I don't run it regularlly, when I do run it, it seems to take much longer.
Thishas lead me to run more regularly than not. 

As for 7.3.3, the project in question suffered a 10x performance degredation on 7.3.3 which went away when we rolled
backto 7.3.2.  Almost all the inserts had triggers which updated stats tables, the database in question was very very
writeheavy, it was pretty much a datawarehouse for X10 sensor information which was then mined for analysis. 

I had certainly considered building the script to do binary seperation style inserts, split the job in half, insert, if
itfails, split in half again until you get everything in.  This would probably work okay considering only about two
dozenout of 30,000 rows fail. The only reason not to do that it the time and effort required, particularly as we are
lookingat a substantial overhaul of the whole system in the next 6 months. 

Alex Turner


On Wed, Sep 10, 2003 at 07:31:53PM +0100, Richard Huxton wrote:
> On Wednesday 10 September 2003 18:53, aturner@neteconomist.com wrote:
> > Hi,
> >
> > My name is Alex Turner and I work for a small Tech company in Pottstown PA.
> >  We run Postgresql on a number of systems for a variety of different
> > applications, and it has been a joy to deal with all around, working fast
> > and reliably for over 2 years.
> >
> > We recently upgraded from RedHat 7.2 to RedHat 9.0, and we are running
> > Postgres 7.3.2 on our Proliant ML370 (Raid 1 2x18 10k, and Raid 5 3x36 10k,
> > 2x866 PIII, 2GB RAM).
> [snip]
> > I have noticed that whilst inserts seem to be slower than before, the
> > vacuum full doesn't seem to take as long overall.
> >
> > postgresql.conf is pretty virgin, and we run postmaster with -B512 -N256
> > -i.  /var/lib/pgsql/data is a symlink to /eda/data, /eda being the mount
> > point for the Raid 5 array.
>
> First things first then, go to:
> http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
> and read the item on Performance Tuning and the commented postgresql.conf
>
> > the database isn't huge, storing about 30000 properties, and the largest
> > table is 2.1 Million rows for property features.  The dump file is only
> > 221MB. Alas, I did not design the schema, but I have made several 'tweaks'
> > to it to greatly improve read performance allowing us to be the fastest
> > provider in the Tristate area.  Unfortunately the Job starts at 01:05
> > (thats the earliest the dump is available) and runs until completion
> > finishing with a vacuum full.  The vacuum full locks areas of the database
> > long enough that our service is temporarily down.  At the worst point, the
> > vacuum full was starting after 09:00, which our customers didn't
> > appreciate.
>
> You might be able to avoid a vacuum full by tweaking the *fsm* settings to be
> able to cope with activity.
>
> > I'm wondering if there is anything I can do with postgres to allieviate
> > this problem.  Either upgrading to 7.3.4 (although I tried 7.3.3 for
> > another app, and we had to roll back to 7.3.2 because of performance
> > problems),
>
> Hmm - can't think what would have changed radically between 7.3.2 and 7.3.3,
> upgrading to .4 is probably sensible.
>
> [snip]
> > Any help/suggestions would be grealy appreciated,
>
> You say that each insert/update is a separate transaction. I don't know how
> much "bad" data you get in the dump, but you might be able to do something
> like:
>
> 1. Set batch size to 128 items
> 2. Read batch-size rows from the dump
> 3. Try to insert/update the batch. If it works, move along by the size of the
> batch and back to #1
> 4. If batch-size=1, record error, move along one row and back to #1
> 5. If batch-size>1, halve batch-size and go back to #3
>
> Your initial batch-size will depend on how many errors there are (but
> obviously use a power of 2).
>
> You could also run an ordinary vacuum every 1000 rows or so (number depends on
> your *fsm* settings as mentioned above).
>
> You might also want to try a REINDEX once a night/week too.
> --
>   Richard Huxton
>   Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: Upgrade Woes

From
Tom Lane
Date:
aturner@neteconomist.com writes:
> As for 7.3.3, the project in question suffered a 10x performance
> degredation on 7.3.3 which went away when we rolled back to 7.3.2.

I would like to pursue that report and find out why.  I've just gone
through the CVS logs between 7.3.2 and 7.3.3, and I don't see any change
that would explain a 10x slowdown.  Can you provide more details about
exactly what slowed down?

Also, what PG version were you using on the old RedHat 7.2 installation?

            regards, tom lane

Re: Upgrade Woes

From
Jeff
Date:
On Thu, 11 Sep 2003, aturner@neteconomist.com wrote:

>
> The Vacuum full is performed once at the end of the whole job.
>
have you also tried vacuum analyze periodically - it does not lock the
table and can help quite a bit?

still odd why it would be that much slower between those versions.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/



Re: Upgrade Woes

From
aturner@neteconomist.com
Date:
In the performance case the machine was running RedHat AS 2.1.  I have posted the database schema at (obtained from
pg_dump-s): 

http://serverbeach.plexq.com/~aturner/schema.sql

The time to run all the stats procedures dropped through the floor.  refresh_hourly_iud, adl_hourly_iud,
rebuild_daily_totaletc.  There is a python script that calls the proc once for each hour or day.  When running the
historicalcalc job for a 7 day period back, it would crawl on 7.3.3.  We started benching the drive array and found
otherissues with the system in the mean time (like the drive array was giving us 10MB/sec write speed - the guy who set
itup did not enable write to cache).  Once it was reconfigured the DB performance did not improve much (bonnie++ was
usedto verify the RAID array speed). 

Alex Turner

On Thu, Sep 11, 2003 at 11:47:32AM -0400, Tom Lane wrote:
> aturner@neteconomist.com writes:
> > As for 7.3.3, the project in question suffered a 10x performance
> > degredation on 7.3.3 which went away when we rolled back to 7.3.2.
>
> I would like to pursue that report and find out why.  I've just gone
> through the CVS logs between 7.3.2 and 7.3.3, and I don't see any change
> that would explain a 10x slowdown.  Can you provide more details about
> exactly what slowed down?
>
> Also, what PG version were you using on the old RedHat 7.2 installation?
>
>             regards, tom lane