Thread: please please please PLEASE help!

please please please PLEASE help!

From
Steve
Date:
Hi,

I've asked this question a couple of times before on this forum but no
one seems to be nice enough to point me to the right direction or help
me out with any information, if possible. Please help me out with this
because this is a very serious issue for me and I need to learn more
about this. And here it is again:

I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy searches
on these tables are expected every 2 to 3 minutes. This unfortunately
gives a very poor response time to the end user and so I'm looking at
other alternatives now.

Currently, the postgresql installation is on a single disk and so all
the tables have their data read from a single disk. Searching on
different tables by multiple users at the same time results in very slow
searches, as it's mainly dependant on the spindle speed. I recently
gained access to another server which has 3 SCSI disks. I know there is
a way to mirror the tables across the three different disks but I'm not
sure if it's as easy as symlinking the files (WAL files only?) across.
Can anyone please tell me what to do here and how to harness the power
of the three SCSI drives that I have. Which files in the data directory
need to be moved? Is this safe? Can backups etc be easily done? Any
information will be greatly appreciated. Thank you,


Steve


Re: please please please PLEASE help!

From
"Goulet, Dick"
Date:
Steve,

    I'm a little short on PostgreSQL experience, but gaining fast.  I'm more of an Oracle nut.  But I think you've pin
pointedthe problem, namely disk contention.  The easiest solution I can think of would be to stripe the volume group
acrossall three drives thereby spreading the pain across the drives.  One other item I'd look at is creating an index
thatspecifically answers the most frequent queries.  Depending on your OS, you could mirror the data across two of the
drivesat the OS level, which would also spread the pain.  Another solution, but it has a price tag on it, is to acquire
anexternal disk array.  These arrays have memory that they use as additional buffers.  The cache gets populated with
themost frequent accesses data & then your not limited by the drives anymore. 

Dick Goulet
Senior Oracle DBA
Oracle Certified 8i DBA

-----Original Message-----
From: Steve [mailto:steve@hotmail.com]
Sent: Friday, July 23, 2004 2:11 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] please please please PLEASE help!


Hi,

I've asked this question a couple of times before on this forum but no
one seems to be nice enough to point me to the right direction or help
me out with any information, if possible. Please help me out with this
because this is a very serious issue for me and I need to learn more
about this. And here it is again:

I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy searches
on these tables are expected every 2 to 3 minutes. This unfortunately
gives a very poor response time to the end user and so I'm looking at
other alternatives now.

Currently, the postgresql installation is on a single disk and so all
the tables have their data read from a single disk. Searching on
different tables by multiple users at the same time results in very slow
searches, as it's mainly dependant on the spindle speed. I recently
gained access to another server which has 3 SCSI disks. I know there is
a way to mirror the tables across the three different disks but I'm not
sure if it's as easy as symlinking the files (WAL files only?) across.
Can anyone please tell me what to do here and how to harness the power
of the three SCSI drives that I have. Which files in the data directory
need to be moved? Is this safe? Can backups etc be easily done? Any
information will be greatly appreciated. Thank you,


Steve


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: please please please PLEASE help!

From
Andrew Biagioni
Date:
Steve,

I'm not as much of an expert on PostgreSQL as others, but at least I
have SOME answer.

The way to leverage the multi-disk capability is to use RAID, either
software or hardware, rather than trying to twist the PostgreSQL
structure and files to do so.

RAID (I forget what the acronym stands for) is basically a number of
methods to provide increased reliability and/or performance from
multiple disks, while having them appear to the programs (including DB
programs) as a single disk.

You will want to look for RAID information elsewhere, especially since
you should weigh the pros and cons of the various types of RAID, but
basically what you will want is to have your data pread out uniformly
across the disks so that any writes and reads (in your case, especially
the reads) are done in parallel on all three devices, thus significantly
cutting the access time.

Another solution that MIGHT help is increasing the memory.  If a
heavily-used index can be cached in RAM it will speed searches up a lot...

I would recommend that you also study ways to redistribute your data
and/or add better indices.  Keep in mind that an index will be quite
large for a big table, and may not be used to its full extent even when
it might seem that it should (at least in my experience).  You can also
use EXPLAIN and EXPLAIN ANALYZE to find out where your bottlenecks are,
to see what else can be done.

I hope this helps, and that the last paragraph didn't come across as
patronizing.  I'm aware that maybe you already did all that, but I can't
know -- and if you didn't, then it would be a shame not to mention it!

             Andrew


Steve wrote:

> Hi,
>
> I've asked this question a couple of times before on this forum but no
> one seems to be nice enough to point me to the right direction or help
> me out with any information, if possible. Please help me out with this
> because this is a very serious issue for me and I need to learn more
> about this. And here it is again:
>
> I've been running postgres on my server for over a year now and the
> tables have become huge. I have 3 tables that have data over 10GB each
> and these tables are read very very frequently. In fact, heavy
> searches on these tables are expected every 2 to 3 minutes. This
> unfortunately gives a very poor response time to the end user and so
> I'm looking at other alternatives now.
>
> Currently, the postgresql installation is on a single disk and so all
> the tables have their data read from a single disk. Searching on
> different tables by multiple users at the same time results in very
> slow searches, as it's mainly dependant on the spindle speed. I
> recently gained access to another server which has 3 SCSI disks. I
> know there is a way to mirror the tables across the three different
> disks but I'm not sure if it's as easy as symlinking the files (WAL
> files only?) across. Can anyone please tell me what to do here and how
> to harness the power of the three SCSI drives that I have. Which files
> in the data directory need to be moved? Is this safe? Can backups etc
> be easily done? Any information will be greatly appreciated. Thank you,
>
>
> Steve
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>
>

Re: please please please PLEASE help!

From
Kris Kiger
Date:
Steve,

    Are the three SCSI drives raided?  If so, I would move the operating
database to that machine.  From what I understand it can be a real
hassle to setup/maintain sym-linked tables.  If you don't have the option
to move the database, I would explore one of these two routes:
    Route one is replication.  Slony1 was just released to the open
source community.  Replicate the data to the new machine, then
distribute the database reads between the master and the slave.
    Route two is symlink database logs/txlogs/etc to the new machine.
 It takes extra write needs off of your main machine.  Not as effective
as route 1, I would say, but it would definately lessen the load.

Kris

>Hi,
>
>I've asked this question a couple of times before on this forum but no
>one seems to be nice enough to point me to the right direction or help
>me out with any information, if possible. Please help me out with this
>because this is a very serious issue for me and I need to learn more
>about this. And here it is again:
>
>I've been running postgres on my server for over a year now and the
>tables have become huge. I have 3 tables that have data over 10GB each
>and these tables are read very very frequently. In fact, heavy searches
>on these tables are expected every 2 to 3 minutes. This unfortunately
>gives a very poor response time to the end user and so I'm looking at
>other alternatives now.
>
>Currently, the postgresql installation is on a single disk and so all
>the tables have their data read from a single disk. Searching on
>different tables by multiple users at the same time results in very slow
>searches, as it's mainly dependant on the spindle speed. I recently
>gained access to another server which has 3 SCSI disks. I know there is
>a way to mirror the tables across the three different disks but I'm not
>sure if it's as easy as symlinking the files (WAL files only?) across.
>Can anyone please tell me what to do here and how to harness the power
>of the three SCSI drives that I have. Which files in the data directory
>need to be moved? Is this safe? Can backups etc be easily done? Any
>information will be greatly appreciated. Thank you,
>
>
>Steve
>
>
>



Re: please please please PLEASE help!

From
"Rob Bamber"
Date:
Another thought -

    We had a similar issue recently.  Our support guys dropped the database and
then rebuilt it from a dump file.  The size of the data directory went down
from 12GB to less than 2GB.  According to the sys ad that did the work
postgres is not very good a reclaiming disk space after large quantities of
tuples are deleted over time.

HTH

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Kris Kiger
Sent: 29 July 2004 16:17
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] please please please PLEASE help!


Steve,

    Are the three SCSI drives raided?  If so, I would move the operating
database to that machine.  From what I understand it can be a real
hassle to setup/maintain sym-linked tables.  If you don't have the option
to move the database, I would explore one of these two routes:
    Route one is replication.  Slony1 was just released to the open
source community.  Replicate the data to the new machine, then
distribute the database reads between the master and the slave.
    Route two is symlink database logs/txlogs/etc to the new machine.
 It takes extra write needs off of your main machine.  Not as effective
as route 1, I would say, but it would definately lessen the load.

Kris

>Hi,
>
>I've asked this question a couple of times before on this forum but no
>one seems to be nice enough to point me to the right direction or help
>me out with any information, if possible. Please help me out with this
>because this is a very serious issue for me and I need to learn more
>about this. And here it is again:
>
>I've been running postgres on my server for over a year now and the
>tables have become huge. I have 3 tables that have data over 10GB each
>and these tables are read very very frequently. In fact, heavy searches
>on these tables are expected every 2 to 3 minutes. This unfortunately
>gives a very poor response time to the end user and so I'm looking at
>other alternatives now.
>
>Currently, the postgresql installation is on a single disk and so all
>the tables have their data read from a single disk. Searching on
>different tables by multiple users at the same time results in very slow
>searches, as it's mainly dependant on the spindle speed. I recently
>gained access to another server which has 3 SCSI disks. I know there is
>a way to mirror the tables across the three different disks but I'm not
>sure if it's as easy as symlinking the files (WAL files only?) across.
>Can anyone please tell me what to do here and how to harness the power
>of the three SCSI drives that I have. Which files in the data directory
>need to be moved? Is this safe? Can backups etc be easily done? Any
>information will be greatly appreciated. Thank you,
>
>
>Steve
>
>
>



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


Re: please please please PLEASE help!

From
"Jouneau Luc"
Date:
Hi,

I don't want to offense you asking if you run vacuum full regullary. I
suppose that the 10Gb are plain of usefull data, aren't they ?

Luc
----- Original Message -----
From: "Steve" <steve@hotmail.com>
To: <pgsql-admin@postgresql.org>
Sent: Friday, July 23, 2004 8:10 AM
Subject: [ADMIN] please please please PLEASE help!


> Hi,
>
> I've asked this question a couple of times before on this forum but no
> one seems to be nice enough to point me to the right direction or help
> me out with any information, if possible. Please help me out with this
> because this is a very serious issue for me and I need to learn more
> about this. And here it is again:
>
> I've been running postgres on my server for over a year now and the
> tables have become huge. I have 3 tables that have data over 10GB each
> and these tables are read very very frequently. In fact, heavy searches
> on these tables are expected every 2 to 3 minutes. This unfortunately
> gives a very poor response time to the end user and so I'm looking at
> other alternatives now.
>
> Currently, the postgresql installation is on a single disk and so all
> the tables have their data read from a single disk. Searching on
> different tables by multiple users at the same time results in very slow
> searches, as it's mainly dependant on the spindle speed. I recently
> gained access to another server which has 3 SCSI disks. I know there is
> a way to mirror the tables across the three different disks but I'm not
> sure if it's as easy as symlinking the files (WAL files only?) across.
> Can anyone please tell me what to do here and how to harness the power
> of the three SCSI drives that I have. Which files in the data directory
> need to be moved? Is this safe? Can backups etc be easily done? Any
> information will be greatly appreciated. Thank you,
>
>
> Steve
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


Re: please please please PLEASE help!

From
Ian FREISLICH
Date:
"Rob Bamber" wrote:
> Another thought -
>
>       We had a similar issue recently.  Our support guys dropped the
> database and then rebuilt it from a dump file.  The size of the data
> directory went down from 12GB to less than 2GB.  According to the sys
> ad that did the work postgres is not very good a reclaiming disk space
> after large quantities of tuples are deleted over time.

And another thought,

Have you tried clustering your tables on the most frequently used
and/or time-important index in your joins?  (remember to VACUUM
ANALYZE after the cluster has completed)  You might find a huge
performance increase.  Clustering should also fix the problem
mentioned above because the table is physically re-written on disk.
Use EXPLAIN ANALYZE to find out which is the most time-consuming
part of your query and optimise that.

Also, have you analyzed your database recently?  If you've never
run VACUUM ANALYZE your query planner's statistics have never been
updated so the query planner might not be making the best choices.

Ian

--
Ian Freislich

Re: please please please PLEASE help!

From
Steve
Date:

Kris Kiger wrote:
> Steve,
>
>    Are the three SCSI drives raided?  If so, I would move the operating
> database to that machine.  From what I understand it can be a real
> hassle to setup/maintain sym-linked tables.  If you don't have the
> option to move the database, I would explore one of these two routes:
>    Route one is replication.  Slony1 was just released to the open
> source community.  Replicate the data to the new machine, then
> distribute the database reads between the master and the slave.
> Route two is symlink database logs/txlogs/etc to the new machine. It
> takes extra write needs off of your main machine.  Not as effective as
> route 1, I would say, but it would definately lessen the load.
>
> Kris
>

Thanks for replying Kris. Yes, the SCSI drives are raided. I believe in
that case I don't really need to do anything other than moving the
database to that machine? I think the drives are striped every 12K or
so, but I'm not sure. Moving it to the new machine has made it a lot
faster than before. It has around 2GB RAM and is a dual processor 3.0
Ghz Intel xeon. I've also 'chattr -R +A /path/to/data/dir' the database
and I think it has made a slight difference. I'll test it a bit more and
see if I can do something else to improve the speed.

Steve


Re: please please please PLEASE help!

From
Steve
Date:

Rob Bamber wrote:
> Another thought -
>
>     We had a similar issue recently.  Our support guys dropped the database and
> then rebuilt it from a dump file.  The size of the data directory went down
> from 12GB to less than 2GB.  According to the sys ad that did the work
> postgres is not very good a reclaiming disk space after large quantities of
> tuples are deleted over time.
>
> HTH
>

That's because you need to 'VACUUM [FULL | ANALYZE]' the database
frequently. For example, in our case there are about 2000
updates/inserts and around 50 'deletes' every 3 minutes . 'Update' and
'delete' operations are most expensive (in terms of disk space
utilization) as they tend to keep the updated/deleted tuples for a
longer period of time, until VACUUM is run. I've set it up so that a
daemon runs VACUUM ANALYZE every 6 minutes or so, and then a full
vacuum, i.e. VACUUM FULL ANALYZE, every 30 mins. As far as I know, in
postgresql 7.4, a simple VACUUM operation doesn't block read operations
and so occassianlly running VACUUM on the most 'popular' tables would be
a good idea. I'm sure in your case the frequency of calling vacuum could
vary, but that is basically why the database reduced to 2GB from 12GB.
VACUUM basically deletes unwanted tuples and indexes and so 'compresses'
the amount of disk space used (and so effectively speeding up queries
two to three orders of magnitude).

Steve