Thread: performance issues for processing more then 150000 records / day.

performance issues for processing more then 150000 records / day.

From
"Jakab Laszlo"
Date:
Hello,
 
We would like to use PostgreSQL for one project.
This project would need to handle some of 150000 - 200000 (thousands records) per day.
 
Is there somebody having experience with Postgresql in this kind of environment.
Can anybody advice us regarding specific PostgreSQL issues for such kind of datamanipulation ?
 
Best Regards,
Jaki

Re: performance issues for processing more then 150000 records / day.

From
"Christopher Kings-Lynne"
Date:
Hi Jakab,
 
I don't see why postgresql wouldn't be able to handle that - it's a relatively small amount of data.
 
When you say 150000 records/day, do you mean:
 
1. 150000 inserts/day (ie. database grows constantly, quickly)
2. 150000 updates/deletes/day
3. 150000 transactions/select/day
 
If you give us a little more information, we'll be able to help you tune your postgres for your application.
 
Regards,
 
Chris
 
----- Original Message -----
Sent: Friday, February 21, 2003 3:48 PM
Subject: [PERFORM] performance issues for processing more then 150000 records / day.

Hello,
 
We would like to use PostgreSQL for one project.
This project would need to handle some of 150000 - 200000 (thousands records) per day.
 
Is there somebody having experience with Postgresql in this kind of environment.
Can anybody advice us regarding specific PostgreSQL issues for such kind of datamanipulation ?
 
Best Regards,
Jaki

Re: performance issues for processing more then 150000 records / day.

From
"Jakab Laszlo"
Date:
Hello Chris,
 
I mean here 150000 inserts/day ( quickly grows constantly )...  - with transactions and on the same table ... maybe after one significant amount of time we can move the records of one year to one archive table ...
And some 2-3 millions of selects / day ...
 
I would like to know also some hardware related advice.
 
thanks,
Jaki
----- Original Message -----
Sent: Friday, February 21, 2003 10:04 AM
Subject: Re: [PERFORM] performance issues for processing more then 150000 records / day.

Hi Jakab,
 
I don't see why postgresql wouldn't be able to handle that - it's a relatively small amount of data.
 
When you say 150000 records/day, do you mean:
 
1. 150000 inserts/day (ie. database grows constantly, quickly)
2. 150000 updates/deletes/day
3. 150000 transactions/select/day
 
If you give us a little more information, we'll be able to help you tune your postgres for your application.
 
Regards,
 
Chris
 
----- Original Message -----
Sent: Friday, February 21, 2003 3:48 PM
Subject: [PERFORM] performance issues for processing more then 150000 records / day.

Hello,
 
We would like to use PostgreSQL for one project.
This project would need to handle some of 150000 - 200000 (thousands records) per day.
 
Is there somebody having experience with Postgresql in this kind of environment.
Can anybody advice us regarding specific PostgreSQL issues for such kind of datamanipulation ?
 
Best Regards,
Jaki

Re: performance issues for processing more then 150000 records / day.

From
"Shridhar Daithankar"
Date:
On 21 Feb 2003 at 10:33, Jakab Laszlo wrote:

>
> Hello Chris,
>
> I mean here 150000 inserts/day ( quickly grows constantly )... - with
> transactions and on the same table ... maybe after onesignificant amount of
> time we can move the records of one year to one archivetable ...
> Andsome 2-3 millions of selects / day ...
>
> I would like to know also some hardware related advice.

Use a 64 bit machine with SCSI, preferrably RAID to start with. You can search
list archives for similar problems and solutions.

HTH

Bye
 Shridhar

--
Fog Lamps, n.:    Excessively (often obnoxiously) bright lamps mounted on the
fronts    of automobiles; used on dry, clear nights to indicate that the    driver's
brain is in a fog.  See also "Idiot Lights".


Re: performance issues for processing more then 150000

From
Justin Clift
Date:
Jakab Laszlo wrote:
> Hello Chris,
>
> I mean here 150000 inserts/day ( quickly grows constantly )...  - with
> transactions and on the same table ... maybe after one significant
> amount of time we can move the records of one year to one archive table ...
> And some 2-3 millions of selects / day ...

That's no problem at all, depending on:

  + How complex are the queries you're intending on running?

  + How will the data be spread between the tables?

  + The amount of data per row also makes a difference, if it is
extremely large.


> I would like to know also some hardware related advice.

You're almost definitely going to be needing a SCSI or better RAID
array, plus a server with quite a few GB's of ECC memory.

If you need to get specific about hardware to the point of knowing
exactly what you're needing, you're likely best to pay a good PostgreSQL
consultant to study your proposal in depth.

Hope this helps.

Regards and best wishes,

Justin Clift


> thanks,
> Jaki

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi


Re: performance issues for processing more then 150000

From
"Jakab Laszlo"
Date:
> Jakab Laszlo wrote:
> > Hello Chris,
> >
> > I mean here 150000 inserts/day ( quickly grows constantly )...  - with
> > transactions and on the same table ... maybe after one significant
> > amount of time we can move the records of one year to one archive table
...
> > And some 2-3 millions of selects / day ...
>
> That's no problem at all, depending on:
>
>   + How complex are the queries you're intending on running?
>   + How will the data be spread between the tables?
>
>   + The amount of data per row also makes a difference, if it is
> extremely large.
>

The main information is a barcode.  The big table will have aprox 3 fields
(just id type fields).
Not so complex there will be mainly about 50 tables with all the info
totally normalized and with some around 10-20 tables which make the links
between them.  Usually the join is starting form one much smaller table and
with left join goes trough the big table (barcodes) and maybe also on other
tables.
And then this table with barcode (this is the bigone with 150000
records/day) linked to other smaller tables. ( 1 bigger with aprox 7000
inserts/day).

> > I would like to know also some hardware related advice.
>
> You're almost definitely going to be needing a SCSI or better RAID
> array, plus a server with quite a few GB's of ECC memory.

Unfortunatelly the hardware budget should be keept as low as possible.
I was thinking is there could be reliable solution based on dual processor
and ATA 133 raid mirroring normally with some gigs of memory.

Thanks,
Jaki



Re: performance issues for processing more then 150000

From
"Shridhar Daithankar"
Date:
On 21 Feb 2003 at 11:59, Jakab Laszlo wrote:
> Unfortunatelly the hardware budget should be keept as low as possible.
> I was thinking is there could be reliable solution based on dual processor
> and ATA 133 raid mirroring normally with some gigs of memory.

Gigs of memory are not as much important as much badnwidth you have. For these
kind of databases, a gig or two would not make as much difference as much
faster disks would do.

If you are hell bent on budget, I suggest you write a custom layer that
consolidates results of query from two boxes and throw two intel boxes at it.
Essentially partitioning the data.

If your queries are simple enough to split and consolidate, try it. It might
give you best performance..


Bye
 Shridhar

--
Blutarsky's Axiom:    Nothing is impossible for the man who will not listen to
reason.


Re: performance issues for processing more then 150000

From
Josh Berkus
Date:
Jakab,

Some simple tips, which is what I thing you were really asking for:

Are you adding the records in a single overnight or downtime load batch?  If
so, the fastest way by far is to:
1) disable all triggers and constraints on the table temporarily, and some or
all of the indexes
2) put all the data into a COPY file (tab-delimited text; see COPY docs)
3) load the data through a COPY statement
4) re-enable the trigger and constraints and re-build the indexes
The degree to which you need to do 1) and 4) depends on how much punishment
your system can take; start out by dropping and rebuilding just the triggers
and move up from there until the load finishes in a satisfactory time.

If the records are being added on a continuous basis and not in a big batch,
then take the following precautions:
1) put as many inserts as possible into transaction batches
2) minimize your use of constraints, triggers, and indexes on the tables being
loaded
3) consdier using a "buffer table" to hold records about to be loaded while
data integrity checks and similar are performed.

> Unfortunatelly the hardware budget should be keept as low as possible.
> I was thinking is there could be reliable solution based on dual processor
> and ATA 133 raid mirroring normally with some gigs of memory.

1 gig of RAM may be plenty.  Your main bottleneck will be your disk channel.
If I were setting up your server, I might do something like:

1) buy a motherboard with 4 ATA controllers.
2) put disks like:
    channel 0: 1 matched pair disks
    channel 1 + 2: 1 matched quartet of disks
    channel 3: single ATA disk
    (for Postgresql, more, smaller disks is almost always better than a few big
ones.) (alternately, set up everythin in one big RAID 5 array with at least 6
disks.  There is argument about which is better)
3) Format the above as a RAID 1 pair on channel 0 and a RAID 1+0 double pair
on channel 1 using Linux software RAID
4) Put Linux OS + swap on channel 0.  Put the database on channel 1+2.  Put
the pg_xlog (the transaction log) on channel 3.  Make sure to use a version
of Linux with kernel 2.4.19 or greater!

That's just one configuration of several possible, of course, but may serve
your purposes admirably and relatively cheaply.

--
Josh Berkus
Aglio Database Solutions
San Francisco