Thread: PGSQL with high number of database rows?

PGSQL with high number of database rows?

From
Tim Perrett
Date:
Hey all

I am possibly looking to use PSGSQL in a project I am working on for a very
large client. The upshot of this is the throughput of data will be pretty
massive, around 20,000 new rows in one of the tables per day. We also have to
keep this data online for a set period so after 5 or 6 weeks it could have
nearly a million rows.

Are there any implications with possibly doing this? will PG handle it? Are
there realworld systems using PG that have a massive amount of data in them?

All the best, thanks for any advice up front

Tim

Re: PGSQL with high number of database rows?

From
Magnus Hagander
Date:
On Tue, Apr 03, 2007 at 09:28:28AM +0100, Tim Perrett wrote:
> Hey all
>
> I am possibly looking to use PSGSQL in a project I am working on for a very
> large client. The upshot of this is the throughput of data will be pretty
> massive, around 20,000 new rows in one of the tables per day. We also have to
> keep this data online for a set period so after 5 or 6 weeks it could have
> nearly a million rows.
>
> Are there any implications with possibly doing this? will PG handle it? Are
> there realworld systems using PG that have a massive amount of data in them?

This is in no way massive for pg. Many millions of rows is not a problem at
all, given that you have proper schema and indexing, and run on reasonable
hardware (hint: it might be a bit slow on your laptop). 20,000 rows / day
is still no more than about 14 / minute, which is a very light load for a
server grade machine to deal with without any problem at all.

//Magnus


Re: PGSQL with high number of database rows?

From
Dave Page
Date:
Tim Perrett wrote:
> Hey all
>
> I am possibly looking to use PSGSQL in a project I am working on for a very
> large client. The upshot of this is the throughput of data will be pretty
> massive, around 20,000 new rows in one of the tables per day. We also have to
> keep this data online for a set period so after 5 or 6 weeks it could have
> nearly a million rows.
>
> Are there any implications with possibly doing this? will PG handle it? Are
> there realworld systems using PG that have a massive amount of data in them?

In all honesty that's really not that big. There are systems out there
with database sizes in the multiple terabyte range with billions of rows.

A few million shouldn't cause you any issues, unless they're
exceptionally wide.

Regards, Dave.

Re: PGSQL with high number of database rows?

From
"Albe Laurenz"
Date:
> I am possibly looking to use PSGSQL in a project I am working on for a
very
> large client. The upshot of this is the throughput of data will be
pretty
> massive, around 20,000 new rows in one of the tables per day.
> We also have tokeep this data online for a set period so after 5 or 6
weeks
> it could have nearly a million rows.
>
> Are there any implications with possibly doing this? will PG
> handle it?

What do you mean, massive? A mere 1000000 rows?
I don't think that a small database like this will be a worry.
Try to avoid unnecessary table scans by using indexes!

Yours,
Laurenz Albe

Re: PGSQL with high number of database rows?

From
Listmail
Date:
> Are there any implications with possibly doing this? will PG handle it?
> Are there realworld systems using PG that have a massive amount of data
> in them?

    It's not how much data you have, it's how you query it.

    You can have a table with 1000 rows and be dead slow if said rows are big
TEXT data and you seq-scan it in its entierety on every webpage hit your
server gets...
    You can have a terabyte table with billions of row, and be fast if you
know what you're doing and have proper indexes.

    Learning all this is very interesting. MySQL always seemed hostile to me,
but postgres is friendly, has helpful error messages, the docs are great,
and the developer team is really nice.

    The size of your data has no importance (unless your disk is full), but
the size of your working set does.

    So, if you intend on querying your data for a website, for instance,
where the user searches data using forms, you will need to index it
properly so you only need to explore small sections of your data set in
order to be fast.

    If you intend to scan entire tables to generate reports or statistics,
you will be more interested in knowing if the size of your RAM is larger
or smaller than your data set, and about your disk throughput.

    So, what is your application ?

Re: PGSQL with high number of database rows?

From
"Harvey, Allan AC"
Date:
Tim,
> massive, around 20,000 new rows in one of the tables per day. 
As an example...
I'm doing about 4000 inserts spread across about 1800 tables per minute.
Pisses it in with fsync off and the PC ( IBM x3650 1 CPU, 1 Gig memory ) on a UPS.

Allan


The material contained in this email may be confidential, privileged or copyrighted. If you are not the intended
recipient,use, disclosure or copying of this information is prohibited. If you have received this document in error,
pleaseadvise the sender and delete the document. Neither OneSteel nor the sender accept responsibility for any viruses
containedin this email or any attachments.