Thread: Huge number of raws

Huge number of raws

From
Anton Nikiforov
Date:
Dear All!
I have a question about how the PostgreSQL will manage a huge number of
raws.
I have a projet where each half an hour 10 millions of records will be
added to the database and they should be calculated, summarized and
managed.
I'm planning to have a few servers that will receive something like a
million records per server and then they will store this data into the
centeral server in report-ready format.
I know that one million records could be managed by postgres (i have a
database with 25 millions of records and it is working just fine)
But i'm worry about mentioned centeral database that should store 240
millions of records daily and should collect this data for years.
I cannot even imagine the needed hardware to collect monthly statistics.
And my question is - is this task is for postgres, or i should think
about Oracle or DB2?
I'm also thinking about replication of data between two servers for
redundancy, what you could suggst for this?
And the data migration problem is still an opened issue for me - how to
make data migration from fast devices (RAID ARRAY) to slower devices (MO
Library or something like this) still having access to this data?

--
Best regads,
Anton Nikiforov

Attachment

Re: Huge number of raws

From
Anton Nikiforov
Date:
Anton Nikiforov пишет:

> Dear All!
> I have a question about how the PostgreSQL will manage a huge number
> of raws.
> I have a projet where each half an hour 10 millions of records will be
> added to the database and they should be calculated, summarized and
> managed.
> I'm planning to have a few servers that will receive something like a
> million records per server and then they will store this data into the
> centeral server in report-ready format.
> I know that one million records could be managed by postgres (i have a
> database with 25 millions of records and it is working just fine)
> But i'm worry about mentioned centeral database that should store 240
> millions of records daily and should collect this data for years.
> I cannot even imagine the needed hardware to collect monthly
> statistics. And my question is - is this task is for postgres, or i
> should think about Oracle or DB2?
> I'm also thinking about replication of data between two servers for
> redundancy, what you could suggst for this?
> And the data migration problem is still an opened issue for me - how
> to make data migration from fast devices (RAID ARRAY) to slower
> devices (MO Library or something like this) still having access to
> this data?
>
And one more question - is there in postgress something like table
partitioning in Oracle to store data according to the some rules, like a
group of data source (IP network or something)?

--
Best regads,
Anton Nikiforov


Attachment

Re: Huge number of raws

From
Bruno Wolff III
Date:
On Thu, Mar 18, 2004 at 11:59:41 +0300,
  Anton Nikiforov <anton@nikiforov.ru> wrote:
> And one more question - is there in postgress something like table
> partitioning in Oracle to store data according to the some rules, like a
> group of data source (IP network or something)?

There isn't currently a partitioning tool. Some table space features
are being developed for 7.5.

Depending on what you really are trying to accomplish there are some
things you can do now. Partial indexes get be used to speed up index
searches over a subset of rows in a table. You can also use sym links
to move tables on to different file systems, but you need to be careful
when doing this.

Re: Huge number of raws

From
Anton Nikiforov
Date:
Hello Bruno and thanks for your replay.
But how PostgreSQL could deal with 1E11 (billions) records in the table?
Do you know how insers will be processed (slow insert problem)?
I have a database with already 50 millions of records (i just doubled my
25 million database) to check how Postgres will fill itself and it is
just OK with inserts, even updates.
But my database is 1000 times smaler than the one planned.
Maybe someone did test the database with such a huge amount of records,
or somebody from develpers could estimate the situation?
I disciver almost all internet :) To find someone having such an amount
of data in the database, but was not sucsseeded with this.

Regards,
Anton

Bruno Wolff III ?????:

>On Thu, Mar 18, 2004 at 11:59:41 +0300,
>  Anton Nikiforov <anton@nikiforov.ru> wrote:
>
>
>>And one more question - is there in postgress something like table
>>partitioning in Oracle to store data according to the some rules, like a
>>group of data source (IP network or something)?
>>
>>
>
>There isn't currently a partitioning tool. Some table space features
>are being developed for 7.5.
>
>Depending on what you really are trying to accomplish there are some
>things you can do now. Partial indexes get be used to speed up index
>searches over a subset of rows in a table. You can also use sym links
>to move tables on to different file systems, but you need to be careful
>when doing this.
>
>
>
>
>


--
? ????????? (Best regads),
????? ????????? (Anton Nikiforov)

???.(Tel.): +7 095 7814200
????(Fax) : +7 095 7814201
???.(Cell): +7 905 7245310


Attachment

Re: Huge number of raws

From
Francisco Reyes
Date:
On Thu, 18 Mar 2004, Anton Nikiforov wrote:

> But i'm worry about mentioned centeral database that should store 240
> millions of records daily and should collect this data for years.

I have not worked with anything even remotely so big.
A few thougths..
I think this is more of a hardware issue than a PostgreSQL issue. I think
a good disk subsystem will be a must. Last time I was looking for my ex
employer at large disk subsystems I think the one we were leaning towards
was an IBM disk subsystem. I think it was in the $100,000 range.

Regardless of architecture (ie PC, SUN, etc..) SMP may be of help if you
have concurrent users. Lots and lots of memory will help too.

> And the data migration problem is still an opened issue for me - how to
> make data migration from fast devices (RAID ARRAY) to slower devices (MO
> Library or something like this) still having access to this data?

Don't follow you there. You mean backup?
You can make a pg_dump of the data while the DB is running and then back
that up.

Or were you talking about something else like storing different data in
different media speeds? (Like Hierarchical Storage Management)

Re: Huge number of raws

From
Anton Nikiforov
Date:
Francisco Reyes ?????:

>On Thu, 18 Mar 2004, Anton Nikiforov wrote:
>
>
>
>>But i'm worry about mentioned centeral database that should store 240
>>millions of records daily and should collect this data for years.
>>
>>
>
>I have not worked with anything even remotely so big.
>A few thougths..
>I think this is more of a hardware issue than a PostgreSQL issue. I think
>a good disk subsystem will be a must. Last time I was looking for my ex
>employer at large disk subsystems I think the one we were leaning towards
>was an IBM disk subsystem. I think it was in the $100,000 range.
>
>Regardless of architecture (ie PC, SUN, etc..) SMP may be of help if you
>have concurrent users. Lots and lots of memory will help too.
>
>
>
>>And the data migration problem is still an opened issue for me - how to
>>make data migration from fast devices (RAID ARRAY) to slower devices (MO
>>Library or something like this) still having access to this data?
>>
>>
>
>Don't follow you there. You mean backup?
>You can make a pg_dump of the data while the DB is running and then back
>that up.
>
>Or were you talking about something else like storing different data in
>different media speeds? (Like Hierarchical Storage Management)
>
>
I do not exactly know how to deal wth such a huge amount of data. The disk subsytem is the must and i do undrstand
this.SMP architecture is the must also. 
I was asking is there any way that data will migrate from fast disk subsystem to slower but relyible automaticaly. Like
inNivell Netware (i used to work with it 7-8 years ago) you could ask the system if the file is untached for a month -
thenmove it from one disk to magnetic-optical or tape but if this file is requested OS could move it back to the
operationalvolume. 

Anton


Attachment

Re: Huge number of raws

From
Richard Huxton
Date:
On Friday 19 March 2004 08:10, Anton Nikiforov wrote:
>
> I do not exactly know how to deal wth such a huge amount of data. The disk
> subsytem is the must and i do undrstand this. SMP architecture is the must
> also. I was asking is there any way that data will migrate from fast disk
> subsystem to slower but relyible automaticaly. Like in Nivell Netware (i
> used to work with it 7-8 years ago) you could ask the system if the file is
> untached for a month - then move it from one disk to magnetic-optical or
> tape but if this file is requested OS could move it back to the operational
> volume.

Won't work for a database - it would have to sit there waiting while someone
went to the vault, got a tape out and restored a few GB of data.

You could separate out old data, say monthly into archive tables and then
backup/drop those.

--
  Richard Huxton
  Archonet Ltd

Re: Huge number of raws

From
Francisco Reyes
Date:
On Fri, 19 Mar 2004, Anton Nikiforov wrote:

> >Or were you talking about something else like storing different data in
> >different media speeds? (Like Hierarchical Storage Management)
> >
> >
> I do not exactly know how to deal wth such a huge amount of data. The disk subsytem is the must and i do undrstand
this.SMP architecture is the must also. 
> I was asking is there any way that data will migrate from fast disk subsystem to slower but relyible automaticaly.
Likein Nivell Netware (i used to work with it 7-8 years ago) you could ask the system if the file is untached for a
month- then move it from one disk to magnetic-optical or tape but if this file is requested OS could move it back to
theoperational volume. 

Ok so you were talking about HSM (Hierarchical Storage Management). I
don't believe it's possible at this time anything like that with
PostgreSQL. Also I don't think, personally, that anything like that will
ever be implemented. The main object of a database is to allow access to
data quickly. HSM is a way to keep cost of storage low.

There are, however, ways you could do this with programs.
If there is data that your users will not need often you could write
programs to move that particular data from one server to another.