Home > mailing lists

Re: Mass-Data question - Mailing list pgsql-general

From	Boris Köster
Subject	Re: Mass-Data question
Date	April 16, 2002 05:45:53
Msg-id	1135213696.20020416114535@x-itec.de Whole thread Raw
In response to	Re: Mass-Data question (Curt Sampson <cjs@cynic.net>)
Responses	Re: Mass-Data question Re: Mass-Data question
List	pgsql-general

Tree view

Hello Curt,

Tuesday, April 16, 2002, 5:25:25 AM, you wrote:

>> Hmm, interesting. I have similar needs.

CS> As do I. Unfortuantely, I'm not a guru. But I'll be testing out
CS> something like this in the next few weeks if all goes well. I was
CS> planning to do some fairly simple data partitioning. My initial
CS> plan is to drop the data into multiple tables across multiple
CS> servers, partitioned by date, and have a master table indicating
CS> the names of the various tables and the date ranges they cover.

Aha, interesting.

CS> The application will then deal with determining which tables the
CS> query will be spread across, construct and submit the appropriate
CS> queries (eventually in parallel, if I'm getting a lot of queries
CS> crossing multiple tables), and collate the results.

Parallel querying sounds very interesting to me. My current plan was
to do parallel writing because the hard-drives are not fast enough to
collect all the data, your idea of parallel reading is very
intersting.

I have written a C++ library to access mysql+postgresql databases. My
OS is FreeBSD, but it should work with other OSes, too I think.

Normally it sounds not very complex to do parallelized
reading/writing but getting the results in the right order that is a
problem. Maybe I could collect data parallelized from several
machines via threads, writing the content to a (new) machine (?) if the numer of rows is
not higher than x rows to avoid disk-overrun. The advantage could be
that if this works, its possible to use that feature with pgsql+mysql.

----------   ----------
rdbms1       rdbms[n]
----------   ----------
    |             |
    |             |
    ---------------
           |
           |distributed writing for logfiles or similar into databases
           |
           |         ----------
           |-------- rdbms-tmp  temporary db-server (?)
           |         ---------- to analyze the data for parallelized
           |              |      reading like a temporary space... ?
           |              |
           |              |---- > Customer-Access for analyzing
    --------------
     Machine with Memory-Queue implementation for fast reading/writing
     "Collector for writing and distributing the content"
    --------------
           |
           |
        Internet
----------   ----------
client1      client[n]
----------   ----------

What do the GURUs think about this? I need this functionality within
the next 1-2 month and I could try to code it as a C++ library. If the
concept is not bogus, the only question left is if i should give out
the source for free or not, this is no solution for a home-user *gg
I have no idea.

--
Best regards,
 Boris Köster                           mailto:koester@x-itec.de

pgsql-general by date:

From: Boris Köster
Date: 16 April 2002, 05:44:02
Subject: Re: Mass-Data question

From: Jochem van Dieten
Date: 16 April 2002, 06:19:14
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!

Re: Mass-Data question - Mailing list pgsql-general

Previous

Next