Re: Mass-Data question - Mailing list pgsql-general
From | Boris Köster |
---|---|
Subject | Re: Mass-Data question |
Date | |
Msg-id | 1135213696.20020416114535@x-itec.de Whole thread Raw |
In response to | Re: Mass-Data question (Curt Sampson <cjs@cynic.net>) |
Responses |
Re: Mass-Data question
Re: Mass-Data question |
List | pgsql-general |
Hello Curt, Tuesday, April 16, 2002, 5:25:25 AM, you wrote: >> Hmm, interesting. I have similar needs. CS> As do I. Unfortuantely, I'm not a guru. But I'll be testing out CS> something like this in the next few weeks if all goes well. I was CS> planning to do some fairly simple data partitioning. My initial CS> plan is to drop the data into multiple tables across multiple CS> servers, partitioned by date, and have a master table indicating CS> the names of the various tables and the date ranges they cover. Aha, interesting. CS> The application will then deal with determining which tables the CS> query will be spread across, construct and submit the appropriate CS> queries (eventually in parallel, if I'm getting a lot of queries CS> crossing multiple tables), and collate the results. Parallel querying sounds very interesting to me. My current plan was to do parallel writing because the hard-drives are not fast enough to collect all the data, your idea of parallel reading is very intersting. I have written a C++ library to access mysql+postgresql databases. My OS is FreeBSD, but it should work with other OSes, too I think. Normally it sounds not very complex to do parallelized reading/writing but getting the results in the right order that is a problem. Maybe I could collect data parallelized from several machines via threads, writing the content to a (new) machine (?) if the numer of rows is not higher than x rows to avoid disk-overrun. The advantage could be that if this works, its possible to use that feature with pgsql+mysql. ---------- ---------- rdbms1 rdbms[n] ---------- ---------- | | | | --------------- | |distributed writing for logfiles or similar into databases | | ---------- |-------- rdbms-tmp temporary db-server (?) | ---------- to analyze the data for parallelized | | reading like a temporary space... ? | | | |---- > Customer-Access for analyzing -------------- Machine with Memory-Queue implementation for fast reading/writing "Collector for writing and distributing the content" -------------- | | Internet ---------- ---------- client1 client[n] ---------- ---------- What do the GURUs think about this? I need this functionality within the next 1-2 month and I could try to code it as a C++ library. If the concept is not bogus, the only question left is if i should give out the source for free or not, this is no solution for a home-user *gg I have no idea. -- Best regards, Boris Köster mailto:koester@x-itec.de
pgsql-general by date: