Thread: Sending a select to multiple servers.
I would like to know if the following kind of database client exists: I need a 'select' query to be sent to say 10 db serverssimultaneously in parallel (using threading), the results should be re-sorted and returned. For example I have a query:'select * from table where parent_clname = 'parent' order by name limit 10'. Now this query has to be sent to 10 servers,and the maximum number of results would be 100. Now this 100 result set has to be re-sorted, out of which 90 hasto be discarded, and the 10 has to be returned. Does such a solution exist now. To me this appears to be in entirety of what should constitute a database cluster. Onlythe search needs to be done on all the servers simultaneously at the low level. Once you get the results, the writingcan be determined by the upper level logic (which can even be in a scripting language). But the search across manyservers has to be done using proper threading, and the re-sorting also needs to be done fast. Thanks a lot in advance. -- :: Ligesh :: http://ligesh.com
On Fri, 26 Aug 2005 20:54:09 +0530 Ligesh <gxlists@gmail.com> wrote: > > I would like to know if the following kind of database client exists: > I need a 'select' query to be sent to say 10 db servers > simultaneously in parallel (using threading), the results should be > re-sorted and returned. For example I have a query: 'select * from > table where parent_clname = 'parent' order by name limit 10'. Now > this query has to be sent to 10 servers, and the maximum number of > results would be 100. Now this 100 result set has to be re-sorted, > out of which 90 has to be discarded, and the 10 has to be returned. > > Does such a solution exist now. To me this appears to be in entirety > of what should constitute a database cluster. Only the search needs > to be done on all the servers simultaneously at the low level. Once > you get the results, the writing can be determined by the upper level > logic (which can even be in a scripting language). But the search > across many servers has to be done using proper threading, and the > re-sorting also needs to be done fast. This is typically handled by the application layer, not a standard client. Mostly because every situation is different, you may have 10 servers and need 10 rows of results, others may need something entirely different. This isn't really a "cluster" either. In a clustered environment you would send the one query to any of the 10 servers and it would return the proper results. But like I said this type of application is fairly trivial to write in most scripting or higher level languages. --------------------------------- Frank Wiles <frank@wiles.org> http://www.wiles.org ---------------------------------
> Does such a solution exist now. To me this appears to be in entirety of > what should constitute a database cluster. Only the search needs to be > done on all the servers simultaneously at the low level. Once you get the > results, the writing can be determined by the upper level logic (which can > even be in a scripting language). But the search across many servers has > to be done using proper threading, and the re-sorting also needs to be > done fast. Well the fastest way would be to write a libpq wrapper, personally I would choose C++ for extreme performance. STL bring super fast sorting to the table and will make dealing with ExecParams/ExecPrepared a little bit easier. To make available from scripting languages you need to make C wrappers for interface functions and build in a shared library. You could use any of a number of high level scripting languages but performance will not be as good. YMMV. Antother interesting take on this problem would be to use dblink contrib. module. Check that out and see if it can meet your needs. Merlin
On Fri, Aug 26, 2005 at 11:04:59AM -0500, Frank Wiles wrote: > On Fri, 26 Aug 2005 20:54:09 +0530 > This is typically handled by the application layer, not a standard > client. Mostly because every situation is different, you may have > 10 servers and need 10 rows of results, others may need something > entirely different. > > This isn't really a "cluster" either. In a clustered environment > you would send the one query to any of the 10 servers and it would > return the proper results. > > But like I said this type of application is fairly trivial to write > in most scripting or higher level languages. > The cluster logic is sort of implemented by this client library. If you write this at higher level the scalability becomesan issue. For 10 servers it is alright. If you want to retrieve 100 rows, and there are 100 servers, you will havea total result set of 10,000, which should be re-sorted and the 9900 of them should be droped, and only the 100 shouldbe returned. Anyway, what I want to know is, if there is such a functionality offered by some C library. Thanks. -- :: Ligesh :: http://ligesh.com
On Fri, Aug 26, 2005 at 11:04:59AM -0500, Frank Wiles wrote: > On Fri, 26 Aug 2005 20:54:09 +0530 > Ligesh <gxlists@gmail.com> wrote: > Mostly because every situation is different, you may have > 10 servers and need 10 rows of results, others may need something > entirely different. > No. I have say 'm' number of servers, and I need 'n' rows. To get the results, you need to run the query against all the'm' servers, which will return 'm x n' results, then you have to re-sort it and drop the 'm x n - n' rows and return onlythe 'n'. So this is like retrieving the 'n' rows amongst ALL the servers, that satisfy your search criteria. Once you retrieve the data, you will know which server each row belongs to, and you can do the writes yourself at the higherlevel. Thanks. -- :: Ligesh :: http://ligesh.com