Thread: Sending a select to multiple servers.

Sending a select to multiple servers.

From

Ligesh

Date:

26 August 2005, 15:43:22

 I would like to know if the following kind of database client exists: I need a 'select' query to be sent to say 10 db
serverssimultaneously in parallel (using threading), the results should be re-sorted and returned. For example I have a
query:'select * from table where parent_clname = 'parent' order by name limit 10'. Now this query has to be sent to 10
servers,and the maximum number of results would be 100. Now this 100 result set has to be re-sorted, out of which 90
hasto be discarded, and the 10 has to be returned. 

 Does such a solution exist now. To me this appears to be in entirety of what should constitute a database cluster.
Onlythe search needs to be done on all the servers simultaneously at the low level. Once you get the results, the
writingcan be determined by the upper level logic (which can even be in a scripting language). But the search across
manyservers has to be done using proper threading, and the re-sorting also needs to be done fast.  

 Thanks a lot in advance.


--
:: Ligesh :: http://ligesh.com

Re: Sending a select to multiple servers.

From

Frank Wiles

Date:

26 August 2005, 16:06:21

On Fri, 26 Aug 2005 20:54:09 +0530
Ligesh <gxlists@gmail.com> wrote:

>
>  I would like to know if the following kind of database client exists:
>  I need a 'select' query to be sent to say 10 db servers
>  simultaneously in parallel (using threading), the results should be
>  re-sorted and returned. For example I have a query: 'select * from
>  table where parent_clname = 'parent' order by name limit 10'. Now
>  this query has to be sent to 10 servers, and the maximum number of
>  results would be 100. Now this 100 result set has to be re-sorted,
>  out of which 90 has to be discarded, and the 10 has to be returned.
>
>  Does such a solution exist now. To me this appears to be in entirety
>  of what should constitute a database cluster. Only the search needs
>  to be done on all the servers simultaneously at the low level. Once
>  you get the results, the writing can be determined by the upper level
>  logic (which can even be in a scripting language). But the search
>  across many servers has to be done using proper threading, and the
>  re-sorting also needs to be done fast.

  This is typically handled by the application layer, not a standard
  client.  Mostly because every situation is different, you may have
  10 servers and need 10 rows of results, others may need something
  entirely different.

  This isn't really a "cluster" either.  In a clustered environment
  you would send the one query to any of the 10 servers and it would
  return the proper results.

  But like I said this type of application is fairly trivial to write
  in most scripting or higher level languages.

 ---------------------------------
   Frank Wiles <frank@wiles.org>
   http://www.wiles.org
 ---------------------------------

Re: Sending a select to multiple servers.

From

"Merlin Moncure"

Date:

26 August 2005, 16:12:48

>  Does such a solution exist now. To me this appears to be in entirety
of
> what should constitute a database cluster. Only the search needs to be
> done on all the servers simultaneously at the low level. Once you get
the
> results, the writing can be determined by the upper level logic (which
can
> even be in a scripting language). But the search across many servers
has
> to be done using proper threading, and the re-sorting also needs to be
> done fast.

Well the fastest way would be to write a libpq wrapper, personally I
would choose C++ for extreme performance.  STL bring super fast sorting
to the table and will make dealing with ExecParams/ExecPrepared a little
bit easier.  To make available from scripting languages you need to make
C wrappers for interface functions and build in a shared library.

You could use any of a number of high level scripting languages but
performance will not be as good.  YMMV.

Antother interesting take on this problem would be to use dblink
contrib. module.  Check that out and see if it can meet your needs.

Merlin

Re: Sending a select to multiple servers.

From

Ligesh

Date:

26 August 2005, 22:35:44

On Fri, Aug 26, 2005 at 11:04:59AM -0500, Frank Wiles wrote:
> On Fri, 26 Aug 2005 20:54:09 +0530
>   This is typically handled by the application layer, not a standard
>   client.  Mostly because every situation is different, you may have
>   10 servers and need 10 rows of results, others may need something
>   entirely different.
>
>   This isn't really a "cluster" either.  In a clustered environment
>   you would send the one query to any of the 10 servers and it would
>   return the proper results.
>
>   But like I said this type of application is fairly trivial to write
>   in most scripting or higher level languages.
>

 The cluster logic is sort of implemented by this client library. If you write this at higher level the scalability
becomesan issue. For 10 servers it is alright. If you want to retrieve 100 rows, and there are 100 servers, you will
havea total result set of 10,000, which should be re-sorted and the 9900 of them should be droped, and only the 100
shouldbe returned. 

 Anyway, what I want to know is, if there is such a functionality offered by some C library.

 Thanks.

--
:: Ligesh :: http://ligesh.com

Re: Sending a select to multiple servers.

From

Ligesh

Date:

26 August 2005, 22:50:39

On Fri, Aug 26, 2005 at 11:04:59AM -0500, Frank Wiles wrote:
> On Fri, 26 Aug 2005 20:54:09 +0530
> Ligesh <gxlists@gmail.com> wrote:

>   Mostly because every situation is different, you may have
>   10 servers and need 10 rows of results, others may need something
>   entirely different.
>

 No. I have say 'm' number of servers, and I need 'n' rows. To get the results, you need to run the query against all
the'm' servers, which will return 'm x n' results, then you have to re-sort it and drop the 'm x n - n' rows and return
onlythe 'n'. So this is like retrieving the 'n' rows amongst ALL the servers, that satisfy your search criteria.  

 Once you retrieve the data, you will know which server each row belongs to, and you can do the writes yourself at the
higherlevel. 

 Thanks.

--
:: Ligesh :: http://ligesh.com