Re: [rfc,patch] PL/Proxy in core - Mailing list pgsql-hackers

From Marko Kreen
Subject Re: [rfc,patch] PL/Proxy in core
Date
Msg-id e51f66da0805172244v3f4df204te5a26852323a2d79@mail.gmail.com
Whole thread Raw
In response to Re: [rfc,patch] PL/Proxy in core  (Steve Singer <ssinger_pg@sympatico.ca>)
List pgsql-hackers
On 5/18/08, Steve Singer <ssinger_pg@sympatico.ca> wrote:
> On Sat, 17 May 2008, Marko Kreen wrote:
> > On 5/17/08, Steve Singer <ssinger_pg@sympatico.ca> wrote:
> > >  Somewhat unrelated, I can see use-cases for replacing the call to
> random()
> > > with something that allows user defined polices for RUN ON ANY.
> >
> > Well, thats why the RUN ON userfunc(..); exists.  Also notice the function
> > can tag more that one partition for execution.
> >
> > Or did you mean something else than partition selection by "user
> > defined policy"?
>
>  I see RUN ON userfunc() as being for partitioning where the correctness
> requires that the query be run on the result of userfunc. I see RUN ON ANY
> as being for load-balancing.

Here you see wrong.  You should see RUN ON ANY simply as a shortcut
for RUN ON random();  The actual random() would not work as it returns
floats, but equivalent integer random();

So if you want smarter ANY, just implement your function.  I don't see
any need for tunable ANY.

>                               You might want to RUN ON ANY with a round
> robin balancing, or maybe consider the load of servers for doing the
> balancing.
>
>  In the case of RUN ON ANY it seems that the database the query gets sent to
> doesn't matter.  It might make sense for plproxy to try the next database if
> it can't connect to the first one it picks.  You wouldn't want this for
> partitioning queries.  If plproxy knows if you mean 'the query has to be run
> on these partitions' versus 'run the query on any partition, using method x
> to choose' then that type of things would be possible.

Ok, here are 2 feature requests, that we have thought ourselves too:

RUN ON LEAST LOADED;
 Sorry, this is unimplementable with current PL/Proxy design, as the per-backend PL-s do not coordinate their usage.
Andthis is deliberate.
 
 If you want to implement this the design should look exactly like PL/Proxy 1 - each PL does special connection to
specialpooler that is responsible for partition selection and thus has information about partition usage.  And the
complexitywent through the roof...
 
 You may achieve the same effect with smart tcp proxy or if not you can write load-balancing feature with load check
forPgBouncer.
 

RUN ON ANY PICK NEXT ON ERROR;
 This is implementable.  But we have not found an actual need for it ourselves.  So I have bothered to implement it as
otherwiseplproxy would have another "implementable" and "maybe nice to have" feature without actual reason like
CONNECT,SELECT and get_cluster_config() turned out to be.
 
 OTOH, here we don't use read-only load balancing much.  And such feature does not make sense when partitioning is
used. But it indeed makes sense for load-balancing.  So I'm not against adding it.
 

-- 
marko


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Link requirements creep
Next
From: Greg Smith
Date:
Subject: Re: New DTrace probes proposal