Thread: [rfc,patch] PL/Proxy in core
Please consider this mail as submission of PL/Proxy into core. Documentation: https://developer.skype.com/SkypeGarage/DbProjects/PlProxy Previous submission with overview: http://archives.postgresql.org/pgsql-hackers/2007-03/msg01763.php - The code has changed rather little compared to first submission, biggest changes have been select()->poll() conversionand making scanner work with flex 2.5.4. - My interest in replacing dblink has decreased, it is for -hackers to decide if this is interesting goal or not. The patch itself: http://plproxy.projects.postgresql.org/plproxy-v1.diff.gz I have not much effort into patch, except made it compile in with main source. I'd like to get general consensus that the idea is acceptable first. Following are major parts that need work before committing: - SGML documentation. - Makefile review. - Integrate regression tests into main test framework. There are 2 things that could be migrated in to core code, but this can be done later: - Compat poll() function based on select() - can be used to get rid of #ifdef HAVE_POLL in various places in core code. - rowstamp.h - all the PL-s could use it to detect row changes. There are few code beautification ideas on which I'd like to get feedback from wider audience: - Drop CONNECT statement. This was added to make creating simple dblink style calls easier, but actually its not maintainablecompared to CLUSTER, it can be used only for occasional hacks. OTOH, if we want to deprecate dblink and replaceit with PL/Proxy, it should probably stay. - Drop SELECT statement. This was added as it was simple to do and also to be straightforward replacement for dblink. Butit's conceptually ugly. Also it gives impression that there will be UPDATE, DELETE and IF... which will not happen. The good developement strategy when using plproxy is: 1. Create regular database with function-based API, keeping an eye that interfaces would support partitioning. 2. If load grows, split database to several pieces, directing clients to db where all functions are plproxy ones which will rediect to correct partition. Having the SELECT available discourages good design where all partitions are functional on their own. -- marko Diffstat for the patch:src/include/catalog/pg_pltemplate.h | 1src/pl/Makefile | 2src/pl/plproxy/Makefile | 97 +++src/pl/plproxy/cluster.c | 542 ++++++++++++++++++src/pl/plproxy/execute.c | 750 +++++++++++++++++++++++++src/pl/plproxy/expected/plproxy_clustermap.out| 71 ++src/pl/plproxy/expected/plproxy_errors.out | 66 ++src/pl/plproxy/expected/plproxy_init.out | 1src/pl/plproxy/expected/plproxy_many.out | 116 +++src/pl/plproxy/expected/plproxy_select.out | 37 +src/pl/plproxy/expected/plproxy_test.out | 266 ++++++++src/pl/plproxy/function.c | 408 +++++++++++++src/pl/plproxy/main.c | 218 +++++++src/pl/plproxy/parser.y | 190 ++++++src/pl/plproxy/plproxy.h | 314 ++++++++++src/pl/plproxy/poll_compat.c | 140 ++++src/pl/plproxy/poll_compat.h | 58 +src/pl/plproxy/query.c | 299+++++++++src/pl/plproxy/result.c | 222 +++++++src/pl/plproxy/rowstamp.h | 27src/pl/plproxy/scanner.l | 329 ++++++++++src/pl/plproxy/sql/plproxy_clustermap.sql | 56+src/pl/plproxy/sql/plproxy_errors.sql | 63 ++src/pl/plproxy/sql/plproxy_init.sql | 63 ++src/pl/plproxy/sql/plproxy_many.sql | 66 ++src/pl/plproxy/sql/plproxy_select.sql | 37 +src/pl/plproxy/sql/plproxy_test.sql | 164 +++++src/pl/plproxy/type.c | 308 ++++++++++ 28 files changed, 4909 insertions(+), 2 modifications(!)
On Wednesday 14 May 2008 13:29, Marko Kreen wrote: > - SGML documentation. > - Makefile review. > - Integrate regression tests into main test framework. Has PL/proxy been tested on other OSes? FreeBSD/Solaris/Windows? -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On 5/15/08, Josh Berkus <josh@agliodbs.com> wrote: > On Wednesday 14 May 2008 13:29, Marko Kreen wrote: > > - SGML documentation. > > - Makefile review. > > - Integrate regression tests into main test framework. > > Has PL/proxy been tested on other OSes? FreeBSD/Solaris/Windows? It definitely works on Linux and MacOS. I've seen ports for *BSD. I think any unix-like OS-es should work fine. In fact, only syscalls it does on its own are gettimeofday() and poll() [or select()], otherwise it calls either core or libpq functions. So I see no reason why it shouldnt already work on Windows. The biggest portability problem thus far has been scanner.l, which at the beginning was written for flex 2.5.33+, but 2.5.4 is still pretty widespread. But this I fixed in 2.0.3. Hmm.. Now that I think about it, in my effort to remove malloc() calls in both scanner and parser I told bison to use alloca(). Is it portability concern? Easy to fix, just need to be careful not to create memleaks. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > Hmm.. Now that I think about it, in my effort to remove malloc() calls > in both scanner and parser I told bison to use alloca(). Is it portability > concern? Yes. regards, tom lane
On 5/15/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > Hmm.. Now that I think about it, in my effort to remove malloc() calls > > in both scanner and parser I told bison to use alloca(). Is it portability > > concern? > > Yes. How about following patch? I have bison 2.3 and it seems not to do global allocation, so it should be fine. There may be early exit with elog(ERRROR) inside so I'd like to avoid malloc() itself. Is there some older bison that keeps allocations around? They would need bit more work... --- src/parser.y 14 May 2008 12:25:00 -0000 1.7 +++ src/parser.y 15 May 2008 07:34:53 -0000 @@ -24,7 +24,9 @@void plproxy_yy_scan_bytes(const char *bytes, int len); /* avoid permanent allocations */ -#define YYSTACK_USE_ALLOCA 1 +#define YYMALLOC palloc +#define YYFREE pfree +/* remove unused code */#define YY_LOCATION_PRINT(File, Loc) (0)#define YY_(x) (x) I will roll new full patch when more comments have appeared. -- marko
On 5/15/08, Marko Kreen <markokr@gmail.com> wrote: > On 5/15/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Marko Kreen" <markokr@gmail.com> writes: > > > Hmm.. Now that I think about it, in my effort to remove malloc() calls > > > in both scanner and parser I told bison to use alloca(). Is it portability > > > concern? > > > > Yes. > > > How about following patch? I have bison 2.3 and it seems not to do > global allocation, so it should be fine. There may be early exit > with elog(ERRROR) inside so I'd like to avoid malloc() itself. > > Is there some older bison that keeps allocations around? > They would need bit more work... > > --- src/parser.y 14 May 2008 12:25:00 -0000 1.7 > +++ src/parser.y 15 May 2008 07:34:53 -0000 > @@ -24,7 +24,9 @@ > void plproxy_yy_scan_bytes(const char *bytes, int len); > > /* avoid permanent allocations */ > -#define YYSTACK_USE_ALLOCA 1 > +#define YYMALLOC palloc > +#define YYFREE pfree > + > /* remove unused code */ > #define YY_LOCATION_PRINT(File, Loc) (0) > #define YY_(x) (x) > > I will roll new full patch when more comments have appeared. Checked bison 1.875, and it does not use YYMALLOC/YYFREE. But luckily its allocation pattern seems sane, so following should work: --- src/parser.y 14 May 2008 12:25:00 -0000 1.7 +++ src/parser.y 15 May 2008 08:18:14 -0000 @@ -24,7 +24,9 @@void plproxy_yy_scan_bytes(const char *bytes, int len); /* avoid permanent allocations */ -#define YYSTACK_USE_ALLOCA 1 +#define malloc palloc +#define free pfree +/* remove unused code */#define YY_LOCATION_PRINT(File, Loc) (0)#define YY_(x) (x) -- marko
"Marko Kreen" <markokr@gmail.com> writes: > How about following patch? I have bison 2.3 and it seems not to do > global allocation, so it should be fine. There may be early exit > with elog(ERRROR) inside so I'd like to avoid malloc() itself. None of our other parsers fool with bison's memory allocation; why does yours need to? regards, tom lane
On 5/15/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > How about following patch? I have bison 2.3 and it seems not to do > > global allocation, so it should be fine. There may be early exit > > with elog(ERRROR) inside so I'd like to avoid malloc() itself. > > None of our other parsers fool with bison's memory allocation; > why does yours need to? Because that way I can be sure I understand their allocation behaviour. Eg. how does src/backend/parser/gram.c not leak memory on syntax error? I don't understand it. But if I force them use palloc(), always, I can be sure memore is freed. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > Eg. how does src/backend/parser/gram.c not leak memory on syntax error? It's not a leak because the memory can be re-used during the next command. I believe you'll find that trying to make it use palloc is a failure because it keeps static pointers that it expects will stay valid across calls. regards, tom lane
On 5/15/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > Eg. how does src/backend/parser/gram.c not leak memory on syntax error? > > It's not a leak because the memory can be re-used during the next > command. I may be blind, but I don't see any static variables there. > I believe you'll find that trying to make it use palloc is a failure > because it keeps static pointers that it expects will stay valid across > calls. Thats true, I need to drop the redefines if the allocations may be reused. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > On 5/15/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Marko Kreen" <markokr@gmail.com> writes: >>> Eg. how does src/backend/parser/gram.c not leak memory on syntax error? >> >> It's not a leak because the memory can be re-used during the next >> command. > I may be blind, but I don't see any static variables there. Sorry, I was confusing bison with flex --- there are static variables pointing at buffers within a flex scanner. For bison it looks like defining YYSTACK_ALLOC/YYSTACK_FREE as palloc/pfree might be a sane thing to do, but I haven't tested it. regards, tom lane
On 5/15/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > On 5/15/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> "Marko Kreen" <markokr@gmail.com> writes: > >>> Eg. how does src/backend/parser/gram.c not leak memory on syntax error? > >> > >> It's not a leak because the memory can be re-used during the next > >> command. > > > I may be blind, but I don't see any static variables there. > > Sorry, I was confusing bison with flex --- there are static variables > pointing at buffers within a flex scanner. > > For bison it looks like defining YYSTACK_ALLOC/YYSTACK_FREE as > palloc/pfree might be a sane thing to do, but I haven't tested it. Ok, so parser.y is now fine. Now I must admit I do the same hack in scanner.l, but because it keeps static references, there is always call to plproxy_yylex_destroy() in places that throw exceptions (all of flex/bison/plproxy exceptions go through single function). Reason for that is again the fact that I could not wrap my brain around flex memory handling. And the hacks in src/backend/parser/scan.l are also somethat mystery to me. When using palloc() I can be sure of the flow, and if something goes wrong it crashes instead leaking, so it can be fixed immidately. But now that I think about it, the scheme fails if palloc() itself throws exception. It can be fixed by calling following function on parser entry: void plproxy_yylex_startup(void) { #if FLXVER < 2005031(YY_CURRENT_BUFFER) = NULL; #else(yy_buffer_stack) = NULL; #endifplproxy_yylex_destroy(); } This is pretty hairy, but anything near flex is hairy. Such function also would drop the requirement that plproxy_yylex_destroy() must always be called. Then we could keep current simple always-from-scratch allocation pattern but more robust. Or we could go back to default malloc usage. I somewhat doubt it will be much cleaner, it needs lot more faith in sanity of flex which I dont have. OTOH, considering that now here the possibility of others reviewing the result is lot higher than before it can be attempted. -- marko
On Thu, 15 May 2008, Marko Kreen wrote: > On 5/15/08, Josh Berkus <josh@agliodbs.com> wrote: >> Has PL/proxy been tested on other OSes? FreeBSD/Solaris/Windows? > > It definitely works on Linux and MacOS. I've seen ports for *BSD. > I think any unix-like OS-es should work fine. I've compiled it with a minor Makefile modification on Solaris and it seems to work (it passes all the unit tests but the one involving random; we might want to rethink that test so 'passes' on platforms with different implementations of random()). However, I haven't deployed it into a production environment. Somewhat unrelated, I can see use-cases for replacing the call to random() with something that allows user defined polices for RUN ON ANY. > > In fact, only syscalls it does on its own are gettimeofday() and poll() > [or select()], otherwise it calls either core or libpq functions. > So I see no reason why it shouldnt already work on Windows. > > The biggest portability problem thus far has been scanner.l, which at > the beginning was written for flex 2.5.33+, but 2.5.4 is still pretty > widespread. But this I fixed in 2.0.3. > > Hmm.. Now that I think about it, in my effort to remove malloc() calls > in both scanner and parser I told bison to use alloca(). Is it portability > concern? Easy to fix, just need to be careful not to create memleaks. > > -- > marko > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Steve, > I've compiled it with a minor Makefile modification on Solaris and it > seems to work (it passes all the unit tests but the one involving > random; we might want to rethink that test so 'passes' on platforms with > different implementations of random()). However, I haven't deployed it > into a production environment. Confirmed on my copy of Nevada 70 (opensolaris) as well. Looks like we're good on OS support. Now time to start playing with the semantics ... --Josh
On 5/17/08, Steve Singer <ssinger_pg@sympatico.ca> wrote: > Somewhat unrelated, I can see use-cases for replacing the call to random() > with something that allows user defined polices for RUN ON ANY. Well, thats why the RUN ON userfunc(..); exists. Also notice the function can tag more that one partition for execution. Or did you mean something else than partition selection by "user defined policy"? -- marko
On Sat, 17 May 2008, Marko Kreen wrote: > On 5/17/08, Steve Singer <ssinger_pg@sympatico.ca> wrote: >> Somewhat unrelated, I can see use-cases for replacing the call to random() >> with something that allows user defined polices for RUN ON ANY. > > Well, thats why the RUN ON userfunc(..); exists. Also notice the function > can tag more that one partition for execution. > > Or did you mean something else than partition selection by "user > defined policy"? I see RUN ON userfunc() as being for partitioning where the correctness requires that the query be run on the result of userfunc. I see RUN ON ANY as being for load-balancing. You might want to RUN ON ANY with a round robin balancing, or maybe consider the load of servers for doing the balancing. In the case of RUN ON ANY it seems that the database the query gets sent to doesn't matter. It might make sense for plproxy to try the next database if it can't connect to the first one it picks. You wouldn't want this for partitioning queries. If plproxy knows if you mean 'the query has to be run on these partitions' versus 'run the query on any partition, using method x to choose' then that type of things would be possible. Steve > > -- > marko > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On 5/18/08, Steve Singer <ssinger_pg@sympatico.ca> wrote: > On Sat, 17 May 2008, Marko Kreen wrote: > > On 5/17/08, Steve Singer <ssinger_pg@sympatico.ca> wrote: > > > Somewhat unrelated, I can see use-cases for replacing the call to > random() > > > with something that allows user defined polices for RUN ON ANY. > > > > Well, thats why the RUN ON userfunc(..); exists. Also notice the function > > can tag more that one partition for execution. > > > > Or did you mean something else than partition selection by "user > > defined policy"? > > I see RUN ON userfunc() as being for partitioning where the correctness > requires that the query be run on the result of userfunc. I see RUN ON ANY > as being for load-balancing. Here you see wrong. You should see RUN ON ANY simply as a shortcut for RUN ON random(); The actual random() would not work as it returns floats, but equivalent integer random(); So if you want smarter ANY, just implement your function. I don't see any need for tunable ANY. > You might want to RUN ON ANY with a round > robin balancing, or maybe consider the load of servers for doing the > balancing. > > In the case of RUN ON ANY it seems that the database the query gets sent to > doesn't matter. It might make sense for plproxy to try the next database if > it can't connect to the first one it picks. You wouldn't want this for > partitioning queries. If plproxy knows if you mean 'the query has to be run > on these partitions' versus 'run the query on any partition, using method x > to choose' then that type of things would be possible. Ok, here are 2 feature requests, that we have thought ourselves too: RUN ON LEAST LOADED; Sorry, this is unimplementable with current PL/Proxy design, as the per-backend PL-s do not coordinate their usage. Andthis is deliberate. If you want to implement this the design should look exactly like PL/Proxy 1 - each PL does special connection to specialpooler that is responsible for partition selection and thus has information about partition usage. And the complexitywent through the roof... You may achieve the same effect with smart tcp proxy or if not you can write load-balancing feature with load check forPgBouncer. RUN ON ANY PICK NEXT ON ERROR; This is implementable. But we have not found an actual need for it ourselves. So I have bothered to implement it as otherwiseplproxy would have another "implementable" and "maybe nice to have" feature without actual reason like CONNECT,SELECT and get_cluster_config() turned out to be. OTOH, here we don't use read-only load balancing much. And such feature does not make sense when partitioning is used. But it indeed makes sense for load-balancing. So I'm not against adding it. -- marko
On Wed, 2008-05-14 at 23:29 +0300, Marko Kreen wrote: > There are few code beautification ideas on which I'd like to > get feedback from wider audience: > > - Drop CONNECT statement. This was added to make creating > simple dblink style calls easier, but actually its not > maintainable compared to CLUSTER, it can be used only > for occasional hacks. OTOH, if we want to deprecate > dblink and replace it with PL/Proxy, it should probably stay. > > - Drop SELECT statement. This was added as it was simple to do > and also to be straightforward replacement for dblink. But > it's conceptually ugly. Also it gives impression that there > will be UPDATE, DELETE and IF... which will not happen. I'd also suggest one feature request - Add COPY FROM/TO support The way It could be done is similar to (now deprecated) SELECT CREATE FUNCTION copy_users_to_partitions() RETURNS VOID AS $$ CLUSTER 'userdb'; RUN ON hashtext(text::$1); COPY users FROM stdin; $$ LANGUAGE plproxy; and it should be used like COPY is currently proxydb# SELECT copy_users_to_partitions(); bob bobspwd bob@email.com ben benspwd ben@email.com ... amy amyspwd amy@email.com .copy_users_to_partitions -------------------------- (1 row) I am not sure how easy it is to get access to "stdin" from inside a function, but the version of COPY which copies from filesystem can surely be done. On slaves it will always be plain COPY with STDIN/STDOUT. As this reintroduces "direct access to partition tables" feature removed with SELECT, it should be a feature that can be used by superusers only. Notice that field type should be given, as it can't be deduced from arguments. COPY .. TO would usually (but not neccessarily) be RUN ON ALL -------------------- Hannu