pl/python improvements - Mailing list pgsql-hackers
From | Jan Urbański |
---|---|
Subject | pl/python improvements |
Date | |
Msg-id | 4CFE8865.7050506@wulczer.org Whole thread Raw |
Responses |
Re: pl/python improvements
Re: pl/python improvements |
List | pgsql-hackers |
Hi, no, no patch(es) yet. I'm going through plpython.c trying as best I can to improve things there. I'll have a patch (or patches) ready for the January commitfest, but I thought I'd open up a discussion already to spare me having to redo features because the way I attacked the problem is a dead end. Be warned, this ended up being a very long email... The code is on https://github.com/wulczer/postgres, in the plpython branch. I'll be rebasing it regularly, so don't be surprised by commit hashes changing. Currently the improvements are:* added a validator, so syntax errors are caught on function creation, and you don't pay compile overhead when first calling the function* use HTABs instead of Python dictionaries to cache procedures, maintaing two separate HTABs for procedures and triggers (this way you can use the procedure's OID as the hash key)* get rid of the global PLy_error_in_progress, hopefuly things that cause errors are reported immediately* use palloc(TopMemoryContext) instead of malloc() for things that need to live until the end of the session* execute SPI calls in a subtransaction, report errors back to Python as exceptions that can be caught etc.* do not prefix error messages with "PL/Python", just use the error string as errmsg* if a function has composite type inputs, check if the underlying types have not changed when calling the function and recompile it if necessary (fixes a TODO item)* support tracebacks and make them look just like Python tracebacks (as much as possible that is)* fix plpy.error and plpy.fatal to actually raise a plpy.Error/plpy.Fatal exception, that can be caught etc.* remove volatile modifiers from variables (AIUI they are completely not necessary anywhere in plpython.c)* general small refactorings, removing unnecessary code, splitting big procedures into smaller ones Planned improvements include:* explicit subxact starting, so you can issue consecutive plpy.execute calls atomically* split plpython.c into smaller modules, like plpython_traceback.c, plpython_type_conversion.c, plpython_plpy.c etc.* variadic argument handling (is this possible for a non-plpgsql pl?)* multipleOUT parameters support (a TODO item, I think "table function support" meant to say that)* add a DB-API interface on top of SPI (a TODO item) Not sure if I'll make it in time with all of them, I have put them in more-or-less priority order. Now that you're excited, the questions. Q1 To check for composite types changing after the procedure I/O functions have been cached, I store the relid of the base relation for the type (typeidTypeRelid(desc->tdtypeid), where desc is the type's TupleDesc), as well as the xmin and tid of the relation's tuple from pg_class. Then when the function is invoked, for each composite parameter I fetch the relation's tuple from pg_class again and compare xmin and tid. This means that functions with composite parameters are slower, because there's a syscache search for each parameter for each invocation. I roughly measured the slowdown for a function with one composite param to be 4%. OTOH this works now: CREATE TABLE employee ( name text, basesalary integer, bonus integer ); INSERT INTO employee VALUES ('John', 100, 10), ('Mary', 200, 10); CREATE OR REPLACE FUNCTION test_composite_table_input(e employee) RETURNS integer AS $$ return e['basesalary'] + e['bonus'] $$ LANGUAGE plpythonu; SELECT name, test_composite_table_input(employee.*) FROM employee; ALTER TABLE employee DROP bonus; ALTER TABLE employee ADD bonus integer; UPDATE employee SET bonus = 10; -- will recompile the function and recache I/O functions SELECT name, test_composite_table_input(employee.*) FROM employee; PL/Perl has the same problem, BTW. The question is: am I going the right way checking if the composite type is up-to-date? The TODO item says: (is this possible, easy, quick?). My answer is: I guess, kinda, dunno. Q2 The format of error messages changes, because the "PL/Python" prefix gets removed (I want raise plpy.Error("foo") to result in "ERROR: plpy.ERROR: foo", not in "PL/Python: plpy.ERROR: foo") and because of other things. Like for instance since plpy.error(msg) now just raises a plpy.Error exception, you will get "ERROR: plpy.Error: msg" and not just "ERROR: msg". This potentially breaks clients parsing error messages, but I don't buy that argument really. Still, is it acceptable? Q3 Tracebacks! The way I implemented them (based on the patch linked from the TODO item) they look like this: CREATE FUNCTION nested_error() RETURNS text AS 'def fun1():plpy.error("boom") def fun2():fun1() def fun3():fun2() fun3() return "not reached" ' LANGUAGE plpythonu; SELECT nested_error(); ERROR: plpy.Error: boom DETAIL: Traceback (most recent call last): PL/Python function "nested_error", line 10, in <module> fun3() PL/Python function"nested_error", line 8, in fun3 fun2() PL/Python function "nested_error", line 5, in fun2 fun1() PL/Python function"nested_error", line 2, in fun1 plpy.error("boom") CONTEXT: PL/Python function "nested_error" Tracebacks are added to every error, even if their depth is 1, making them a bit redundant, so you get things like: CREATE FUNCTION sql_syntax_error() RETURNS text AS 'plpy.execute("syntax error")' LANGUAGE plpythonu; SELECT sql_syntax_error(); ERROR: plpy.SPIError: syntax error at or near "syntax" DETAIL: Traceback (most recent call last): PL/Python function "sql_syntax_error", line 1, in <module> plpy.execute("syntaxerror") CONTEXT: PL/Python function "sql_syntax_error" Notice that they look almost exactly like a traceback Python would give you. I had to cheat there a bit, to hide the ugly generated Python function name and remove the topmost frame from the traceback that was pointing at the toplevel interpreter module. Another oddity is that Python tracebacks have line numbers, so I am storing the entire text of the function in the cache to print the corresponding line of each frame, just like the real Python traceback.py module. I guess that's negligible overhead? Or should I refetch prosrc every time a traceback is printed? Note that traceback computation costs some cycles, but I think it's OK, since we're not optimizing for that case, obviously. The question is: is this how you'd like plpython tracebacks to look like? Q3 Splitting into smaller modules - should I do it? I find a ~4k loc module to be too big, but then again, if I do the split, the resulting diff will be difficult to review. Or won't it? Q4 Passing function arguments as globals is IMHO ugly. Not sure what the TODO item wants to say by "Passing them as globals means functions cannot be called recursively.". I tried and recursion worked just fine. But it's still quite ugly IMHO... problem is that I don't know hot to fix it. Creating a Python function with the signature mimicking the one of the SQL function is tricky because of unnamed arguments. Also, passing arguments as real arguments would break code that uses the 'global' trick suggested in the manual. Even though the manual says you should not rely on it, I still feel bad about braking people's code. Does anyone have any bright ideas on how to improve argument passing in plplython? Q5 (last one) Anyone has any more things on her/his PL/Python Christmas whishlist? plpy.make_coffee()? Thanks for reading all the way here, Jan
pgsql-hackers by date: