Thread: BUG #13960: plpython fails with certain function names
The following bug has been logged on the website: Bug reference: 13960 Logged by: Jim Nasby Email address: Jim.Nasby@BlueTreble.com PostgreSQL version: 9.5.0 Operating system: OS X Description: If a Postgres function contains characters that are illegal for python identifiers, compilation fails. Error message is not very helpful either: ~@decina.local/21474# CREATE FUNCTION "test[]"() RETURNS text LANGUAGE plpythonu AS $$return 'test'$$; ERROR: could not compile PL/Python function "test[]" DETAIL: SyntaxError: invalid syntax (<string>, line 1) ~@decina.local/21474# CREATE FUNCTION "test"() RETURNS text LANGUAGE plpythonu AS $$return 'test'$$; CREATE FUNCTION ~@decina.local/21474# One possibility is to simply strip out invalid characters[1]. What's valid expands in 3.5, but I don't think that really matters. Another possibility would be to catch the python exception, but that's probably more trouble than it's worth. [1] https://docs.python.org/2/reference/lexical_analysis.html#grammar-token-identifier identifier ::= (letter|"_") (letter | digit | "_")* letter ::= lowercase | uppercase lowercase ::= "a"..."z" uppercase ::= "A"..."Z" digit ::= "0"..."9" 3.5: https://docs.python.org/3.5/reference/lexical_analysis.html#grammar-token-identifier
Jim.Nasby@BlueTreble.com writes: > If a Postgres function contains characters that are illegal for python > identifiers, compilation fails. Error message is not very helpful either: Hm, how much do we really care? The example seems kinda artificial. > One possibility is to simply strip out invalid characters[1]. No, because then you would get collisions, ie function names that look different to PG would look the same to python. Bad news. (Actually, don't we have that issue anyway because of schemas? I wonder why we are exposing the PG name of the function to python at all.) regards, tom lane
On 2/14/16 6:49 PM, Tom Lane wrote: > Jim.Nasby@BlueTreble.com writes: >> If a Postgres function contains characters that are illegal for python >> identifiers, compilation fails. Error message is not very helpful either: > > Hm, how much do we really care? The example seems kinda artificial. I did run across it creating cast functions programmatically, roughly as format( 'CREATE FUNCTION "ndarray_to_%1s"', input_type::regtype ) which blows up when you feed it something like float. The actual example was doing something similar but for an array. I agree it's not exactly a high priority thing to support, but the error text is completely useless unless you happen to know that plpython is creating an actual python function. >> One possibility is to simply strip out invalid characters[1]. > > No, because then you would get collisions, ie function names that look > different to PG would look the same to python. Bad news. IIRC we append the regproc OID to ensure it's unique. > (Actually, don't we have that issue anyway because of schemas? I wonder > why we are exposing the PG name of the function to python at all.) Not sure. Maybe useful in a call stack inside python? I can't really think of what else you'd use it for. (I assume you're suggesting just call the function names something like plpython_<regproc::oid>.) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@bluetreble.com> writes: > On 2/14/16 6:49 PM, Tom Lane wrote: >> No, because then you would get collisions, ie function names that look >> different to PG would look the same to python. Bad news. > IIRC we append the regproc OID to ensure it's unique. Ah, okay, problem solved. >> (Actually, don't we have that issue anyway because of schemas? I wonder >> why we are exposing the PG name of the function to python at all.) > Not sure. Maybe useful in a call stack inside python? I can't really > think of what else you'd use it for. (I assume you're suggesting just > call the function names something like plpython_<regproc::oid>.) Yeah, that's what I was thinking about. But yes, if we append the OID anyway then we might as well just strip all non-alphanumeric chars from the name. Safe and you still get some debuggability. regards, tom lane
On 2/14/16 7:09 PM, Tom Lane wrote: > Yeah, that's what I was thinking about. But yes, if we append the OID > anyway then we might as well just strip all non-alphanumeric chars > from the name. Safe and you still get some debuggability. Attached. I opted not to modify the name in-place. If it's safe to modify in place, might want to just replace invalid characters with '_' instead of making a copy. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachment
Jim Nasby <Jim.Nasby@bluetreble.com> writes: > On 2/14/16 7:09 PM, Tom Lane wrote: >> Yeah, that's what I was thinking about. But yes, if we append the OID >> anyway then we might as well just strip all non-alphanumeric chars >> from the name. Safe and you still get some debuggability. > Attached. I opted not to modify the name in-place. If it's safe to > modify in place, might want to just replace invalid characters with '_' > instead of making a copy. I like the idea of replacing invalid characters with '_'. It's definitely not safe to scribble on the pg_proc tuple, but we could get the same result with a few wasted cycles by rescanning the procName string after building it, as per attached. regards, tom lane diff --git a/src/pl/plpython/expected/plpython_test.out b/src/pl/plpython/expected/plpython_test.out index 7b76faf..f8270a7 100644 *** a/src/pl/plpython/expected/plpython_test.out --- b/src/pl/plpython/expected/plpython_test.out *************** select stupidn(); *** 16,23 **** zarkon (1 row) ! -- test multiple arguments ! CREATE FUNCTION argument_test_one(u users, a1 text, a2 text) RETURNS text AS 'keys = list(u.keys()) keys.sort() --- 16,23 ---- zarkon (1 row) ! -- test multiple arguments and odd characters in function name ! CREATE FUNCTION "Argument test #1"(u users, a1 text, a2 text) RETURNS text AS 'keys = list(u.keys()) keys.sort() *************** for key in keys: *** 27,34 **** words = a1 + " " + a2 + " => {" + ", ".join(out) + "}" return words' LANGUAGE plpythonu; ! select argument_test_one(users, fname, lname) from users where lname = 'doe' order by 1; ! argument_test_one ----------------------------------------------------------------------- jane doe => {fname: jane, lname: doe, userid: 1, username: j_doe} john doe => {fname: john, lname: doe, userid: 2, username: johnd} --- 27,34 ---- words = a1 + " " + a2 + " => {" + ", ".join(out) + "}" return words' LANGUAGE plpythonu; ! select "Argument test #1"(users, fname, lname) from users where lname = 'doe' order by 1; ! Argument test #1 ----------------------------------------------------------------------- jane doe => {fname: jane, lname: doe, userid: 1, username: j_doe} john doe => {fname: john, lname: doe, userid: 2, username: johnd} diff --git a/src/pl/plpython/plpy_procedure.c b/src/pl/plpython/plpy_procedure.c index e1f5620..a0d0792 100644 *** a/src/pl/plpython/plpy_procedure.c --- b/src/pl/plpython/plpy_procedure.c *************** PLy_procedure_create(HeapTuple procTup, *** 146,151 **** --- 146,152 ---- MemoryContext cxt; MemoryContext oldcxt; int rv; + char *ptr; procStruct = (Form_pg_proc) GETSTRUCT(procTup); rv = snprintf(procName, sizeof(procName), *************** PLy_procedure_create(HeapTuple procTup, *** 155,160 **** --- 156,170 ---- if (rv >= sizeof(procName) || rv < 0) elog(ERROR, "procedure name would overrun buffer"); + /* Replace any not-legal-in-Python-names characters with '_' */ + for (ptr = procName; *ptr; ptr++) + { + if (!((*ptr >= 'A' && *ptr <= 'Z') || + (*ptr >= 'a' && *ptr <= 'z') || + (*ptr >= '0' && *ptr <= '9'))) + *ptr = '_'; + } + cxt = AllocSetContextCreate(TopMemoryContext, procName, ALLOCSET_DEFAULT_MINSIZE, diff --git a/src/pl/plpython/sql/plpython_test.sql b/src/pl/plpython/sql/plpython_test.sql index c8d5ef5..3a76104 100644 *** a/src/pl/plpython/sql/plpython_test.sql --- b/src/pl/plpython/sql/plpython_test.sql *************** CREATE FUNCTION stupidn() RETURNS text A *** 11,18 **** select stupidn(); ! -- test multiple arguments ! CREATE FUNCTION argument_test_one(u users, a1 text, a2 text) RETURNS text AS 'keys = list(u.keys()) keys.sort() --- 11,18 ---- select stupidn(); ! -- test multiple arguments and odd characters in function name ! CREATE FUNCTION "Argument test #1"(u users, a1 text, a2 text) RETURNS text AS 'keys = list(u.keys()) keys.sort() *************** words = a1 + " " + a2 + " => {" + ", ".j *** 23,29 **** return words' LANGUAGE plpythonu; ! select argument_test_one(users, fname, lname) from users where lname = 'doe' order by 1; -- check module contents --- 23,29 ---- return words' LANGUAGE plpythonu; ! select "Argument test #1"(users, fname, lname) from users where lname = 'doe' order by 1; -- check module contents
On 2/16/16 7:49 PM, Tom Lane wrote: > I like the idea of replacing invalid characters with '_'. It's definitely > not safe to scribble on the pg_proc tuple, but we could get the same > result with a few wasted cycles by rescanning the procName string after > building it, as per attached. Heck, I didn't even think about that. Yeah, it's going to scan another 20 bytes or so, but this certainly isn't performance critical. BTW, I didn't bother checking this with python 3.5, but I can't fathom how that would matter here. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@bluetreble.com> writes: > BTW, I didn't bother checking this with python 3.5, but I can't fathom > how that would matter here. Me either, but the buildfarm will soon tell us if it does. regards, tom lane
On 2/17/16 8:36 PM, Pedro Gimeno wrote: > Tom Lane wrote, On 2016-02-17 02:49: > >> + /* Replace any not-legal-in-Python-names characters with '_' */ >> + for (ptr = procName; *ptr; ptr++) >> + { >> + if (!((*ptr >= 'A' && *ptr <= 'Z') || >> + (*ptr >= 'a' && *ptr <= 'z') || >> + (*ptr >= '0' && *ptr <= '9'))) >> + *ptr = '_'; >> + } >> + > > That may blow on names that start with a digit. Maybe > > + (*ptr >= '0' && *ptr <= '9' && ptr != procName))) > > Making the testcase start with a digit sounds like a good idea. Postgres prefaces the name of the python function with a fixed string, so it will never start with a digit. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Tom Lane wrote, On 2016-02-17 02:49: > + /* Replace any not-legal-in-Python-names characters with '_' */ > + for (ptr = procName; *ptr; ptr++) > + { > + if (!((*ptr >= 'A' && *ptr <= 'Z') || > + (*ptr >= 'a' && *ptr <= 'z') || > + (*ptr >= '0' && *ptr <= '9'))) > + *ptr = '_'; > + } > + That may blow on names that start with a digit. Maybe + (*ptr >= '0' && *ptr <= '9' && ptr != procName))) Making the testcase start with a digit sounds like a good idea.