Thread: Re: [BUGS] BUG #2683: spi_exec_query in plperl returns column names which are not marked as UTF8
Re: [BUGS] BUG #2683: spi_exec_query in plperl returns column names which are not marked as UTF8
From
Tom Lane
Date:
"Vitali Stupin" <Vitali.Stupin@ria.ee> writes: > If database uses UTF8 encoding, then spi_exec_query in plperl should return > query results in UTF8 encoding. But unfortunately only data is marked as > UTF8, while column names are not. It looks to me like basically everywhere in plperl.c that does newSVpv() should follow it with #if PERL_BCDVERSION >= 0x5006000L if (GetDatabaseEncoding() == PG_UTF8) SvUTF8_on(sv); #endif whereas currently there are only a couple of places that do that. I'm tempted to consolidate this into a function on the order of newSVstring(const char *) or some such. Comments? regards, tom lane
Re: [BUGS] BUG #2683: spi_exec_query in plperl returns column names which are not marked as UTF8
From
Tom Lane
Date:
I wrote: > It looks to me like basically everywhere in plperl.c that does newSVpv() > should follow it with > > #if PERL_BCDVERSION >= 0x5006000L > if (GetDatabaseEncoding() == PG_UTF8) > SvUTF8_on(sv); > #endif Experimentation proved that this was insufficient to fix Vitali's problem --- the string he's unhappy about is actually a hash key entry, and there's no documented way to mark the second argument of hv_store() as being a UTF-8 string. Some digging in the Perl source code found that since at least Perl 5.8.0, hv_fetch and hv_store recognize a negative key length as meaning a UTF-8 key (ick!!), so I used that hack. I am not sure there is any reasonable fix available in Perl 5.6.x. Attached patch applied to HEAD, but I'm not going to risk back-patching it without some field testing. regards, tom lane *** src/pl/plperl/plperl.c.orig Tue Oct 3 23:17:16 2006 --- src/pl/plperl/plperl.c Sun Oct 15 14:47:27 2006 *************** *** 114,119 **** --- 114,122 ---- static SV *plperl_hash_from_tuple(HeapTuple tuple, TupleDesc tupdesc); static void plperl_init_shared_libs(pTHX); static HV *plperl_spi_execute_fetch_result(SPITupleTable *, int, int); + static SV *newSVstring(const char *str); + static SV **hv_store_string(HV *hv, const char *key, SV *val); + static SV **hv_fetch_string(HV *hv, const char *key); /* * This routine is a crock, and so is everyplace that calls it. The problem *************** *** 471,531 **** ) ); ! hv_store(hv, "name", 4, newSVpv(tdata->tg_trigger->tgname, 0), 0); ! hv_store(hv, "relid", 5, newSVpv(relid, 0), 0); if (TRIGGER_FIRED_BY_INSERT(tdata->tg_event)) { event = "INSERT"; if (TRIGGER_FIRED_FOR_ROW(tdata->tg_event)) ! hv_store(hv, "new", 3, ! plperl_hash_from_tuple(tdata->tg_trigtuple, tupdesc), ! 0); } else if (TRIGGER_FIRED_BY_DELETE(tdata->tg_event)) { event = "DELETE"; if (TRIGGER_FIRED_FOR_ROW(tdata->tg_event)) ! hv_store(hv, "old", 3, ! plperl_hash_from_tuple(tdata->tg_trigtuple, tupdesc), ! 0); } else if (TRIGGER_FIRED_BY_UPDATE(tdata->tg_event)) { event = "UPDATE"; if (TRIGGER_FIRED_FOR_ROW(tdata->tg_event)) { ! hv_store(hv, "old", 3, ! plperl_hash_from_tuple(tdata->tg_trigtuple, tupdesc), ! 0); ! hv_store(hv, "new", 3, ! plperl_hash_from_tuple(tdata->tg_newtuple, tupdesc), ! 0); } } else event = "UNKNOWN"; ! hv_store(hv, "event", 5, newSVpv(event, 0), 0); ! hv_store(hv, "argc", 4, newSViv(tdata->tg_trigger->tgnargs), 0); if (tdata->tg_trigger->tgnargs > 0) { AV *av = newAV(); for (i = 0; i < tdata->tg_trigger->tgnargs; i++) ! av_push(av, newSVpv(tdata->tg_trigger->tgargs[i], 0)); ! hv_store(hv, "args", 4, newRV_noinc((SV *) av), 0); } ! hv_store(hv, "relname", 7, ! newSVpv(SPI_getrelname(tdata->tg_relation), 0), 0); ! hv_store(hv, "table_name", 10, ! newSVpv(SPI_getrelname(tdata->tg_relation), 0), 0); ! hv_store(hv, "table_schema", 12, ! newSVpv(SPI_getnspname(tdata->tg_relation), 0), 0); if (TRIGGER_FIRED_BEFORE(tdata->tg_event)) when = "BEFORE"; --- 474,534 ---- ) ); ! hv_store_string(hv, "name", newSVstring(tdata->tg_trigger->tgname)); ! hv_store_string(hv, "relid", newSVstring(relid)); if (TRIGGER_FIRED_BY_INSERT(tdata->tg_event)) { event = "INSERT"; if (TRIGGER_FIRED_FOR_ROW(tdata->tg_event)) ! hv_store_string(hv, "new", ! plperl_hash_from_tuple(tdata->tg_trigtuple, ! tupdesc)); } else if (TRIGGER_FIRED_BY_DELETE(tdata->tg_event)) { event = "DELETE"; if (TRIGGER_FIRED_FOR_ROW(tdata->tg_event)) ! hv_store_string(hv, "old", ! plperl_hash_from_tuple(tdata->tg_trigtuple, ! tupdesc)); } else if (TRIGGER_FIRED_BY_UPDATE(tdata->tg_event)) { event = "UPDATE"; if (TRIGGER_FIRED_FOR_ROW(tdata->tg_event)) { ! hv_store_string(hv, "old", ! plperl_hash_from_tuple(tdata->tg_trigtuple, ! tupdesc)); ! hv_store_string(hv, "new", ! plperl_hash_from_tuple(tdata->tg_newtuple, ! tupdesc)); } } else event = "UNKNOWN"; ! hv_store_string(hv, "event", newSVstring(event)); ! hv_store_string(hv, "argc", newSViv(tdata->tg_trigger->tgnargs)); if (tdata->tg_trigger->tgnargs > 0) { AV *av = newAV(); for (i = 0; i < tdata->tg_trigger->tgnargs; i++) ! av_push(av, newSVstring(tdata->tg_trigger->tgargs[i])); ! hv_store_string(hv, "args", newRV_noinc((SV *) av)); } ! hv_store_string(hv, "relname", ! newSVstring(SPI_getrelname(tdata->tg_relation))); ! hv_store_string(hv, "table_name", ! newSVstring(SPI_getrelname(tdata->tg_relation))); ! hv_store_string(hv, "table_schema", ! newSVstring(SPI_getnspname(tdata->tg_relation))); if (TRIGGER_FIRED_BEFORE(tdata->tg_event)) when = "BEFORE"; *************** *** 533,539 **** when = "AFTER"; else when = "UNKNOWN"; ! hv_store(hv, "when", 4, newSVpv(when, 0), 0); if (TRIGGER_FIRED_FOR_ROW(tdata->tg_event)) level = "ROW"; --- 536,542 ---- when = "AFTER"; else when = "UNKNOWN"; ! hv_store_string(hv, "when", newSVstring(when)); if (TRIGGER_FIRED_FOR_ROW(tdata->tg_event)) level = "ROW"; *************** *** 541,547 **** level = "STATEMENT"; else level = "UNKNOWN"; ! hv_store(hv, "level", 5, newSVpv(level, 0), 0); return newRV_noinc((SV *) hv); } --- 544,550 ---- level = "STATEMENT"; else level = "UNKNOWN"; ! hv_store_string(hv, "level", newSVstring(level)); return newRV_noinc((SV *) hv); } *************** *** 567,573 **** tupdesc = tdata->tg_relation->rd_att; ! svp = hv_fetch(hvTD, "new", 3, FALSE); if (!svp) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_COLUMN), --- 570,576 ---- tupdesc = tdata->tg_relation->rd_att; ! svp = hv_fetch_string(hvTD, "new"); if (!svp) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_COLUMN), *************** *** 741,749 **** } ! /* Uses mksafefunc/mkunsafefunc to create an anonymous sub whose text is ! * supplied in s, and returns a reference to the closure. */ ! static SV * plperl_create_sub(char *s, bool trusted) { --- 744,753 ---- } ! /* ! * Uses mksafefunc/mkunsafefunc to create an anonymous sub whose text is ! * supplied in s, and returns a reference to the closure. ! */ static SV * plperl_create_sub(char *s, bool trusted) { *************** *** 761,768 **** ENTER; SAVETMPS; PUSHMARK(SP); ! XPUSHs(sv_2mortal(newSVpv("our $_TD; local $_TD=$_[0]; shift;", 0))); ! XPUSHs(sv_2mortal(newSVpv(s, 0))); PUTBACK; /* --- 765,772 ---- ENTER; SAVETMPS; PUSHMARK(SP); ! XPUSHs(sv_2mortal(newSVstring("our $_TD; local $_TD=$_[0]; shift;"))); ! XPUSHs(sv_2mortal(newSVstring(s))); PUTBACK; /* *************** *** 900,910 **** tmp = OutputFunctionCall(&(desc->arg_out_func[i]), fcinfo->arg[i]); ! sv = newSVpv(tmp, 0); ! #if PERL_BCDVERSION >= 0x5006000L ! if (GetDatabaseEncoding() == PG_UTF8) ! SvUTF8_on(sv); ! #endif XPUSHs(sv_2mortal(sv)); pfree(tmp); } --- 904,910 ---- tmp = OutputFunctionCall(&(desc->arg_out_func[i]), fcinfo->arg[i]); ! sv = newSVstring(tmp); XPUSHs(sv_2mortal(sv)); pfree(tmp); } *************** *** 965,971 **** tg_trigger = ((TriggerData *) fcinfo->context)->tg_trigger; for (i = 0; i < tg_trigger->tgnargs; i++) ! XPUSHs(sv_2mortal(newSVpv(tg_trigger->tgargs[i], 0))); PUTBACK; /* Do NOT use G_KEEPERR here */ --- 965,971 ---- tg_trigger = ((TriggerData *) fcinfo->context)->tg_trigger; for (i = 0; i < tg_trigger->tgnargs; i++) ! XPUSHs(sv_2mortal(newSVstring(tg_trigger->tgargs[i]))); PUTBACK; /* Do NOT use G_KEEPERR here */ *************** *** 1256,1262 **** HeapTuple procTup; Form_pg_proc procStruct; char internal_proname[64]; - int proname_len; plperl_proc_desc *prodesc = NULL; int i; SV **svp; --- 1256,1261 ---- *************** *** 1277,1288 **** else sprintf(internal_proname, "__PLPerl_proc_%u_trigger", fn_oid); - proname_len = strlen(internal_proname); - /************************************************************ * Lookup the internal proc name in the hashtable ************************************************************/ ! svp = hv_fetch(plperl_proc_hash, internal_proname, proname_len, FALSE); if (svp) { bool uptodate; --- 1276,1285 ---- else sprintf(internal_proname, "__PLPerl_proc_%u_trigger", fn_oid); /************************************************************ * Lookup the internal proc name in the hashtable ************************************************************/ ! svp = hv_fetch_string(plperl_proc_hash, internal_proname); if (svp) { bool uptodate; *************** *** 1484,1491 **** internal_proname); } ! hv_store(plperl_proc_hash, internal_proname, proname_len, ! newSVuv(PTR2UV(prodesc)), 0); } ReleaseSysCache(procTup); --- 1481,1488 ---- internal_proname); } ! hv_store_string(plperl_proc_hash, internal_proname, ! newSVuv(PTR2UV(prodesc))); } ReleaseSysCache(procTup); *************** *** 1512,1547 **** char *outputstr; Oid typoutput; bool typisvarlena; - int namelen; - SV *sv; if (tupdesc->attrs[i]->attisdropped) continue; attname = NameStr(tupdesc->attrs[i]->attname); - namelen = strlen(attname); attr = heap_getattr(tuple, i + 1, tupdesc, &isnull); if (isnull) { /* Store (attname => undef) and move on. */ ! hv_store(hv, attname, namelen, newSV(0), 0); continue; } /* XXX should have a way to cache these lookups */ - getTypeOutputInfo(tupdesc->attrs[i]->atttypid, &typoutput, &typisvarlena); outputstr = OidOutputFunctionCall(typoutput, attr); ! sv = newSVpv(outputstr, 0); ! #if PERL_BCDVERSION >= 0x5006000L ! if (GetDatabaseEncoding() == PG_UTF8) ! SvUTF8_on(sv); ! #endif ! hv_store(hv, attname, namelen, sv, 0); pfree(outputstr); } --- 1509,1535 ---- char *outputstr; Oid typoutput; bool typisvarlena; if (tupdesc->attrs[i]->attisdropped) continue; attname = NameStr(tupdesc->attrs[i]->attname); attr = heap_getattr(tuple, i + 1, tupdesc, &isnull); if (isnull) { /* Store (attname => undef) and move on. */ ! hv_store_string(hv, attname, newSV(0)); continue; } /* XXX should have a way to cache these lookups */ getTypeOutputInfo(tupdesc->attrs[i]->atttypid, &typoutput, &typisvarlena); outputstr = OidOutputFunctionCall(typoutput, attr); ! hv_store_string(hv, attname, newSVstring(outputstr)); pfree(outputstr); } *************** *** 1627,1636 **** result = newHV(); ! hv_store(result, "status", strlen("status"), ! newSVpv((char *) SPI_result_code_string(status), 0), 0); ! hv_store(result, "processed", strlen("processed"), ! newSViv(processed), 0); if (status > 0 && tuptable) { --- 1615,1624 ---- result = newHV(); ! hv_store_string(result, "status", ! newSVstring(SPI_result_code_string(status))); ! hv_store_string(result, "processed", ! newSViv(processed)); if (status > 0 && tuptable) { *************** *** 1644,1651 **** row = plperl_hash_from_tuple(tuptable->vals[i], tuptable->tupdesc); av_push(rows, row); } ! hv_store(result, "rows", strlen("rows"), ! newRV_noinc((SV *) rows), 0); } SPI_freetuptable(tuptable); --- 1632,1639 ---- row = plperl_hash_from_tuple(tuptable->vals[i], tuptable->tupdesc); av_push(rows, row); } ! hv_store_string(result, "rows", ! newRV_noinc((SV *) rows)); } SPI_freetuptable(tuptable); *************** *** 1811,1817 **** if (portal == NULL) elog(ERROR, "SPI_cursor_open() failed:%s", SPI_result_code_string(SPI_result)); ! cursor = newSVpv(portal->name, 0); /* Commit the inner transaction, return to outer xact context */ ReleaseCurrentSubTransaction(); --- 1799,1805 ---- if (portal == NULL) elog(ERROR, "SPI_cursor_open() failed:%s", SPI_result_code_string(SPI_result)); ! cursor = newSVstring(portal->name); /* Commit the inner transaction, return to outer xact context */ ReleaseCurrentSubTransaction(); *************** *** 2065,2073 **** * Insert a hashtable entry for the plan and return * the key to the caller. ************************************************************/ ! hv_store(plperl_query_hash, qdesc->qname, strlen(qdesc->qname), newSVuv(PTR2UV(qdesc)), 0); ! return newSVpv(qdesc->qname, strlen(qdesc->qname)); } HV * --- 2053,2061 ---- * Insert a hashtable entry for the plan and return * the key to the caller. ************************************************************/ ! hv_store_string(plperl_query_hash, qdesc->qname, newSVuv(PTR2UV(qdesc))); ! return newSVstring(qdesc->qname); } HV * *************** *** 2098,2104 **** /************************************************************ * Fetch the saved plan descriptor, see if it's o.k. ************************************************************/ ! sv = hv_fetch(plperl_query_hash, query, strlen(query), 0); if (sv == NULL) elog(ERROR, "spi_exec_prepared: Invalid prepared query passed"); if (*sv == NULL || !SvOK(*sv)) --- 2086,2092 ---- /************************************************************ * Fetch the saved plan descriptor, see if it's o.k. ************************************************************/ ! sv = hv_fetch_string(plperl_query_hash, query); if (sv == NULL) elog(ERROR, "spi_exec_prepared: Invalid prepared query passed"); if (*sv == NULL || !SvOK(*sv)) *************** *** 2118,2124 **** limit = 0; if (attr != NULL) { ! sv = hv_fetch(attr, "limit", 5, 0); if (*sv && SvIOK(*sv)) limit = SvIV(*sv); } --- 2106,2112 ---- limit = 0; if (attr != NULL) { ! sv = hv_fetch_string(attr, "limit"); if (*sv && SvIOK(*sv)) limit = SvIV(*sv); } *************** *** 2239,2245 **** /************************************************************ * Fetch the saved plan descriptor, see if it's o.k. ************************************************************/ ! sv = hv_fetch(plperl_query_hash, query, strlen(query), 0); if (sv == NULL) elog(ERROR, "spi_query_prepared: Invalid prepared query passed"); if (*sv == NULL || !SvOK(*sv)) --- 2227,2233 ---- /************************************************************ * Fetch the saved plan descriptor, see if it's o.k. ************************************************************/ ! sv = hv_fetch_string(plperl_query_hash, query); if (sv == NULL) elog(ERROR, "spi_query_prepared: Invalid prepared query passed"); if (*sv == NULL || !SvOK(*sv)) *************** *** 2301,2307 **** elog(ERROR, "SPI_cursor_open() failed:%s", SPI_result_code_string(SPI_result)); ! cursor = newSVpv(portal->name, 0); /* Commit the inner transaction, return to outer xact context */ ReleaseCurrentSubTransaction(); --- 2289,2295 ---- elog(ERROR, "SPI_cursor_open() failed:%s", SPI_result_code_string(SPI_result)); ! cursor = newSVstring(portal->name); /* Commit the inner transaction, return to outer xact context */ ReleaseCurrentSubTransaction(); *************** *** 2353,2359 **** void *plan; plperl_query_desc *qdesc; ! sv = hv_fetch(plperl_query_hash, query, strlen(query), 0); if (sv == NULL) elog(ERROR, "spi_exec_freeplan: Invalid prepared query passed"); if (*sv == NULL || !SvOK(*sv)) --- 2341,2347 ---- void *plan; plperl_query_desc *qdesc; ! sv = hv_fetch_string(plperl_query_hash, query); if (sv == NULL) elog(ERROR, "spi_exec_freeplan: Invalid prepared query passed"); if (*sv == NULL || !SvOK(*sv)) *************** *** 2375,2378 **** --- 2363,2422 ---- free(qdesc); SPI_freeplan(plan); + } + + /* + * Create a new SV from a string assumed to be in the current database's + * encoding. + */ + static SV * + newSVstring(const char *str) + { + SV *sv; + + sv = newSVpv(str, 0); + #if PERL_BCDVERSION >= 0x5006000L + if (GetDatabaseEncoding() == PG_UTF8) + SvUTF8_on(sv); + #endif + return sv; + } + + /* + * Store an SV into a hash table under a key that is a string assumed to be + * in the current database's encoding. + */ + static SV ** + hv_store_string(HV *hv, const char *key, SV *val) + { + int32 klen = strlen(key); + + /* + * This seems nowhere documented, but under Perl 5.8.0 and up, + * hv_store() recognizes a negative klen parameter as meaning + * a UTF-8 encoded key. It does not appear that hashes track + * UTF-8-ness of keys at all in Perl 5.6. + */ + #if PERL_BCDVERSION >= 0x5008000L + if (GetDatabaseEncoding() == PG_UTF8) + klen = -klen; + #endif + return hv_store(hv, key, klen, val, 0); + } + + /* + * Fetch an SV from a hash table under a key that is a string assumed to be + * in the current database's encoding. + */ + static SV ** + hv_fetch_string(HV *hv, const char *key) + { + int32 klen = strlen(key); + + /* See notes in hv_store_string */ + #if PERL_BCDVERSION >= 0x5008000L + if (GetDatabaseEncoding() == PG_UTF8) + klen = -klen; + #endif + return hv_fetch(hv, key, klen, 0); }
Tom Lane wrote: > I wrote: >> It looks to me like basically everywhere in plperl.c that does newSVpv() >> should follow it with >> >> #if PERL_BCDVERSION >= 0x5006000L >> if (GetDatabaseEncoding() == PG_UTF8) >> SvUTF8_on(sv); >> #endif > > Experimentation proved that this was insufficient to fix Vitali's > problem --- the string he's unhappy about is actually a hash key entry, > and there's no documented way to mark the second argument of hv_store() > as being a UTF-8 string. Some digging in the Perl source code found > that since at least Perl 5.8.0, hv_fetch and hv_store recognize a > negative key length as meaning a UTF-8 key (ick!!), so I used that hack. > I am not sure there is any reasonable fix available in Perl 5.6.x. > > Attached patch applied to HEAD, but I'm not going to risk back-patching > it without some field testing. > Hmm. That negative pointer hack is mighty ugly. I am also wondering, now that it's been raised, if we need to issue a "use utf8;" in the startup code, so that literals in the code get the right encoding. cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > I am also wondering, now that it's been raised, if we need to issue a "use > utf8;" in the startup code, so that literals in the code get the right > encoding. Good question. I took care to ensure that the code strings passed to Perl are marked as UTF8; perhaps that makes it happen implicitly? If not, are there any downsides to issuing "use utf8"? regards, tom lane
On Sun, Oct 15, 2006 at 04:50:15PM -0500, Andrew Dunstan wrote: > Tom Lane wrote: > > I wrote: > >> It looks to me like basically everywhere in plperl.c that does > >> newSVpv() should follow it with > >> > >> #if PERL_BCDVERSION >= 0x5006000L > >> if (GetDatabaseEncoding() == PG_UTF8) > >> SvUTF8_on(sv); > >> #endif > > > > Experimentation proved that this was insufficient to fix Vitali's > > problem --- the string he's unhappy about is actually a hash key > > entry, and there's no documented way to mark the second argument > > of hv_store() as being a UTF-8 string. Some digging in the Perl > > source code found that since at least Perl 5.8.0, hv_fetch and > > hv_store recognize a negative key length as meaning a UTF-8 key > > (ick!!), so I used that hack. I am not sure there is any > > reasonable fix available in Perl 5.6.x. > > > > Attached patch applied to HEAD, but I'm not going to risk > > back-patching it without some field testing. > > Hmm. That negative pointer hack is mighty ugly. > > I am also wondering, now that it's been raised, if we need to issue > a "use utf8;" in the startup code, so that literals in the code get > the right encoding. That would be a reason to go to 5.8, as 'use utf8;' is tricky at best in 5.6. Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
On Sun, Oct 15, 2006 at 06:15:27PM -0400, Tom Lane wrote: > "Andrew Dunstan" <andrew@dunslane.net> writes: > > I am also wondering, now that it's been raised, if we need to issue a "use > > utf8;" in the startup code, so that literals in the code get the right > > encoding. > > Good question. I took care to ensure that the code strings passed to > Perl are marked as UTF8; perhaps that makes it happen implicitly? > If not, are there any downsides to issuing "use utf8"? What "use utf8" does is allow the *source* to be in utf8, thus affecting what's a valid identifier and such. It doesn't affect the data, for that you need "use encoding 'utf8'". It's clear whether you actually want to allow people to put utf8 characters directly into their source (especially if the database is not in utf8 encoding anyway). There is always the \u{xxxx} escape. The perlunicode man page describe it better, though I only have perl5.8. In know the perl5.6 model was different and somewhat more awkward to use. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > It's clear whether you actually want to allow people to put utf8 > characters directly into their source (especially if the database is > not in utf8 encoding anyway). There is always the \u{xxxx} escape. Well, if the database encoding isn't utf8 then we'd not issue any such command anyway. But if it is, then AFAICS the text of pg_proc entries could be expected to be utf8 too. regards, tom lane