Thread: quoting psql varible as identifier
Hello I am working on new patch. I see a problem with copy quote_ident on client side. This function call ScanKeywordLookup function. const ScanKeyword *keyword = ScanKeywordLookup(ident, ScanKeywords, NumScanKeywords); so we cannot simply implement quote_ident on client side :(. So we have to use a real query. It is acceptable for you? Pavel
Pavel Stehule <pavel.stehule@gmail.com> writes: > so we cannot simply implement quote_ident on client side :(. So we > have to use a real query. > It is acceptable for you? No, certainly not --- that adds a boatload of failure conditions. Just quote it unconditionally. regards, tom lane
On Dec 29, 2009, at 8:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hello > > I am working on new patch. I see a problem with copy quote_ident on > client side. This function call ScanKeywordLookup function. > > const ScanKeyword *keyword = ScanKeywordLookup(ident, > > ScanKeywords, > > NumScanKeywords); > > so we cannot simply implement quote_ident on client side :(. So we > have to use a real query. > > It is acceptable for you? No. It has to be client-side. ...Robert
2009/12/29 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> so we cannot simply implement quote_ident on client side :(. So we >> have to use a real query. > >> It is acceptable for you? > > No, certainly not --- that adds a boatload of failure conditions. > Just quote it unconditionally. ok this function have to live in libpq - it needs some not exported functionality. But, it would not be a problem. Pavel > > regards, tom lane >
Pavel Stehule escribió: > 2009/12/29 Tom Lane <tgl@sss.pgh.pa.us>: > > Pavel Stehule <pavel.stehule@gmail.com> writes: > >> so we cannot simply implement quote_ident on client side :(. So we > >> have to use a real query. > > > >> It is acceptable for you? > > > > No, certainly not --- that adds a boatload of failure conditions. > > Just quote it unconditionally. > > ok > > this function have to live in libpq - it needs some not exported > functionality. But, it would not be a problem. Can we use a trick similar to pg_dump's? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2009/12/29 Alvaro Herrera <alvherre@commandprompt.com>: > Pavel Stehule escribió: >> 2009/12/29 Tom Lane <tgl@sss.pgh.pa.us>: >> > Pavel Stehule <pavel.stehule@gmail.com> writes: >> >> so we cannot simply implement quote_ident on client side :(. So we >> >> have to use a real query. >> > >> >> It is acceptable for you? >> > >> > No, certainly not --- that adds a boatload of failure conditions. >> > Just quote it unconditionally. >> >> ok >> >> this function have to live in libpq - it needs some not exported >> functionality. But, it would not be a problem. > > Can we use a trick similar to pg_dump's? ?? Pavel > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support >
2009/12/29 Alvaro Herrera <alvherre@commandprompt.com>: > Pavel Stehule escribió: >> 2009/12/29 Tom Lane <tgl@sss.pgh.pa.us>: >> > Pavel Stehule <pavel.stehule@gmail.com> writes: >> >> so we cannot simply implement quote_ident on client side :(. So we >> >> have to use a real query. >> > >> >> It is acceptable for you? >> > >> > No, certainly not --- that adds a boatload of failure conditions. >> > Just quote it unconditionally. >> >> ok >> >> this function have to live in libpq - it needs some not exported >> functionality. But, it would not be a problem. > > Can we use a trick similar to pg_dump's? > I see it - we can move function (content) fmtId from dumputils.c to libpq. Pavel > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support >
Pavel Stehule escribió: > 2009/12/29 Alvaro Herrera <alvherre@commandprompt.com>: > > Pavel Stehule escribió: > >> 2009/12/29 Tom Lane <tgl@sss.pgh.pa.us>: > >> > Pavel Stehule <pavel.stehule@gmail.com> writes: > >> >> so we cannot simply implement quote_ident on client side :(. So we > >> >> have to use a real query. > >> > > >> >> It is acceptable for you? > >> > > >> > No, certainly not --- that adds a boatload of failure conditions. > >> > Just quote it unconditionally. > >> > >> ok > >> > >> this function have to live in libpq - it needs some not exported > >> functionality. But, it would not be a problem. > > > > Can we use a trick similar to pg_dump's? > > ?? See src/bin/pg_dump/keywords.c and fmtId in dumputils.c -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2009/12/29 Alvaro Herrera <alvherre@commandprompt.com>: > Pavel Stehule escribió: >> 2009/12/29 Alvaro Herrera <alvherre@commandprompt.com>: >> > Pavel Stehule escribió: >> >> 2009/12/29 Tom Lane <tgl@sss.pgh.pa.us>: >> >> > Pavel Stehule <pavel.stehule@gmail.com> writes: >> >> >> so we cannot simply implement quote_ident on client side :(. So we >> >> >> have to use a real query. >> >> > >> >> >> It is acceptable for you? >> >> > >> >> > No, certainly not --- that adds a boatload of failure conditions. >> >> > Just quote it unconditionally. >> >> >> >> ok >> >> >> >> this function have to live in libpq - it needs some not exported >> >> functionality. But, it would not be a problem. >> > >> > Can we use a trick similar to pg_dump's? >> >> ?? > > See src/bin/pg_dump/keywords.c and fmtId in dumputils.c I found it. It is maybe too complex for using in psql. Pavel > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support >
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2009/12/29 Alvaro Herrera <alvherre@commandprompt.com>: >> Can we use a trick similar to pg_dump's? > I see it - we can move function (content) fmtId from dumputils.c to libpq. This is not a good idea. pg_dump can be expected to be up-to-date with the backend's keyword list, but libpq cannot. Just quote the thing unconditionally. It's not worth working harder than that anyway. regards, tom lane
2009/12/29 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2009/12/29 Alvaro Herrera <alvherre@commandprompt.com>: >>> Can we use a trick similar to pg_dump's? > >> I see it - we can move function (content) fmtId from dumputils.c to libpq. > > This is not a good idea. pg_dump can be expected to be up-to-date with > the backend's keyword list, but libpq cannot. > > Just quote the thing unconditionally. It's not worth working harder > than that anyway. I see it. Pavel > > regards, tom lane >
hello here is patch pavel@postgres:5432=# \set foo 'hello world' pavel@postgres:5432=# SELECT :'foo' AS :"foo"; hello world ------------- hello world (1 row) Regards Pavel 2009/12/29 Pavel Stehule <pavel.stehule@gmail.com>: > 2009/12/29 Tom Lane <tgl@sss.pgh.pa.us>: >> Pavel Stehule <pavel.stehule@gmail.com> writes: >>> 2009/12/29 Alvaro Herrera <alvherre@commandprompt.com>: >>>> Can we use a trick similar to pg_dump's? >> >>> I see it - we can move function (content) fmtId from dumputils.c to libpq. >> >> This is not a good idea. pg_dump can be expected to be up-to-date with >> the backend's keyword list, but libpq cannot. >> >> Just quote the thing unconditionally. It's not worth working harder >> than that anyway. > > I see it. > > Pavel > >> >> regards, tom lane >> >
Attachment
On Tue, Dec 29, 2009 at 3:19 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > here is patch The error handling in quote_literal() doesn't look right to me. The documentation for PQescapeStringConn says that it stores an error message in the conn object, but your code ignores that and prints out a generic message instead. That doesn't seem right: but then it further goes on to call exit(1), which seems like a considerable overreaction to an encoding violation, which is apparently the only class of error PQescapeStringConn() is documented to throw. ...Robert
2009/12/30 Robert Haas <robertmhaas@gmail.com>: > On Tue, Dec 29, 2009 at 3:19 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> here is patch > > The error handling in quote_literal() doesn't look right to me. The > documentation for PQescapeStringConn says that it stores an error > message in the conn object, but your code ignores that and prints out > a generic message instead. That doesn't seem right: but then it > further goes on to call exit(1), which seems like a considerable > overreaction to an encoding violation, which is apparently the only > class of error PQescapeStringConn() is documented to throw. Actually I am not sure about the adequate solution. Now, the lexer doesn't return any error. Any handled errors are fatal, and lexer (and psql) is finished with exit(1) - so there are not checking status after any lexer call. It is question if we have to do it. Because error will be raised in next stage: "Presently the only possible error conditions involve invalid multibyte encoding in the source string. The output string is still generated on error, but it can be expected that the server will reject it as malformed. On error, a suitable message is stored in the conn object, whether or not error is NULL." http://www.postgresql.org/docs/8.4/static/libpq-exec.html#LIBPQ-EXEC-ESCAPE-STRING so probably - we can quietly ignore this error without security or functionality issues. I see two solution: a) print correct message and exit(1) b) quietly ignore this error - it's more warning than error, because error will be raised immediately Third variant, checking lexer status over every call is maybe non adequate to frequency of this error and an importance of this patch. I am for a. Regards, and happy new year Pavel > > ...Robert >
Pavel Stehule <pavel.stehule@gmail.com> writes: > a) print correct message and exit(1) Which part of "no, you're not doing that" wasn't clear to you? The error handling in this function should be no different from that in the existing escaping functions. exit(1) is completely unacceptable. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > here is patch I looked at this patch a bit, and I think the real problem with it is that it's not multibyte safe. You've copied backend code that is allowed to assume it's in a safe encoding (ie, one where multibyte characters can't contain non-high-bit-set bytes). This is not okay on the client side, see SJIS and similar encodings. Where you need to start out is by cloning PQescapeStringConn, which does a similar type of transformation correctly even in unsafe encodings. I think we'd agreed upthread that libpq should provide PQescapeIdentifier functionality anyhow. Once you've actually read that code, you'll realize that it's okay to treat the error result as a warning, which resolves the other point of concern. Just print the message and use the result anyway. regards, tom lane
2010/1/2 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> a) print correct message and exit(1) > > Which part of "no, you're not doing that" wasn't clear to you? > > The error handling in this function should be no different from that > in the existing escaping functions. exit(1) is completely unacceptable. > Are we talking we about error handling in psql lexer? What I know, in psql there are no any escaping function now. Yes, exit(1) is maybe too hard - but it is safe. I didn't write PQescapeStringConn function, and I am not able to speak, we can ignore this error result. Full handling of this possible error, means to add some CHECK over any lexer call. Pavel > regards, tom lane >
2010/1/2 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> here is patch > > I looked at this patch a bit, and I think the real problem with it is > that it's not multibyte safe. You've copied backend code that is > allowed to assume it's in a safe encoding (ie, one where multibyte > characters can't contain non-high-bit-set bytes). This is not okay > on the client side, see SJIS and similar encodings. > this code is taken from pg_dump, so if I understand it well, this is littl bit different case. > Where you need to start out is by cloning PQescapeStringConn, which does > a similar type of transformation correctly even in unsafe encodings. > I think we'd agreed upthread that libpq should provide > PQescapeIdentifier functionality anyhow. > ok > Once you've actually read that code, you'll realize that it's okay to > treat the error result as a warning, which resolves the other point > of concern. Just print the message and use the result anyway. ok. Pavel > > regards, tom lane >
hello 2010/1/2 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> here is patch > > I looked at this patch a bit, and I think the real problem with it is > that it's not multibyte safe. You've copied backend code that is > allowed to assume it's in a safe encoding (ie, one where multibyte > characters can't contain non-high-bit-set bytes). This is not okay > on the client side, see SJIS and similar encodings. > > Where you need to start out is by cloning PQescapeStringConn, which does > a similar type of transformation correctly even in unsafe encodings. > I think we'd agreed upthread that libpq should provide > PQescapeIdentifier functionality anyhow. > I am looking on psql directory. Now I found, so in this directory is linked dumputil.c - It could little bit to help us. I have one question. If I understand well, the function fmtId isn't multibyte safe? So why is possible to use it in pg_dump? Pavel > Once you've actually read that code, you'll realize that it's okay to > treat the error result as a warning, which resolves the other point > of concern. Just print the message and use the result anyway. > > regards, tom lane >
Hello I talked with Hitoshi Harada, and fmtId function is safe (minimally for Japanese case). This function working without any errors, so we must not duplicate a code. Pavel 2010/1/4 Pavel Stehule <pavel.stehule@gmail.com>: > hello > > 2010/1/2 Tom Lane <tgl@sss.pgh.pa.us>: >> Pavel Stehule <pavel.stehule@gmail.com> writes: >>> here is patch >> >> I looked at this patch a bit, and I think the real problem with it is >> that it's not multibyte safe. You've copied backend code that is >> allowed to assume it's in a safe encoding (ie, one where multibyte >> characters can't contain non-high-bit-set bytes). This is not okay >> on the client side, see SJIS and similar encodings. >> >> Where you need to start out is by cloning PQescapeStringConn, which does >> a similar type of transformation correctly even in unsafe encodings. >> I think we'd agreed upthread that libpq should provide >> PQescapeIdentifier functionality anyhow. >> > > I am looking on psql directory. Now I found, so in this directory is > linked dumputil.c - It could little bit to help us. > > I have one question. If I understand well, the function fmtId isn't > multibyte safe? So why is possible to use it in pg_dump? > > Pavel > >> Once you've actually read that code, you'll realize that it's okay to >> treat the error result as a warning, which resolves the other point >> of concern. Just print the message and use the result anyway. >> >> regards, tom lane >> >
Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes: > I have one question. If I understand well, the function fmtId isn't > multibyte safe? So why is possible to use it in pg_dump? pg_dump is only guaranteed to work correctly in the server encoding. If you force it to use a client_encoding different from the server's, it might or might not work, for reasons far beyond that one --- the big problem usually is data containing characters that have no equivalent in the client encoding. So I'm not particularly excited about whether fmtId is multibyte safe within pg_dump. If we were to try to use it in more general contexts, it would probably need more work. regards, tom lane
2010/1/4 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> I have one question. If I understand well, the function fmtId isn't >> multibyte safe? So why is possible to use it in pg_dump? > > pg_dump is only guaranteed to work correctly in the server encoding. > If you force it to use a client_encoding different from the server's, > it might or might not work, for reasons far beyond that one --- the > big problem usually is data containing characters that have no > equivalent in the client encoding. So I'm not particularly excited > about whether fmtId is multibyte safe within pg_dump. If we were to try > to use it in more general contexts, it would probably need more work. I could agree with this explanation for quote_identifier function, but not in 100% for fmtId function. We can change encoding for pg_dump (option -E). I don't have a problem to write second and safe fmtId function (with technique used in dumputils don't need to modify libpq), although fmtId do exactly what I need. I would to understand to behave. Pavel > > regards, tom lane >
On Mon, Jan 4, 2010 at 2:51 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/4 Tom Lane <tgl@sss.pgh.pa.us>: >> Pavel Stehule <pavel.stehule@gmail.com> writes: >>> I have one question. If I understand well, the function fmtId isn't >>> multibyte safe? So why is possible to use it in pg_dump? >> >> pg_dump is only guaranteed to work correctly in the server encoding. >> If you force it to use a client_encoding different from the server's, >> it might or might not work, for reasons far beyond that one --- the >> big problem usually is data containing characters that have no >> equivalent in the client encoding. So I'm not particularly excited >> about whether fmtId is multibyte safe within pg_dump. If we were to try >> to use it in more general contexts, it would probably need more work. > > I could agree with this explanation for quote_identifier function, but > not in 100% for fmtId function. We can change encoding for pg_dump > (option -E). But if it's not guaranteed to work, the fact that you can do it is irrelevant. > I don't have a problem to write second and safe fmtId > function (with technique used in dumputils don't need to modify > libpq), although fmtId do exactly what I need. I would to understand > to behave. I think you mean that you would need to understand how it should behave - in which case I agree, but I think Tom spelled that out pretty clearly upthread: close PQescapeStringConn and adapt it to be PQescapeIdentifier. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 4, 2010 at 2:51 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I don't have a problem to write second and safe fmtId >> function (with technique used in dumputils don't need to modify >> libpq), although fmtId do exactly what I need. I would to understand >> to behave. > I think you mean that you would need to understand how it should > behave - in which case I agree, but I think Tom spelled that out > pretty clearly upthread: close PQescapeStringConn and adapt it to be > PQescapeIdentifier. The more important point here is that fmtId doesn't do "exactly what you need" in any case. fmtId is safe to use in pg_dump because pg_dump is only expected to work with the same or older version of the backend. It would not be safe to use it in libpq, which is expected to still work with newer backends that might have more reserved words. regards, tom lane
2010/1/5 Tom Lane <tgl@sss.pgh.pa.us>: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Jan 4, 2010 at 2:51 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> I don't have a problem to write second and safe fmtId >>> function (with technique used in dumputils don't need to modify >>> libpq), although fmtId do exactly what I need. I would to understand >>> to behave. > >> I think you mean that you would need to understand how it should >> behave - in which case I agree, but I think Tom spelled that out >> pretty clearly upthread: close PQescapeStringConn and adapt it to be >> PQescapeIdentifier. > > The more important point here is that fmtId doesn't do "exactly what you > need" in any case. fmtId is safe to use in pg_dump because pg_dump is > only expected to work with the same or older version of the backend. > It would not be safe to use it in libpq, which is expected to still work > with newer backends that might have more reserved words. So I finnaly moved to libpq PQescapeIdentConn function patch is attached. regards Pavel > > regards, tom lane >
Attachment
On Tue, Jan 5, 2010 at 8:23 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/5 Tom Lane <tgl@sss.pgh.pa.us>: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Mon, Jan 4, 2010 at 2:51 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> I don't have a problem to write second and safe fmtId >>>> function (with technique used in dumputils don't need to modify >>>> libpq), although fmtId do exactly what I need. I would to understand >>>> to behave. >> >>> I think you mean that you would need to understand how it should >>> behave - in which case I agree, but I think Tom spelled that out >>> pretty clearly upthread: close PQescapeStringConn and adapt it to be >>> PQescapeIdentifier. >> >> The more important point here is that fmtId doesn't do "exactly what you >> need" in any case. fmtId is safe to use in pg_dump because pg_dump is >> only expected to work with the same or older version of the backend. >> It would not be safe to use it in libpq, which is expected to still work >> with newer backends that might have more reserved words. > > So I finnaly moved to libpq PQescapeIdentConn function > > patch is attached. No longer applies, please rebase. Thanks, ...Robert
2010/1/11 Robert Haas <robertmhaas@gmail.com>: > On Tue, Jan 5, 2010 at 8:23 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2010/1/5 Tom Lane <tgl@sss.pgh.pa.us>: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> On Mon, Jan 4, 2010 at 2:51 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>>> I don't have a problem to write second and safe fmtId >>>>> function (with technique used in dumputils don't need to modify >>>>> libpq), although fmtId do exactly what I need. I would to understand >>>>> to behave. >>> >>>> I think you mean that you would need to understand how it should >>>> behave - in which case I agree, but I think Tom spelled that out >>>> pretty clearly upthread: close PQescapeStringConn and adapt it to be >>>> PQescapeIdentifier. >>> >>> The more important point here is that fmtId doesn't do "exactly what you >>> need" in any case. fmtId is safe to use in pg_dump because pg_dump is >>> only expected to work with the same or older version of the backend. >>> It would not be safe to use it in libpq, which is expected to still work >>> with newer backends that might have more reserved words. >> >> So I finnaly moved to libpq PQescapeIdentConn function >> >> patch is attached. > > No longer applies, please rebase. ok I'll do it today Pavel > > Thanks, > > ...Robert >
Hello > > No longer applies, please rebase. > fixed, sorry Pavel > Thanks, > > ...Robert >
Attachment
On Mon, Jan 11, 2010 at 6:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> No longer applies, please rebase. > > fixed, sorry Hmm. I think that pqEscapeIdentConn should be in a separate section of the documentation, entitled "Escaping Identifiers for Inclusion in SQL Commands". Or else we should merge the existing sections "Escaping Strings for Inclusion in SQL Commands" and "Escaping Binary Strings for Inclusion in SQL Commands" and then put this in there too. On a perhaps-related note, does anyone understand why "Escaping Strings for Inclusion in SQL Commands" is formatted in a way that is needlessly inconsistent with the preceding and following sections? I was surprised by the magnitude of the doc diff hunk in this patch, but when I looked at it it seems to read more clearly with these changes. I have yet to fully review the code but on a quick glance it looks reasonable. ...Robert
2010/1/15 Robert Haas <robertmhaas@gmail.com>: > On Mon, Jan 11, 2010 at 6:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> No longer applies, please rebase. >> >> fixed, sorry > my idea was: * string * escape_string * escape_ident * bytea * escape_bytea But I am not strong in it. Maybe this part of doc needs more love - long time there are doc to deprecated functions. It could be moved. Pavel > Hmm. I think that pqEscapeIdentConn should be in a separate section > of the documentation, entitled "Escaping Identifiers for Inclusion in > SQL Commands". Or else we should merge the existing sections > "Escaping Strings for Inclusion in SQL Commands" and "Escaping Binary > Strings for Inclusion in SQL Commands" and then put this in there too. > > On a perhaps-related note, does anyone understand why "Escaping > Strings for Inclusion in SQL Commands" is formatted in a way that is > needlessly inconsistent with the preceding and following sections? I > was surprised by the magnitude of the doc diff hunk in this patch, but > when I looked at it it seems to read more clearly with these changes. > > I have yet to fully review the code but on a quick glance it looks reasonable. > > ...Robert >
On Thu, Jan 14, 2010 at 8:46 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I have yet to fully review the code but on a quick glance it looks reasonable. On further review, it looks less reasonable. :-( The new PQescapeIdentConn function is basically a cut-up version of PQescapeStringInternal, which seems like a reasonable foundation, but it rips out a little too much - specifically: 1. the length argument, 2. the size_t return value, 3. the portion of the handling for incomplete multibyte characters that prevents us from overrunning the output buffer on a maliciously constructed (or unlucky) input string, and 4. some relevant comments. I'm inclined to think we should put all of that stuff back, but certainly #3 at a minimum. ...Robert
2010/1/16 Robert Haas <robertmhaas@gmail.com>: > On Thu, Jan 14, 2010 at 8:46 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I have yet to fully review the code but on a quick glance it looks reasonable. > > On further review, it looks less reasonable. :-( > > The new PQescapeIdentConn function is basically a cut-up version of > PQescapeStringInternal, which seems like a reasonable foundation, but > it rips out a little too much - specifically: > > 1. the length argument, > 2. the size_t return value, > 3. the portion of the handling for incomplete multibyte characters > that prevents us from overrunning the output buffer on a maliciously > constructed (or unlucky) input string, and > 4. some relevant comments. > Yes, I didn't repeat parameter's pattern from PQescapeStringConn, I would to simplify interface but I hasn't problem to modify it to same interface. I rewrote patch so now interface for PQescapeIdentConn is same as PQescapeStringConn @3. I though so the protection under incomplete multibyte chars are enought - missing bytes are replaced by space - like PQescapeStringConn does. But now - mechanism is exactly same, so this problem should be solved. Regards Pavel Stehule > I'm inclined to think we should put all of that stuff back, but > certainly #3 at a minimum. > > ...Robert >
Attachment
On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I rewrote patch so now interface for PQescapeIdentConn is same as > PQescapeStringConn > > @3. I though so the protection under incomplete multibyte chars are > enought - missing bytes are replaced by space - like > PQescapeStringConn does. That much is fine, but the output buffer is only guaranteed to be of size 2n+1. Imagine the input is two double-quotes followed by a byte for which pg_encoding_mblen() returns 4. The input is 3 characters long so the user was responsible to provide 7 bytes of output space, but you'll try to write 9 bytes to it (including the terminating NUL). > But now - mechanism is exactly same, so this > problem should be solved. This is no better. What the function does no longer matches either its comments or the documentation (which also contradict each other). Let me take a crack at this and post a patch. We're making this harder than it needs to be. ...Robert
2010/1/18 Robert Haas <robertmhaas@gmail.com>: > On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I rewrote patch so now interface for PQescapeIdentConn is same as >> PQescapeStringConn >> >> @3. I though so the protection under incomplete multibyte chars are >> enought - missing bytes are replaced by space - like >> PQescapeStringConn does. > > That much is fine, but the output buffer is only guaranteed to be of > size 2n+1. Imagine the input is two double-quotes followed by a byte > for which pg_encoding_mblen() returns 4. The input is 3 characters > long so the user was responsible to provide 7 bytes of output space, > but you'll try to write 9 bytes to it (including the terminating NUL). > I don't understand. The "length" is number of bytes, not number of chars. It is maybe bad documented only. If your input string has 6 bytes, then buffer have to allocated to 13 bytes. Nobody knows how much is chars there. >> But now - mechanism is exactly same, so this >> problem should be solved. > > This is no better. What the function does no longer matches either > its comments or the documentation (which also contradict each other). > > Let me take a crack at this and post a patch. We're making this > harder than it needs to be. > sure, please. Pavel > ...Robert >
On Mon, Jan 18, 2010 at 1:52 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/18 Robert Haas <robertmhaas@gmail.com>: >> On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> I rewrote patch so now interface for PQescapeIdentConn is same as >>> PQescapeStringConn >>> >>> @3. I though so the protection under incomplete multibyte chars are >>> enought - missing bytes are replaced by space - like >>> PQescapeStringConn does. >> >> That much is fine, but the output buffer is only guaranteed to be of >> size 2n+1. Imagine the input is two double-quotes followed by a byte >> for which pg_encoding_mblen() returns 4. The input is 3 characters >> long so the user was responsible to provide 7 bytes of output space, >> but you'll try to write 9 bytes to it (including the terminating NUL). >> > I don't understand. The "length" is number of bytes, not number of > chars. It is maybe bad documented only. If your input string has 6 > bytes, then buffer have to allocated to 13 bytes. Nobody knows how > much is chars there. Right, but the point is we can't assume that the input is validly encoded. If the input ends with a garbage character that looks like the start of a multi-byte character, we can't assume that there's enough space in the output buffer to store the required number of padding spaces. To take an extreme example, suppose there were an encoding where any time the first byte of a multi-byte character has the high-bit set, the character is 100 bytes long. Then suppose someone call PQescapeStringConn(), or this new function we're adding, with a length argument of 1, and the first byte of the input buffer has the high-bit set. The caller is only required to provide a 3-byte output buffer, and the third byte is needed for the terminating NUL. That means that after we copy that first character we only have room to insert one padding space. The way you had it coded, since we were expecting a character 100 bytes long, we'd always try to insert 99 padding spaces. >> Let me take a crack at this and post a patch. We're making this >> harder than it needs to be. > > sure, please. Working on it... ...Robert
2010/1/18 Robert Haas <robertmhaas@gmail.com>: > On Mon, Jan 18, 2010 at 1:52 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2010/1/18 Robert Haas <robertmhaas@gmail.com>: >>> On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> I rewrote patch so now interface for PQescapeIdentConn is same as >>>> PQescapeStringConn >>>> >>>> @3. I though so the protection under incomplete multibyte chars are >>>> enought - missing bytes are replaced by space - like >>>> PQescapeStringConn does. >>> >>> That much is fine, but the output buffer is only guaranteed to be of >>> size 2n+1. Imagine the input is two double-quotes followed by a byte >>> for which pg_encoding_mblen() returns 4. The input is 3 characters >>> long so the user was responsible to provide 7 bytes of output space, >>> but you'll try to write 9 bytes to it (including the terminating NUL). >>> >> I don't understand. The "length" is number of bytes, not number of >> chars. It is maybe bad documented only. If your input string has 6 >> bytes, then buffer have to allocated to 13 bytes. Nobody knows how >> much is chars there. > > Right, but the point is we can't assume that the input is validly > encoded. If the input ends with a garbage character that looks like > the start of a multi-byte character, we can't assume that there's > enough space in the output buffer to store the required number of > padding spaces. > > To take an extreme example, suppose there were an encoding where any > time the first byte of a multi-byte character has the high-bit set, > the character is 100 bytes long. Then suppose someone call > PQescapeStringConn(), or this new function we're adding, with a length > argument of 1, and the first byte of the input buffer has the high-bit > set. The caller is only required to provide a 3-byte output buffer, > and the third byte is needed for the terminating NUL. That means that > after we copy that first character we only have room to insert one > padding space. The way you had it coded, since we were expecting a > character 100 bytes long, we'd always try to insert 99 padding spaces. > do you speak about previous version? in current version is garanted new length is <= 2x original length Pavel > >> Let me take a crack at this and post a patch. We're making this >>> harder than it needs to be. >> >> sure, please. > > Working on it... > > ...Robert >
On Mon, Jan 18, 2010 at 2:20 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/18 Robert Haas <robertmhaas@gmail.com>: >> On Mon, Jan 18, 2010 at 1:52 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> 2010/1/18 Robert Haas <robertmhaas@gmail.com>: >>>> On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>>> I rewrote patch so now interface for PQescapeIdentConn is same as >>>>> PQescapeStringConn >>>>> >>>>> @3. I though so the protection under incomplete multibyte chars are >>>>> enought - missing bytes are replaced by space - like >>>>> PQescapeStringConn does. >>>> >>>> That much is fine, but the output buffer is only guaranteed to be of >>>> size 2n+1. Imagine the input is two double-quotes followed by a byte >>>> for which pg_encoding_mblen() returns 4. The input is 3 characters >>>> long so the user was responsible to provide 7 bytes of output space, >>>> but you'll try to write 9 bytes to it (including the terminating NUL). >>>> >>> I don't understand. The "length" is number of bytes, not number of >>> chars. It is maybe bad documented only. If your input string has 6 >>> bytes, then buffer have to allocated to 13 bytes. Nobody knows how >>> much is chars there. >> >> Right, but the point is we can't assume that the input is validly >> encoded. If the input ends with a garbage character that looks like >> the start of a multi-byte character, we can't assume that there's >> enough space in the output buffer to store the required number of >> padding spaces. >> >> To take an extreme example, suppose there were an encoding where any >> time the first byte of a multi-byte character has the high-bit set, >> the character is 100 bytes long. Then suppose someone call >> PQescapeStringConn(), or this new function we're adding, with a length >> argument of 1, and the first byte of the input buffer has the high-bit >> set. The caller is only required to provide a 3-byte output buffer, >> and the third byte is needed for the terminating NUL. That means that >> after we copy that first character we only have room to insert one >> padding space. The way you had it coded, since we were expecting a >> character 100 bytes long, we'd always try to insert 99 padding spaces. >> > > do you speak about previous version? Yes. > in current version is garanted new length is <= 2x original length Actually, strictly less than, but the code gets it correct. However, your latest version has some other problems. For example, you didn't update the docs to match your source-code changes. Also, I prefer an API where the escaping function does include the quotes, so I've done it that way in the attached patch. This is just the libpq changes, I figure if we can agree on this, then we can move onto the psql stuff. Comments? ...Robert
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > ... Also, I prefer an > API where the escaping function does include the quotes, so I've done > it that way in the attached patch. IMO this function should act as much like PQescapeStringConn as possible. Random differences like including or not including outer quotes don't make the user's life better. Random differences like a slightly different rule for the amount of space required are outright dangerous. Also, why is this patch changing the documentation of PQescapeStringConn? It might be only whitespace changes, but I don't particularly wish to have to determine that. regards, tom lane
On Mon, Jan 18, 2010 at 3:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> ... Also, I prefer an >> API where the escaping function does include the quotes, so I've done >> it that way in the attached patch. > > IMO this function should act as much like PQescapeStringConn as possible. Generally speaking, I agree... > Random differences like including or not including outer quotes don't > make the user's life better. Random differences like a slightly > different rule for the amount of space required are outright dangerous. I'm not sure that not including the quotes is any better. If someone escapes foo and gets back foo, are they going to realize that escaping fo"o is going to give them back fo""o rather than "fo""o"? One difference vs. PQescapeStringConn() is that if you fail to include the surrounding quotes in that case, something will almost certainly break in a noisy and highly visible fashion. Here that might not happen, or someone might call one of PQescapeStringConn() and PQescapeIdentifierConn() and then use the wrong sort of outer quotes. IMO, it's actually pretty weird that PQescapeStringConn() and quote_literal() are named differently and do incompatible things. I think it would be a plus if this new function were a little more similar to quote_ident(), but that's just MHO, of course. > Also, why is this patch changing the documentation of PQescapeStringConn? > It might be only whitespace changes, but I don't particularly wish to > have to determine that. See previous discussion upthread. http://archives.postgresql.org/pgsql-hackers/2010-01/msg01516.php ...Robert
2010/1/18 Robert Haas <robertmhaas@gmail.com>: > On Mon, Jan 18, 2010 at 3:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> ... Also, I prefer an >>> API where the escaping function does include the quotes, so I've done >>> it that way in the attached patch. >> >> IMO this function should act as much like PQescapeStringConn as possible. > > Generally speaking, I agree... > >> Random differences like including or not including outer quotes don't >> make the user's life better. Random differences like a slightly >> different rule for the amount of space required are outright dangerous. > > I'm not sure that not including the quotes is any better. If someone > escapes foo and gets back foo, are they going to realize that escaping > fo"o is going to give them back fo""o rather than "fo""o"? One > difference vs. PQescapeStringConn() is that if you fail to include the > surrounding quotes in that case, something will almost certainly break > in a noisy and highly visible fashion. Here that might not happen, or > someone might call one of PQescapeStringConn() and > PQescapeIdentifierConn() and then use the wrong sort of outer quotes. > > IMO, it's actually pretty weird that PQescapeStringConn() and > quote_literal() are named differently and do incompatible things. I > think it would be a plus if this new function were a little more > similar to quote_ident(), but that's just MHO, of course. > I am afraid so we can do nothing now with this. There are two arguments - consistency versus robustness. If you use PQescapeStringConn() without outer quotes, then you have a SQL injection problem (there could not be error) :(. When there are no escape function that add outer quotes, then can be strange for developers working with one different. I see three solution: a) use a PQescapeIdentifConn as PQescapeStringConn, b) move this functionality to psql without change of API, c) change semantic and name - maybe PQquoteIdentifierConn() Personally I am for a) and later for b). What I know - php coders needs some secure function for identifier escaping - but I dislike PHP because every function is designed different. regards Pavel >> Also, why is this patch changing the documentation of PQescapeStringConn? >> It might be only whitespace changes, but I don't particularly wish to >> have to determine that. > > See previous discussion upthread. > > http://archives.postgresql.org/pgsql-hackers/2010-01/msg01516.php > > ...Robert >
On Tue, Jan 19, 2010 at 4:13 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/18 Robert Haas <robertmhaas@gmail.com>: >> On Mon, Jan 18, 2010 at 3:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> ... Also, I prefer an >>>> API where the escaping function does include the quotes, so I've done >>>> it that way in the attached patch. >>> >>> IMO this function should act as much like PQescapeStringConn as possible. >> >> Generally speaking, I agree... >> >>> Random differences like including or not including outer quotes don't >>> make the user's life better. Random differences like a slightly >>> different rule for the amount of space required are outright dangerous. >> >> I'm not sure that not including the quotes is any better. If someone >> escapes foo and gets back foo, are they going to realize that escaping >> fo"o is going to give them back fo""o rather than "fo""o"? One >> difference vs. PQescapeStringConn() is that if you fail to include the >> surrounding quotes in that case, something will almost certainly break >> in a noisy and highly visible fashion. Here that might not happen, or >> someone might call one of PQescapeStringConn() and >> PQescapeIdentifierConn() and then use the wrong sort of outer quotes. >> >> IMO, it's actually pretty weird that PQescapeStringConn() and >> quote_literal() are named differently and do incompatible things. I >> think it would be a plus if this new function were a little more >> similar to quote_ident(), but that's just MHO, of course. >> > > I am afraid so we can do nothing now with this. There are two > arguments - consistency versus robustness. If you use > PQescapeStringConn() without outer quotes, then you have a SQL > injection problem (there could not be error) :(. When there are no > escape function that add outer quotes, then can be strange for > developers working with one different. > > I see three solution: > > a) use a PQescapeIdentifConn as PQescapeStringConn, > b) move this functionality to psql without change of API, > c) change semantic and name - maybe PQquoteIdentifierConn() > > Personally I am for a) and later for b). What I know - php coders > needs some secure function for identifier escaping - but I dislike PHP > because every function is designed different. I think what you're saying is that you agree with Tom's position that the new escaping function should not add the necessary surrounding quotes, instead leaving that to the user. Is that correct? ...Robert
2010/1/19 Robert Haas <robertmhaas@gmail.com>: > On Tue, Jan 19, 2010 at 4:13 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2010/1/18 Robert Haas <robertmhaas@gmail.com>: >>> On Mon, Jan 18, 2010 at 3:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Robert Haas <robertmhaas@gmail.com> writes: >>>>> ... Also, I prefer an >>>>> API where the escaping function does include the quotes, so I've done >>>>> it that way in the attached patch. >>>> >>>> IMO this function should act as much like PQescapeStringConn as possible. >>> >>> Generally speaking, I agree... >>> >>>> Random differences like including or not including outer quotes don't >>>> make the user's life better. Random differences like a slightly >>>> different rule for the amount of space required are outright dangerous. >>> >>> I'm not sure that not including the quotes is any better. If someone >>> escapes foo and gets back foo, are they going to realize that escaping >>> fo"o is going to give them back fo""o rather than "fo""o"? One >>> difference vs. PQescapeStringConn() is that if you fail to include the >>> surrounding quotes in that case, something will almost certainly break >>> in a noisy and highly visible fashion. Here that might not happen, or >>> someone might call one of PQescapeStringConn() and >>> PQescapeIdentifierConn() and then use the wrong sort of outer quotes. >>> >>> IMO, it's actually pretty weird that PQescapeStringConn() and >>> quote_literal() are named differently and do incompatible things. I >>> think it would be a plus if this new function were a little more >>> similar to quote_ident(), but that's just MHO, of course. >>> >> >> I am afraid so we can do nothing now with this. There are two >> arguments - consistency versus robustness. If you use >> PQescapeStringConn() without outer quotes, then you have a SQL >> injection problem (there could not be error) :(. When there are no >> escape function that add outer quotes, then can be strange for >> developers working with one different. >> >> I see three solution: >> >> a) use a PQescapeIdentifConn as PQescapeStringConn, >> b) move this functionality to psql without change of API, >> c) change semantic and name - maybe PQquoteIdentifierConn() >> >> Personally I am for a) and later for b). What I know - php coders >> needs some secure function for identifier escaping - but I dislike PHP >> because every function is designed different. > > I think what you're saying is that you agree with Tom's position that > the new escaping function should not add the necessary surrounding > quotes, instead leaving that to the user. Is that correct? yes Pavel > > ...Robert >
On Tue, Jan 19, 2010 at 12:54 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I think what you're saying is that you agree with Tom's position that >> the new escaping function should not add the necessary surrounding >> quotes, instead leaving that to the user. Is that correct? > > yes Updated patch attached. I still think this is a bizarre API. ...Robert
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > Updated patch attached. I still think this is a bizarre API. Well, if we had it to do over I'm sure we'd have done it differently, but the functions are there now and we aren't going to change them ... regards, tom lane
On Tue, Jan 19, 2010 at 1:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Updated patch attached. I still think this is a bizarre API. > > Well, if we had it to do over I'm sure we'd have done it differently, > but the functions are there now and we aren't going to change them ... I agree, but I don't feel the need to repeat the mistakes we've already made by slavishly copying them into new places. I think as long as we're adding a new function, we should make it behave sanely. We could even take the opportunity to go back and add a saner version of PQescapeStringConn. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > ... I think as > long as we're adding a new function, we should make it behave sanely. > We could even take the opportunity to go back and add a saner version > of PQescapeStringConn. Well, it's a bit late in the devel cycle to be inventing from scratch, but if we did want to do something "saner" what would it look like? I can see the following possibilities: * include boundary quotes (and E too in the literal case). This would imply telling people they should leave whitespace around the value in the constructed query ... or should we insert leading/trailing spaces to prevent that mistake too? * malloc our own result value instead of expecting caller to provide space. This adds another failure case (out of memory) but it's not really much different from OOM on the caller side. * anything else you want to change? I suggest we could use PQescapeLiteral + PQescapeIdentifier as names if we provide a new pair of functions defined with a different API. regards, tom lane
2010/1/19 Robert Haas <robertmhaas@gmail.com>: > On Tue, Jan 19, 2010 at 1:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Updated patch attached. I still think this is a bizarre API. >> >> Well, if we had it to do over I'm sure we'd have done it differently, >> but the functions are there now and we aren't going to change them ... > > I agree, but I don't feel the need to repeat the mistakes we've > already made by slavishly copying them into new places. I think as > long as we're adding a new function, we should make it behave sanely. > We could even take the opportunity to go back and add a saner version > of PQescapeStringConn. Sometimes is better do nothing :(. I understand you, my first code was same as your, but any change of basic API is the terrible thing. This function is used in PHP driver, and bilions web pages. Theoretically we can work on new libpq with richer API - there could be better support of binary form, smarter quoting functions. But actually there are no main request. Regards Pavel > > ...Robert >
On Tue, Jan 19, 2010 at 2:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> ... I think as >> long as we're adding a new function, we should make it behave sanely. >> We could even take the opportunity to go back and add a saner version >> of PQescapeStringConn. > > Well, it's a bit late in the devel cycle to be inventing from scratch, > but if we did want to do something "saner" what would it look like? > I can see the following possibilities: > > * include boundary quotes (and E too in the literal case). This would > imply telling people they should leave whitespace around the value in > the constructed query ... or should we insert leading/trailing spaces > to prevent that mistake too? Does the existing PQescapeStringConn() require E'' rather than just ''? It isn't documented so. I think inserting the whitespace would be overkill, but that's another thing that's not mentioned in the present documentation and should be, because it would not be difficult to screw it up. > * malloc our own result value instead of expecting caller to provide > space. This adds another failure case (out of memory) but it's not > really much different from OOM on the caller side. I think that would be an anti-improvement - caller might have their own malloc substitute, might want to stack-allocate, etc. Plus we'd either have to malloc the worst-case buffer size or else read the string twice. But see below... > * anything else you want to change? Instead of relying on the output buffer to be exactly 2n+1 or 2n+3 or whatever, I suggest that we add an explicit argument for the size of the output buffer. If the buffer the caller provides is too short, ideally we should let the caller know about the error and also indicate how much space would have been necessary (like sprintf). > I suggest we could use PQescapeLiteral + PQescapeIdentifier as names > if we provide a new pair of functions defined with a different API. I like those names. Maybe something like: size_t PQescapeLiteral(PGconn *conn, char *to, size_t tolen, char *from, size_t fromlen, int *error); size_t PQescapeIdentifier(PGconn *conn, char *to, size_t tolen, char *from, size_t fromlen, int *error); On success, *error is 0 and size_t is the number of bytes actually written. On failure, *error is non-zero, conn->errorMessage is an appropriate error, and the return value is the shortest value of tolen adequate to hold the result, or 0 if we bombed out for some reason other than lack of output space. That way, if people are willing to incur the overhead of an extra call, they can malloc() (or whatever) exactly the right amount of space. Or they can just size the buffer according to the documentation and then they're safe. Another option would be to swap the return value and *error: int PQescapeLiteral(PGconn *conn, char *to, int tolen, char *from, int fromlen, size_t *needed); int PQescapeIdentifier(PGconn *conn, char *to, int tolen, char *from, int fromlen, size_t *needed); ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 19, 2010 at 2:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> * include boundary quotes (and E too in the literal case). �This would >> imply telling people they should leave whitespace around the value in >> the constructed query ... or should we insert leading/trailing spaces >> to prevent that mistake too? > Does the existing PQescapeStringConn() require E'' rather than just > ''? It isn't documented so. It does not expect E'', and would get it wrong if you supplied E'', which is an anti-feature because it makes it impossible to avoid escape_string_warning messages. If we were to supply quotes I think we should also supply E whenever we need to use a backslash. > I think inserting the whitespace would be overkill, but that's another > thing that's not mentioned in the present documentation and should be, > because it would not be difficult to screw it up. As long as it's documented it's okay ... probably ... note that conditionally inserting E would get us right back to the place where an unsafe usage might appear to work fine in light testing. Maybe prepend a space only if prepending E? >> * malloc our own result value instead of expecting caller to provide >> space. �This adds another failure case (out of memory) but it's not >> really much different from OOM on the caller side. > I think that would be an anti-improvement - caller might have their > own malloc substitute, might want to stack-allocate, etc. Plus we'd > either have to malloc the worst-case buffer size or else read the > string twice. But see below... We already have multiple functions in the API that do malloc(). I think that requiring the caller to estimate the size of the space is a bigger anti-feature than using malloc. > I like those names. Maybe something like: > size_t PQescapeLiteral(PGconn *conn, char *to, size_t tolen, char > *from, size_t fromlen, int *error); > size_t PQescapeIdentifier(PGconn *conn, char *to, size_t tolen, char > *from, size_t fromlen, int *error); This would get a *lot* simpler if we malloced the result space. char *PQescapeXXX(PQconn *conn, const char *str, size_t len) with NULL result used to signal any failure (and a message left in the conn). > ... Or they can just size the buffer > according to the documentation and then they're safe. Wrong, one thing we would absolutely not do is document any hard guarantee about the maximum space needed. That's isomorphic to documenting the exact escaping algorithm, and doing that has already bit us more times than you know. If we're going to alter the API at all I think we absolutely *must* get rid of that design error. Keep in mind that a large part of wanting to have these functions at all is for simplicity in the calling code. Complicating their API for marginal gains doesn't seem to me to be the right design direction. regards, tom lane
On Tue, Jan 19, 2010 at 2:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > As long as it's documented it's okay ... probably ... note that > conditionally inserting E would get us right back to the place where > an unsafe usage might appear to work fine in light testing. Maybe > prepend a space only if prepending E? That's a bit of a kludge, but maybe it's liveable. >>> * malloc our own result value instead of expecting caller to provide >>> space. This adds another failure case (out of memory) but it's not >>> really much different from OOM on the caller side. > >> I think that would be an anti-improvement - caller might have their >> own malloc substitute, might want to stack-allocate, etc. Plus we'd >> either have to malloc the worst-case buffer size or else read the >> string twice. But see below... > > We already have multiple functions in the API that do malloc(). I think > that requiring the caller to estimate the size of the space is a bigger > anti-feature than using malloc. > >> I like those names. Maybe something like: > >> size_t PQescapeLiteral(PGconn *conn, char *to, size_t tolen, char >> *from, size_t fromlen, int *error); >> size_t PQescapeIdentifier(PGconn *conn, char *to, size_t tolen, char >> *from, size_t fromlen, int *error); > > This would get a *lot* simpler if we malloced the result space. > > char *PQescapeXXX(PQconn *conn, const char *str, size_t len) > > with NULL result used to signal any failure (and a message left in > the conn). Well, I have to admit that the simplicity of that API is very appealing. Someone will probably eventually request PQescapeXXXButUseMyMalloc(PQConn *conn, const char *str, size_t len, void (*MyMalloc)(size_t)); ... but at least until we change the escaping algorithm again, we can always tell that person to use PQescapeStringConn() and roll their own. So, I'm sold. Do you think we should malloc 2n+X bytes all the time, or do you want to scan the string once to determine how much space is needed and then allocate exactly that much space? It seems to me that it might be sensible to do it this way: 1. Do a locale-aware scan of the input string and count the number of characters we need to escape (num_to_escape). 2. If num_to_escape is 0, fast path: allocate len+3 bytes and use memcpy to copy the input data to the output buffer. 3. Otherwise, allocate len+num_to_escape+5 bytes (space-E-quote-quote-NUL) and do a second locale-aware scan of the input string, copying and escaping as we go (or do you only want the space-E if the escapable character that we find is \ rather than '?). ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Do you think we should malloc 2n+X bytes all the time, or do you want > to scan the string once to determine how much space is needed and then > allocate exactly that much space? I'd vote for two scans, as I think we'll soon decide 2n doesn't cut it anyway. One of the issues that needs to be looked at is embedded null characters. We might fail on that for the moment, but I can foresee wanting to send \000 instead. You don't want to pay for 4x do you? > It seems to me that it might be > sensible to do it this way: > 1. Do a locale-aware scan of the input string and count the number of > characters we need to escape (num_to_escape). > 2. If num_to_escape is 0, fast path: allocate len+3 bytes and use > memcpy to copy the input data to the output buffer. > 3. Otherwise, allocate len+num_to_escape+5 bytes > (space-E-quote-quote-NUL) and do a second locale-aware scan of the > input string, copying and escaping as we go (or do you only want the > space-E if the escapable character that we find is \ rather than '?). Yeah, the fast path if no escaping is needed seems attractive. I think that E should only be inserted if we have a backslash to deal with; no reason to generate nonstandard syntax if we don't have to. That would mean keeping two bits of state (escaping-needed and malloc-size) from the initial pass, but that's pretty cheap. regards, tom lane
On Tue, Jan 19, 2010 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It seems to me that it might be >> sensible to do it this way: > >> 1. Do a locale-aware scan of the input string and count the number of >> characters we need to escape (num_to_escape). >> 2. If num_to_escape is 0, fast path: allocate len+3 bytes and use >> memcpy to copy the input data to the output buffer. >> 3. Otherwise, allocate len+num_to_escape+5 bytes >> (space-E-quote-quote-NUL) and do a second locale-aware scan of the >> input string, copying and escaping as we go (or do you only want the >> space-E if the escapable character that we find is \ rather than '?). > > Yeah, the fast path if no escaping is needed seems attractive. Cool. > I think that E should only be inserted if we have a backslash to deal > with; no reason to generate nonstandard syntax if we don't have to. > That would mean keeping two bits of state (escaping-needed and > malloc-size) from the initial pass, but that's pretty cheap. That's fine. I'd like to proceed by committing an initial patch which changes the "Escaping Strings for Inclusion in SQL Commands" to use a <variablelist> with one <varlistentry> per function (as we do in surrounding functions) and consolidates it with the following section,"Escaping Binary Strings for Inclusion in SQL Commands". Then I'll submit a patch implementing pqEscapeLiteral() and pqEscapeIdentifier() as discussed here, and the doc diff hunks will actually be readable. Objections? If so, please state how you'd like to see the docs organized because I don't see another way to do it that makes any sense. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I'd like to proceed by committing an initial patch which changes the > "Escaping Strings for Inclusion in SQL Commands" to use a > <variablelist> with one <varlistentry> per function (as we do in > surrounding functions) and consolidates it with the following section, > "Escaping Binary Strings for Inclusion in SQL Commands". Then I'll > submit a patch implementing pqEscapeLiteral() and pqEscapeIdentifier() > as discussed here, and the doc diff hunks will actually be readable. Sounds like a plan. regards, tom lane
On Tue, Jan 19, 2010 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I'd like to proceed by committing an initial patch which changes the >> "Escaping Strings for Inclusion in SQL Commands" to use a >> <variablelist> with one <varlistentry> per function (as we do in >> surrounding functions) and consolidates it with the following section, >> "Escaping Binary Strings for Inclusion in SQL Commands". Then I'll >> submit a patch implementing pqEscapeLiteral() and pqEscapeIdentifier() >> as discussed here, and the doc diff hunks will actually be readable. > > Sounds like a plan. Initial commit done, and follow-on patch attached. The docs took longer to write than the code. I spent a fair amount of time trying to make it all make sense, but suggestions are welcome. ...Robert
Attachment
On Tue, Jan 19, 2010 at 11:19 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jan 19, 2010 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I'd like to proceed by committing an initial patch which changes the >>> "Escaping Strings for Inclusion in SQL Commands" to use a >>> <variablelist> with one <varlistentry> per function (as we do in >>> surrounding functions) and consolidates it with the following section, >>> "Escaping Binary Strings for Inclusion in SQL Commands". Then I'll >>> submit a patch implementing pqEscapeLiteral() and pqEscapeIdentifier() >>> as discussed here, and the doc diff hunks will actually be readable. >> >> Sounds like a plan. > > Initial commit done, and follow-on patch attached. The docs took > longer to write than the code. I spent a fair amount of time trying > to make it all make sense, but suggestions are welcome. Committed after fixing a couple of oversights in my doc changes. ...Robert
2010/1/21 Robert Haas <robertmhaas@gmail.com>: > On Tue, Jan 19, 2010 at 11:19 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Jan 19, 2010 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> I'd like to proceed by committing an initial patch which changes the >>>> "Escaping Strings for Inclusion in SQL Commands" to use a >>>> <variablelist> with one <varlistentry> per function (as we do in >>>> surrounding functions) and consolidates it with the following section, >>>> "Escaping Binary Strings for Inclusion in SQL Commands". Then I'll >>>> submit a patch implementing pqEscapeLiteral() and pqEscapeIdentifier() >>>> as discussed here, and the doc diff hunks will actually be readable. >>> >>> Sounds like a plan. >> >> Initial commit done, and follow-on patch attached. The docs took >> longer to write than the code. I spent a fair amount of time trying >> to make it all make sense, but suggestions are welcome. > > Committed after fixing a couple of oversights in my doc changes. thank you. I actualised patch I thing, we need one libpq change more. + static void + appendLiteral(PGconn *conn, PQExpBuffer buf, const char *str) + { + char *escaped_str; + size_t len; + + len = strlen(str); + escaped_str = PQescapeLiteral(conn, str, len); + + if (escaped_str == NULL) + { + const char *error_message = PQerrorMessage(pset.db); + + if (strlen(error_message)) + psql_error("%s", error_message); + } + else + { + appendPQExpBufferStr(buf, escaped_str); + free(escaped_str); + } + } the correct result of this function (when is some error) is broken buffer. But function markPQExpBufferBroken is static. Regards Pavel Stehule > > ...Robert >
Attachment
On Thu, Jan 21, 2010 at 11:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/21 Robert Haas <robertmhaas@gmail.com>: >> On Tue, Jan 19, 2010 at 11:19 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Tue, Jan 19, 2010 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Robert Haas <robertmhaas@gmail.com> writes: >>>>> I'd like to proceed by committing an initial patch which changes the >>>>> "Escaping Strings for Inclusion in SQL Commands" to use a >>>>> <variablelist> with one <varlistentry> per function (as we do in >>>>> surrounding functions) and consolidates it with the following section, >>>>> "Escaping Binary Strings for Inclusion in SQL Commands". Then I'll >>>>> submit a patch implementing pqEscapeLiteral() and pqEscapeIdentifier() >>>>> as discussed here, and the doc diff hunks will actually be readable. >>>> >>>> Sounds like a plan. >>> >>> Initial commit done, and follow-on patch attached. The docs took >>> longer to write than the code. I spent a fair amount of time trying >>> to make it all make sense, but suggestions are welcome. >> >> Committed after fixing a couple of oversights in my doc changes. > > thank you. > > I actualised patch > > I thing, we need one libpq change more. > > + static void > + appendLiteral(PGconn *conn, PQExpBuffer buf, const char *str) > + { > + char *escaped_str; > + size_t len; > + > + len = strlen(str); > + escaped_str = PQescapeLiteral(conn, str, len); > + > + if (escaped_str == NULL) > + { > + const char *error_message = PQerrorMessage(pset.db); > + > + if (strlen(error_message)) > + psql_error("%s", error_message); > + } > + else > + { > + appendPQExpBufferStr(buf, escaped_str); > + free(escaped_str); > + } > + } > > the correct result of this function (when is some error) is broken > buffer. But function markPQExpBufferBroken is static. markPQExpBufferBroken is specifically designed to handle out of memory errors. I don't think we should try to generalize that to handle encoding violations or other things, at least not without some very careful thought about the consequences of so doing. I think we need to find some other way of signalling an error back to the caller, although it's not exactly obvious to me what the best way to do that is. ...Robert
2010/1/21 Robert Haas <robertmhaas@gmail.com>: > On Thu, Jan 21, 2010 at 11:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2010/1/21 Robert Haas <robertmhaas@gmail.com>: >>> On Tue, Jan 19, 2010 at 11:19 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> On Tue, Jan 19, 2010 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> Robert Haas <robertmhaas@gmail.com> writes: >>>>>> I'd like to proceed by committing an initial patch which changes the >>>>>> "Escaping Strings for Inclusion in SQL Commands" to use a >>>>>> <variablelist> with one <varlistentry> per function (as we do in >>>>>> surrounding functions) and consolidates it with the following section, >>>>>> "Escaping Binary Strings for Inclusion in SQL Commands". Then I'll >>>>>> submit a patch implementing pqEscapeLiteral() and pqEscapeIdentifier() >>>>>> as discussed here, and the doc diff hunks will actually be readable. >>>>> >>>>> Sounds like a plan. >>>> >>>> Initial commit done, and follow-on patch attached. The docs took >>>> longer to write than the code. I spent a fair amount of time trying >>>> to make it all make sense, but suggestions are welcome. >>> >>> Committed after fixing a couple of oversights in my doc changes. >> >> thank you. >> >> I actualised patch >> >> I thing, we need one libpq change more. >> >> + static void >> + appendLiteral(PGconn *conn, PQExpBuffer buf, const char *str) >> + { >> + char *escaped_str; >> + size_t len; >> + >> + len = strlen(str); >> + escaped_str = PQescapeLiteral(conn, str, len); >> + >> + if (escaped_str == NULL) >> + { >> + const char *error_message = PQerrorMessage(pset.db); >> + >> + if (strlen(error_message)) >> + psql_error("%s", error_message); >> + } >> + else >> + { >> + appendPQExpBufferStr(buf, escaped_str); >> + free(escaped_str); >> + } >> + } >> >> the correct result of this function (when is some error) is broken >> buffer. But function markPQExpBufferBroken is static. > > markPQExpBufferBroken is specifically designed to handle out of memory > errors. I don't think we should try to generalize that to handle > encoding violations or other things, at least not without some very > careful thought about the consequences of so doing. I think we need > to find some other way of signalling an error back to the caller, > although it's not exactly obvious to me what the best way to do that > is. BufferBroken flag is used as signal for out of memory now. But everybody can use it for his purposes. There isn't any limit (what I know). More - the behave is similar like NULL. If you have a broken buffer, then everything with this is broken. Theoretically we could to add some parser state flag and check it over every scanner call - but it is duplicate. minimally when escape function returns null because is out of memory, then breaking the buffer is semantically correct. Currently there isn't consistency in memory handling :( - from good reasons. Bad allocation from libpq is finished with raising a error message. Out of memory from psql is raising fatal error. I am not sure, if we can better join these two worlds. Maybe we can add malloc handler to libpq (maybe in future)? Still there are two kind of bugs - memory and encoding. These bugs should be handled differently. In both cases we cannot stop lexers - because we have to find end of statement, we cannot to execute broken command. my plan add to state structure field like lexer_error. This field will be checked before execution it could be ugly for metacommands, there will be lot of new checks :( Pavel > > ...Robert >
On Thu, Jan 21, 2010 at 12:53 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > add to state structure field like lexer_error. This field will be > checked before execution > it could be ugly for metacommands, there will be lot of new checks :( Eh? The only places where we should need new tests are the places that check PQExpBufferBroken() now - there are only 6 calls to that function in src/bin/psql and not all of them need to be changed. The places that do need to be changed will need to be modified to check PQExpBufferBroken() || lexer_coughed_up_a_lung. It should be possible to do this pretty simply, I think. ...Robert
2010/1/21 Robert Haas <robertmhaas@gmail.com>: > On Thu, Jan 21, 2010 at 12:53 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> add to state structure field like lexer_error. This field will be >> checked before execution >> it could be ugly for metacommands, there will be lot of new checks :( > > Eh? The only places where we should need new tests are the places > that check PQExpBufferBroken() now - there are only 6 calls to that > function in src/bin/psql and not all of them need to be changed. The > places that do need to be changed will need to be modified to check > PQExpBufferBroken() || lexer_coughed_up_a_lung. no, it is only 6 calls because we don't check psql_scan_slash_option result. When we don't allow escaping syntax in slash statement, then it could be simple. we have to do some like: if buffer_is_broken(..) report "out of memory" else if global_error != true do .... end if Pavel > > It should be possible to do this pretty simply, I think. > > ...Robert >
On Thu, Jan 21, 2010 at 2:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/21 Robert Haas <robertmhaas@gmail.com>: >> On Thu, Jan 21, 2010 at 12:53 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> add to state structure field like lexer_error. This field will be >>> checked before execution >>> it could be ugly for metacommands, there will be lot of new checks :( >> >> Eh? The only places where we should need new tests are the places >> that check PQExpBufferBroken() now - there are only 6 calls to that >> function in src/bin/psql and not all of them need to be changed. The >> places that do need to be changed will need to be modified to check >> PQExpBufferBroken() || lexer_coughed_up_a_lung. > > no, it is only 6 calls because we don't check psql_scan_slash_option result. psql_scan_slash_option() already has a way to signal errors - it can return NULL. Type any backslash command followed by a single quote... I'm not saying I love the way those errors are handled, but if we make this patch about revising the way psql does error handling, this is not going to get committed for this release... what we need to do is fit what we're trying to do into the existing model. ...Robert
2010/1/22 Robert Haas <robertmhaas@gmail.com>: > On Thu, Jan 21, 2010 at 2:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2010/1/21 Robert Haas <robertmhaas@gmail.com>: >>> On Thu, Jan 21, 2010 at 12:53 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> add to state structure field like lexer_error. This field will be >>>> checked before execution >>>> it could be ugly for metacommands, there will be lot of new checks :( >>> >>> Eh? The only places where we should need new tests are the places >>> that check PQExpBufferBroken() now - there are only 6 calls to that >>> function in src/bin/psql and not all of them need to be changed. The >>> places that do need to be changed will need to be modified to check >>> PQExpBufferBroken() || lexer_coughed_up_a_lung. >> >> no, it is only 6 calls because we don't check psql_scan_slash_option result. > > psql_scan_slash_option() already has a way to signal errors - it can > return NULL. Type any backslash command followed by a single quote... NULL means "no value". But I thing, so there is only one important - \set. For other can be enough some error message and empty value. > > I'm not saying I love the way those errors are handled, but if we make > this patch about revising the way psql does error handling, this is > not going to get committed for this release... what we need to do is > fit what we're trying to do into the existing model. > I try to find some simple and I'll send a patch. Pavel > ...Robert >
Hello here is new variant. Add scan_state flag "valid" and enhance protection against execution broken statement. Regards Pavel Stehule 2010/1/22 Pavel Stehule <pavel.stehule@gmail.com>: > 2010/1/22 Robert Haas <robertmhaas@gmail.com>: >> On Thu, Jan 21, 2010 at 2:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> 2010/1/21 Robert Haas <robertmhaas@gmail.com>: >>>> On Thu, Jan 21, 2010 at 12:53 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>>> add to state structure field like lexer_error. This field will be >>>>> checked before execution >>>>> it could be ugly for metacommands, there will be lot of new checks :( >>>> >>>> Eh? The only places where we should need new tests are the places >>>> that check PQExpBufferBroken() now - there are only 6 calls to that >>>> function in src/bin/psql and not all of them need to be changed. The >>>> places that do need to be changed will need to be modified to check >>>> PQExpBufferBroken() || lexer_coughed_up_a_lung. >>> >>> no, it is only 6 calls because we don't check psql_scan_slash_option result. >> >> psql_scan_slash_option() already has a way to signal errors - it can >> return NULL. Type any backslash command followed by a single quote... > > NULL means "no value". But I thing, so there is only one important - > \set. For other can be enough some error message and empty value. > >> >> I'm not saying I love the way those errors are handled, but if we make >> this patch about revising the way psql does error handling, this is >> not going to get committed for this release... what we need to do is >> fit what we're trying to do into the existing model. >> > > I try to find some simple and I'll send a patch. > > Pavel > >> ...Robert >> >
Attachment
On Fri, Jan 22, 2010 at 7:19 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > here is new variant. Add scan_state flag "valid" and enhance > protection against execution broken statement. This doesn't make sense to me. You've changed the way \set is handled - in a way that doesn't seem particularly appropriate to me - but most of the other backslash commands are unmodified - but then there's this hack at the bottom that overrides the return value if psql_scan_is_valid() fails. And then there's this: - /* we need not do psql_scan_reset() here */ + psql_scan_reset(scan_state); It's extremely unclear what the rationale for this change is. Basically, you need to either improve the comments in here so that someone can understand what is going on, or you need to submit it with some detailed documentation explaining the rationale behind each change and why it is right, or more than likely both. But I think the whole approach is likely off-base and you need to go back and think about a cleaner way to get this done. ...Robert
2010/1/22 Robert Haas <robertmhaas@gmail.com>: > On Fri, Jan 22, 2010 at 7:19 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> here is new variant. Add scan_state flag "valid" and enhance >> protection against execution broken statement. > > This doesn't make sense to me. You've changed the way \set is handled > - in a way that doesn't seem particularly appropriate to me - but most > of the other backslash commands are unmodified - but then there's this > hack at the bottom that overrides the return value if > psql_scan_is_valid() fails. And then there's this: > > - /* we need not do psql_scan_reset() here */ > + psql_scan_reset(scan_state); > > It's extremely unclear what the rationale for this change is. Sure, I'll enhance comments. All \command is verified on broken scanning. So when command's processing time the scanning is broken, then we can detect it on the end - flag valid is changed only to false direction. \set is different - it only one statement, that store values. And it is reason why I did more checks there. I am not sure - it is necessary now - simply we can set ERROR on the end and finish. psql_scan_reset is called on the end of any statement - so we can there reset info about scanning. I'll look on it tomorrow. Regards Pavel Stehule > > Basically, you need to either improve the comments in here so that > someone can understand what is going on, or you need to submit it with > some detailed documentation explaining the rationale behind each > change and why it is right, or more than likely both. But I think the > whole approach is likely off-base and you need to go back and think > about a cleaner way to get this done. > > ...Robert >
Hello I hope, so this version is more readable and more clean. I removed some not necessary checks. regards Pavel 2010/1/22 Robert Haas <robertmhaas@gmail.com>: > On Fri, Jan 22, 2010 at 7:19 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> here is new variant. Add scan_state flag "valid" and enhance >> protection against execution broken statement. > > This doesn't make sense to me. You've changed the way \set is handled > - in a way that doesn't seem particularly appropriate to me - but most > of the other backslash commands are unmodified - but then there's this > hack at the bottom that overrides the return value if > psql_scan_is_valid() fails. And then there's this: > > - /* we need not do psql_scan_reset() here */ > + psql_scan_reset(scan_state); > > It's extremely unclear what the rationale for this change is. > > Basically, you need to either improve the comments in here so that > someone can understand what is going on, or you need to submit it with > some detailed documentation explaining the rationale behind each > change and why it is right, or more than likely both. But I think the > whole approach is likely off-base and you need to go back and think > about a cleaner way to get this done. > > ...Robert >
Attachment
On Mon, Jan 25, 2010 at 7:36 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I hope, so this version is more readable and more clean. I removed > some not necessary checks. This still seems overly complicated to me. I spent a few hours today working up the attached patch. Let me know your thoughts. ...Robert
Attachment
2010/1/27 Robert Haas <robertmhaas@gmail.com>: > On Mon, Jan 25, 2010 at 7:36 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I hope, so this version is more readable and more clean. I removed >> some not necessary checks. > > This still seems overly complicated to me. I spent a few hours today > working up the attached patch. Let me know your thoughts. > There is serious issue. The "psql_error" only shows some message on output, but do nothing more - you don't set a result status for commands and for statements. So some potential error from parsing is pseudo quietly ignored - without respect to your setting ON_ERROR_STOP. This could be a problem for commands. Execution of broken SQL statements will raise syntax error. But for \set some variable will be a broken and the content can be used. I don't thing so it is good. It is limited. Your version is acceptable only when we don't enable escape syntax for commands. Then we don't need check it. On your version - I am not sure if it is fully compatible, and using a global variables isn't nice. I little bit modify my original code - it is more verbose (- useless using pqexpbuffer) - and more consistent with previous behave. Pavel > ...Robert >
Attachment
On Thu, Jan 28, 2010 at 4:53 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/27 Robert Haas <robertmhaas@gmail.com>: >> On Mon, Jan 25, 2010 at 7:36 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> I hope, so this version is more readable and more clean. I removed >>> some not necessary checks. >> >> This still seems overly complicated to me. I spent a few hours today >> working up the attached patch. Let me know your thoughts. > > There is serious issue. The "psql_error" only shows some message on > output, but do nothing more - you don't set a result status for > commands and for statements. So some potential error from parsing is > pseudo quietly ignored - without respect to your setting > ON_ERROR_STOP. This could be a problem for commands. Execution of > broken SQL statements will raise syntax error. But for \set some > variable will be a broken and the content can be used. I don't thing > so it is good. It is limited. Well, what you seem to be proposing to do is allow the command to execute (on the screwed-up data) and then afterwards pretend that it failed by overriding the return status. I think that's unacceptable. The root of the problem here is that the parsing and execution stages for backslash commands are not cleanly separated. There's no clean way for the lexer to return an error that allows the command to finish parsing normally but then doesn't execute it. Fixing that is going to require an extensive refactoring of commands.c which I don't think it makes sense to undertake at this point in the release cycle. Even if it did, it seems like material for a separate patch rather than something which has to be done before this goes in. > Your version is acceptable only when we don't enable escape syntax for > commands. Then we don't need check it. On your version - I am not sure > if it is fully compatible, and using a global variables isn't nice. I'm not adding any new global variables - I'm just using the ones that are already there to avoid duplicating the same code four times. Referencing them from within the bodies of the lexer rules doesn't make the variables not global. ...Robert
2010/1/28 Robert Haas <robertmhaas@gmail.com>: > On Thu, Jan 28, 2010 at 4:53 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2010/1/27 Robert Haas <robertmhaas@gmail.com>: >>> On Mon, Jan 25, 2010 at 7:36 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> I hope, so this version is more readable and more clean. I removed >>>> some not necessary checks. >>> >>> This still seems overly complicated to me. I spent a few hours today >>> working up the attached patch. Let me know your thoughts. >> >> There is serious issue. The "psql_error" only shows some message on >> output, but do nothing more - you don't set a result status for >> commands and for statements. So some potential error from parsing is >> pseudo quietly ignored - without respect to your setting >> ON_ERROR_STOP. This could be a problem for commands. Execution of >> broken SQL statements will raise syntax error. But for \set some >> variable will be a broken and the content can be used. I don't thing >> so it is good. It is limited. > > Well, what you seem to be proposing to do is allow the command to > execute (on the screwed-up data) and then afterwards pretend that it > failed by overriding the return status. I think that's unacceptable. > The root of the problem here is that the parsing and execution stages > for backslash commands are not cleanly separated. There's no clean > way for the lexer to return an error that allows the command to finish > parsing normally but then doesn't execute it. Fixing that is going to > require an extensive refactoring of commands.c which I don't think it > makes sense to undertake at this point in the release cycle. Even if > it did, it seems like material for a separate patch rather than > something which has to be done before this goes in. so I removed support for escaping from backslah commands and refactorised code. I hope so this code is more verbose and clean than previous versions. Regards Pavel > >> Your version is acceptable only when we don't enable escape syntax for >> commands. Then we don't need check it. On your version - I am not sure >> if it is fully compatible, and using a global variables isn't nice. > > I'm not adding any new global variables - I'm just using the ones that > are already there to avoid duplicating the same code four times. > Referencing them from within the bodies of the lexer rules doesn't > make the variables not global. > > ...Robert >
Attachment
On Thu, Jan 28, 2010 at 11:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/28 Robert Haas <robertmhaas@gmail.com>: >> On Thu, Jan 28, 2010 at 4:53 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> 2010/1/27 Robert Haas <robertmhaas@gmail.com>: >>>> On Mon, Jan 25, 2010 at 7:36 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>>> I hope, so this version is more readable and more clean. I removed >>>>> some not necessary checks. >>>> >>>> This still seems overly complicated to me. I spent a few hours today >>>> working up the attached patch. Let me know your thoughts. >>> >>> There is serious issue. The "psql_error" only shows some message on >>> output, but do nothing more - you don't set a result status for >>> commands and for statements. So some potential error from parsing is >>> pseudo quietly ignored - without respect to your setting >>> ON_ERROR_STOP. This could be a problem for commands. Execution of >>> broken SQL statements will raise syntax error. But for \set some >>> variable will be a broken and the content can be used. I don't thing >>> so it is good. It is limited. >> >> Well, what you seem to be proposing to do is allow the command to >> execute (on the screwed-up data) and then afterwards pretend that it >> failed by overriding the return status. I think that's unacceptable. >> The root of the problem here is that the parsing and execution stages >> for backslash commands are not cleanly separated. There's no clean >> way for the lexer to return an error that allows the command to finish >> parsing normally but then doesn't execute it. Fixing that is going to >> require an extensive refactoring of commands.c which I don't think it >> makes sense to undertake at this point in the release cycle. Even if >> it did, it seems like material for a separate patch rather than >> something which has to be done before this goes in. > > so I removed support for escaping from backslah commands and refactorised code. > > I hope so this code is more verbose and clean than previous versions. I don't see any advantage of this over the code that I wrote. First, you can't just remove support for the escape syntax from \d commands without some discussion of whether or not that's the right thing to do, and I don't think it is. The cases where this will potentially cause a problem are limited to those where the input is invalidly encoded, and I don't think that's important enough to justify the surprise factor of having backslash commands behave differently from everything else. Second, even if it were OK to remove support for the escape syntax from \d commands, you failed to update the documentation you cribbed from my patch to match the behavior you implemented. Third, you've reintroduced all of the code duplication that I eliminated in my version of this patch, as well as at least one bug - you've used free() where I believe you need PQfreemem(). I am also thinking that it doesn't make sense to push the result of PQescapeLiteral() or PQescapeIdentifier() as a new buffer, because we don't want to process variable expansions recursively. At first I thought this was a security hole, but on further reflection I don't think it is: it'll rescan as a quoted string anyway, so any colon-escapes will be ignored. But I believe it's unnecessary at any rate. I would like to go ahead and commit my version of this patch and will do so later today if no one else objects. ...Robert
> > First, you can't just remove support for the escape syntax from \d > commands without some discussion of whether or not that's the right > thing to do, and I don't think it is. The cases where this will > potentially cause a problem are limited to those where the input is > invalidly encoded, and I don't think that's important enough to > justify the surprise factor of having backslash commands behave > differently from everything else. > > Second, even if it were OK to remove support for the escape syntax > from \d commands, you failed to update the documentation you cribbed > from my patch to match the behavior you implemented. we can discus about programming style, but in this case I am sure. The problem is \set command. We cannot ignore error in this case. In other cases invalid escaping raises error, not in this case. So there is two ways again: a) remove escaped expansion from \command b) implement \set command differently > > Third, you've reintroduced all of the code duplication that I > eliminated in my version of this patch, as well as at least one bug - > you've used free() where I believe you need PQfreemem(). you have a true. I am also > thinking that it doesn't make sense to push the result of > PQescapeLiteral() or PQescapeIdentifier() as a new buffer, because we > don't want to process variable expansions recursively. At first I > thought this was a security hole, but on further reflection I don't > think it is: it'll rescan as a quoted string anyway, so any > colon-escapes will be ignored. But I believe it's unnecessary at any > rate. > I think so it was a back door for scripting support in psql. It can break backward compatibility! > I would like to go ahead and commit my version of this patch and will > do so later today if no one else objects. yes, I have. * your patch remove some feature without any warning and documentation * your patch doesn't solve issue with \set command Regards Pavel > > ...Robert >
Attachment
On Fri, Jan 29, 2010 at 2:08 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> >> First, you can't just remove support for the escape syntax from \d >> commands without some discussion of whether or not that's the right >> thing to do, and I don't think it is. The cases where this will >> potentially cause a problem are limited to those where the input is >> invalidly encoded, and I don't think that's important enough to >> justify the surprise factor of having backslash commands behave >> differently from everything else. >> >> Second, even if it were OK to remove support for the escape syntax >> from \d commands, you failed to update the documentation you cribbed >> from my patch to match the behavior you implemented. > > we can discus about programming style, but in this case I am sure. The > problem is \set command. We cannot ignore error in this case. In other > cases invalid escaping raises error, not in this case. So there is two > ways again: > > a) remove escaped expansion from \command > b) implement \set command differently I don't view either of those solutions as appropriate or acceptable. > I am also >> thinking that it doesn't make sense to push the result of >> PQescapeLiteral() or PQescapeIdentifier() as a new buffer, because we >> don't want to process variable expansions recursively. At first I >> thought this was a security hole, but on further reflection I don't >> think it is: it'll rescan as a quoted string anyway, so any >> colon-escapes will be ignored. But I believe it's unnecessary at any >> rate. >> > I think so it was a back door for scripting support in psql. It can > break backward compatibility! I don't understand your point here. >> I would like to go ahead and commit my version of this patch and will >> do so later today if no one else objects. > > yes, I have. > > * your patch remove some feature without any warning and documentation > * your patch doesn't solve issue with \set command It does not remove any feature at all that I can see. The fact that backslash commands don't properly report errors is a problem, but I don't see why I have to solve that problem as part of this patch, and even less why I should fix \set and leave all the others alone. Another problem I've found looking at this code is that \copy does not support variable substitutions at all, which seems rather unfortunate.Still another is that the regular expression used forvariable substitutions in SQL commands is subtly different than the one used for backslash commands: the latter matches a single colon with no following characters, while the former does not. All of this stuff may be worth fixing but I'm sticking with my position that it's not material for this patch. If you want to submit a separate patch to change the error handling, go ahead. But I don't think fixing it for \set only is going to fly, and I don't think that it should be specific to handling only errors resulting from encoding violations or out of memory conditions in variable substitutions. It needs to be a general mechanism that can be applied to existing types of error conditions where it's appropriate and to new error conditions that may arise in the future. ...Robert