Thread: Open 6.4 items
Additions --------- test new cidr/IP address type(Tom Helbekkmo) complete rewrite system changes(Jan) CREATE TABLE test (x text, s serial) fails if no database creation permission regression test all platforms vacuum crash Serious Items ------------ change pg args for platforms that don't support argv changes (setproctitle()?, sendmail hack?) Docs ---- man pages/sgml synchronization generate html/postscript documentation make sure all changes are documented properly Minor items ----------- cnf-ify still can exhaust memory, make SET KSQO more generic permissions on indexes: what do they do? should it be prevented? multi-verion concurrency control - work-in-progress for 6.5 improve reporting of syntax errors by showing location of error in query use index with constants on functions allow chaining of pages to allow >8k tuples allow multiple generic operators in expressions without the use of parentheses document/trigger/rule so changes to pg_shadow create pg_pwd large objects orphanage improve group handling no min/max for oid type improve PRIMARY KEY handling generate postmaster pid file and remove flock/fcntl lock code -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > test new cidr/IP address type(Tom Helbekkmo) Looks good to me. I haven't done really heavy testing yet, and I'd also like to update the regression test -- what I included was just a very quick sequence to see that it basically worked, but we should have a more comprehensive test. No great hurry there, though: for now, I'd say it's ready for shipping, modulo the renaming of IPADDR to INET, for which I'm sending a separate patch kit. One problem though, seemingly lately introduced: It's nice to be able to input IP addresses in various formats, for compatibility with other software. One of the common formats is a network byte order hex string, like 0x12345678. Until very recently, I could check what the heck the actual address behind such a representation was, by going "select '0x12345678'::ipaddr;". This no longer works, because the system now helpfully transforms the hex into a long int or something and then tries to treat the result as an ipaddr. Uh-oh. The intuitively correct thing would be for the hex string to be read and converted as a numeric value only if the type it is being cast to is, indeed, numeric in nature. In the given case, it should be up to ipaddr_in() to make sense of the character string. Or what do you say? -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > > test new cidr/IP address type(Tom Helbekkmo) > > Looks good to me. I haven't done really heavy testing yet, and I'd > also like to update the regression test -- what I included was just a > very quick sequence to see that it basically worked, but we should > have a more comprehensive test. No great hurry there, though: for > now, I'd say it's ready for shipping, modulo the renaming of IPADDR to > INET, for which I'm sending a separate patch kit. OK. I think Thomas is adding it to the regression tests, particularly so people can see how it works. Your readme is now in the manual. > One problem though, seemingly lately introduced: It's nice to be able > to input IP addresses in various formats, for compatibility with other > software. One of the common formats is a network byte order hex > string, like 0x12345678. Until very recently, I could check what the > heck the actual address behind such a representation was, by going > "select '0x12345678'::ipaddr;". This no longer works, because the > system now helpfully transforms the hex into a long int or something > and then tries to treat the result as an ipaddr. Uh-oh. > > The intuitively correct thing would be for the hex string to be read > and converted as a numeric value only if the type it is being cast to > is, indeed, numeric in nature. In the given case, it should be up to > ipaddr_in() to make sense of the character string. Or what do you say? Thomas? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > Looks good to me. I haven't done really heavy testing yet, and I'd > > also like to update the regression test > OK. I think Thomas is adding it to the regression tests, particularly > so people can see how it works. Your readme is now in the manual. It would be great if you could gin up a real regression test, since I'm up to my eyeballs in other Postgres projects. But if you can't, or want some help, let me know. I'd be happy to help with integration of the test... > > One problem though, seemingly lately introduced: It's nice > > to input IP addresses in various formats... > > One of the common formats is a network byte order hex > > string, like 0x12345678. Until very recently, I could check what > > the heck the actual address behind such a representation was, by > > going "select '0x12345678'::ipaddr;". This no longer works, because > > the system now helpfully transforms the hex into a long int or > > something and then tries to treat the result as an ipaddr. Uh-oh. There is a problem, but this is not the explanation. See below... > > The intuitively correct thing would be for the hex string to be read > > and converted as a numeric value only if the type it is being cast > > to is, indeed, numeric in nature. In the given case, it should be > > up to ipaddr_in() to make sense of the character string. What you describe as "no longer works" is actually a core dump on my machine: postgres=> select 'x12345678'::ipaddr; ERROR: could not parse "x12345678" postgres=> select '0x12345678'::ipaddr; pqReadData() -- backend closed the channel unexpectedly. Any single-quote-delimited string is passed to the corresponding _in() routine without change. So the system is not changing anything before ipaddr_in() sees it for input. It appears that the _in() code tries to handle strings starting with "X", but fails to do so (though I'm not exactly sure what a correct value would look like; I would have guessed that something like 'x12121212' might be legal, but it fails). It also appears that the code crashes if there is a leading zero and an otherwise hex-looking value following, even though it at least sometimes is checking that each character is valid. I haven't looked at it further, but the core dump feature should be a "must-fix" and I would put the "hex input" in the same category since the code claims to handle it but doesn't. Volunteers? - Tom
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: (Oh, and "::ipaddr" is now "::inet", of course...) > postgres=> select 'x12345678'::ipaddr; > ERROR: could not parse "x12345678" This is as it should be -- it demands the leading 0 to work, as in: > postgres=> select '0x12345678'::ipaddr; ...which should return: ?column? --------------- 18.52.86.120/32 (1 row) but in your case gives: > pqReadData() -- backend closed the channel unexpectedly. ...or a number of other, weird errors, with or without crashes, but mostly with. It also changes with time, so it looks very much like a memory corruption. Now, I've read and re-read my code, and I'm pretty damn sure that I don't write outside the palloc()ed area at all, and I do palloc() the sum of VARHDRSZ and the size of the data structure I'm storing. I also set VARSIZE(dst) to the right number (the same as the number of bytes I palloc()). Looking at debug output from the backend, it is obvious that inet_in() does the right thing: the expected 12 byte data structure looks the way it ought to, and since all written data ends up inside it (that is, it's filled with what I expect to see there), it's hard to imagine what might end up outside. It's also very strange that this happens only when the '0x12345678'::inet format is used. It would be natural from this to suspect that inet_net_pton() is at fault, but my tests of that function don't show up any problems. Weirdly, this patch fixes the problem: Index: inet.c =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/inet.c,v retrieving revision 1.2 diff -r1.2 inet.c 52c52 < dst = palloc(VARHDRSZ + sizeof(inet_struct)); --- > dst = palloc(VARHDRSZ + sizeof(inet_struct) + 1); Now I'm explicitly asking for at least one byte more than I need, and I'm pretty damn sure that I never touch that extra byte, but something seems to, since the problem goes away. It's arrogant of me, I know, but barring a complete misunderstanding on my part of how variable size records work (or a stupid bug that I've been staring at for hours without seeing, of course), I'd say that something outside my code is at fault. Any ideas as to how to try to find out? -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Tom Ivar Helbekkmo <tih@nhh.no> writes: > Now I'm explicitly asking for at least one byte more than I need, and > I'm pretty damn sure that I never touch that extra byte, but something > seems to, since the problem goes away. It's arrogant of me, I know, > but barring a complete misunderstanding on my part of how variable > size records work (or a stupid bug that I've been staring at for hours > without seeing, of course), I'd say that something outside my code is > at fault. Any ideas as to how to try to find out? Well, I hate to ruin your day, but coming pre-armed with the knowledge that the code is writing one byte too many, it's pretty obvious that the first loop in inet_net_pton_ipv4 does indeed do that. Specifically at else if (size-- > 0) *++dst = 0, dirty = 0; where, when size (the number of remaining destination bytes) is reduced to 0, the code nonetheless advances dst and clears the next byte. The loop logic is fundamentally faulty: you can't check for emsgsize overflow until you get a digit that is supposed to go into another byte. I'd try something like tmp = 0; ndigits = 0; // ndigits is # of hex digits seen for cur byte while (ch = next hex digit) { n = numeric equivalent of ch; assert(n >= 0 && n <= 15); tmp = (tmp << 4) | n; if (++ndigits == 2) { if (size-- <= 0) goto emsgsize; *dst++ = (u_char) tmp; tmp = 0, ndigits = 0; } } if (ndigits) goto enoent; // odd number of hex digits is bogus BTW, shouldn't this routine clear out all "size" bytes of the destination, even if the given data doesn't fill it all? A memset at the top might be worthwhile. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Well, I hate to ruin your day, You sure didn't! You made my day! :-) > but coming pre-armed with the knowledge > that the code is writing one byte too many, it's pretty obvious that the > first loop in inet_net_pton_ipv4 does indeed do that. Specifically at > else if (size-- > 0) > *++dst = 0, dirty = 0; > where, when size (the number of remaining destination bytes) is reduced > to 0, the code nonetheless advances dst and clears the next byte. You're quite right, and I should have caught this one myself, only I must have goofed when I tested the function. (It is written in a style that's just obscure enough that I chose testing it instead of studying it until I understood in detail what it did.) As far as I can tell, the only actual error in Paul Vixie's code is that the two lines you quote above should be: else if (--size > 0) *++dst = 0, dirty = 0; That is, size should be predecremented instead of postdecremented. I'm cc-ing Paul on this, as I assume he wants to get the fix into BIND, which is where inet_net_ntop() and inet_net_pton() came from. > BTW, shouldn't this routine clear out all "size" bytes of the > destination, even if the given data doesn't fill it all? A memset > at the top might be worthwhile. It does: there is code further down that handles that. Another example of the obscure programming style: this function really shows what a low-level language C is! :-) -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Tom Ivar Helbekkmo <tih@nhh.no> writes: > As far as I > can tell, the only actual error in Paul Vixie's code is that the two > lines you quote above should be: > else if (--size > 0) > *++dst = 0, dirty = 0; No, that's still wrong, because it will error out (jump to emsgsize) one byte sooner than it should. The loop is fundamentally broken because it wants to grab and zero a byte before it knows whether there are any digits for the byte. I think its behavior for an odd number of digits is wrong too, or at least pretty nonintuitive. I like my code a lot better ;-) regards, tom lane
i'll take a tiny exception to the declaration that the code is obscure -- since that just means you have not looked at the rest of bind :-). but thanks for the bug report, i'll fix this in bind 8.1.3.
Tom Lane <tgl@sss.pgh.pa.us> writes: > No, that's still wrong, [...] I'll leave that to you and Paul. Right now, the code works as I expect it to, and that's enough for me. :-) -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
I wrote: > As far as I can tell, the only actual error in Paul Vixie's code is > that the two lines you quote above should be: > > else if (--size > 0) > *++dst = 0, dirty = 0; That is, the patch to apply to fix the INET type is: Index: inet_net_pton.c =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/inet_net_pton.c,v retrieving revision 1.2 diff -r1.2 inet_net_pton.c 120c120 < else if (size-- > 0) --- > else if (--size > 0) -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom Ivar Helbekkmo <tih@nhh.no> writes: > > As far as I > > can tell, the only actual error in Paul Vixie's code is that the two > > lines you quote above should be: > > else if (--size > 0) > > *++dst = 0, dirty = 0; > > No, that's still wrong, because it will error out (jump to emsgsize) > one byte sooner than it should. The loop is fundamentally broken > because it wants to grab and zero a byte before it knows whether there > are any digits for the byte. Damn! You're right! I swear I tested that change at home, and found it to do what I wanted -- but now that I've put it into my production system at work, it doesn't work here. Heck, I even read the code very carefully to verify that it was all that was needed! I think I really need the holiday I'm taking next week! To keep this in sync with BIND, maybe Paul could take a look, and fix the code the way he wants it there? Whoops. It just hit me. When I tested my "quick fix" last night, I just looked at what happened to the data, and ignored the return value. Silly of me... -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Tom Lane <tgl@sss.pgh.pa.us> writes: > I think its behavior for an odd number of digits is wrong too, or at > least pretty nonintuitive. I like my code a lot better ;-) Well, I like your code better too on closer inspection. I changed it a bit, though, because I don't feel that an odd number of digits in the hex string is wrong at all -- it's just a representation of a bit string, with the representation padded to an even multiple of four bits. Anyway, here's what I ended up with for the code in question: if (ch == '0'&& (src[0] == 'x'|| src[0] == 'X') && isascii(src[1]) && isxdigit(src[1])) { /* Hexadecimal: Eat nybble string. */ if (size <= 0) goto emsgsize; tmp = 0; dirty = 0; src++; /* skip x or X. */ while ((ch = *src++) != '\0'&& isascii(ch) && isxdigit(ch)) { if (isupper(ch)) ch = tolower(ch); n = strchr(xdigits, ch) - xdigits; assert(n >= 0 && n <= 15); tmp = (tmp << 4) | n; if (++dirty == 2) { if (size-- <= 0) goto emsgsize; *dst++ = (u_char) tmp; tmp = 0, dirty = 0; } } if (dirty) { if (size-- <= 0) goto emsgsize; tmp <<= 4; *dst++ = (u_char) tmp; } } -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Applied > I wrote: > > > As far as I can tell, the only actual error in Paul Vixie's code is > > that the two lines you quote above should be: > > > > else if (--size > 0) > > *++dst = 0, dirty = 0; > > > That is, the patch to apply to fix the INET type is: > > Index: inet_net_pton.c > =================================================================== > RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/inet_net_pton.c,v > retrieving revision 1.2 > diff -r1.2 inet_net_pton.c > 120c120 > < else if (size-- > 0) > --- > > else if (--size > 0) > > -tih > -- > Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier" > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Applied > > > I wrote: > > > > > As far as I can tell, the only actual error in Paul Vixie's code is > > > that the two lines you quote above should be: > > > > > > else if (--size > 0) > > > *++dst = 0, dirty = 0; [snip] > > < else if (size-- > 0) > > --- > > > else if (--size > 0) Didn't we agree this WASN'T the fix for the problem? Something about return values? I seem to remember that this piece of code needed to be rewritten... Taral
> Didn't we agree this WASN'T the fix for the problem? Something about > return values? I seem to remember that this piece of code needed to be > rewritten... Yup. Bruce, Tom has another patch he posted which will behave better... - Tom
> > Didn't we agree this WASN'T the fix for the problem? Something about > > return values? I seem to remember that this piece of code needed to be > > rewritten... > > Yup. Bruce, Tom has another patch he posted which will behave better... > > - Tom > I have lost it. Could someone resend it, and the patch to reverse too. Thanks. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > I have lost it. Could someone resend it, and the patch to reverse too. This is the one that shouldn't have been applied: Index: inet_net_pton.c =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/inet_net_pton.c,v retrieving revision 1.2 diff -r1.2 inet_net_pton.c 120c120 < else if (size-- > 0) --- > else if (--size > 0) This is the one that works: Index: inet_net_pton.c =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/inet_net_pton.c,v retrieving revision 1.2 diff -r1.2 inet_net_pton.c 108c108,109 < *dst = 0, dirty = 0; --- > tmp = 0; > dirty = 0; 117,122c118,127 < *dst |= n; < if (!dirty++) < *dst <<= 4; < else if (size-- > 0) < *++dst = 0, dirty = 0; < else --- > tmp = (tmp << 4) | n; > if (++dirty == 2) { > if (size-- <= 0) > goto emsgsize; > *dst++ = (u_char) tmp; > tmp = 0, dirty = 0; > } > } > if (dirty) { > if (size-- <= 0) 123a129,130 > tmp <<= 4; > *dst++ = (u_char) tmp; 125,126d131 < if (dirty) < size--; -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Both applied, with the first one reverse applied. Thanks. > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > > I have lost it. Could someone resend it, and the patch to reverse too. > > This is the one that shouldn't have been applied: > > Index: inet_net_pton.c > =================================================================== > RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/inet_net_pton.c,v > retrieving revision 1.2 > diff -r1.2 inet_net_pton.c > 120c120 > < else if (size-- > 0) > --- > > else if (--size > 0) > > This is the one that works: > > Index: inet_net_pton.c > =================================================================== > RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/inet_net_pton.c,v > retrieving revision 1.2 > diff -r1.2 inet_net_pton.c > 108c108,109 > < *dst = 0, dirty = 0; > --- > > tmp = 0; > > dirty = 0; > 117,122c118,127 > < *dst |= n; > < if (!dirty++) > < *dst <<= 4; > < else if (size-- > 0) > < *++dst = 0, dirty = 0; > < else > --- > > tmp = (tmp << 4) | n; > > if (++dirty == 2) { > > if (size-- <= 0) > > goto emsgsize; > > *dst++ = (u_char) tmp; > > tmp = 0, dirty = 0; > > } > > } > > if (dirty) { > > if (size-- <= 0) > 123a129,130 > > tmp <<= 4; > > *dst++ = (u_char) tmp; > 125,126d131 > < if (dirty) > < size--; > > -tih > -- > Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier" > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Wow. Tough room! I've got to admit that I don't have any test cases with hex strings in them, and had the hex portion of inet_net_pton_ipv4() ever been tested here it would have shown every error you folks found: > From: Tom Ivar Helbekkmo <tih@nhh.no> > Date: 09 Oct 1998 22:38:37 +0200 > > As far as I can tell, the only actual error in Paul Vixie's code is that > the two lines you quote above should be: > > else if (--size > 0) > *++dst = 0, dirty = 0; > > That is, size should be predecremented instead of postdecremented. > I'm cc-ing Paul on this, as I assume he wants to get the fix into > BIND, which is where inet_net_ntop() and inet_net_pton() came from. As later bespoken by someone else, this is not yet the right answer. > > BTW, shouldn't this routine clear out all "size" bytes of the > > destination, even if the given data doesn't fill it all? A memset > > at the top might be worthwhile. No. "size" is the maximum I'm allowed to fill -- but I only touch the bits I need, and I return the number of bits I touched. > It does: there is code further down that handles that. Another > example of the obscure programming style: this function really shows > what a low-level language C is! :-) C is not a language, it is a very smart macro assembler. Modula-3 is a language. I'm told that Eiffel is a language. Scheme is definitely a language. But C is not a language. It's just what we all use :-). More seriously, this function is called in the inner loop and speed was important enough to trade off some simplicity for. The amazing thing is, because I was blasting the same byte of *dst twice per loop iteration, it was actually sped up considerably by the patches suggested here. I've got egg on my face this time fershure man. And as I said, if you think this routine is obscure, that just shows that you've not looked at the rest of BIND. This routine is at least self contained and its bugs don't lean on other bugs. > Date: Fri, 09 Oct 1998 17:20:22 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > > > As far as I can tell, the only actual error in Paul Vixie's code is > > that the two lines you quote above should be: > > > > else if (--size > 0) > > *++dst = 0, dirty = 0; > > No, that's still wrong, because it will error out (jump to emsgsize) > one byte sooner than it should. The loop is fundamentally broken > because it wants to grab and zero a byte before it knows whether there > are any digits for the byte. Yes. > From: Tom Ivar Helbekkmo <tih@nhh.no> > Date: 10 Oct 1998 20:14:51 +0200 > > > I think its behavior for an odd number of digits is wrong too, or at > > least pretty nonintuitive. I like my code a lot better ;-) > > Well, I like your code better too on closer inspection. I changed it > a bit, though, because I don't feel that an odd number of digits in > the hex string is wrong at all -- it's just a representation of a bit > string, with the representation padded to an even multiple of four > bits. I agree with this interpretation. I'm now wracking my brain trying to remember what other source file I got this code from, in what other package, in what year, done for what project, because it's possible that the bug was inherited. > Anyway, here's what I ended up with for the code in question: > > if (ch == '0'&& (src[0] == 'x'|| src[0] == 'X') > && isascii(src[1]) && isxdigit(src[1])) > { > /* Hexadecimal: Eat nybble string. */ > if (size <= 0) > goto emsgsize; > tmp = 0; > dirty = 0; > src++; /* skip x or X. */ > while ((ch = *src++) != '\0'&& > isascii(ch) && isxdigit(ch)) > { > if (isupper(ch)) > ch = tolower(ch); > n = strchr(xdigits, ch) - xdigits; > assert(n >= 0 && n <= 15); > tmp = (tmp << 4) | n; > if (++dirty == 2) { > if (size-- <= 0) > goto emsgsize; > *dst++ = (u_char) tmp; > tmp = 0, dirty = 0; > } > } > if (dirty) { > if (size-- <= 0) > goto emsgsize; > tmp <<= 4; > *dst++ = (u_char) tmp; > } > } Since that made some stylistic changes that had nothing to do with the algorythm, it caught my eye (that's a hint.) In keeping with its spirit, I came up with the following. Comments please? if (ch == '0' && (src[0] == 'x' || src[0] == 'X') && isascii(src[1]) && isxdigit(src[1])) { /* Hexadecimal: Eat nybble string. */ if (size <= 0) goto emsgsize; dirty = 0; src++; /* skip x or X. */ while ((ch = *src++) != '\0' && isascii(ch) && isxdigit(ch)) { if (isupper(ch)) ch = tolower(ch); n = strchr(xdigits, ch) - xdigits; assert(n >= 0 && n <= 15); if (dirty == 0) tmp = n; else tmp = (tmp << 4) | n; if (++dirty == 2) { if (size-- <= 0) goto emsgsize; *dst++ = (u_char) tmp; dirty = 0; } } if (dirty) { /* Odd trailing nybble? */ if (size-- <= 0) goto emsgsize; *dst++ = (u_char) (tmp << 4); } } For the record, I found my original code completely impenetrable, and I'd like to both thank those of you who slogged through it, and apologize for not testing every code path before I released it. I oughta know better, and you find folks deserve better.