Thread: BUG #5418: psql exits after using tab-completion with error message
The following bug has been logged online: Bug reference: 5418 Logged by: Ben Madin Email address: ben@ausvet.com.au PostgreSQL version: 8.4.3 Operating system: Mac OS X 10.6.3 Description: psql exits after using tab-completion with error message Details: G'day, this problem appear to be intermittent - in so far as I don't always notice it. It has been happening for a number of versions (since 8.3 at least) and it might work or it might not, but I can't really pick what has changed when it starts happening. Once it starts, it seems very hard to stop. Very anecdotally, I think it only happens when the characters entered so far are ambiguous (ie could be more than one table) and the tables are recently added to the database. Here is an example : (I wanted the abattoir table, so I had typed \d aba and then pressed the tab key) psql (8.4.3) Type "help" for help. prices=# SELECT version(); version ---------------------------------------------------------------------------- -------------------------------------------------------------------- PostgreSQL 8.4.3 on i386-apple-darwin10.3.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1), 64-bit (1 row) prices=# \set VERBOSITY verbose prices=# \d abapsql(11407) malloc: *** error for object 0xe: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug Abort trap but if I had only typed \d a, it would have just waited (as there are two other tables starting with a) I have tried : 1. restarting the terminal 2. restarting the pg_server 3. rebooting the system 4. a few cycles of upgrades 5. dropping and reloading the database itself. and I don't know where else to go. I have no idea about what malloc_error_break means or where to start. I hope this is helpful enough, please let me know if you require further information. cheers Ben
> prices=# \d abapsql(11407) malloc: *** error for object 0xe: pointer being > freed was not allocated > *** set a breakpoint in malloc_error_break to debug > Abort trap This could be a bug in psql, a buggy/damaged readline library, etc. For GUI apps Mac OS X makes a crash record in the system logs. I'm not sure if it does that for command line apps too - I think it does. Can you check "Console" and see if there are any crash dumps for psql? If not, to find out what's going on it may be necessary to attach a debugger. This probably isn't installed on your computer. You would need the Developer Tools (XCode etc), which can be obtained from the Apple web site as a free download. I don't have access to Mac OS X 10.6, but maybe someone else here does and can reproduce the issue. Even if they can, it might still be helpful to get some additional info on the crash you're having, so it'd be great if you could grab the developer tools and reply once they're installed. See: http://developer.apple.com/technologies/xcode.html You need a "dev center" username/password, but it's free to register and they don't spam you. See: http://developer.apple.com/programs/register/ (click "Get Started") -- Craig Ringer
G'day Craig, thanks for your reply. On 13/04/2010, at 20:33 , Craig Ringer wrote: > >> prices=# \d abapsql(11407) malloc: *** error for object 0xe: pointer being >> freed was not allocated >> *** set a breakpoint in malloc_error_break to debug >> Abort trap > > This could be a bug in psql, a buggy/damaged readline library, etc. > > For GUI apps Mac OS X makes a crash record in the system logs. I'm not sure if it does that for command line apps too -I think it does. Can you check "Console" and see if there are any crash dumps for psql? I have checked console, and there are many psql entries - I have attached two as they all appear fairly similar, some numberschanging in this section : Thread 0 crashed with X86 Thread State (64-bit): rax: 0x0000000000000000 rbx: 0x0000000000000002 rcx: 0x00007fff5fbff448 rdx: 0x0000000000000000 rdi: 0x0000000000002efd rsi: 0x0000000000000006 rbp: 0x00007fff5fbff460 rsp: 0x00007fff5fbff448 r8: 0x0000000000000e03 r9: 0x0000000000000000 r10: 0x00007fff868cf8ca r11: 0x0000000000000202 r12: 0x00000001000d5000 r13: 0x00000001000d2000 r14: 0x0000000000000000 r15: 0x0000000000000003 rip: 0x00007fff868d3886 rfl: 0x0000000000000202 cr2: 0x0000000100188bd5 The ones that vary most are rdi, rcx, rbp, cr2 and r15. > If not, to find out what's going on it may be necessary to attach a debugger. This probably isn't installed on your computer.You would need the Developer Tools (XCode etc), which can be obtained from the Apple web site as a free download. > > I don't have access to Mac OS X 10.6, but maybe someone else here does and can reproduce the issue. Even if they can, itmight still be helpful to get some additional info on the crash you're having, so it'd be great if you could grab the developertools and reply once they're installed. See: I have the developer tools installed - but I think only because I needed them installed to install something ages ago. cheers Ben -- Ben Madin AusVet Animal Health Services P.O. Box 5467 Broome WA 6725 Australia t : +61 8 9192 5455 f : +61 8 9192 5535 m : 0448 887 220 e : ben@ausvet.com.au AusVet's website: http://www.ausvet.com.au This transmission is for the intended addressee only and is confidential information. If you have received this transmissionin error, please delete it and notify the sender. The contents of this email are the opinion of the writer onlyand are not endorsed by AusVet Animal Health Services unless expressly stated otherwise. Although AusVet uses virus scanningsoftware we do not accept liability for viruses or similar in any attachments.
Attachment
Ben Madin wrote: > Bug reference: 5418 > Logged by: Ben Madin > Email address: ben@ausvet.com.au > PostgreSQL version: 8.4.3 > Operating system: Mac OS X 10.6.3 > Description: psql exits after using tab-completion with error message Lots of problems have been reported with MacOSX's libreadline -- it is said to be buggy. I think the recommendation is to install vanilla GNU libreadline and compile Postgres against that one. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Craig Ringer <craig@postnewspapers.com.au> writes: >> prices=# \d abapsql(11407) malloc: *** error for object 0xe: pointer being >> freed was not allocated >> *** set a breakpoint in malloc_error_break to debug >> Abort trap > This could be a bug in psql, a buggy/damaged readline library, etc. > ... > I don't have access to Mac OS X 10.6, but maybe someone else here does > and can reproduce the issue. It's fairly easy to reproduce in the regression database: type "\d ten<TAB>". I'm not sure what the triggering condition is exactly, because some seemingly-similar cases don't fail, for instance "\d test<TAB>" works as expected, ditto "\d t<TAB>". Stack trace looks like this: regression=# \d tenpsql(16771) malloc: *** error for object 0xd: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug Program received signal SIGABRT, Aborted. 0x00007fff83652886 in __kill () (gdb) bt #0 0x00007fff83652886 in __kill () #1 0x00007fff836f2eae in abort () #2 0x00007fff8360aa75 in free () #3 0x000000010009b9a8 in fn_complete () #4 0x00000001000a1416 in rl_complete () #5 0x00000001000a1428 in rl_complete () #6 0x000000010009fb87 in el_gets () #7 0x00000001000a19bf in readline () #8 0x00000001000083ff in gets_interactive (prompt=<value temporarily unavailable, due to optimizations>) at input.c:76 #9 0x000000010000bfdb in MainLoop (source=0x7fff705a30c0) at mainloop.c:134 #10 0x000000010000e6d4 in main (argc=<value temporarily unavailable, due to optimizations>, argv=0x7fff5fbff510) at startup.c:305 The object address is nonreproducible (varies even in seemingly identical test runs), but it's always a very small integer, 1 to 0xd or so. Since this doesn't happen on any of my libreadline-using boxes, it seems like a fairly safe bet that it's a bug in libedit, rather than us using the library incorrectly. You can try to get Apple to take an interest, but there's not much we can do about it. I concur with Alvaro's suggestion to install GNU readline instead of depending on libedit. regards, tom lane
On 13/04/10 21:16, Ben Madin wrote: > I have checked console, and there are many psql entries - I have attached two as they all appear fairly similar, some numberschanging in this section : > > Thread 0 crashed with X86 Thread State (64-bit): > rax: 0x0000000000000000 rbx: 0x0000000000000002 rcx: 0x00007fff5fbff448 rdx: 0x0000000000000000 > rdi: 0x0000000000002efd rsi: 0x0000000000000006 rbp: 0x00007fff5fbff460 rsp: 0x00007fff5fbff448 > r8: 0x0000000000000e03 r9: 0x0000000000000000 r10: 0x00007fff868cf8ca r11: 0x0000000000000202 > r12: 0x00000001000d5000 r13: 0x00000001000d2000 r14: 0x0000000000000000 r15: 0x0000000000000003 > rip: 0x00007fff868d3886 rfl: 0x0000000000000202 cr2: 0x0000000100188bd5 > > The ones that vary most are rdi, rcx, rbp, cr2 and r15. Darn. There's no more information, like a numbered list of functions (stack trace), list of linked libraries, etc? Maybe OS X only generates that for GUI app crashes. The stack trace is really what's needed. While it's possible to figure out where a program crashed based on the thread state dump as shown above, it doesn't give you any information about how it got there - and that can be rather helpful. Is there any chance you can run psql under gdb from the developer tools and reproduce the fault that way? Then, when it crashes, get a backtrace? Since you're clearly pretty familiar with the shell, I'll just illustrate how to do it: $ gdb --quiet --args psql [any psql params here] (gdb) run [ do whatever you need to do to make psql crash ] Program received signal SIGSEGV, Segmentation fault. 0x007ca422 in __kernel_vsyscall () (gdb) bt #0 0x007ca422 in __kernel_vsyscall () #1 0x001cddd3 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:82 #2 0x00de68c7 in rl_getc () from /lib/libreadline.so.6 #3 0x00de6ea3 in rl_read_key () from /lib/libreadline.so.6 #4 0x00dd109e in readline_internal_char () from /lib/libreadline.so.6 #5 0x00dd15ed in readline () from /lib/libreadline.so.6 #6 0x00730ff1 in ?? () #7 0x00733ebe in ?? () #8 0x00737964 in main () (gdb) If you paste all the output after "run", that'd be really handy. If for some reason you can't start psql under gdb, you can instead run psql normally and then attach gdb to psql using "gdb -p pidofpsql" . Get "pidofpsql" using the "ps" command - "ps -ef" or "ps aux" depending, I don't remember which flavour Mac OS X understands - passed through "| grep psql". > I have the developer tools installed - but I think only because I needed them installed to install something ages ago. Great. -- Craig Ringer
On 13/04/10 21:16, Ben Madin wrote: > G'day Craig, thanks for your reply. Please disregard my follow-up. I hadn't seen Tom's reply that he was able to reproduce the issue. There's no need for you to collect a backtrace now :-) -- Craig Ringer
I wrote: > It's fairly easy to reproduce in the regression database: > type "\d ten<TAB>". I'm not sure what the triggering condition > is exactly, because some seemingly-similar cases don't fail, > for instance "\d test<TAB>" works as expected, ditto "\d t<TAB>". It turns out that the problem occurs when there are exactly 9 + 10*N possible completions, for any N>=0. There's an off-by-one logic bug in libedit that results in a memory stomp because it forgets to enlarge an array before storing a terminating null pointer in it. The upstream netbsd sources incorporated a fix some time ago: http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/readline.c.diff?r1=1.82&r2=1.83&sortby=date&f=h with credit to Caleb Welton at Greenplum --- I wonder if he found it because of psql failing? Apple hasn't incorporated this fix as of OS X 10.6.3, however. What's slightly more distressing is that the same source file (readline.c) appears to have at least two other occurrences of the same broken array-enlargement coding pattern, which were *not* fixed. I've reported this to Apple but I'm not real sure where to file NetBSD bugs. Anybody want to yank the BSD guys' chain about the other errors? regards, tom lane
Ben Madin <ben@ausvet.com.au> writes: > I also reported it to Apple, but without this information, so I hope they get the sense that it might be important enoughto look at, especially if there is already a fix known. Mine is problem ID 7866382, if you'd like to add a note to yours pointing out the duplication. If this is biting you on a regular basis, one easy workaround would be to add a dummy table or index to change the number of possible completions. regards, tom lane
Thanks Tom, I also reported it to Apple, but without this information, so I hope they g= et the sense that it might be important enough to look at, especially if th= ere is already a fix known. I also contacted William Kyngesbury, and he was sympathetic (but had never = had the problem himself, but neither does he use the tab-completion), but w= ould rather Apple fixed it because his GIS software suite is very based on = using Apple OSX frameworks where they exist. cheers Ben On 15/04/2010, at 12:13 , Tom Lane wrote: > I wrote: >> It's fairly easy to reproduce in the regression database: >> type "\d ten<TAB>". I'm not sure what the triggering condition >> is exactly, because some seemingly-similar cases don't fail, >> for instance "\d test<TAB>" works as expected, ditto "\d t<TAB>". >=20 > It turns out that the problem occurs when there are exactly 9 + 10*N > possible completions, for any N>=3D0. There's an off-by-one logic bug > in libedit that results in a memory stomp because it forgets to enlarge > an array before storing a terminating null pointer in it. >=20 > The upstream netbsd sources incorporated a fix some time ago: > http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/readline.c.diff?r1=3D= 1.82&r2=3D1.83&sortby=3Ddate&f=3Dh > with credit to Caleb Welton at Greenplum --- I wonder if he found it > because of psql failing? Apple hasn't incorporated this fix as of > OS X 10.6.3, however. >=20 > What's slightly more distressing is that the same source file > (readline.c) appears to have at least two other occurrences of the same > broken array-enlargement coding pattern, which were *not* fixed. >=20 > I've reported this to Apple but I'm not real sure where to file NetBSD > bugs. Anybody want to yank the BSD guys' chain about the other errors? >=20 > regards, tom lane --=20 Ben Madin AusVet Animal Health Services P.O. Box 5467 Broome WA 6725 Australia t : +61 8 9192 5455 f : +61 8 9192 5535 m : 0448 887 220 e : ben@ausvet.com.au AusVet's website: http://www.ausvet.com.au This transmission is for the intended addressee only and is confidential in= formation. If you have received this transmission in error, please delete i= t and notify the sender. The contents of this email are the opinion of the = writer only and are not endorsed by AusVet Animal Health Services unless ex= pressly stated otherwise. Although AusVet uses virus scanning software we d= o not accept liability for viruses or similar in any attachments.