Thread: \dS and \df crashing psql
Your name : Nishad Prakash Your email address : prakashn@uci.edu System Configuration --------------------- Architecture (example: Intel Pentium) : Sun Sparc Operating System (example: Linux 2.0.26 ELF) : Solaris 2.6 PostgreSQL version (example: PostgreSQL-6.5.1): PostgreSQL-7.0 Compiler used (example: gcc 2.8.0) : gcc 2.95.2 Please enter a FULL description of your problem: ------------------------------------------------ In psql, when connected to template1 as the postgres superuser, the \df function complains about some memory allocation problem. See the following four examples for representative errors: template1=# \df get ERROR: AllocSetFree: cannot find block containing chunk template1=# \df get NOTICE: PortalHeapMemoryFree: 0x31f5b0 not in alloc set! List of functions Result | Function | Arguments --------+---------------------+------------- int4 | get_bit | bytea int4 int4 | get_byte | bytea int4 name | getdatabaseencoding | name | getpgusername | (4 rows) template1=# \df get NOTICE: PortalHeapMemoryFree: 0x344350 not in alloc set! ERROR: AllocSetFree: cannot find block containing chunk template1=# \df get ERROR: SearchSysCache: recursive use of cache 2 Note that this is before creating any of my own databases -- at the time when I got these errors I had just finished the installation. There is another problem with the \d family. I created a new db (named can) and its tables. Then, typing \dS has the following effect: can=# \dS The connection to the server was lost. Attempting reset: Failed. !# \d You are currently not connected to a database. !# \c can No Postgres username specified in startup packet. Segmentation fault Note that this happens whether or not the tables are actually populated; I ran a vacuum right after both acts (creation and population) and \dS caused a crash out of psql each time. FWIW, my 6.5.3 installation with the same configure and build parameters, same data, etc. ran with no problems at all. Has anyone had similar problems with the \d functions in 7.0? Nishad
> System Configuration > --------------------- > Architecture (example: Intel Pentium) : Sun Sparc > > Operating System (example: Linux 2.0.26 ELF) : Solaris 2.6 > > PostgreSQL version (example: PostgreSQL-6.5.1): PostgreSQL-7.0 > > Compiler used (example: gcc 2.8.0) : gcc 2.95.2 > > > Please enter a FULL description of your problem: > ------------------------------------------------ > > In psql, when connected to template1 as the postgres superuser, the > \df function complains about some memory allocation problem. See the > following four examples for representative errors: Neither \df or \dS problem reproduces here (I have exactly same configuration as you). Instead, I have another problem already reported at hackers list: creatdb/dropdb does not work See the posting "Solaris 2.6 problems" in the archives. -- Tatsuo Ishii
Nishad PRAKASH writes: > In psql, when connected to template1 as the postgres superuser, the > \df function complains about some memory allocation problem. The \d series of psql commands are really just shortcuts for various SQL queries to the system catalogs. Start psql with the -E option to see them. Therefore it is unlikely that this behaviour is entirely localized at these functions. Have you run the regression tests without problems? > can=# \dS > The connection to the server was lost. Attempting reset: Failed. Can you show the server output. There's probably a segmentation fault or failed assertion in the backend involved, which we'd need to see. > !# \d > You are currently not connected to a database. > !# \c can > No Postgres username specified in startup packet. > Segmentation fault That's certainly a psql problem. Can you show a backtrace from gdb? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
On Fri, 26 May 2000, Peter Eisentraut wrote: > The \d series of psql commands are really just shortcuts for various SQL > queries to the system catalogs. Start psql with the -E option to see them. > Therefore it is unlikely that this behaviour is entirely localized at > these functions. Have you run the regression tests without problems? First of all, this was not a Postgres bug but a configuration mistake on my part. I had been meaning to write back to the list explaining what really happened: I compiled 7.0 with locale support, recode, and multibyte options all enabled. In the postgres (db superuser) .cshrc, I had set LC_CTYPE to "en_US". This was the problem. When I would start postmaster and run anything that involved a regexp (and the query that \dS expands to uses regexps) on a "bytea" type field, psql would crash. To fix this, I tried first letting the locale default to "C", then setting LC_CTYPE to "iso_8859_1". Starting postmaster with either of these works perfectly. If you are still interested in server output or backtraces (perhaps to implement a more graceful exit?), I'd be glad to send them, but I'm sure you can replicate this pretty easily now if required. I have never needed to mess around with locales before, so I apologize for posting this as bug -- I didn't quite know where to look at first. By the way, I don't know what you guys have done with the optimizer but my previously slow queries now run VERY FAST. This prevents me from taking cigarette breaks, coffee breaks, etc. under the "I'm running a large query" pretext. Please do what you can to fix this problem. Thanks for the help, Nishad
Nishad PRAKASH <prakashn@uci.edu> writes: > I compiled 7.0 with locale support, recode, and multibyte options all > enabled. In the postgres (db superuser) .cshrc, I had set LC_CTYPE to > "en_US". This was the problem. When I would start postmaster and run > anything that involved a regexp (and the query that \dS expands to uses > regexps) on a "bytea" type field, psql would crash. > To fix this, I tried first letting the locale default to "C", then setting > LC_CTYPE to "iso_8859_1". Starting postmaster with either of these works > perfectly. > If you are still interested in server output or backtraces (perhaps to > implement a more graceful exit?), I'd be glad to send them, but I'm sure > you can replicate this pretty easily now if required. Hmm, news to us. It may be a platform-specific problem, so yes please do send a backtrace. regards, tom lane
> First of all, this was not a Postgres bug but a configuration mistake on > my part. I had been meaning to write back to the list explaining what > really happened: > > I compiled 7.0 with locale support, recode, and multibyte options all > enabled. In the postgres (db superuser) .cshrc, I had set LC_CTYPE to > "en_US". This was the problem. When I would start postmaster and run > anything that involved a regexp (and the query that \dS expands to uses > regexps) on a "bytea" type field, psql would crash. > > To fix this, I tried first letting the locale default to "C", then setting > LC_CTYPE to "iso_8859_1". Starting postmaster with either of these works > perfectly. > > If you are still interested in server output or backtraces (perhaps to > implement a more graceful exit?), I'd be glad to send them, but I'm sure > you can replicate this pretty easily now if required. Of course regexp should not crash in this situation above. Thanks for the info. I will dig into the problem. -- Tatsuo Ishii
On Thu, 25 May 2000, Tom Lane wrote: > > Hmm, news to us. It may be a platform-specific problem, so yes please > do send a backtrace. > CAVEAT: I may just be missing something really obvious. A high-level description of the problem is: If postmaster is started with LC_COLLATE set to en_US in the db superuser's environment, then working on a db created with createdb -E LATIN1 <foo> causes strange behaviour in regexps. If that sounds like an obviously wrong use of locale settings, you probably don't need to read any further, but just tell me what's going on. To replicate the problem, you need to do the following. All actions are performed by postgres, the db superuser account Install postgres 7.0 with all three of --enable-locale, --enable-recode, and --enable-multibyte specified. Set the user postgres's LC_COLLATE env var to any of the en_* locales available on your machine /except/ en_US.UTF-8, which doesn't seem to cause problems. The other locale vars appear to be irrelevant; LC_COLLATE alone will do for replication. These were my settings: > locale LANG= LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE=en_US LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= What follows are the operations I performed to get psql to crash: > createdb -E LATIN1 foo CREATE DATABASE > psql foo Welcome to psql, the PostgreSQL interactive terminal. <snip> foo=# create table TenChrName ( somelongname varchar (100) unique); NOTICE: CREATE TABLE/UNIQUE will create implicit index 'tenchrname_somelongname_key' for table 'tenchrname' CREATE foo=# vacuum analyze; VACUUM foo=# \dS The connection to the server was lost. Attempting reset: Failed. !# \q > kill `cat postmaster.pid` > gdb postgres <snip> (gdb) run foo /* note: the following query is the smallest part of \dS's expansion * that is sufficient for a crash */ backend> select * from pg_class where relname ~ '^n'; ERROR: expression_tree_walker: Unexpected node type 0 ERROR: expression_tree_walker: Unexpected node type 0 backend> select * from pg_class where relname ~ '^n'; NOTICE: PortalHeapMemoryFree: 0x51c330 not in alloc set! NOTICE: PortalHeapMemoryFree: 0x51c330 not in alloc set! Program received signal SIGBUS, Bus error. 0x21ddf4 in AllocSetAlloc (set=0x500ff8, size=12) at aset.c:233 233 if (chunk->size >= size) (gdb) bt #0 0x21ddf4 in AllocSetAlloc (set=0x500ff8, size=12) at aset.c:233 #1 0x21f8a0 in PortalHeapMemoryAlloc (this=0x2bddc0, size=12) at portalmem.c:253 #2 0x21ed20 in MemoryContextAlloc (context=0x2bddc0, size=12) at mcxt.c:224 #3 0x126e84 in newNode (size=12, tag=T_List) at nodes.c:38 #4 0x127180 in lcons (obj=0x51a240, list=0x0) at list.c:112 #5 0x127220 in lappend (list=0x0, obj=0x51a240) at list.c:144 #6 0x14e6f8 in get_actual_clauses (restrictinfo_list=0x51a298) at restrictinfo.c:55 #7 0x144b80 in create_scan_node (root=0x5134f8, best_path=0x51be80, tlist=0x51b0b0) at createplan.c:152 #8 0x144ab0 in create_plan (root=0x5134f8, best_path=0x51be80) at createplan.c:103 #9 0x147698 in subplanner (root=0x5134f8, flat_tlist=0x51a4a0, qual=0x51a280, tuple_fraction=0) at planmain.c:288 #10 0x14740c in query_planner (root=0x5134f8, tlist=0x519b08, qual=0x51a280, tuple_fraction=0) at planmain.c:128 #11 0x14817c in union_planner (parse=0x5134f8, tuple_fraction=0) at planner.c:530 #12 0x147b38 in subquery_planner (parse=0x5134f8, tuple_fraction=-1) at planner.c:202 #13 0x147810 in planner (parse=0x5134f8) at planner.c:67 #14 0x1977c0 in pg_plan_query (querytree=0x5134f8) at postgres.c:512 #15 0x197a9c in pg_exec_query_dest ( query_string=0x2ba070 "select * from pg_class where relname ~ '^n'; \n", dest=Debug, aclOverride=0 '\000') at postgres.c:646 #16 0x1978e4 in pg_exec_query ( query_string=0x2ba070 "select * from pg_class where relname ~ '^n'; \n") at postgres.c:562 #17 0x1996f4 in PostgresMain (argc=2, argv=0xeffffa64, real_argc=2, real_argv=0xeffffa64) at postgres.c:1590 #18 0x1026d0 in main (argc=2, argv=0xeffffa64) at main.c:103 If you actually care to go through the steps above, don't leave anything out. The vacuum analyze serves no useful purpose, but you won't get a crash if you omit it. The table indentifiers really do need to be around 10 chars long. The regexp needs to match the front of a string, so use '^foo' -- I couldn't get a crash with other types of regexps but then I didn't try too many. With the local settings described above, a query on pg_proc of the type "select * from pg_proc where proname ~ '^n';" will /always/ produce the following kind of error: "NOTICE: PortalHeapMemoryFree: <addr> not in alloc set!" before printing the result (it never causes a crash, AFAICT, and always does produce a correct result). You can get this behaviour just by connecting to template1; perhaps other tables with bytea fields may also do this, but pg_proc does it every single time. If you like, I'll do a backtrace from where it produces that error, but this message is getting too long for that. If someone can replicate this (or even try and fail), it would help me to learn whether the error lies in Postgres, Solaris's locales, or yours truly. It seems too quirky to be a genuine bug. Thanks, and let me know if you have any ideas. Nishad