[Fwd: [HACKERS] PostgreSQL ES3.0 problems?] - Mailing list pgsql-bugs
From | Joshua D. Drake |
---|---|
Subject | [Fwd: [HACKERS] PostgreSQL ES3.0 problems?] |
Date | |
Msg-id | 406379FA.7040503@commandprompt.com Whole thread Raw |
List | pgsql-bugs |
-------- Original Message -------- Subject: [HACKERS] PostgreSQL ES3.0 problems? Date: Wed, 24 Mar 2004 17:06:23 -0800 From: Joshua D. Drake <jd@commandprompt.com> Organization: Command Prompt, Inc. To: pgsql-hackers@postgresql.org Hello, About a week ago we had a report of replicator not working correctly on ES 3.0 when running with more than one CPU. Replicator worked perfectly when utilizing only one CPU. The machine was a Quad Xeon with HyperThreading, SCSI RAID array and 4 Gig of RAM. During our testing we found issues with - nocachegetattr. Our tests showed the following output using gdb: ================== GDB LOG ====================== [postgres@es30 postgresql-7.3.6]$ gdb /usr/local/pgsql/bin/postmaster /home/postgres/data_master/base/16976/core.858 GNU gdb Red Hat Linux (5.3.90-0.20030710.40rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". Core was generated by `postgres: postgres test [local] SELECT '. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4...done. Loaded symbols for /usr/lib/libreadline.so.4 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/tls/libm.so.6...done. Loaded symbols for /lib/tls/libm.so.6 Reading symbols from /lib/tls/libc.so.6...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x0806b729 in nocachegetattr (tuple=0x82992d8, attnum=17, tupleDesc=0x0, isnull=0xbfffae2f "") at heaptuple.c:406 406 usecache = false; (gdb) l 401 } 402 403 off = att_addlength(off, att[i]->attlen, tp + off); 404 405 if (usecache && att[i]->attlen <= 0) 406 usecache = false; 407 } 408 409 off = att_align(off, att[attnum]->attalign); 410 (gdb) Our testing was performed on: Dual Athlon MP 2800 (Barton) 2 Gig of Registered ECC ram 3Ware 0+1 IDE Array LVM The log output was thus: [postgres@es30 postgresql-7.3.6]$ /usr/local/pgsql/bin/postmaster -D ~/data_master DEBUG: /usr/local/pgsql/bin/postmaster: PostmasterMain: initial environ dump: DEBUG: ----------------------------------------- DEBUG: HOSTNAME=es30 DEBUG: TERM=xterm DEBUG: SHELL=/bin/bash DEBUG: HISTSIZE=1000 DEBUG: SSH_CLIENT=192.168.1.65 1745 22 DEBUG: SSH_TTY=/dev/pts/2 DEBUG: USER=postgres DEBUG: LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35: DEBUG: MAIL=/var/spool/mail/postgres DEBUG: PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/postgres/bin DEBUG: INPUTRC=/etc/inputrc DEBUG: PWD=/home/postgres/postgresql-7.3.6 DEBUG: LANG=en_US.UTF-8 DEBUG: SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass DEBUG: SHLVL=1 DEBUG: HOME=/home/postgres DEBUG: LOGNAME=postgres DEBUG: SSH_CONNECTION=192.168.1.65 1745 192.168.1.199 22 DEBUG: LESSOPEN=|/usr/bin/lesspipe.sh %s DEBUG: G_BROKEN_FILENAMES=1 DEBUG: _=/usr/local/pgsql/bin/postmaster DEBUG: OLDPWD=/home/postgres/src/SOURCES/postgresql-7.3.6 DEBUG: ----------------------------------------- DEBUG: FindExec: found "/usr/local/pgsql/bin/postgres" using argv[0] DEBUG: invoking IpcMemoryCreate(size=36421632) DEBUG: FindExec: found "/usr/local/pgsql/bin/postmaster" using argv[0] LOG: database system was shut down at 2004-03-19 06:48:17 PST LOG: checkpoint record is at 0/58E2B4F8 LOG: redo record is at 0/58E2B4F8; undo record is at 0/0; shutdown TRUE LOG: next transaction id: 630; next oid: 217646 LOG: database system is ready DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: /usr/local/pgsql/bin/postmaster child[858]: starting with ( DEBUG: postgres DEBUG: -v131072 DEBUG: -p DEBUG: test DEBUG: ) DEBUG: InitPostgres DEBUG: BackendStartup: forked pid=858 socket=8 DEBUG: StartTransactionCommand DEBUG: ProcessUtility DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: ProcessQuery DEBUG: ProcessUtility DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: ProcessUtility DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: ProcessQuery DEBUG: ProcessUtility DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: ProcessQuery DEBUG: reaping dead processes DEBUG: child process (pid 858) was terminated by signal 11 LOG: server process (pid 858) was terminated by signal 11 LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing shared memory and semaphores DEBUG: shmem_exit(0) DEBUG: invoking IpcMemoryCreate(size=36421632) LOG: database system was interrupted at 2004-03-19 06:52:03 PST LOG: checkpoint record is at 0/58E2B4F8 LOG: redo record is at 0/58E2B4F8; undo record is at 0/0; shutdown TRUE LOG: next transaction id: 630; next oid: 217646 LOG: database system was not properly shut down; automatic recovery in progress FATAL: The database system is starting up DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: BackendStartup: forked pid=860 socket=8 DEBUG: reaping dead processes DEBUG: child process (pid 860) exited with exit code 0 LOG: ReadRecord: record with zero length at 0/58E2B538 LOG: redo is not required LOG: database system is ready DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes After chasing our tails a little bit we decided to test the community PostgreSQL 7.3.6. We were able to produce similar results with PostgreSQL 7.3.6. Again only in SMP mode, only on ES 3.0. Similar problems occured when trying to do a select * from a large table (about 2Gig in size with about 1.5 million rows). Here is a strings from a core dump that is associated: CORE CORE postmaster postgres: postgres test [local] SELECT CORE postmaster CORE FLINUX hat exists. out of memory in PortalHashTable trying to delete portal name that does not exist. ltsWriteBlock: failed to write block %ld of temporary file Perhaps out of disk space? ltsReadBlock: failed to read block %ld of temporary file LogicalTapeWrite: impossible state LogicalTapeBackspace: unexpected end of tape LogicalTapeSeek: unexpected end of tape tuplesort_puttuple: invalid state tuplesort_performsort: invalid state tuplesort_gettuple: invalid state tuplesort_gettuple: bogus tuple len in backward scan tuplesort_rescan: invalid state tuplesort_markpos: invalid state tuplesort_restorepos: invalid state tuplesort: unexpected end of tape tuplesort: unexpected end of data SelectSortFunction: cache lookup failed for operator %u Cannot create unique index. Table contains non-unique values copytup_datum() should not be called tuplestore_puttuple: invalid state tuplestore_donestoring: invalid state tuplestore_donestoring: seek(0) failed tuplestore_gettuple: invalid state tuplestore_gettuple: bogus tuple len in backward scan tuplestore_rescan: invalid state tuplestore_rescan: seek(0) failed tuplestore_markpos: invalid state tuplestore_restorepos: invalid state tuplestore: unexpected end of tape tuplestore: unexpected end of data CopyQuerySnapshot: no snapshot has been set pg_char_to_encname_struct(): encoding name too long UtfToLocal: could not convert UTF-8 (0x%04x). Ignored LocalToUtf: could not convert (0x%04x) %s to UTF-8. Ignored default conversion proc for %s to %s not found default conversion proc %u for %s to %s not found in pg_proc Invalid destination encoding name %s Invalid source encoding name %s SetDatabaseEncoding(): invalid database encoding Invalid %s character sequence found (0x Unicode >= 0x10000 is not supported MbP? @UUUUUU A. K. bool bytea char name int2 int2vector int4 regproc regclass regtype text oidvector smgr _int4 _aclitem ctid xmin cmin xmax cmax tableoid MbP?{ -infinity abstime acsst *acst aesst I tried running the pg_dump in single proc mode and the machine ran just fine. Once we put it in SMP mode and ran the dump again it trashed the machine we had to reboot. Once we rebooted we forced an FSCK and then we kernel panicked. We now have fairly extensive FS corruption ;) Any thoughts? Sincerely, Joshua D. Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Attachment
pgsql-bugs by date: