Thread: Problem with PostgreSQL 9.2.7 and make check on AIX 7.1
Hello, I am still using a 8.2 PortgresSQL for an older application. I have compiled 8.2.11 (same as on the old server) and 8.2.23 (latest 8.2) on AIX 7.1 TL3 SP1 (latest level) with the IBM C/C++ V12 compiler. Unfortunate I see the following errors: ... 2014-02-24 10:07:30 CET LOG: received fast shutdown request <--- here 2014-02-24 10:07:30 CET LOG: shutting down 2014-02-24 10:07:30 CET LOG: database system is shut down 2014-02-24 10:07:36 CET LOG: database system was shut down at 2014-02-24 10:07:30 CET 2014-02-24 10:07:36 CET LOG: checkpoint record is at 0/E9F073E0 2014-02-24 10:07:36 CET LOG: redo record is at 0/E9F073E0; undo record is at 0/0; shutdown TRUE 2014-02-24 10:07:36 CET LOG: next transaction ID: 0/7161900; next OID: 39134 2014-02-24 10:07:36 CET LOG: next MultiXactId: 1; next MultiXactOffset: 0 2014-02-24 10:07:36 CET LOG: database system is ready 2014-02-24 10:08:35 CET LOG: received fast shutdown request <--- here 2014-02-24 10:08:35 CET LOG: shutting down 2014-02-24 10:08:35 CET LOG: database system is shut down 2014-02-24 10:10:45 CET LOG: database system was shut down at 2014-02-24 10:08:35 CET 2014-02-24 10:10:45 CET LOG: checkpoint record is at 0/E9F07430 2014-02-24 10:10:45 CET LOG: redo record is at 0/E9F07430; undo record is at 0/0; shutdown TRUE 2014-02-24 10:10:45 CET LOG: next transaction ID: 0/7161950; next OID: 39134 2014-02-24 10:10:45 CET LOG: next MultiXactId: 1; next MultiXactOffset: 0 2014-02-24 10:10:45 CET LOG: database system is ready I had set a higher log level to see all client statements. Unfortunately I could see in the log file that when the fast shutdown was executed that there where no running statements. I never had a problem on AIX 5.3. Do you have any ideas?? So I have decided to give 9.2.7 a try. During the make check I have found two failures - to be more precisely two hangs. I have changed parallel_schedule to run most test sequential. The following tests do show a problem: 1. problem plpgsql.out <- this test hangs in statement_timeout. ... end$$ language plpgsql; set statement_timeout to 2000; select blockme(); <- hang 2. problem prepared_xacts.out <- this test hangs in statement_timeout. ... -- pxtest3 should be locked because of the pending DROP set statement_timeout to 2000; SELECT * FROM pxtest3; <- hang These tests do not move forward even if I wait 10 minutes. So my conclusion is that the statement_timeout does not work as expected. Bye Rainer
Hello, There is one update: The tests plpgsql and prepared_xacts do run on AIX 7.1 with PostgeSQL 8.2.11 and 8.2.23 without a problem. Bye Rainer On 24.02.2014 11:29, Rainer Tammer wrote: > Hello, > I am still using a 8.2 PortgresSQL for an older application. > I have compiled 8.2.11 (same as on the old server) and > 8.2.23 (latest 8.2) on AIX 7.1 TL3 SP1 (latest level) with > the IBM C/C++ V12 compiler. > > Unfortunate I see the following errors: > > ... > 2014-02-24 10:07:30 CET LOG: received fast shutdown request <--- here > 2014-02-24 10:07:30 CET LOG: shutting down > 2014-02-24 10:07:30 CET LOG: database system is shut down > 2014-02-24 10:07:36 CET LOG: database system was shut down at > 2014-02-24 10:07:30 CET > 2014-02-24 10:07:36 CET LOG: checkpoint record is at 0/E9F073E0 > 2014-02-24 10:07:36 CET LOG: redo record is at 0/E9F073E0; undo record > is at 0/0; shutdown TRUE > 2014-02-24 10:07:36 CET LOG: next transaction ID: 0/7161900; next OID: > 39134 > 2014-02-24 10:07:36 CET LOG: next MultiXactId: 1; next MultiXactOffset: 0 > 2014-02-24 10:07:36 CET LOG: database system is ready > 2014-02-24 10:08:35 CET LOG: received fast shutdown request <--- here > 2014-02-24 10:08:35 CET LOG: shutting down > 2014-02-24 10:08:35 CET LOG: database system is shut down > 2014-02-24 10:10:45 CET LOG: database system was shut down at > 2014-02-24 10:08:35 CET > 2014-02-24 10:10:45 CET LOG: checkpoint record is at 0/E9F07430 > 2014-02-24 10:10:45 CET LOG: redo record is at 0/E9F07430; undo record > is at 0/0; shutdown TRUE > 2014-02-24 10:10:45 CET LOG: next transaction ID: 0/7161950; next OID: > 39134 > 2014-02-24 10:10:45 CET LOG: next MultiXactId: 1; next MultiXactOffset: 0 > 2014-02-24 10:10:45 CET LOG: database system is ready > > I had set a higher log level to see all client statements. > Unfortunately I could see in the log file that when the > fast shutdown was executed that there where no running statements. > > I never had a problem on AIX 5.3. Do you have any ideas?? > > > So I have decided to give 9.2.7 a try. During the make check I have > found two failures - to be more precisely two hangs. > > I have changed parallel_schedule to run most test sequential. > The following tests do show a problem: > > > 1. problem > > plpgsql.out <- this test hangs in statement_timeout. > > ... > end$$ language plpgsql; > set statement_timeout to 2000; > select blockme(); <- hang > > > 2. problem > > prepared_xacts.out <- this test hangs in statement_timeout. > > ... > -- pxtest3 should be locked because of the pending DROP > set statement_timeout to 2000; > SELECT * FROM pxtest3; <- hang > > These tests do not move forward even if I wait 10 minutes. > > So my conclusion is that the statement_timeout does not work as expected. > > Bye > Rainer > > >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > I have compiled 8.2.11 (same as on the old server) and > 8.2.23 (latest 8.2) on AIX 7.1 TL3 SP1 (latest level) with > the IBM C/C++ V12 compiler. > Unfortunate I see the following errors: > 2014-02-24 10:07:30 CET LOG: received fast shutdown request <--- here If this is the log of a "make check" run, that seems as-expected. Otherwise, something is sending the postmaster process a SIGINT. > So I have decided to give 9.2.7 a try. During the make check I have > found two failures - to be more precisely two hangs. Hm. Unfortunately, you're kind of on your own to debug this; AFAIK there are no active Postgres developers who use AIX. It's been awhile since there was an active AIX buildfarm machine either, so that it would not exactly be astonishing to find that we'd inadvertently broken something for that platform. (And I'm not sure there ever was a buildfarm member running AIX 7.1 anyway; according to http://buildfarm.postgresql.org/cgi-bin/show_members.pl grebe was running 5.3 when last heard from, half a year ago.) We're still willing to support AIX, but we can't do it without help from users of that platform. If you send in a patch for whatever is broken, we'll almost certainly accept it (in some form). But it would be a good idea to set up a buildfarm animal so that any future breakage gets detected in a more timely fashion. See http://buildfarm.postgresql.org/index.html regards, tom lane
On Mon, Feb 24, 2014 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hm. Unfortunately, you're kind of on your own to debug this; AFAIK > there are no active Postgres developers who use AIX. It's been awhile > since there was an active AIX buildfarm machine either, so that it > would not exactly be astonishing to find that we'd inadvertently > broken something for that platform. (And I'm not sure there ever > was a buildfarm member running AIX 7.1 anyway; according to > http://buildfarm.postgresql.org/cgi-bin/show_members.pl > grebe was running 5.3 when last heard from, half a year ago.) > We retired the AIX boxes, and aren't adding more, so it's not likely to be us that retrieve this support. :-( There might be some elderly servers still kicking around, but I imagine that new licenses would need to be purchased in order to have AIX 7.x run on them, which might readily cost more than the hardware's worth. I was glad to help provide buildfarm support on AIX; it certainly was helpful to get issues dealt with, and meant that pretty much all "usual" Postgres functionality (including contrib modules) worked well, which wasn't the case pre-buildfarm. We're still willing to support AIX, but we can't do it without help > from users of that platform. If you send in a patch for whatever > is broken, we'll almost certainly accept it (in some form). But > it would be a good idea to set up a buildfarm animal so that any > future breakage gets detected in a more timely fashion. See > http://buildfarm.postgresql.org/index.html > Indeed, if someone's keen on having AIX supported, then it's important to be able to spare enough resources for a buildfarm node. It doesn't mean that you'll need to devote development resources to deal with every problem that comes along; I recall a number of cases where I saw email threads go past where Tom (or someone else) was poking away at a problem noticed because grebe "turned red." Someone with useful contacts at IBM might ask if they have a server hiding somewhere that could be used as a buildfarm animal. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
Hello, First of all: Thanks for the reply. 1. fast shutdown The unexpected "LOG: received fast shutdown request" is happening on an installed instance. I have found other articles on the WEB which describe the same problem (other platform) - unfortunately there was no real solution to it. I am not too familiar with the PostgreSQL source code, but the only location where this message is generated is: ---[postmaster.c]------- ... /* * pmdie -- signal handler for processing various postmaster signals. */ static void pmdie(SIGNAL_ARGS) { ... case SIGINT: /* * Fast Shutdown: * * Abort all children with SIGTERM (rollback active transactions * and exit) and shut down when they are gone. */ if (Shutdown >= FastShutdown) break; Shutdown = FastShutdown; ereport(LOG, (errmsg("received fast shutdown request"))); <------ HERE if (DLGetHead(BackendList) || AutoVacPID != 0) { if (!FatalError) { ereport(LOG, (errmsg("aborting any active transactions"))); SignalChildren(SIGTERM); if (AutoVacPID != 0) signal_child(AutoVacPID, SIGTERM); /* reaper() does the rest */ } break; } ---[postmaster.c]------- I will try to add some code to get the source of the signal. Would this help? 2. hang during make check AIX 7.1 Technology Level 3 PostgreSQL 8.4.20 -> make check does finish without hang PostgreSQL 9.0.16 -> hang PostgreSQL 9.2.7 -> hang As far as I can see the hang is caused by the "set statement_timeout to 2000;" statement. Where would be a good start point to diagnose this problem?? Would this place be a good start? Al have already checked if HAVE_SETSID is ausing the problem - unfortunately that's not the case. ---[src/backend/storage/lmgr/proc.c]------ .... bool enable_sig_alarm(int delayms, bool is_statement_timeout) { TimestampTz fin_time; struct itimerval timeval; if (is_statement_timeout) { .... static bool CheckStatementTimeout(void) { TimestampTz now; if (!statement_timeout_active) return true; /* do nothing if not active */ now = GetCurrentTimestamp(); if (now >= statement_fin_time) { /* Time to die */ statement_timeout_active = false; cancel_from_timeout = true; #ifdef HAVE_SETSID /* try to signal whole process group */ kill(-MyProcPid, SIGINT); #endif kill(MyProcPid, SIGINT); ---[src/backend/storage/lmgr/proc.c]------ AIX 6.1 Technology Level 6 PostgreSQL 9.2.7 -> make check does finish without a hang. That is strange. I will try to get a box with the latest AIX 6.1 Technology Level to check is the test will pass, too. 3. Smoker There are several possibilities to setup a smoker. a.) IBM does provide Power 7/7+ machines with Power Linux and AIX 6.1/7.1. As far as I know the access is free This is the URL: http://www-304.ibm.com/partnerworld/wps/servlet/ContentHandler/stg_com_sys_power-development-platform Maybe I can help with the setup. b.) Maybe I can setup a smoker Is external access to the smoker required? Bye Rainer Tammer On 24.02.2014 21:21, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> I have compiled 8.2.11 (same as on the old server) and >> 8.2.23 (latest 8.2) on AIX 7.1 TL3 SP1 (latest level) with >> the IBM C/C++ V12 compiler. >> Unfortunate I see the following errors: >> 2014-02-24 10:07:30 CET LOG: received fast shutdown request <--- here > If this is the log of a "make check" run, that seems as-expected. > Otherwise, something is sending the postmaster process a SIGINT. > >> So I have decided to give 9.2.7 a try. During the make check I have >> found two failures - to be more precisely two hangs. > Hm. Unfortunately, you're kind of on your own to debug this; AFAIK > there are no active Postgres developers who use AIX. It's been awhile > since there was an active AIX buildfarm machine either, so that it > would not exactly be astonishing to find that we'd inadvertently > broken something for that platform. (And I'm not sure there ever > was a buildfarm member running AIX 7.1 anyway; according to > http://buildfarm.postgresql.org/cgi-bin/show_members.pl > grebe was running 5.3 when last heard from, half a year ago.) > > We're still willing to support AIX, but we can't do it without help > from users of that platform. If you send in a patch for whatever > is broken, we'll almost certainly accept it (in some form). But > it would be a good idea to set up a buildfarm animal so that any > future breakage gets detected in a more timely fashion. See > http://buildfarm.postgresql.org/index.html > > regards, tom lane > >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > 1. fast shutdown > The unexpected "LOG: received fast shutdown request" is happening > on an installed instance. I have found other articles on the WEB which > describe the same problem (other platform) - unfortunately there was > no real solution to it. As far as we can tell, any SIGINT of the postmaster must be coming from outside the Postgres code. There are several places where SIGINT is generated internally, but I've just been through all of them again and it's pretty nearly impossible to believe that they could target the postmaster process rather than some child process. If you've got a way to instrument it and find out where the signal came from (eg what PID sent it), that would be interesting information. > 2. hang during make check > PostgreSQL 8.4.20 -> make check does finish without hang > PostgreSQL 9.0.16 -> hang > PostgreSQL 9.2.7 -> hang Interesting, since AFAIR there was no major surgery on the timeout code in 9.0. If you'd said 9.3 broke it, that wouldn't be so surprising ... > As far as I can see the hang is caused by the "set statement_timeout to > 2000;" > statement. Where would be a good start point to diagnose this problem?? Well, the point is that the timeout is failing to happen. Is the SIGALRM signal being blocked? If it is delivered, why doesn't that spring the process off its wait? Anyway, I see you already found enable_sig_alarm and CheckStatementTimeout, so those are reasonable places to start injecting some additional logging. You might also need to instrument the backend SIGINT handler, StatementCancelHandler in postgres.c. regards, tom lane
Hello, I will try to get some debug code in the SIGINT handler. In the wort case I will start a system trace. In the meantime I have build more version on AIX 6.1. I can see no failure on AIX 6.1, including 9.2.7. I have installed the same C/C++ compiler on the AIX 6.1 box as I have on the AIX 7.1 box - still same result. All test are OK. So the problem is not compiler dependent. Currently I upgrade a AIX 7.1 test LPAR on a Power 5 box. This way I can check if the problem is dependant on AIX 7.1 or Power 7+. (Some years ago there was a problem with Java on the newer CPUs.) What code path is executed if the timeout passes and the signal is send? - Where exactly is the signal send? --> Is the signal really send? - Where is the first entry in the handler? --> Do we receive the signal? Bye Rainer P.S.: What do you think of a smoker on the IBM developer cloud? If this is interesting I might organize the setup. On 25.02.2014 17:35, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> 1. fast shutdown >> The unexpected "LOG: received fast shutdown request" is happening >> on an installed instance. I have found other articles on the WEB which >> describe the same problem (other platform) - unfortunately there was >> no real solution to it. > As far as we can tell, any SIGINT of the postmaster must be coming from > outside the Postgres code. There are several places where SIGINT is > generated internally, but I've just been through all of them again and > it's pretty nearly impossible to believe that they could target the > postmaster process rather than some child process. If you've got a > way to instrument it and find out where the signal came from (eg what > PID sent it), that would be interesting information. > >> 2. hang during make check >> PostgreSQL 8.4.20 -> make check does finish without hang >> PostgreSQL 9.0.16 -> hang >> PostgreSQL 9.2.7 -> hang > Interesting, since AFAIR there was no major surgery on the timeout > code in 9.0. If you'd said 9.3 broke it, that wouldn't be so > surprising ... > >> As far as I can see the hang is caused by the "set statement_timeout to >> 2000;" >> statement. Where would be a good start point to diagnose this problem?? > Well, the point is that the timeout is failing to happen. Is the SIGALRM > signal being blocked? If it is delivered, why doesn't that spring the > process off its wait? Anyway, I see you already found enable_sig_alarm > and CheckStatementTimeout, so those are reasonable places to start > injecting some additional logging. You might also need to instrument > the backend SIGINT handler, StatementCancelHandler in postgres.c. > > regards, tom lane > >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > What code path is executed if the timeout passes and > the signal is send? Well, the general idea (pre 9.3) is that at the start of the statement, enable_sig_alarm calculates a future timeout instant and calls setitimer() to schedule a SIGALRM signal then. When the signal is delivered, CheckStatementTimeout should do kill(MyProcPid, SIGINT), which should lead to ProcessInterrupts calling ereport(ERROR), which will longjmp back to the process idle loop. We need to narrow down which of these steps is failing to happen before we can speculate much on what's wrong. There are scenarios in which the SIGINT handler proper won't think it's safe to call ProcessInterrupts immediately, but will just set a flag to make that happen later. That should not apply in these test cases, though. regards, tom lane
Rainer Tammer escribió: > P.S.: What do you think of a smoker on the IBM developer cloud? > If this is interesting I might organize the setup. Please do get some Power buildfarm members setup if it's within your, err, powers. See here for instructions: http://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto If you can get machines for all architectures across all supported versions of AIX, that'd be great. Since these Power machines are not very common, perhaps it'd be good if you can get one running Linux too; we only have PowerPC running NetBSD and Mac OS X, nothing on POWER7. There's no need for anyone to access the machines. There's a push script that's used to upload the test results. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hello, OK, so SIGALRM will kill the worker process if the timer runs down. tammer 7078048 10289298 0 18:43:45 pts/1 0:00 gmake -C src/test/regress check tammer 9175106 12386340 0 18:45:31 pts/1 0:00 /daten/source/postgresql-9.2.7/src/test/regress/./tmp_check/install//usr/local/pgsql-9.2.7/bin/psql -X -a -q -d regression tammer 10289298 20512896 0 18:43:39 pts/1 0:00 gmake check tammer 12189838 20840526 0 18:44:37 - 0:00 postgres: autovacuum launcher process tammer 12386340 7078048 0 18:43:45 pts/1 0:00 ../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check --top-builddir=../../.. --dlpath=. --schedule=./parallel_schedule tammer 13172918 20840526 0 18:44:37 - 0:00 postgres: wal writer process tammer 16777336 20840526 0 18:44:37 - 0:00 postgres: checkpointer process tammer 18874572 20840526 0 18:45:31 - 0:00 postgres: tammer regression [local] SELECT waiting <------ here tammer 20512896 15270014 0 18:36:19 pts/1 0:00 -ksh tammer 20644046 20840526 0 18:44:37 - 0:00 postgres: writer process tammer 20840526 12386340 0 18:44:37 pts/1 0:00 /daten/source/postgresql-9.2.7/src/test/regress/./tmp_check/install//usr/local/pgsql-9.2.7/bin/postgres -D /daten/source/postgresql-9.2.7/src/test/regress/./tmp_check/data -F -c listen_addresses= tammer 24182972 20840526 0 18:44:37 - 0:00 postgres: stats collector process The worker is hanging here: root@adsmsrv4 rc:0 # dbx -a 18874572 Waiting to attach to process 18874572 ... Successfully attached to postgres. warning: Directory containing postgres could not be determined. Apply 'use' command to initialize source path. Type 'help' for help. reading symbolic information ... stopped in semop at 0xd02f8df0 ($t1) 0xd02f8df0 (semop+0xb0) 80410014 lwz r2,0x14(r1) (dbx) where semop(??, ??, ??) at 0xd02f8df0 PGSemaphoreLock(0x32438750, 0x1000001) at 0x10060958 ProcSleep(0x20212d18, 0x20039140) at 0x10114ab8 WaitOnLock(0x20212d18, 0x201f74f0) at 0x101269c0 LockAcquireExtended(0x2ff1dc90, 0x1, 0x0, 0x0, 0x1000001) at 0x10128384 LockAcquire(0x2ff1dc90, 0x1, 0x0, 0x0) at 0x101284a0 LockRelationOid(0xa35c, 0x1) at 0x10173d50 RangeVarGetRelidExtended(0x2020f5d8, 0x1, 0x1000001, 0x0, 0x0, 0x0) at 0x1009dcf8 relation_openrv_extended(0x2020f5d8, 0x1, 0x1000001) at 0x1004e76c heap_openrv_extended(0x2020f5d8, 0x1, 0x1000001) at 0x1004eb98 parserOpenTable(0x2020f6d8, 0x2020f5d8, 0x1) at 0x101abeb0 addRangeTableEntry(0x2020f6d8, 0x2020f5d8, 0x0, 0x1000001, 0x1000001) at 0x101ac534 transformTableEntry(0x2020f6d8, 0x2020f5d8) at 0x1027f42c transformFromClauseItem(0x2020f6d8, 0x2020f5d8, 0x2ff1e05c, 0x2ff1e060, 0x2ff1e064, 0x2ff1e068) at 0x1027f528 transformFromClause(0x2020f6d8, 0x2020f610) at 0x102800c8 transformSelectStmt(0x2020f6d8, 0x2020f628) at 0x10313f90 transformStmt(0x2020f6d8, 0x2020f628) at 0x1031559c transformTopLevelStmt(0x2020f6d8, 0x2020f628) at 0x10315788 parse_analyze(0x2020f628, 0x2020ec10, 0x0, 0x0) at 0x1031583c pg_analyze_and_rewrite(0x2020f628, 0x2020ec10, 0x0, 0x0) at 0x10062528 exec_simple_query(0x2020ec10) at 0x10066b74 PostgresMain(0x1, 0x201f6048, 0x201f5ed8, 0x201f5ec8) at 0x10067e4c BackendRun(0x20234ad8) at 0x10117e54 BackendStartup(0x20234ad8) at 0x10119618 ServerLoop() at 0x10119c08 PostmasterMain(0x6, 0x201f56c8) at 0x1011b65c main(0x6, 0x201f56c8) at 0x10000b1c So the SELECT is waiting for the lock. The SIGALRM should send a SININT to this worker process to terminate the worker. Bye Rainer On 25.02.2014 18:18, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> What code path is executed if the timeout passes and >> the signal is send? > Well, the general idea (pre 9.3) is that at the start of the statement, > enable_sig_alarm calculates a future timeout instant and calls setitimer() > to schedule a SIGALRM signal then. When the signal is delivered, > CheckStatementTimeout should do kill(MyProcPid, SIGINT), which should > lead to ProcessInterrupts calling ereport(ERROR), which will longjmp > back to the process idle loop. We need to narrow down which of these > steps is failing to happen before we can speculate much on what's wrong. > > There are scenarios in which the SIGINT handler proper won't think it's > safe to call ProcessInterrupts immediately, but will just set a flag > to make that happen later. That should not apply in these test cases, > though. > > regards, tom lane > >
Hello, OK, if no access from the outside is needed I can probably run smoke builds on a Power5 on AIX 5.3, 6.1 and 7.1 LPAR. And maybe we can get a Power 7+ LPAR at IBM. I will try to get a contact within IBM. Bye Rainer P.S.: Power with AIX has now a market share from ~50% compared to all other Unix systems. On 25.02.2014 18:47, Alvaro Herrera wrote: > Rainer Tammer escribió: > >> P.S.: What do you think of a smoker on the IBM developer cloud? >> If this is interesting I might organize the setup. > Please do get some Power buildfarm members setup if it's within your, > err, powers. See here for instructions: > http://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto > > If you can get machines for all architectures across all supported > versions of AIX, that'd be great. Since these Power machines are not > very common, perhaps it'd be good if you can get one running Linux too; > we only have PowerPC running NetBSD and Mac OS X, nothing on POWER7. > > There's no need for anyone to access the machines. There's a push > script that's used to upload the test results. >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > The worker is hanging here: > (dbx) where > semop(??, ??, ??) at 0xd02f8df0 > PGSemaphoreLock(0x32438750, 0x1000001) at 0x10060958 > ProcSleep(0x20212d18, 0x20039140) at 0x10114ab8 > WaitOnLock(0x20212d18, 0x201f74f0) at 0x101269c0 > LockAcquireExtended(0x2ff1dc90, 0x1, 0x0, 0x0, 0x1000001) at 0x10128384 > LockAcquire(0x2ff1dc90, 0x1, 0x0, 0x0) at 0x101284a0 > LockRelationOid(0xa35c, 0x1) at 0x10173d50 > RangeVarGetRelidExtended(0x2020f5d8, 0x1, 0x1000001, 0x0, 0x0, 0x0) at > 0x1009dcf8 Pretty much as expected. So the question is why the signal isn't getting serviced; semop() should be interruptable. Given that you found it works again on a more recent AIX release, maybe that's an OS bug? It'd be worth adding some elog printouts to try to confirm whether the signal handlers are getting entered at all. It seems possible that the SIGALRM handler is entered but then the nested SIGINT occurrence is not serviced for some reason. regards, tom lane
Hello, one note.. it works in AIX 6.1 but not 7.1. So it could be a OS problem. I have opened a support call at IBM. Maybe the have an idea. The semop() should be interrupted by SIGINT, right? Bye Rainer On 25.02.2014 19:26, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> The worker is hanging here: >> (dbx) where >> semop(??, ??, ??) at 0xd02f8df0 >> PGSemaphoreLock(0x32438750, 0x1000001) at 0x10060958 >> ProcSleep(0x20212d18, 0x20039140) at 0x10114ab8 >> WaitOnLock(0x20212d18, 0x201f74f0) at 0x101269c0 >> LockAcquireExtended(0x2ff1dc90, 0x1, 0x0, 0x0, 0x1000001) at 0x10128384 >> LockAcquire(0x2ff1dc90, 0x1, 0x0, 0x0) at 0x101284a0 >> LockRelationOid(0xa35c, 0x1) at 0x10173d50 >> RangeVarGetRelidExtended(0x2020f5d8, 0x1, 0x1000001, 0x0, 0x0, 0x0) at >> 0x1009dcf8 > Pretty much as expected. So the question is why the signal isn't getting > serviced; semop() should be interruptable. Given that you found it works > again on a more recent AIX release, maybe that's an OS bug? > > It'd be worth adding some elog printouts to try to confirm whether the > signal handlers are getting entered at all. It seems possible that the > SIGALRM handler is entered but then the nested SIGINT occurrence is > not serviced for some reason. > > regards, tom lane > >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > The semop() should be interrupted by SIGINT, right? Yeah. Note that we're expecting the SIGINT handler to do a longjmp, so that it doesn't matter whether or not the semop would choose to resume waiting after a signal. But it has to execute the handler. regards, tom lane
Hello, sorry to bother you again... There is a pg_sema.c in src/backend/port. This is linked ti sysv_sema.c, there is also a posix_sema.c. How do you select the one or the other? Bye Rainer On 25.02.2014 19:26, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> The worker is hanging here: >> (dbx) where >> semop(??, ??, ??) at 0xd02f8df0 >> PGSemaphoreLock(0x32438750, 0x1000001) at 0x10060958 >> ProcSleep(0x20212d18, 0x20039140) at 0x10114ab8 >> WaitOnLock(0x20212d18, 0x201f74f0) at 0x101269c0 >> LockAcquireExtended(0x2ff1dc90, 0x1, 0x0, 0x0, 0x1000001) at 0x10128384 >> LockAcquire(0x2ff1dc90, 0x1, 0x0, 0x0) at 0x101284a0 >> LockRelationOid(0xa35c, 0x1) at 0x10173d50 >> RangeVarGetRelidExtended(0x2020f5d8, 0x1, 0x1000001, 0x0, 0x0, 0x0) at >> 0x1009dcf8 > Pretty much as expected. So the question is why the signal isn't getting > serviced; semop() should be interruptable. Given that you found it works > again on a more recent AIX release, maybe that's an OS bug? > > It'd be worth adding some elog printouts to try to confirm whether the > signal handlers are getting entered at all. It seems possible that the > SIGALRM handler is entered but then the nested SIGINT occurrence is > not serviced for some reason. > > regards, tom lane > >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > There is a pg_sema.c in src/backend/port. > This is linked ti sysv_sema.c, there is also a posix_sema.c. > How do you select the one or the other? The configure script chooses which to use for a particular platform. Perhaps it should be making a different choice for AIX? regards, tom lane
Hello, Just to clarify my picture: backend process: tammer 12255262 13107322 0 07:52:10 - 0:00 postgres: tammer regression [local] SELECT waiting If I send a SIGINT (-2) to the backend process then this would be equivalent to the SIGINT send by the SIGALRM handler? Bye Rainer On 25.02.2014 19:47, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> The semop() should be interrupted by SIGINT, right? > Yeah. Note that we're expecting the SIGINT handler to do a longjmp, > so that it doesn't matter whether or not the semop would choose to > resume waiting after a signal. But it has to execute the handler. > > regards, tom lane > >
Hello, so far: - The switch to POSIX semaphores do not solve the problem... - It's not CPU dependent, the problem does show on Power 5 and Power 7+ - The problem does show on AIX 7.1 (at least on Technology Level 3) - The problem does not show on AIX 6.1 - If I send a SIGINT manually to the backend process nothing happens. My next steps: - Instrument the SIGALRM handler - Instrument the SIGINT handler Bye Rainer On 25.02.2014 20:29, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> There is a pg_sema.c in src/backend/port. >> This is linked ti sysv_sema.c, there is also a posix_sema.c. >> How do you select the one or the other? > The configure script chooses which to use for a particular platform. > > Perhaps it should be making a different choice for AIX? > > regards, tom lane > >
Hello, OK, at least the first stage is working... src/backend/storage/lmgr/proc.c ... static bool CheckStatementTimeout(void) { TimestampTz now; if (!statement_timeout_active) return true; /* do nothing if not active */ elog(FATAL, "enter: CheckStatementTimeout - after statement_timeout_active check"); now = GetCurrentTimestamp(); ... gmake check: prepared_xacts.out ... -- pxtest3 should be locked because of the pending DROP set statement_timeout to 2000; SELECT * FROM pxtest3; FATAL: enter: CheckStatementTimeout - after statement_timeout_active check LINE 1: SELECT * FROM pxtest3; ^ FATAL: enter: CheckStatementTimeout - after statement_timeout_active check FATAL: enter: CheckStatementTimeout - after statement_timeout_active check LINE 1: SELECT * FROM pxtest3; ^ FATAL: enter: CheckStatementTimeout - after statement_timeout_active check connection to server was lost .... Now I have put the output in CheckStatementTimeout: src/backend/storage/lmgr/proc.c ... static bool CheckStatementTimeout(void) { TimestampTz now; if (!statement_timeout_active) return true; /* do nothing if not active */ now = GetCurrentTimestamp(); if (now >= statement_fin_time) { /* Time to die */ statement_timeout_active = false; cancel_from_timeout = true; elog(FATAL, "enter: CheckStatementTimeout - next statement send kill"); #ifdef HAVE_SETSID /* try to signal whole process group */ kill(-MyProcPid, SIGINT); #endif kill(MyProcPid, SIGINT); } ... gmake check: prepared_xacts.out ... -- pxtest3 should be locked because of the pending DROP set statement_timeout to 2000; SELECT * FROM pxtest3; FATAL: enter: CheckStatementTimeout - next statement send kill LINE 1: SELECT * FROM pxtest3; ^ FATAL: enter: CheckStatementTimeout - next statement send kill LINE 1: SELECT * FROM pxtest3; ^ connection to server was lost So the next thing should be that the backend process (the one hanging in the SELECT) should receive the SIGINT - correct? Bye Rainer On 25.02.2014 20:29, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> There is a pg_sema.c in src/backend/port. >> This is linked ti sysv_sema.c, there is also a posix_sema.c. >> How do you select the one or the other? > The configure script chooses which to use for a particular platform. > > Perhaps it should be making a different choice for AIX? > > regards, tom lane > >
Hello, So we are getting closer (if I did not instrument the wrond code): src/backend/tcop/postgres.c int PostgresMain(int argc, char *argv[], const char *dbname, const char *username) { ... if (am_walsender) WalSndSignals(); else { pqsignal(SIGHUP, SigHupHandler); /* set flag to read config * file */ ----> register handler here <------------------------------------- pqsignal(SIGINT, StatementCancelHandler); /* cancel current query */ ----> register handler here <------------------------------------- pqsignal(SIGTERM, die); /* cancel current query and exit */ ... void StatementCancelHandler(SIGNAL_ARGS) { int save_errno = errno; elog(WARNING, "StatementCancelHandler() - entered"); /* * Don't joggle the elbow of proc_exit */ if (!proc_exit_inprogress) { ... This part is never reached. Does this mean that the signal SIGINT got lost? Or do I search in the wrong place. Your help is much appreciated. Bye Rainer On 25.02.2014 19:47, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> The semop() should be interrupted by SIGINT, right? > Yeah. Note that we're expecting the SIGINT handler to do a longjmp, > so that it doesn't matter whether or not the semop would choose to > resume waiting after a signal. But it has to execute the handler. > > regards, tom lane > >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > void > StatementCancelHandler(SIGNAL_ARGS) > { > int save_errno = errno; > elog(WARNING, "StatementCancelHandler() - entered"); > This part is never reached. Does this mean that the signal SIGINT got lost? Yeah, looks that way. You might try changing the WARNING to FATAL, as in your previous example, just to be totally sure that it's not a problem of the message not getting flushed out or something like that. But what it seems at the moment is that the signal isn't getting delivered. It'd be an idea to add some more logging to CheckStatementTimeout to verify that (1) MyProcPid corresponds to the process's PID and (2) the kill() isn't returning an error indication. You could also try making it use the non-HAVE_SETSID logic, ie don't use the "-", just to see if that changes anything. regards, tom lane
Hello, I can try that, but I have tested this WARNING in 9.2.7 on AIX 6.1. I can see that the output is in the log file... so somehow the signal got stuck. I can log the PIDs. Is there a way to send something to the server log (during make check) which does not show up on the test output? With the trace text all affected test will fail because the expected output differs from the real output. I will also text the kill() without "-". Bye Rainer On 26.02.2014 15:39, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> void >> StatementCancelHandler(SIGNAL_ARGS) >> { >> int save_errno = errno; >> elog(WARNING, "StatementCancelHandler() - entered"); >> This part is never reached. Does this mean that the signal SIGINT got lost? > Yeah, looks that way. You might try changing the WARNING to FATAL, as > in your previous example, just to be totally sure that it's not a problem > of the message not getting flushed out or something like that. But what > it seems at the moment is that the signal isn't getting delivered. > > It'd be an idea to add some more logging to CheckStatementTimeout to > verify that (1) MyProcPid corresponds to the process's PID and > (2) the kill() isn't returning an error indication. You could also > try making it use the non-HAVE_SETSID logic, ie don't use the "-", > just to see if that changes anything. > > regards, tom lane > >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > I can log the PIDs. Is there a way to send something to the server log > (during make check) which does not show up on the test output? Sure, use elog(LOG, ...) regards, tom lane
Hello, Here is the code change: CheckStatementTimeout() .... if (now >= statement_fin_time) { /* Time to die */ statement_timeout_active = false; cancel_from_timeout = true; elog(LOG, "kill PID: %d", MyProcPid); #ifdef HAVE_SETSID_off /* try to signal whole process group */ kill(-MyProcPid, SIGINT); #endif kill(MyProcPid, SIGINT); } .... And here is the result: --- server.log ---- STATEMENT: FETCH 1 FROM foo; ERROR: relation "pxtest2" does not exist at character 15 STATEMENT: SELECT * FROM pxtest2; LOG: kill PID: 12255410 at character 15 --- ps -ef ---- tammer 12255410 16318698 0 16:14:01 - 0:00 postgres: tammer regression [local] SELECT waiting So the PID in CheckStatementTimeout() does match the actual backend process. There is no difference if I use kill() with "-" or without "-". Bye Rainer On 26.02.2014 15:39, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> void >> StatementCancelHandler(SIGNAL_ARGS) >> { >> int save_errno = errno; >> elog(WARNING, "StatementCancelHandler() - entered"); >> This part is never reached. Does this mean that the signal SIGINT got lost? > Yeah, looks that way. You might try changing the WARNING to FATAL, as > in your previous example, just to be totally sure that it's not a problem > of the message not getting flushed out or something like that. But what > it seems at the moment is that the signal isn't getting delivered. > > It'd be an idea to add some more logging to CheckStatementTimeout to > verify that (1) MyProcPid corresponds to the process's PID and > (2) the kill() isn't returning an error indication. You could also > try making it use the non-HAVE_SETSID logic, ie don't use the "-", > just to see if that changes anything. > > regards, tom lane > >
Hello, Here is the result of a working test in AIX 6.1: ERROR: canceling statement due to statement timeout STATEMENT: SELECT * FROM pxtest3; LOG: kill PID: 4259914 at character 15 <- the signal is send STATEMENT: SELECT * FROM pxtest3; LOG: StatementCancelHandler() - entered MyProcPid: 4259914 at character 15 <-- the signal is received STATEMENT: SELECT * FROM pxtest3; So we can be pretty sure that the SIGINT does not interrupt semop() on AIX 71. I will also send this information to IBM. Bye Rainer On 26.02.2014 16:20, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> I can log the PIDs. Is there a way to send something to the server log >> (during make check) which does not show up on the test output? > Sure, use elog(LOG, ...) > > regards, tom lane > >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > So we can be pretty sure that the SIGINT does not interrupt semop() on > AIX 71. Looks that way :-(. It would be interesting to add a check to see if the kill() is returning a failure indication or not. The existing code doesn't bother because there's not really anything it can do about it, but for this purpose it would be good to know. regards, tom lane
Hello, I have checked this: From prepared_xacts: LOG: Send kill to PID: 11731118 LOG: Retrun value of kill PID: ret= 0 From: plpgsql LOG: Send kill to PID: 11731120 LOG: Retrun value of kill PID: ret= 0 A kill -2 on the shell does not stop the backend process. A kill -15 does stop the backend process. So the kill does not return a failure. The code in /src/backend/storage/lmgr/proc.c - CheckStatementTimeout() of 8.4.20 does not look much different. The code in pqsignal.c is also the not really different. Bye Rainer On 26.02.2014 18:00, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> So we can be pretty sure that the SIGINT does not interrupt semop() on >> AIX 71. > Looks that way :-(. It would be interesting to add a check to see if the > kill() is returning a failure indication or not. The existing code > doesn't bother because there's not really anything it can do about it, > but for this purpose it would be good to know. > > regards, tom lane > >
Hello, the problem has reached the development support at IBM. As soon as I have a solution for the problem I will setup a build farm member (hopefully). I can probably smoke HEAD on AIX 7.1 and 6.1. Bye Rainer On 27.02.2014 08:21, Rainer Tammer wrote: > Hello, > I have checked this: > > >From prepared_xacts: > > LOG: Send kill to PID: 11731118 > LOG: Retrun value of kill PID: ret= 0 > > From: plpgsql > > LOG: Send kill to PID: 11731120 > LOG: Retrun value of kill PID: ret= 0 > > A kill -2 on the shell does not stop the backend process. > A kill -15 does stop the backend process. > > So the kill does not return a failure. > > The code in /src/backend/storage/lmgr/proc.c - CheckStatementTimeout() > of 8.4.20 > does not look much different. The code in pqsignal.c is also the not > really different. > > Bye > Rainer > > On 26.02.2014 18:00, Tom Lane wrote: >> Rainer Tammer <pgsql@spg.schulergroup.com> writes: >>> So we can be pretty sure that the SIGINT does not interrupt semop() on >>> AIX 71. >> Looks that way :-(. It would be interesting to add a check to see if the >> kill() is returning a failure indication or not. The existing code >> doesn't bother because there's not really anything it can do about it, >> but for this purpose it would be good to know. >> >> regards, tom lane >> >> > >
Hello, I am currently checking the hang problem with IBM. The have found some other test failures on AXI 7.1 TL2: They do not see the hang on TL2 but instead they see the following test errors. I do not see these errors in AIX 6.1 TL8 or AIX 7.1 TL3 (with prepared_xacts and plpgsql removed). Any ideas? I have attached the diff file for better readability. *** /stuff/postgresql-9.2.7/src/test/regress/expected/inherit.out Mon Feb 17 13:38:15 2014 --- /stuff/postgresql-9.2.7/src/test/regress/results/inherit.out Tue Mar 11 16:55:26 2014 *************** *** 576,592 **** from ( select f1 from foo union all select f1+3 from foo ) ss where bar.f1 = ss.f1; select tableoid::regclass::text as relname, bar.* from bar order by 1,2; relname | f1 | f2 ---------+----+----- ! bar | 1 | 201 ! bar | 2 | 202 ! bar | 3 | 203 ! bar | 4 | 104 ! bar2 | 1 | 201 ! bar2 | 2 | 202 ! bar2 | 3 | 203 ! bar2 | 4 | 104 (8 rows) /* Test multiple inheritance of column defaults */ --- 576,593 ---- from ( select f1 from foo union all select f1+3 from foo ) ss where bar.f1 = ss.f1; + ERROR: rel 5 already exists select tableoid::regclass::text as relname, bar.* from bar order by 1,2; relname | f1 | f2 ---------+----+----- ! bar | 1 | 101 ! bar | 2 | 102 ! bar | 3 | 103 ! bar | 4 | 4 ! bar2 | 1 | 101 ! bar2 | 2 | 102 ! bar2 | 3 | 103 ! bar2 | 4 | 4 (8 rows) /* Test multiple inheritance of column defaults */ *************** *** 1391,1427 **** (SELECT unique1 AS x FROM tenk1 a UNION ALL SELECT unique2 AS x FROM tenk1 b) s; ! QUERY PLAN ! -------------------------------------------------------------------- ! Result ! InitPlan 1 (returns $0) ! -> Limit ! -> Merge Append ! Sort Key: a.unique1 ! -> Index Only Scan using tenk1_unique1 on tenk1 a ! Index Cond: (unique1 IS NOT NULL) ! -> Index Only Scan using tenk1_unique2 on tenk1 b ! Index Cond: (unique2 IS NOT NULL) ! (9 rows) ! explain (costs off) SELECT min(y) FROM (SELECT unique1 AS x, unique1 AS y FROM tenk1 a UNION ALL SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s; ! QUERY PLAN ! -------------------------------------------------------------------- ! Result ! InitPlan 1 (returns $0) ! -> Limit ! -> Merge Append ! Sort Key: a.unique1 ! -> Index Only Scan using tenk1_unique1 on tenk1 a ! Index Cond: (unique1 IS NOT NULL) ! -> Index Only Scan using tenk1_unique2 on tenk1 b ! Index Cond: (unique2 IS NOT NULL) ! (9 rows) ! -- XXX planner doesn't recognize that index on unique2 is sufficiently sorted explain (costs off) SELECT x, y FROM --- 1392,1404 ---- (SELECT unique1 AS x FROM tenk1 a UNION ALL SELECT unique2 AS x FROM tenk1 b) s; ! ERROR: rel 4 already exists explain (costs off) SELECT min(y) FROM (SELECT unique1 AS x, unique1 AS y FROM tenk1 a UNION ALL SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s; ! ERROR: rel 4 already exists -- XXX planner doesn't recognize that index on unique2 is sufficiently sorted explain (costs off) SELECT x, y FROM *************** *** 1429,1445 **** UNION ALL SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s ORDER BY x, y; ! QUERY PLAN ! ------------------------------------------------------------------- ! Result ! -> Merge Append ! Sort Key: a.thousand, a.tenthous ! -> Index Only Scan using tenk1_thous_tenthous on tenk1 a ! -> Sort ! Sort Key: b.unique2, b.unique2 ! -> Index Only Scan using tenk1_unique2 on tenk1 b ! (7 rows) ! reset enable_seqscan; reset enable_indexscan; reset enable_bitmapscan; --- 1406,1412 ---- UNION ALL SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s ORDER BY x, y; ! ERROR: rel 4 already exists reset enable_seqscan; reset enable_indexscan; reset enable_bitmapscan; ====================================================================== *** /stuff/postgresql-9.2.7/src/test/regress/expected/union.out Mon Feb 17 13:38:15 2014 --- /stuff/postgresql-9.2.7/src/test/regress/results/union.out Tue Mar 11 16:55:32 2014 *************** *** 474,489 **** UNION ALL SELECT * FROM t2) t WHERE ab = 'ab'; ! QUERY PLAN ! --------------------------------------------------- ! Result ! -> Append ! -> Index Scan using t1_ab_idx on t1 ! Index Cond: ((a || b) = 'ab'::text) ! -> Index Only Scan using t2_pkey on t2 ! Index Cond: (ab = 'ab'::text) ! (6 rows) ! explain (costs off) SELECT * FROM (SELECT a || b AS ab FROM t1 --- 474,480 ---- UNION ALL SELECT * FROM t2) t WHERE ab = 'ab'; ! ERROR: rel 4 already exists explain (costs off) SELECT * FROM (SELECT a || b AS ab FROM t1 *************** *** 510,522 **** UNION ALL SELECT 2 AS t, * FROM tenk1 b) c WHERE t = 2; ! QUERY PLAN ! --------------------------------- ! Result ! -> Append ! -> Seq Scan on tenk1 b ! (3 rows) ! -- Test that we push quals into UNION sub-selects only when it's safe explain (costs off) SELECT * FROM --- 501,507 ---- UNION ALL SELECT 2 AS t, * FROM tenk1 b) c WHERE t = 2; ! ERROR: rel 4 already exists -- Test that we push quals into UNION sub-selects only when it's safe explain (costs off) SELECT * FROM *************** *** 617,641 **** select * from (select * from t3 a union all select * from t3 b) ss join int4_tbl on f1 = expensivefunc(x); ! QUERY PLAN ! ------------------------------------------------------------ ! Nested Loop ! -> Seq Scan on int4_tbl ! -> Append ! -> Index Scan using t3i on t3 a ! Index Cond: (expensivefunc(x) = int4_tbl.f1) ! -> Index Scan using t3i on t3 b ! Index Cond: (expensivefunc(x) = int4_tbl.f1) ! (7 rows) ! select * from (select * from t3 a union all select * from t3 b) ss join int4_tbl on f1 = expensivefunc(x); ! x | f1 ! ---+---- ! 0 | 0 ! 0 | 0 ! (2 rows) ! drop table t3; drop function expensivefunc(int); --- 602,611 ---- select * from (select * from t3 a union all select * from t3 b) ss join int4_tbl on f1 = expensivefunc(x); ! ERROR: rel 6 already exists select * from (select * from t3 a union all select * from t3 b) ss join int4_tbl on f1 = expensivefunc(x); ! ERROR: rel 6 already exists drop table t3; drop function expensivefunc(int); ====================================================================== *** /stuff/postgresql-9.2.7/src/test/regress/expected/security_label.out Tue Mar 11 16:53:52 2014 --- /stuff/postgresql-9.2.7/src/test/regress/results/security_label.out Tue Mar 11 16:55:37 2014 *************** *** 92,110 **** SECURITY LABEL ON SCHEMA seclabel_test IS 'unclassified'; -- OK SELECT objtype, objname, provider, label FROM pg_seclabels ORDER BY objtype, objname; ! objtype | objname | provider | label ! ----------+-----------------+----------+-------------- ! column | seclabel_tbl1.a | dummy | unclassified ! domain | seclabel_domain | dummy | classified ! function | seclabel_four() | dummy | classified ! role | seclabel_user1 | dummy | classified ! role | seclabel_user2 | dummy | unclassified ! schema | seclabel_test | dummy | unclassified ! table | seclabel_tbl1 | dummy | top secret ! table | seclabel_tbl2 | dummy | classified ! view | seclabel_view1 | dummy | classified ! (9 rows) ! -- clean up objects DROP FUNCTION seclabel_four(); DROP DOMAIN seclabel_domain; --- 92,98 ---- SECURITY LABEL ON SCHEMA seclabel_test IS 'unclassified'; -- OK SELECT objtype, objname, provider, label FROM pg_seclabels ORDER BY objtype, objname; ! ERROR: rel 16 already exists -- clean up objects DROP FUNCTION seclabel_four(); DROP DOMAIN seclabel_domain; *************** *** 117,123 **** -- make sure we don't have any leftovers SELECT objtype, objname, provider, label FROM pg_seclabels ORDER BY objtype, objname; ! objtype | objname | provider | label ! ---------+---------+----------+------- ! (0 rows) ! --- 105,108 ---- -- make sure we don't have any leftovers SELECT objtype, objname, provider, label FROM pg_seclabels ORDER BY objtype, objname; ! ERROR: rel 16 already exists ====================================================================== *** /stuff/postgresql-9.2.7/src/test/regress/expected/foreign_data.out Mon Feb 17 13:38:15 2014 --- /stuff/postgresql-9.2.7/src/test/regress/results/foreign_data.out Tue Mar 11 16:55:46 2014 *************** *** 900,922 **** (7 rows) SELECT * FROM information_schema.usage_privileges WHERE object_type LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; ! grantor | grantee | object_catalog | object_schema | object_name | object_type | privilege_type | is_grantable ! -------------------+-----------------------+----------------+---------------+-------------+----------------------+----------------+-------------- ! foreign_data_user | foreign_data_user | regression | | foo | FOREIGN DATA WRAPPER | USAGE | YES ! foreign_data_user | foreign_data_user | regression | | s6 | FOREIGN SERVER | USAGE | YES ! foreign_data_user | regress_test_indirect | regression | | foo | FOREIGN DATA WRAPPER | USAGE | NO ! foreign_data_user | regress_test_role2 | regression | | s6 | FOREIGN SERVER | USAGE | YES ! (4 rows) ! SELECT * FROM information_schema.role_usage_grants WHERE object_type LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; ! grantor | grantee | object_catalog | object_schema | object_name | object_type | privilege_type | is_grantable ! -------------------+-----------------------+----------------+---------------+-------------+----------------------+----------------+-------------- ! foreign_data_user | foreign_data_user | regression | | foo | FOREIGN DATA WRAPPER | USAGE | YES ! foreign_data_user | foreign_data_user | regression | | s6 | FOREIGN SERVER | USAGE | YES ! foreign_data_user | regress_test_indirect | regression | | foo | FOREIGN DATA WRAPPER | USAGE | NO ! foreign_data_user | regress_test_role2 | regression | | s6 | FOREIGN SERVER | USAGE | YES ! (4 rows) ! SELECT * FROM information_schema.foreign_tables ORDER BY 1, 2, 3; foreign_table_catalog | foreign_table_schema | foreign_table_name | foreign_server_catalog | foreign_server_name -----------------------+----------------------+--------------------+------------------------+--------------------- --- 900,908 ---- (7 rows) SELECT * FROM information_schema.usage_privileges WHERE object_type LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; ! ERROR: could not open relation with OID 0 SELECT * FROM information_schema.role_usage_grants WHERE object_type LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; ! ERROR: could not open relation with OID 0 SELECT * FROM information_schema.foreign_tables ORDER BY 1, 2, 3; foreign_table_catalog | foreign_table_schema | foreign_table_name | foreign_server_catalog | foreign_server_name -----------------------+----------------------+--------------------+------------------------+--------------------- *************** *** 943,961 **** (5 rows) SELECT * FROM information_schema.usage_privileges WHERE object_type LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; ! grantor | grantee | object_catalog | object_schema | object_name | object_type | privilege_type | is_grantable ! -------------------+-----------------------+----------------+---------------+-------------+----------------------+----------------+-------------- ! foreign_data_user | regress_test_indirect | regression | | foo | FOREIGN DATA WRAPPER | USAGE | NO ! foreign_data_user | regress_test_role2 | regression | | s6 | FOREIGN SERVER | USAGE | YES ! (2 rows) ! SELECT * FROM information_schema.role_usage_grants WHERE object_type LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; ! grantor | grantee | object_catalog | object_schema | object_name | object_type | privilege_type | is_grantable ! -------------------+-----------------------+----------------+---------------+-------------+----------------------+----------------+-------------- ! foreign_data_user | regress_test_indirect | regression | | foo | FOREIGN DATA WRAPPER | USAGE | NO ! foreign_data_user | regress_test_role2 | regression | | s6 | FOREIGN SERVER | USAGE | YES ! (2 rows) ! DROP USER MAPPING FOR current_user SERVER t1; SET ROLE regress_test_role2; SELECT * FROM information_schema.user_mapping_options ORDER BY 1, 2, 3, 4; --- 929,937 ---- (5 rows) SELECT * FROM information_schema.usage_privileges WHERE object_type LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; ! ERROR: rel 11 already exists SELECT * FROM information_schema.role_usage_grants WHERE object_type LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; ! ERROR: rel 14 already exists DROP USER MAPPING FOR current_user SERVER t1; SET ROLE regress_test_role2; SELECT * FROM information_schema.user_mapping_options ORDER BY 1, 2, 3, 4; ====================================================================== *** /stuff/postgresql-9.2.7/src/test/regress/expected/window.out Mon Feb 17 13:38:15 2014 --- /stuff/postgresql-9.2.7/src/test/regress/results/window.out Tue Mar 11 16:55:44 2014 *************** *** 957,966 **** -- with UNION SELECT count(*) OVER (PARTITION BY four) FROM (SELECT * FROM tenk1 UNION ALL SELECT * FROM tenk2)s LIMIT 0; ! count ! ------- ! (0 rows) ! -- ordering by a non-integer constant is allowed SELECT rank() OVER (ORDER BY length('abc')); rank --- 957,963 ---- -- with UNION SELECT count(*) OVER (PARTITION BY four) FROM (SELECT * FROM tenk1 UNION ALL SELECT * FROM tenk2)s LIMIT 0; ! ERROR: rel 4 already exists -- ordering by a non-integer constant is allowed SELECT rank() OVER (ORDER BY length('abc')); rank ====================================================================== *** /stuff/postgresql-9.2.7/src/test/regress/expected/with.out Mon Feb 17 13:38:15 2014 --- /stuff/postgresql-9.2.7/src/test/regress/results/with.out Tue Mar 11 16:55:57 2014 *************** *** 384,396 **** (SELECT 2 AS i UNION ALL SELECT 3 AS i) AS t2 JOIN t ON (t2.i = t.i+1)) SELECT * FROM t; ! i | j ! ---+--- ! 1 | 2 ! 2 | 3 ! 3 | 4 ! (3 rows) ! -- -- different tree example -- --- 384,390 ---- (SELECT 2 AS i UNION ALL SELECT 3 AS i) AS t2 JOIN t ON (t2.i = t.i+1)) SELECT * FROM t; ! ERROR: rel 6 already exists -- -- different tree example -- ====================================================================== Bye Rainer On 04.03.2014 17:21, Rainer Tammer wrote: > Hello, > the problem has reached the development support at IBM. > > As soon as I have a solution for the problem I will setup a build farm > member (hopefully). > > I can probably smoke HEAD on AIX 7.1 and 6.1. > > Bye > Rainer > > On 27.02.2014 08:21, Rainer Tammer wrote: >> Hello, >> I have checked this: >> >> >From prepared_xacts: >> >> LOG: Send kill to PID: 11731118 >> LOG: Retrun value of kill PID: ret= 0 >> >> From: plpgsql >> >> LOG: Send kill to PID: 11731120 >> LOG: Retrun value of kill PID: ret= 0 >> >> A kill -2 on the shell does not stop the backend process. >> A kill -15 does stop the backend process. >> >> So the kill does not return a failure. >> >> The code in /src/backend/storage/lmgr/proc.c - CheckStatementTimeout() >> of 8.4.20 >> does not look much different. The code in pqsignal.c is also the not >> really different. >> >> Bye >> Rainer >> >> On 26.02.2014 18:00, Tom Lane wrote: >>> Rainer Tammer <pgsql@spg.schulergroup.com> writes: >>>> So we can be pretty sure that the SIGINT does not interrupt semop() on >>>> AIX 71. >>> Looks that way :-(. It would be interesting to add a check to see if the >>> kill() is returning a failure indication or not. The existing code >>> doesn't bother because there's not really anything it can do about it, >>> but for this purpose it would be good to know. >>> >>> regards, tom lane >>> >>> >> > >
Attachment
Hello, it looks like these failures are caused by compiler optimizations -O2. Without optimization the random failures do not show up. Bye Rainer On 12.03.2014 08:13, Rainer Tammer wrote: > Hello, > I am currently checking the hang problem with IBM. > > The have found some other test failures on AXI 7.1 TL2: > They do not see the hang on TL2 but instead they see the > following test errors. I do not see these errors in AIX 6.1 TL8 > or AIX 7.1 TL3 (with prepared_xacts and plpgsql removed). > > Any ideas? I have attached the diff file for better readability. > > *** /stuff/postgresql-9.2.7/src/test/regress/expected/inherit.out Mon > Feb 17 13:38:15 2014 > --- /stuff/postgresql-9.2.7/src/test/regress/results/inherit.out Tue > Mar 11 16:55:26 2014 > *************** > *** 576,592 **** > from > ( select f1 from foo union all select f1+3 from foo ) ss > where bar.f1 = ss.f1; > select tableoid::regclass::text as relname, bar.* from bar order by 1,2; > relname | f1 | f2 > ---------+----+----- > ! bar | 1 | 201 > ! bar | 2 | 202 > ! bar | 3 | 203 > ! bar | 4 | 104 > ! bar2 | 1 | 201 > ! bar2 | 2 | 202 > ! bar2 | 3 | 203 > ! bar2 | 4 | 104 > (8 rows) > > /* Test multiple inheritance of column defaults */ > --- 576,593 ---- > from > ( select f1 from foo union all select f1+3 from foo ) ss > where bar.f1 = ss.f1; > + ERROR: rel 5 already exists > select tableoid::regclass::text as relname, bar.* from bar order by 1,2; > relname | f1 | f2 > ---------+----+----- > ! bar | 1 | 101 > ! bar | 2 | 102 > ! bar | 3 | 103 > ! bar | 4 | 4 > ! bar2 | 1 | 101 > ! bar2 | 2 | 102 > ! bar2 | 3 | 103 > ! bar2 | 4 | 4 > (8 rows) > > /* Test multiple inheritance of column defaults */ > *************** > *** 1391,1427 **** > (SELECT unique1 AS x FROM tenk1 a > UNION ALL > SELECT unique2 AS x FROM tenk1 b) s; > ! QUERY PLAN > ! -------------------------------------------------------------------- > ! Result > ! InitPlan 1 (returns $0) > ! -> Limit > ! -> Merge Append > ! Sort Key: a.unique1 > ! -> Index Only Scan using tenk1_unique1 on tenk1 a > ! Index Cond: (unique1 IS NOT NULL) > ! -> Index Only Scan using tenk1_unique2 on tenk1 b > ! Index Cond: (unique2 IS NOT NULL) > ! (9 rows) > ! > explain (costs off) > SELECT min(y) FROM > (SELECT unique1 AS x, unique1 AS y FROM tenk1 a > UNION ALL > SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s; > ! QUERY PLAN > ! -------------------------------------------------------------------- > ! Result > ! InitPlan 1 (returns $0) > ! -> Limit > ! -> Merge Append > ! Sort Key: a.unique1 > ! -> Index Only Scan using tenk1_unique1 on tenk1 a > ! Index Cond: (unique1 IS NOT NULL) > ! -> Index Only Scan using tenk1_unique2 on tenk1 b > ! Index Cond: (unique2 IS NOT NULL) > ! (9 rows) > ! > -- XXX planner doesn't recognize that index on unique2 is sufficiently > sorted > explain (costs off) > SELECT x, y FROM > --- 1392,1404 ---- > (SELECT unique1 AS x FROM tenk1 a > UNION ALL > SELECT unique2 AS x FROM tenk1 b) s; > ! ERROR: rel 4 already exists > explain (costs off) > SELECT min(y) FROM > (SELECT unique1 AS x, unique1 AS y FROM tenk1 a > UNION ALL > SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s; > ! ERROR: rel 4 already exists > -- XXX planner doesn't recognize that index on unique2 is sufficiently > sorted > explain (costs off) > SELECT x, y FROM > *************** > *** 1429,1445 **** > UNION ALL > SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s > ORDER BY x, y; > ! QUERY PLAN > ! ------------------------------------------------------------------- > ! Result > ! -> Merge Append > ! Sort Key: a.thousand, a.tenthous > ! -> Index Only Scan using tenk1_thous_tenthous on tenk1 a > ! -> Sort > ! Sort Key: b.unique2, b.unique2 > ! -> Index Only Scan using tenk1_unique2 on tenk1 b > ! (7 rows) > ! > reset enable_seqscan; > reset enable_indexscan; > reset enable_bitmapscan; > --- 1406,1412 ---- > UNION ALL > SELECT unique2 AS x, unique2 AS y FROM tenk1 b) s > ORDER BY x, y; > ! ERROR: rel 4 already exists > reset enable_seqscan; > reset enable_indexscan; > reset enable_bitmapscan; > > ====================================================================== > > *** /stuff/postgresql-9.2.7/src/test/regress/expected/union.out Mon > Feb 17 13:38:15 2014 > --- /stuff/postgresql-9.2.7/src/test/regress/results/union.out Tue > Mar 11 16:55:32 2014 > *************** > *** 474,489 **** > UNION ALL > SELECT * FROM t2) t > WHERE ab = 'ab'; > ! QUERY PLAN > ! --------------------------------------------------- > ! Result > ! -> Append > ! -> Index Scan using t1_ab_idx on t1 > ! Index Cond: ((a || b) = 'ab'::text) > ! -> Index Only Scan using t2_pkey on t2 > ! Index Cond: (ab = 'ab'::text) > ! (6 rows) > ! > explain (costs off) > SELECT * FROM > (SELECT a || b AS ab FROM t1 > --- 474,480 ---- > UNION ALL > SELECT * FROM t2) t > WHERE ab = 'ab'; > ! ERROR: rel 4 already exists > explain (costs off) > SELECT * FROM > (SELECT a || b AS ab FROM t1 > *************** > *** 510,522 **** > UNION ALL > SELECT 2 AS t, * FROM tenk1 b) c > WHERE t = 2; > ! QUERY PLAN > ! --------------------------------- > ! Result > ! -> Append > ! -> Seq Scan on tenk1 b > ! (3 rows) > ! > -- Test that we push quals into UNION sub-selects only when it's safe > explain (costs off) > SELECT * FROM > --- 501,507 ---- > UNION ALL > SELECT 2 AS t, * FROM tenk1 b) c > WHERE t = 2; > ! ERROR: rel 4 already exists > -- Test that we push quals into UNION sub-selects only when it's safe > explain (costs off) > SELECT * FROM > *************** > *** 617,641 **** > select * from > (select * from t3 a union all select * from t3 b) ss > join int4_tbl on f1 = expensivefunc(x); > ! QUERY PLAN > ! ------------------------------------------------------------ > ! Nested Loop > ! -> Seq Scan on int4_tbl > ! -> Append > ! -> Index Scan using t3i on t3 a > ! Index Cond: (expensivefunc(x) = int4_tbl.f1) > ! -> Index Scan using t3i on t3 b > ! Index Cond: (expensivefunc(x) = int4_tbl.f1) > ! (7 rows) > ! > select * from > (select * from t3 a union all select * from t3 b) ss > join int4_tbl on f1 = expensivefunc(x); > ! x | f1 > ! ---+---- > ! 0 | 0 > ! 0 | 0 > ! (2 rows) > ! > drop table t3; > drop function expensivefunc(int); > --- 602,611 ---- > select * from > (select * from t3 a union all select * from t3 b) ss > join int4_tbl on f1 = expensivefunc(x); > ! ERROR: rel 6 already exists > select * from > (select * from t3 a union all select * from t3 b) ss > join int4_tbl on f1 = expensivefunc(x); > ! ERROR: rel 6 already exists > drop table t3; > drop function expensivefunc(int); > > ====================================================================== > > *** > /stuff/postgresql-9.2.7/src/test/regress/expected/security_label.out > Tue Mar 11 16:53:52 2014 > --- > /stuff/postgresql-9.2.7/src/test/regress/results/security_label.out > Tue Mar 11 16:55:37 2014 > *************** > *** 92,110 **** > SECURITY LABEL ON SCHEMA seclabel_test IS 'unclassified'; -- OK > SELECT objtype, objname, provider, label FROM pg_seclabels > ORDER BY objtype, objname; > ! objtype | objname | provider | label > ! ----------+-----------------+----------+-------------- > ! column | seclabel_tbl1.a | dummy | unclassified > ! domain | seclabel_domain | dummy | classified > ! function | seclabel_four() | dummy | classified > ! role | seclabel_user1 | dummy | classified > ! role | seclabel_user2 | dummy | unclassified > ! schema | seclabel_test | dummy | unclassified > ! table | seclabel_tbl1 | dummy | top secret > ! table | seclabel_tbl2 | dummy | classified > ! view | seclabel_view1 | dummy | classified > ! (9 rows) > ! > -- clean up objects > DROP FUNCTION seclabel_four(); > DROP DOMAIN seclabel_domain; > --- 92,98 ---- > SECURITY LABEL ON SCHEMA seclabel_test IS 'unclassified'; -- OK > SELECT objtype, objname, provider, label FROM pg_seclabels > ORDER BY objtype, objname; > ! ERROR: rel 16 already exists > -- clean up objects > DROP FUNCTION seclabel_four(); > DROP DOMAIN seclabel_domain; > *************** > *** 117,123 **** > -- make sure we don't have any leftovers > SELECT objtype, objname, provider, label FROM pg_seclabels > ORDER BY objtype, objname; > ! objtype | objname | provider | label > ! ---------+---------+----------+------- > ! (0 rows) > ! > --- 105,108 ---- > -- make sure we don't have any leftovers > SELECT objtype, objname, provider, label FROM pg_seclabels > ORDER BY objtype, objname; > ! ERROR: rel 16 already exists > > ====================================================================== > > *** > /stuff/postgresql-9.2.7/src/test/regress/expected/foreign_data.out > Mon Feb 17 13:38:15 2014 > --- /stuff/postgresql-9.2.7/src/test/regress/results/foreign_data.out > Tue Mar 11 16:55:46 2014 > *************** > *** 900,922 **** > (7 rows) > > SELECT * FROM information_schema.usage_privileges WHERE object_type > LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; > ! grantor | grantee | object_catalog | > object_schema | object_name | object_type | privilege_type | > is_grantable > ! > -------------------+-----------------------+----------------+---------------+-------------+----------------------+----------------+-------------- > ! foreign_data_user | foreign_data_user | regression > | | foo | FOREIGN DATA WRAPPER | USAGE | YES > ! foreign_data_user | foreign_data_user | regression > | | s6 | FOREIGN SERVER | USAGE | YES > ! foreign_data_user | regress_test_indirect | regression > | | foo | FOREIGN DATA WRAPPER | USAGE | NO > ! foreign_data_user | regress_test_role2 | regression > | | s6 | FOREIGN SERVER | USAGE | YES > ! (4 rows) > ! > SELECT * FROM information_schema.role_usage_grants WHERE object_type > LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; > ! grantor | grantee | object_catalog | > object_schema | object_name | object_type | privilege_type | > is_grantable > ! > -------------------+-----------------------+----------------+---------------+-------------+----------------------+----------------+-------------- > ! foreign_data_user | foreign_data_user | regression > | | foo | FOREIGN DATA WRAPPER | USAGE | YES > ! foreign_data_user | foreign_data_user | regression > | | s6 | FOREIGN SERVER | USAGE | YES > ! foreign_data_user | regress_test_indirect | regression > | | foo | FOREIGN DATA WRAPPER | USAGE | NO > ! foreign_data_user | regress_test_role2 | regression > | | s6 | FOREIGN SERVER | USAGE | YES > ! (4 rows) > ! > SELECT * FROM information_schema.foreign_tables ORDER BY 1, 2, 3; > foreign_table_catalog | foreign_table_schema | foreign_table_name | > foreign_server_catalog | foreign_server_name > > -----------------------+----------------------+--------------------+------------------------+--------------------- > --- 900,908 ---- > (7 rows) > > SELECT * FROM information_schema.usage_privileges WHERE object_type > LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; > ! ERROR: could not open relation with OID 0 > SELECT * FROM information_schema.role_usage_grants WHERE object_type > LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; > ! ERROR: could not open relation with OID 0 > SELECT * FROM information_schema.foreign_tables ORDER BY 1, 2, 3; > foreign_table_catalog | foreign_table_schema | foreign_table_name | > foreign_server_catalog | foreign_server_name > > -----------------------+----------------------+--------------------+------------------------+--------------------- > *************** > *** 943,961 **** > (5 rows) > > SELECT * FROM information_schema.usage_privileges WHERE object_type > LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; > ! grantor | grantee | object_catalog | > object_schema | object_name | object_type | privilege_type | > is_grantable > ! > -------------------+-----------------------+----------------+---------------+-------------+----------------------+----------------+-------------- > ! foreign_data_user | regress_test_indirect | regression > | | foo | FOREIGN DATA WRAPPER | USAGE | NO > ! foreign_data_user | regress_test_role2 | regression > | | s6 | FOREIGN SERVER | USAGE | YES > ! (2 rows) > ! > SELECT * FROM information_schema.role_usage_grants WHERE object_type > LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; > ! grantor | grantee | object_catalog | > object_schema | object_name | object_type | privilege_type | > is_grantable > ! > -------------------+-----------------------+----------------+---------------+-------------+----------------------+----------------+-------------- > ! foreign_data_user | regress_test_indirect | regression > | | foo | FOREIGN DATA WRAPPER | USAGE | NO > ! foreign_data_user | regress_test_role2 | regression > | | s6 | FOREIGN SERVER | USAGE | YES > ! (2 rows) > ! > DROP USER MAPPING FOR current_user SERVER t1; > SET ROLE regress_test_role2; > SELECT * FROM information_schema.user_mapping_options ORDER BY 1, 2, 3, 4; > --- 929,937 ---- > (5 rows) > > SELECT * FROM information_schema.usage_privileges WHERE object_type > LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; > ! ERROR: rel 11 already exists > SELECT * FROM information_schema.role_usage_grants WHERE object_type > LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4, 5; > ! ERROR: rel 14 already exists > DROP USER MAPPING FOR current_user SERVER t1; > SET ROLE regress_test_role2; > SELECT * FROM information_schema.user_mapping_options ORDER BY 1, 2, 3, 4; > > ====================================================================== > > *** /stuff/postgresql-9.2.7/src/test/regress/expected/window.out Mon > Feb 17 13:38:15 2014 > --- /stuff/postgresql-9.2.7/src/test/regress/results/window.out Tue > Mar 11 16:55:44 2014 > *************** > *** 957,966 **** > > -- with UNION > SELECT count(*) OVER (PARTITION BY four) FROM (SELECT * FROM tenk1 > UNION ALL SELECT * FROM tenk2)s LIMIT 0; > ! count > ! ------- > ! (0 rows) > ! > -- ordering by a non-integer constant is allowed > SELECT rank() OVER (ORDER BY length('abc')); > rank > --- 957,963 ---- > > -- with UNION > SELECT count(*) OVER (PARTITION BY four) FROM (SELECT * FROM tenk1 > UNION ALL SELECT * FROM tenk2)s LIMIT 0; > ! ERROR: rel 4 already exists > -- ordering by a non-integer constant is allowed > SELECT rank() OVER (ORDER BY length('abc')); > rank > > ====================================================================== > > *** /stuff/postgresql-9.2.7/src/test/regress/expected/with.out Mon > Feb 17 13:38:15 2014 > --- /stuff/postgresql-9.2.7/src/test/regress/results/with.out Tue Mar > 11 16:55:57 2014 > *************** > *** 384,396 **** > (SELECT 2 AS i UNION ALL SELECT 3 AS i) AS t2 > JOIN t ON (t2.i = t.i+1)) > SELECT * FROM t; > ! i | j > ! ---+--- > ! 1 | 2 > ! 2 | 3 > ! 3 | 4 > ! (3 rows) > ! > -- > -- different tree example > -- > --- 384,390 ---- > (SELECT 2 AS i UNION ALL SELECT 3 AS i) AS t2 > JOIN t ON (t2.i = t.i+1)) > SELECT * FROM t; > ! ERROR: rel 6 already exists > -- > -- different tree example > -- > > ====================================================================== > > > > Bye > Rainer > > On 04.03.2014 17:21, Rainer Tammer wrote: >> Hello, >> the problem has reached the development support at IBM. >> >> As soon as I have a solution for the problem I will setup a build farm >> member (hopefully). >> >> I can probably smoke HEAD on AIX 7.1 and 6.1. >> >> Bye >> Rainer >> >> On 27.02.2014 08:21, Rainer Tammer wrote: >>> Hello, >>> I have checked this: >>> >>> >From prepared_xacts: >>> >>> LOG: Send kill to PID: 11731118 >>> LOG: Retrun value of kill PID: ret= 0 >>> >>> From: plpgsql >>> >>> LOG: Send kill to PID: 11731120 >>> LOG: Retrun value of kill PID: ret= 0 >>> >>> A kill -2 on the shell does not stop the backend process. >>> A kill -15 does stop the backend process. >>> >>> So the kill does not return a failure. >>> >>> The code in /src/backend/storage/lmgr/proc.c - CheckStatementTimeout() >>> of 8.4.20 >>> does not look much different. The code in pqsignal.c is also the not >>> really different. >>> >>> Bye >>> Rainer >>> >>> On 26.02.2014 18:00, Tom Lane wrote: >>>> Rainer Tammer <pgsql@spg.schulergroup.com> writes: >>>>> So we can be pretty sure that the SIGINT does not interrupt semop() on >>>>> AIX 71. >>>> Looks that way :-(. It would be interesting to add a check to see if the >>>> kill() is returning a failure indication or not. The existing code >>>> doesn't bother because there's not really anything it can do about it, >>>> but for this purpose it would be good to know. >>>> >>>> regards, tom lane >>>> >>>> >> > >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > it looks like these failures are caused by compiler optimizations -O2. > Without optimization the random failures do not show up. Yeah, that looked suspiciously compiler-bug-like to me. It's conceivable that it's not a compiler bug but something we're doing wherein the results are undefined per C standard ... but since we've not seen similar reports on other platforms, and since the misbehavior goes away again in TL3, I'm betting it's the compiler's fault. regards, tom lane
Hello, I can see miscellaneous test failing on 7.1 TL3, too ... if I use -O2. With -O1 there is no problem. These are the failing / working flags. This problem was discovered by IBM. # make check error export CFLAGS="-qsuppress=1500-010 -O2 -qmaxmem=16384 -qsrcmsg" # make check ok export CFLAGS="-qsuppress=1500-010 -O1 -qmaxmem=16384 -qsrcmsg" They have forwarded this to the compiler development team in Toronto. I think that we can ignore this here. I will post the outcome of this problem. So that we can include a note in the INSTALL. Bye Rainer On 12.03.2014 17:14, Tom Lane wrote: > Rainer Tammer <pgsql@spg.schulergroup.com> writes: >> it looks like these failures are caused by compiler optimizations -O2. >> Without optimization the random failures do not show up. > Yeah, that looked suspiciously compiler-bug-like to me. It's conceivable > that it's not a compiler bug but something we're doing wherein the results > are undefined per C standard ... but since we've not seen similar reports > on other platforms, and since the misbehavior goes away again in TL3, I'm > betting it's the compiler's fault. > > regards, tom lane > >
Hello, Here is a little status update: If you compile PostgreSQL 9.2.7 on AIX 7.1 TL2 and then copy the binaries to a AIX 7.1 TL3 everything is fine - no hangs. It does look like the C/C++ compiler (same version/patch level) is producing a working binary on AIX 7.1 < TL3 and failing one on AIX 7.1 = TL3 (latest OS level available). Currently the IBM kernel development support is analyzing the problem. Bye Rainer
Hello, It looks like the upcomming AIX 7.1 TL3 SP3 will fix the PostgreSQL 9.x build problems. As soon as the new SP3 is released I will start setting up a build farm member. Bye Rainer On 25.03.2014 17:39, Rainer Tammer wrote: > Hello, > Here is a little status update: > > If you compile PostgreSQL 9.2.7 on AIX 7.1 TL2 and then copy the > binaries to a AIX 7.1 TL3 everything is fine - no hangs. > > It does look like the C/C++ compiler (same version/patch level) is > producing a working binary on AIX 7.1 < TL3 and failing one on AIX 7.1 = > TL3 (latest OS level available). > > Currently the IBM kernel development support is analyzing the problem. > > > Bye > Rainer > >
Rainer Tammer wrote: > Hello, > It looks like the upcomming AIX 7.1 TL3 SP3 will fix the PostgreSQL 9.x > build problems. > > As soon as the new SP3 is released I will start setting up a build farm > member. Upcoming when? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hello, There is no release date yet, but AIX 7.1 TL3 SP3 is needed to exploit all Power 8 features. The Power 8 announcement was 2014-04-28. The first machines will ship in early June. So I expect a release of SP3 in the next 4 to 6 weeks. Bye Rainer On 06.05.2014 18:35, Alvaro Herrera wrote: > Rainer Tammer wrote: >> Hello, >> It looks like the upcomming AIX 7.1 TL3 SP3 will fix the PostgreSQL 9.x >> build problems. >> >> As soon as the new SP3 is released I will start setting up a build farm >> member. > Upcoming when? >
Hello, The AIX 7.1 TL3 SP3 was released yesterday. At least PostgreSQL 9.2.7 does now pass all regression test (IBM XL C/C++ V11.1.0.9). I will post more information as soon as I get the details from IBM. The next step is to setup a build farm LPAR. This could take a bit. I am currently involved in two larger migration projects.. How should I post additions to ./INSTALL (format)? Bye Rainer On 06.05.2014 18:35, Alvaro Herrera wrote: > Rainer Tammer wrote: >> Hello, >> It looks like the upcomming AIX 7.1 TL3 SP3 will fix the PostgreSQL 9.x >> build problems. >> >> As soon as the new SP3 is released I will start setting up a build farm >> member. > Upcoming when? >
On 06/04/2014 09:03 AM, Rainer Tammer wrote: > How should I post additions to ./INSTALL (format)? There is no ./INSTALL file in the PostgreSQL source tree, so I'm not sure where you got that from. There are sections on platform-specific compilation instructions in the docs: http://www.postgresql.org/docs/devel/static/installation-platform-notes.html#INSTALLATION-NOTES-AIX. If you need anything added there, a diff file to doc/src/sgml/installation.sgml is preferable. Or if the changes are small, just post the changes you want to be made in free format, and I can incorporate them in the .sgml file by hand. - Heikki
Hello, There is a INSTALL in ./ : root@build71 rc:0 # pwd /daten/source/postgresql-9.2.7 root@build71 rc:0 # ls -al INSTALL -rw-r--r-- 1 tammer staff 76825 Feb 17 20:56 INSTALL I have downloaded the source distribution and unpacked the TAR. Bye Rainer On 04.06.2014 09:53, Heikki Linnakangas wrote: > On 06/04/2014 09:03 AM, Rainer Tammer wrote: >> How should I post additions to ./INSTALL (format)? > > There is no ./INSTALL file in the PostgreSQL source tree, so I'm not > sure where you got that from. > > There are sections on platform-specific compilation instructions in > the docs: > http://www.postgresql.org/docs/devel/static/installation-platform-notes.html#INSTALLATION-NOTES-AIX. > If you need anything added there, a diff file to > doc/src/sgml/installation.sgml is preferable. Or if the changes are > small, just post the changes you want to be made in free format, and I > can incorporate them in the .sgml file by hand. > > - Heikki > >
Hello, Sorry... I suspect that the INSTALL is generated from the sgml file, right? Bye Rainer On 04.06.2014 09:53, Heikki Linnakangas wrote: > On 06/04/2014 09:03 AM, Rainer Tammer wrote: >> How should I post additions to ./INSTALL (format)? > > There is no ./INSTALL file in the PostgreSQL source tree, so I'm not > sure where you got that from. > > There are sections on platform-specific compilation instructions in > the docs: > http://www.postgresql.org/docs/devel/static/installation-platform-notes.html#INSTALLATION-NOTES-AIX. > If you need anything added there, a diff file to > doc/src/sgml/installation.sgml is preferable. Or if the changes are > small, just post the changes you want to be made in free format, and I > can incorporate them in the .sgml file by hand. > > - Heikki > >
On 06/04/2014 11:37 AM, Rainer Tammer wrote: > Hello, > Sorry... > > I suspect that the INSTALL is generated from the sgml file, right? Ah, you're right. I forgot we do that, when building the tarballs :-). INSTALL is created from three sgml files: standalone-install.sgml installation.sgml and version.sgml. - Heikki
Hello, This is the error description for the AIX 7.1 TL3 SP1/SP2 bug: This bug is fixed in AIX 7.1 TL3 SP3. IV56118: AT 7.1 TL 3, CALLING LONGJMP CAUSES SIGINT TO BE BLOCKED APPLIES TO AIX 7100-03 14/06/02 PTF PECHANGE APAR status Closed as program error. Error description If a program calls setjmp/longjmp, it will not be possible to use Ctl-C to exit the program any more, since SIGINT will be blocked. Local fix Using _setjmp and _longjmp can avoid the issue. Problem summary **************************************************************** * USERS AFFECTED: * Systems running the 7100-03 Technology Level with the * bos.rte.libc fileset below the 7.1.3.15 level. **************************************************************** * PROBLEM DESCRIPTION: * If a program calls setjmp/longjmp, it will not be * possible to use Ctl-C to exit the program any more, since * SIGINT will be blocked. **************************************************************** * RECOMMENDATION: * Install APAR IV56118. **************************************************************** Problem conclusion longjmp() and siglongjmp() restore the signal mask correctly. Bye Rainer
Hello, Just a short question regarding an AIX smoker: How will the smoke script build PostgreSQL? Does it try all build options or can I choose appropriate options? For example the is no TCL on the machine. Bye Rainer On 25.02.2014 18:47, Alvaro Herrera wrote: > Rainer Tammer escribió: > >> P.S.: What do you think of a smoker on the IBM developer cloud? >> If this is interesting I might organize the setup. > Please do get some Power buildfarm members setup if it's within your, > err, powers. See here for instructions: > http://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto > > If you can get machines for all architectures across all supported > versions of AIX, that'd be great. Since these Power machines are not > very common, perhaps it'd be good if you can get one running Linux too; > we only have PowerPC running NetBSD and Mac OS X, nothing on POWER7. > > There's no need for anyone to access the machines. There's a push > script that's used to upload the test results. >
Rainer Tammer <pgsql@spg.schulergroup.com> writes: > Just a short question regarding an AIX smoker: > How will the smoke script build PostgreSQL? > Does it try all build options or can I choose appropriate options? > For example the is no TCL on the machine. If you're using the standard buildfarm client (and you should be) then you can select the configure options, environment variables, etc in its configuration file. If there's more than one configuration you want to test, create a separate instance ("animal") for each one. See http://buildfarm.postgresql.org/cgi-bin/register-form.pl and particularly the HowTo linked from there. regards, tom lane