Thread: porting question: funky uid names?
Hi pgsql-hackers, I'm currently porting 7.0.3 to the HP MPE/iX OS to join my other ports of Apache, BIND, sendmail, Perl, and others. I'm at the point where I'm trying to run the "make runcheck" regression tests, and I've just run into a problem where I need to seek the advice of psql-hackers. MPE is a proprietary OS with a POSIX layer on top. The concept of POSIX uids and gids has been mapped to the concept of MPE usernames and MPE accountnames. An example MPE username would be "MGR.BIXBY", and if you do a POSIX getpwuid(getuid()), the contents of pw_name will be the same "MGR.BIXBY". The fact that pw_name contains a period on MPE has been confusing to some previous ports I've done, and it now appears PostgreSQL is being confused too. Make runcheck is dying in the initdb phase: Creating global relations in /blah/blah/blah ERROR: pg_atoi: error in "BIXBY": can't parse "BIXBY" ERROR: pg_atoi: error in "BIXBY": can't parse "BIXBY" syntax error 25 : -> . I'm guessing that something tried to parse "MGR.BIXBY", saw the decimal point character and passed the string to pg_atoi() thinking it's a number instead of a name. This seems like a really bad omen hinting at trouble on a fundamental level. What are my options here? 1) I'm screwed; go try porting MySQL instead. ;-) 2) Somehow modify username parsing to be tolerant of the "." character? I was able to do this when I ported sendmail. Where should I be looking in the PostgreSQL source? Is this going to require language grammar changes? 3) Always specify numeric uids instead of user names. Is this even possible? Your advice will be greatly appreciated. MPE users are currently whining on their mailing list about the lack of standard databases for the platform, and I wanted to surprise them by releasing a PostgreSQL port. Thanks! -- mark@bixby.org Remainder of .sig suppressed to conserve scarce California electrons...
Mark Bixby <mark@bixby.org> writes: > MPE is a proprietary OS with a POSIX layer on top. The concept of > POSIX uids and gids has been mapped to the concept of MPE usernames > and MPE accountnames. An example MPE username would be "MGR.BIXBY", > and if you do a POSIX getpwuid(getuid()), the contents of pw_name will > be the same "MGR.BIXBY". Hm. And what is returned in pw_uid? I think you are getting burnt by initdb's attempt to assign the postgres superuser's numeric ID to be the same as the Unix userid number of the user running initdb. Look at the uses of pg_id in the initdb script, and experiment with running pg_id by hand to see what it produces. A quick and dirty experiment would be to run "initdb -i 42" (or whatever) to override the result of pg_id. If that succeeds, the real answer may be that pg_id needs a patch to behave reasonably on MPE. Let us know... regards, tom lane
Tom Lane wrote: > > Mark Bixby <mark@bixby.org> writes: > > MPE is a proprietary OS with a POSIX layer on top. The concept of > > POSIX uids and gids has been mapped to the concept of MPE usernames > > and MPE accountnames. An example MPE username would be "MGR.BIXBY", > > and if you do a POSIX getpwuid(getuid()), the contents of pw_name will > > be the same "MGR.BIXBY". > > Hm. And what is returned in pw_uid? A valid numeric uid. > I think you are getting burnt by initdb's attempt to assign the postgres > superuser's numeric ID to be the same as the Unix userid number of the > user running initdb. Look at the uses of pg_id in the initdb script, > and experiment with running pg_id by hand to see what it produces. pg_id without parameters returns uid=484(MGR.BIXBY), which matches what I get from MPE's native id command. The pg_id -n and -u options behave as expected. > A quick and dirty experiment would be to run "initdb -i 42" (or > whatever) to override the result of pg_id. If that succeeds, the > real answer may be that pg_id needs a patch to behave reasonably on MPE. I just hacked src/test/regress/run_check.sh to invoke initdb with --show. The user name/id is behaving "correctly" for an MPE machine: SUPERUSERNAME: MGR.BIXBY SUPERUSERID: 484 The initdb -i option will only override the SUPERUSERID, but it's already correct. -- mark@bixby.org Remainder of .sig suppressed to conserve scarce California electrons...
Mark Bixby <mark@bixby.org> writes: > I just hacked src/test/regress/run_check.sh to invoke initdb with > --show. The user name/id is behaving "correctly" for an MPE machine: > SUPERUSERNAME: MGR.BIXBY > SUPERUSERID: 484 Okay, so much for that theory. Can you set a breakpoint at elog() and provide a stack backtrace so we can see where this is happening? I can't think where else in the code might be affected, but obviously the problem is somewhere else... regards, tom lane
Mark Bixby writes: > Creating global relations in /blah/blah/blah > ERROR: pg_atoi: error in "BIXBY": can't parse "BIXBY" > ERROR: pg_atoi: error in "BIXBY": can't parse "BIXBY" > syntax error 25 : -> . I'm curious about that last line. Is that the shell complaining? The offending command seems to be insert OID = 0 ( POSTGRES PGUID t t t t _null_ _null_ ) in the file global1.bki.source. (This is the file the creates the global relations.) The POSTGRES and PGUID quantities are substituted when initdb runs: cat "$GLOBAL" \ | sed -e "s/POSTGRES/$POSTGRES_SUPERUSERNAME/g" \ -e "s/PGUID/$POSTGRES_SUPERUSERID/g" \ | "$PGPATH"/postgres$BACKENDARGS template1 For some reason the line probably ends up being insert OID = 0 ( MGR BIXBY 484 t t t t _null_ _null_ ) ^ which causes the observed failure to parse BIXBY as user id. This brings us back to why the dot disappears, which seems to be related to the error message syntax error 25 : -> . ^^^ Can you try using a different a sed command (e.g, GNU sed)? -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: > cat "$GLOBAL" \ > | sed -e "s/POSTGRES/$POSTGRES_SUPERUSERNAME/g" \ > -e "s/PGUID/$POSTGRES_SUPERUSERID/g" \ > | "$PGPATH"/postgres $BACKENDARGS template1 > For some reason the line probably ends up being > insert OID = 0 ( MGR BIXBY 484 t t t t _null_ _null_ ) > ^ > which causes the observed failure to parse BIXBY as user id. Good thought. Just looking at this, I wonder if we shouldn't flip the order of the sed patterns --- as is, won't it mess up if the superuser name contains PGUID? A further exercise would be to make it not foul up if the superuser name contains '/'. I'd be kind of inclined to use ':' for the pattern delimiter, since in normal Unix practice usernames can't contain colons (cf. passwd file format). Of course one doesn't generally put a slash in a username either, but I think it's physically possible to do it... But none of these fully explain Mark's problem. If we knew where the "syntax error 25 : -> ." came from, we'd be closer to an answer. regards, tom lane
Tom Lane wrote: > But none of these fully explain Mark's problem. If we knew where the > "syntax error 25 : -> ." came from, we'd be closer to an answer. After scanning the source for "syntax error", line 126 of backend/bootstrap/bootscanner.l seems to be the likely culprit. -- mark@bixby.org Remainder of .sig suppressed to conserve scarce California electrons...
Mark Bixby <mark@bixby.org> writes: > Tom Lane wrote: >> But none of these fully explain Mark's problem. If we knew where the >> "syntax error 25 : -> ." came from, we'd be closer to an answer. > After scanning the source for "syntax error", line 126 of > backend/bootstrap/bootscanner.l seems to be the likely culprit. Oh, of course: foo.bar is not a single token to the boot scanner. It needs to be in quotes. Try this patch (line numbers are for 7.1 but probably OK for 7.0.*) *** src/include/catalog/pg_shadow.h~ Wed Jan 24 16:01:30 2001 --- src/include/catalog/pg_shadow.h Fri Mar 9 16:57:53 2001 *************** *** 73,78 **** * user choices. * ---------------- */ ! DATA(insert OID = 0 ( POSTGRES PGUID t t t t _null_ _null_ )); #endif /* PG_SHADOW_H */ --- 73,78 ---- * user choices. * ---------------- */ ! DATA(insert OID = 0 ( "POSTGRES" PGUID t t t t _null_ _null_ )); #endif /* PG_SHADOW_H */ You'll need to rebuild global.bki (over in src/backend/catalog) afterwards, but the executables don't change. regards, tom lane
Tom Lane wrote: > > Mark Bixby <mark@bixby.org> writes: > > I just hacked src/test/regress/run_check.sh to invoke initdb with > > --show. The user name/id is behaving "correctly" for an MPE machine: > > > SUPERUSERNAME: MGR.BIXBY > > SUPERUSERID: 484 > > Okay, so much for that theory. > > Can you set a breakpoint at elog() and provide a stack backtrace so we > can see where this is happening? I can't think where else in the code > might be affected, but obviously the problem is somewhere else... Here's a stack trace from the native MPE debugger (we don't have gdb support yet). I'm assuming that all results after the initdb failure should be suspect, and that's possibly why pg_log wasn't created. I haven't tried troubleshooting the pg_log problem yet until after I resolve the uid names issue. =============== Initializing check database instance ================ DEBUG/iX C.25.06 DEBUG Intrinsic at: 129.0009d09c ?$START$ $1 ($4b) nmdebug > b elog added: NM [1] PROG 129.001ad7d8 elog $2 ($4b) nmdebug > c Break at: NM [1] PROG 129.001ad7d8 elog $3 ($4b) nmdebug > tr PC=129.001ad7d8 elog * 0) SP=41843ef0 RP=129.0018f7a4 pg_atoi+$b4 1) SP=41843ef0 RP=129.00182994 int4in+$14 2) SP=41843e70 RP=129.0018296c ?int4in+$8 export stub: 129.001aed28 $CODE$+$138 3) SP=41843e30 RP=129.001af428 fmgr+$98 4) SP=41843db0 RP=129.000c3354InsertOneValue+$264 5) SP=41843cf0 RP=129.000c05d4 Int_yyparse+$924 6) SP=41843c70 RP=129.00000000 (endof NM stack) $4 ($4b) nmdebug > c =============== Starting regression postmaster ================ Regression postmaster is running - PID=125239393 PGPORT=65432 =============== Creating regression database... ================ NOTICE: mdopen: couldn't open /BIXBY/PUB/src/postgresql-7.0.3-mpe/src/test/regr ess/tmp_check/data/pg_log: No such file or directory NOTICE: mdopen: couldn't open /BIXBY/PUB/src/postgresql-7.0.3-mpe/src/test/regr ess/tmp_check/data/pg_log: No such file or directory psql: FATAL 1: cannot open relation pg_log createdb: database creation failed createdb failed make: *** [runcheck] Error 1 -- mark@bixby.org Remainder of .sig suppressed to conserve scarce California electrons...
Tom Lane wrote: > Oh, of course: foo.bar is not a single token to the boot scanner. > It needs to be in quotes. Try this patch (line numbers are for 7.1 > but probably OK for 7.0.*) > ...snip... > --- src/include/catalog/pg_shadow.h Fri Mar 9 16:57:53 2001 ...snip... > ! DATA(insert OID = 0 ( "POSTGRES" PGUID t t t t _null_ _null_ )); > > #endif /* PG_SHADOW_H */ > > You'll need to rebuild global.bki (over in src/backend/catalog) > afterwards, but the executables don't change. I modified pg_shadow.h as instructed and ran a make from src, and that rebuilt global1.bki.source in src/backend/catalog. However, when I did make runtest, it appears to install from src/backend/global1.bki.source which was still the old version. I modified that old version by hand and reran make runtest. The uid name error has been solved. Thanks! So why is there a backend/global1.bki.source *and* a backend/catalog/global1.bki.source? But now runcheck dies during the install of PL/pgSQL, with createlang complaining about a missing lib/plpgsql.sl. I did do an MPE implementation of dynloader.c, but I was under the dim impression this was only used for user-added functions, not core functionality. Am I mistaken? Are you dynaloading core functionality too? It seems that plpgsql.sl didn't get built. Might be an autoconf issue, since quite frequently config scripts don't know about shared libraries on MPE. I will investigate this further. -- mark@bixby.org Remainder of .sig suppressed to conserve scarce California electrons...
Mark Bixby wrote: > It seems that plpgsql.sl didn't get built. Might be an autoconf issue, since > quite frequently config scripts don't know about shared libraries on MPE. I > will investigate this further. Ah. I found src/Makefile.shlib and added the appropriate stuff. Woohoo! We have test output! The regression README was clear about how some platform dependent errors can be expected, and how to code for these differences in the expected outputs. Now I'm off to examine the individual failures.... MULTIBYTE=;export MULTIBYTE; \ /bin/sh ./run_check.sh hppa1.0-hp-mpeix =============== Removing old ./tmp_check directory ... ================ =============== Create ./tmp_check directory ================ =============== Installing new build into ./tmp_check ================ =============== Initializing check database instance ================ =============== Starting regression postmaster ================ Regression postmaster is running - PID=125042790 PGPORT=65432 =============== Creating regression database... ================ CREATE DATABASE =============== Installing PL/pgSQL... ================ =============== Running regression queries... ================ parallel group1 (12 tests) ...boolean text name oid float4 varchar char int4 int2 float8 int8 nume ric test boolean ... ok test char ... ok test name ... ok test varchar ... ok test text ... ok test int2 ... ok test int4 ... ok test int8 ... ok testoid ... ok test float4 ... ok test float8 ... FAILED test numeric ... ok sequential test strings ... ok sequential test numerology ... ok parallel group2 (15 tests) ...comments path polygon lseg point box reltime interval tinterval circle inet timestamp type_sanity opr_sanity oidjoins test point ... ok test lseg ... ok test box ... ok test path ... ok test polygon ... ok test circle ... ok test interval ... FAILED test timestamp ... FAILED test reltime ... ok test tinterval ... ok test inet ... ok test comments ... ok test oidjoins ... ok test type_sanity ... ok test opr_sanity ... ok sequential test abstime ... ok sequential test geometry ... FAILED sequential test horology ... FAILED sequential test create_function_1 ... ok sequential test create_type ... ok sequential test create_table ... ok sequential test create_function_2 ... ok sequential test copy ... ok parallel group3 (6 tests) ...create_aggregate create_operator triggers constraints create_misc create_i ndex test constraints ... ok test triggers ... ok test create_misc ... ok test create_aggregate ... ok test create_operator ... ok test create_index ... ok sequential test create_view ... ok sequential test sanity_check ... ok sequential test errors ... ok sequential test select ... ok parallel group4 (16 tests) ...arrays union select_having transactions portals join select_implicit sel ect_distinct_on subselect case random select_distinct select_into aggregat es hash_index btree_index test select_into ... ok test select_distinct ... ok test select_distinct_on ... ok test select_implicit ... ok test select_having ... ok test subselect ... ok test union ... ok test case ... ok test join ... ok test aggregates ... ok test transactions ... ok test random ... ok test portals ... ok test arrays ... ok test btree_index ... ok test hash_index ... ok sequential test misc ... ok parallel group5 (5 tests) ...portals_p2 foreign_key rules alter_table select_views test select_views ... ok test alter_table ... ok test portals_p2 ... ok test rules ... ok test foreign_key ... ok parallel group6 (3 tests) ...temp limit plpgsql test limit ... ok test plpgsql ... FAILED test temp ... ok =============== Terminating regression postmaster ================ ACTUAL RESULTS OF REGRESSION TEST ARE NOW IN FILES run_check.out AND regress.out To run the optional big test(s) too, type 'make bigcheck' These big tests can take over an hour to complete These actually are: numeric_big -- mark@bixby.org Remainder of .sig suppressed to conserve scarce California electrons...
Mark Bixby <mark@bixby.org> writes: > So why is there a backend/global1.bki.source *and* a > backend/catalog/global1.bki.source? You don't want to know ;-) ... it's all cleaned up for 7.1 anyway. I think in 7.0 you have to run make install in src/backend to get the .bki files installed. > But now runcheck dies during the install of PL/pgSQL, with createlang > complaining about a missing lib/plpgsql.sl. > I did do an MPE implementation of dynloader.c, but I was under the dim > impression this was only used for user-added functions, not core > functionality. Am I mistaken? Are you dynaloading core functionality too? No, but the regress tests try to test plpgsql too ... you should be able to dike out the createlang call and have all tests except the plpgsql regress test work. regards, tom lane
Tom Lane wrote: > > But now runcheck dies during the install of PL/pgSQL, with createlang > > complaining about a missing lib/plpgsql.sl. > > > I did do an MPE implementation of dynloader.c, but I was under the dim > > impression this was only used for user-added functions, not core > > functionality. Am I mistaken? Are you dynaloading core functionality too? > > No, but the regress tests try to test plpgsql too ... you should be able > to dike out the createlang call and have all tests except the plpgsql > regress test work. Is it possible to re-run failing regression tests individually? It took somewhere between 30-45 minutes for me to run the entire suite, and if I have to run the whole thing every time when I'm only trying to fix just a single test, that will get old pretty fast, and so will I. ;-) Thanks. -- mark@bixby.org Remainder of .sig suppressed to conserve scarce California electrons...
Mark Bixby <mark@bixby.org> writes: > Is it possible to re-run failing regression tests individually? I believe so, but it's not very convenient in the "runcheck" mode, since that normally wants to make a fresh install and start a temporary postmaster. Instead, do a real install, start a real postmaster, and do "make runtest" to create the regression DB in the real installation. Then you can basically just do "psql regression <foo.sql" --- look at the regression driver script to get the details of what switches to pass and how to do the output comparison. There are some order dependencies among the tests, but I think all the ones you were having trouble with should be able to work this way in an end-state regression DB. Also, rerunning the whole suite is much quicker this way, since you don't have to go through install/initdb/start postmaster each time. BTW, the results you posted looked good --- with the exception of plpgsql, the failing tests all seemed to be ones that are notorious for platform-dependent output. regards, tom lane