Thread: porting question: funky uid names?

porting question: funky uid names?

From
Mark Bixby
Date:
Hi pgsql-hackers,

I'm currently porting 7.0.3 to the HP MPE/iX OS to join my other ports of
Apache, BIND, sendmail, Perl, and others.  I'm at the point where I'm trying to
run the "make runcheck" regression tests, and I've just run into a problem
where I need to seek the advice of psql-hackers.

MPE is a proprietary OS with a POSIX layer on top.  The concept of POSIX uids
and gids has been mapped to the concept of MPE usernames and MPE accountnames. 
An example MPE username would be "MGR.BIXBY", and if you do a POSIX
getpwuid(getuid()), the contents of pw_name will be the same "MGR.BIXBY".

The fact that pw_name contains a period on MPE has been confusing to some
previous ports I've done, and it now appears PostgreSQL is being confused too. 
Make runcheck is dying in the initdb phase:

Creating global relations in /blah/blah/blah
ERROR:  pg_atoi: error in "BIXBY": can't parse "BIXBY"
ERROR:  pg_atoi: error in "BIXBY": can't parse "BIXBY"
syntax error 25 : -> .

I'm guessing that something tried to parse "MGR.BIXBY", saw the decimal point
character and passed the string to pg_atoi() thinking it's a number instead of
a name.  This seems like a really bad omen hinting at trouble on a fundamental
level.

What are my options here?

1) I'm screwed; go try porting MySQL instead.  ;-)

2) Somehow modify username parsing to be tolerant of the "." character?  I was
able to do this when I ported sendmail.  Where should I be looking in the
PostgreSQL source?  Is this going to require language grammar changes?

3) Always specify numeric uids instead of user names.  Is this even possible?

Your advice will be greatly appreciated.  MPE users are currently whining on
their mailing list about the lack of standard databases for the platform, and I
wanted to surprise them by releasing a PostgreSQL port.

Thanks!
-- 
mark@bixby.org
Remainder of .sig suppressed to conserve scarce California electrons...


Re: porting question: funky uid names?

From
Tom Lane
Date:
Mark Bixby <mark@bixby.org> writes:
> MPE is a proprietary OS with a POSIX layer on top.  The concept of
> POSIX uids and gids has been mapped to the concept of MPE usernames
> and MPE accountnames.  An example MPE username would be "MGR.BIXBY",
> and if you do a POSIX getpwuid(getuid()), the contents of pw_name will
> be the same "MGR.BIXBY".

Hm.  And what is returned in pw_uid?

I think you are getting burnt by initdb's attempt to assign the postgres
superuser's numeric ID to be the same as the Unix userid number of the
user running initdb.  Look at the uses of pg_id in the initdb script,
and experiment with running pg_id by hand to see what it produces.

A quick and dirty experiment would be to run "initdb -i 42" (or
whatever) to override the result of pg_id.  If that succeeds, the
real answer may be that pg_id needs a patch to behave reasonably on MPE.

Let us know...
        regards, tom lane


Re: porting question: funky uid names?

From
Mark Bixby
Date:

Tom Lane wrote:
> 
> Mark Bixby <mark@bixby.org> writes:
> > MPE is a proprietary OS with a POSIX layer on top.  The concept of
> > POSIX uids and gids has been mapped to the concept of MPE usernames
> > and MPE accountnames.  An example MPE username would be "MGR.BIXBY",
> > and if you do a POSIX getpwuid(getuid()), the contents of pw_name will
> > be the same "MGR.BIXBY".
> 
> Hm.  And what is returned in pw_uid?

A valid numeric uid.

> I think you are getting burnt by initdb's attempt to assign the postgres
> superuser's numeric ID to be the same as the Unix userid number of the
> user running initdb.  Look at the uses of pg_id in the initdb script,
> and experiment with running pg_id by hand to see what it produces.

pg_id without parameters returns uid=484(MGR.BIXBY), which matches what I get
from MPE's native id command.

The pg_id -n and -u options behave as expected.

> A quick and dirty experiment would be to run "initdb -i 42" (or
> whatever) to override the result of pg_id.  If that succeeds, the
> real answer may be that pg_id needs a patch to behave reasonably on MPE.

I just hacked src/test/regress/run_check.sh to invoke initdb with --show.  The
user name/id is behaving "correctly" for an MPE machine:

SUPERUSERNAME:  MGR.BIXBY
SUPERUSERID:    484

The initdb -i option will only override the SUPERUSERID, but it's already
correct.
-- 
mark@bixby.org
Remainder of .sig suppressed to conserve scarce California electrons...


Re: porting question: funky uid names?

From
Tom Lane
Date:
Mark Bixby <mark@bixby.org> writes:
> I just hacked src/test/regress/run_check.sh to invoke initdb with
> --show.  The user name/id is behaving "correctly" for an MPE machine:

> SUPERUSERNAME:  MGR.BIXBY
> SUPERUSERID:    484

Okay, so much for that theory.

Can you set a breakpoint at elog() and provide a stack backtrace so we
can see where this is happening?  I can't think where else in the code
might be affected, but obviously the problem is somewhere else...
        regards, tom lane


Re: porting question: funky uid names?

From
Peter Eisentraut
Date:
Mark Bixby writes:

> Creating global relations in /blah/blah/blah
> ERROR:  pg_atoi: error in "BIXBY": can't parse "BIXBY"
> ERROR:  pg_atoi: error in "BIXBY": can't parse "BIXBY"
> syntax error 25 : -> .

I'm curious about that last line.  Is that the shell complaining?

The offending command seems to be

insert OID = 0 ( POSTGRES PGUID t t t t _null_ _null_ )

in the file global1.bki.source.  (This is the file the creates the global
relations.)  The POSTGRES and PGUID quantities are substituted when initdb
runs:

cat "$GLOBAL" \   | sed -e "s/POSTGRES/$POSTGRES_SUPERUSERNAME/g" \         -e "s/PGUID/$POSTGRES_SUPERUSERID/g" \   |
"$PGPATH"/postgres$BACKENDARGS template1
 

For some reason the line probably ends up being

insert OID = 0 ( MGR BIXBY 484 t t t t _null_ _null_ )                   ^
which causes the observed failure to parse BIXBY as user id.  This brings
us back to why the dot disappears, which seems to be related to the error
message

syntax error 25 : -> .                   ^^^

Can you try using a different a sed command (e.g, GNU sed)?

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: porting question: funky uid names?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> cat "$GLOBAL" \
>     | sed -e "s/POSTGRES/$POSTGRES_SUPERUSERNAME/g" \
>           -e "s/PGUID/$POSTGRES_SUPERUSERID/g" \
>     | "$PGPATH"/postgres $BACKENDARGS template1

> For some reason the line probably ends up being

> insert OID = 0 ( MGR BIXBY 484 t t t t _null_ _null_ )
>                     ^
> which causes the observed failure to parse BIXBY as user id.

Good thought.  Just looking at this, I wonder if we shouldn't flip the
order of the sed patterns --- as is, won't it mess up if the superuser
name contains PGUID?

A further exercise would be to make it not foul up if the superuser name
contains '/'.  I'd be kind of inclined to use ':' for the pattern
delimiter, since in normal Unix practice usernames can't contain colons
(cf. passwd file format).  Of course one doesn't generally put a slash
in a username either, but I think it's physically possible to do it...

But none of these fully explain Mark's problem.  If we knew where the
"syntax error 25 : -> ." came from, we'd be closer to an answer.
        regards, tom lane



Re: porting question: funky uid names?

From
Mark Bixby
Date:

Tom Lane wrote:
> But none of these fully explain Mark's problem.  If we knew where the
> "syntax error 25 : -> ." came from, we'd be closer to an answer.

After scanning the source for "syntax error", line 126 of
backend/bootstrap/bootscanner.l seems to be the likely culprit.
-- 
mark@bixby.org
Remainder of .sig suppressed to conserve scarce California electrons...


Re: porting question: funky uid names?

From
Tom Lane
Date:
Mark Bixby <mark@bixby.org> writes:
> Tom Lane wrote:
>> But none of these fully explain Mark's problem.  If we knew where the
>> "syntax error 25 : -> ." came from, we'd be closer to an answer.

> After scanning the source for "syntax error", line 126 of
> backend/bootstrap/bootscanner.l seems to be the likely culprit.

Oh, of course: foo.bar is not a single token to the boot scanner.
It needs to be in quotes.  Try this patch (line numbers are for 7.1
but probably OK for 7.0.*)

*** src/include/catalog/pg_shadow.h~    Wed Jan 24 16:01:30 2001
--- src/include/catalog/pg_shadow.h     Fri Mar  9 16:57:53 2001
***************
*** 73,78 ****  * user choices.  * ----------------  */
! DATA(insert OID = 0 ( POSTGRES PGUID t t t t _null_ _null_ ));
 #endif         /* PG_SHADOW_H */
--- 73,78 ----  * user choices.  * ----------------  */
! DATA(insert OID = 0 ( "POSTGRES" PGUID t t t t _null_ _null_ ));
 #endif         /* PG_SHADOW_H */


You'll need to rebuild global.bki (over in src/backend/catalog)
afterwards, but the executables don't change.
        regards, tom lane


Re: porting question: funky uid names?

From
Mark Bixby
Date:

Tom Lane wrote:
> 
> Mark Bixby <mark@bixby.org> writes:
> > I just hacked src/test/regress/run_check.sh to invoke initdb with
> > --show.  The user name/id is behaving "correctly" for an MPE machine:
> 
> > SUPERUSERNAME:  MGR.BIXBY
> > SUPERUSERID:    484
> 
> Okay, so much for that theory.
> 
> Can you set a breakpoint at elog() and provide a stack backtrace so we
> can see where this is happening?  I can't think where else in the code
> might be affected, but obviously the problem is somewhere else...

Here's a stack trace from the native MPE debugger (we don't have gdb support
yet).  I'm assuming that all results after the initdb failure should be
suspect, and that's possibly why pg_log wasn't created.  I haven't tried
troubleshooting the pg_log problem yet until after I resolve the uid names
issue.

=============== Initializing check database instance   ================
DEBUG/iX C.25.06 

DEBUG Intrinsic at: 129.0009d09c ?$START$
$1 ($4b) nmdebug > b elog
added: NM    [1] PROG 129.001ad7d8 elog
$2 ($4b) nmdebug > c
Break at: NM    [1] PROG 129.001ad7d8 elog
$3 ($4b) nmdebug > tr    PC=129.001ad7d8 elog
* 0) SP=41843ef0 RP=129.0018f7a4 pg_atoi+$b4 1) SP=41843ef0 RP=129.00182994 int4in+$14 2) SP=41843e70 RP=129.0018296c
?int4in+$8     export stub: 129.001aed28 $CODE$+$138 3) SP=41843e30 RP=129.001af428 fmgr+$98 4) SP=41843db0
RP=129.000c3354InsertOneValue+$264 5) SP=41843cf0 RP=129.000c05d4 Int_yyparse+$924 6) SP=41843c70 RP=129.00000000
(endof NM stack)
 
$4 ($4b) nmdebug > c
=============== Starting regression postmaster         ================
Regression postmaster is running - PID=125239393 PGPORT=65432
=============== Creating regression database...        ================
NOTICE:  mdopen: couldn't open
/BIXBY/PUB/src/postgresql-7.0.3-mpe/src/test/regr
ess/tmp_check/data/pg_log: No such file or directory
NOTICE:  mdopen: couldn't open
/BIXBY/PUB/src/postgresql-7.0.3-mpe/src/test/regr
ess/tmp_check/data/pg_log: No such file or directory
psql: FATAL 1:  cannot open relation pg_log
createdb: database creation failed
createdb failed
make: *** [runcheck] Error 1
-- 
mark@bixby.org
Remainder of .sig suppressed to conserve scarce California electrons...


Re: porting question: funky uid names?

From
Mark Bixby
Date:

Tom Lane wrote:
> Oh, of course: foo.bar is not a single token to the boot scanner.
> It needs to be in quotes.  Try this patch (line numbers are for 7.1
> but probably OK for 7.0.*)
> 
...snip...
> --- src/include/catalog/pg_shadow.h     Fri Mar  9 16:57:53 2001
...snip...
> ! DATA(insert OID = 0 ( "POSTGRES" PGUID t t t t _null_ _null_ ));
> 
>   #endif         /* PG_SHADOW_H */
> 
> You'll need to rebuild global.bki (over in src/backend/catalog)
> afterwards, but the executables don't change.

I modified pg_shadow.h as instructed and ran a make from src, and that rebuilt
global1.bki.source in src/backend/catalog.

However, when I did make runtest, it appears to install from
src/backend/global1.bki.source which was still the old version.  I modified
that old version by hand and reran make runtest.  The uid name error has been
solved.  Thanks!

So why is there a backend/global1.bki.source *and* a
backend/catalog/global1.bki.source?

But now runcheck dies during the install of PL/pgSQL, with createlang
complaining about a missing lib/plpgsql.sl.

I did do an MPE implementation of dynloader.c, but I was under the dim
impression this was only used for user-added functions, not core
functionality.  Am I mistaken?  Are you dynaloading core functionality too?

It seems that plpgsql.sl didn't get built.  Might be an autoconf issue, since
quite frequently config scripts don't know about shared libraries on MPE.  I
will investigate this further.
--
mark@bixby.org
Remainder of .sig suppressed to conserve scarce California electrons...


Re: porting question: funky uid names?

From
Mark Bixby
Date:

Mark Bixby wrote:
> It seems that plpgsql.sl didn't get built.  Might be an autoconf issue, since
> quite frequently config scripts don't know about shared libraries on MPE.  I
> will investigate this further.

Ah.  I found src/Makefile.shlib and added the appropriate stuff.

Woohoo!  We have test output!  The regression README was clear about how some
platform dependent errors can be expected, and how to code for these
differences in the expected outputs.

Now I'm off to examine the individual failures....

MULTIBYTE=;export MULTIBYTE; \
/bin/sh ./run_check.sh hppa1.0-hp-mpeix
=============== Removing old ./tmp_check directory ... ================
=============== Create ./tmp_check directory           ================
=============== Installing new build into ./tmp_check  ================
=============== Initializing check database instance   ================
=============== Starting regression postmaster         ================
Regression postmaster is running - PID=125042790 PGPORT=65432
=============== Creating regression database...        ================
CREATE DATABASE
=============== Installing PL/pgSQL...                 ================
=============== Running regression queries...          ================
parallel group1 (12 tests)           ...boolean  text  name  oid  float4  varchar  char  int4  int2  float8  int8 
nume
ric           test boolean              ...  ok          test char                 ...  ok          test name
     ...  ok          test varchar              ...  ok          test text                 ...  ok          test int2
             ...  ok          test int4                 ...  ok          test int8                 ...  ok
testoid                  ...  ok          test float4               ...  ok          test float8               ...
FAILED         test numeric              ...  ok
 
sequential test strings              ...  ok
sequential test numerology           ...  ok
parallel group2 (15 tests)           ...comments  path  polygon  lseg  point  box  reltime  interval  tinterval 
circle inet  timestamp  type_sanity  opr_sanity  oidjoins           test point                ...  ok          test
lseg                ...  ok          test box                  ...  ok          test path                 ...  ok
  test polygon              ...  ok          test circle               ...  ok          test interval             ...
FAILED         test timestamp            ...  FAILED          test reltime              ...  ok          test tinterval
          ...  ok          test inet                 ...  ok          test comments             ...  ok          test
oidjoins            ...  ok          test type_sanity          ...  ok          test opr_sanity           ...  ok
 
sequential test abstime              ...  ok
sequential test geometry             ...  FAILED
sequential test horology             ...  FAILED
sequential test create_function_1    ...  ok
sequential test create_type          ...  ok
sequential test create_table         ...  ok
sequential test create_function_2    ...  ok
sequential test copy                 ...  ok
parallel group3 (6 tests)            ...create_aggregate  create_operator  triggers  constraints  create_misc 
create_i
ndex           test constraints          ...  ok          test triggers             ...  ok          test create_misc
      ...  ok          test create_aggregate     ...  ok          test create_operator      ...  ok          test
create_index        ...  ok
 
sequential test create_view          ...  ok
sequential test sanity_check         ...  ok
sequential test errors               ...  ok
sequential test select               ...  ok
parallel group4 (16 tests)           ...arrays  union  select_having  transactions  portals  join  select_implicit 
sel
ect_distinct_on  subselect  case  random  select_distinct  select_into 
aggregat
es  hash_index  btree_index           test select_into          ...  ok          test select_distinct      ...  ok
   test select_distinct_on   ...  ok          test select_implicit      ...  ok          test select_having        ...
ok         test subselect            ...  ok          test union                ...  ok          test case
  ...  ok          test join                 ...  ok          test aggregates           ...  ok          test
transactions        ...  ok          test random               ...  ok          test portals              ...  ok
  test arrays               ...  ok          test btree_index          ...  ok          test hash_index           ...
ok
sequential test misc                 ...  ok
parallel group5 (5 tests)            ...portals_p2  foreign_key  rules  alter_table  select_views           test
select_views        ...  ok          test alter_table          ...  ok          test portals_p2           ...  ok
  test rules                ...  ok          test foreign_key          ...  ok
 
parallel group6 (3 tests)            ...temp  limit  plpgsql           test limit                ...  ok          test
plpgsql             ...  FAILED          test temp                 ...  ok
 
=============== Terminating regression postmaster      ================
ACTUAL RESULTS OF REGRESSION TEST ARE NOW IN FILES run_check.out
AND regress.out

To run the optional big test(s) too, type 'make bigcheck'
These big tests can take over an hour to complete
These actually are: numeric_big
-- 
mark@bixby.org
Remainder of .sig suppressed to conserve scarce California electrons...


Re: porting question: funky uid names?

From
Tom Lane
Date:
Mark Bixby <mark@bixby.org> writes:
> So why is there a backend/global1.bki.source *and* a
> backend/catalog/global1.bki.source?

You don't want to know ;-) ... it's all cleaned up for 7.1 anyway.
I think in 7.0 you have to run make install in src/backend to get the
.bki files installed.

> But now runcheck dies during the install of PL/pgSQL, with createlang
> complaining about a missing lib/plpgsql.sl.

> I did do an MPE implementation of dynloader.c, but I was under the dim
> impression this was only used for user-added functions, not core
> functionality.  Am I mistaken?  Are you dynaloading core functionality too?

No, but the regress tests try to test plpgsql too ... you should be able
to dike out the createlang call and have all tests except the plpgsql
regress test work.
        regards, tom lane


Re: porting question: funky uid names?

From
Mark Bixby
Date:

Tom Lane wrote:
> > But now runcheck dies during the install of PL/pgSQL, with createlang
> > complaining about a missing lib/plpgsql.sl.
> 
> > I did do an MPE implementation of dynloader.c, but I was under the dim
> > impression this was only used for user-added functions, not core
> > functionality.  Am I mistaken?  Are you dynaloading core functionality too?
> 
> No, but the regress tests try to test plpgsql too ... you should be able
> to dike out the createlang call and have all tests except the plpgsql
> regress test work.

Is it possible to re-run failing regression tests individually?  It took
somewhere between 30-45 minutes for me to run the entire suite, and if I have
to run the whole thing every time when I'm only trying to fix just a single
test, that will get old pretty fast, and so will I.  ;-)

Thanks.
-- 
mark@bixby.org
Remainder of .sig suppressed to conserve scarce California electrons...


Re: porting question: funky uid names?

From
Tom Lane
Date:
Mark Bixby <mark@bixby.org> writes:
> Is it possible to re-run failing regression tests individually?

I believe so, but it's not very convenient in the "runcheck" mode, since
that normally wants to make a fresh install and start a temporary
postmaster.  Instead, do a real install, start a real postmaster, and
do "make runtest" to create the regression DB in the real installation.
Then you can basically just do "psql regression <foo.sql" --- look at
the regression driver script to get the details of what switches to
pass and how to do the output comparison.

There are some order dependencies among the tests, but I think all the
ones you were having trouble with should be able to work this way in
an end-state regression DB.

Also, rerunning the whole suite is much quicker this way, since you
don't have to go through install/initdb/start postmaster each time.

BTW, the results you posted looked good --- with the exception of
plpgsql, the failing tests all seemed to be ones that are notorious
for platform-dependent output.
        regards, tom lane