Thread: 7.1 on DEC/Alpha

7.1 on DEC/Alpha

From
Brent Verner
Date:
Hi, I saw the thread from a few days ago about Linux/Alpha and 7.1. I
believe I'm seeing the same problems with DEC/Alpha (Tru64Unix 4.0D).

I noticed the following in the postmaster.log, which occurs, as the
Linux/Alpha bug report states, during the misc regression test.
 DEBUG:  copy: line 293, XLogWrite: had to create new log file - you probably should do checkpoints more often Server
process(pid 24954) exited with status 139 at Fri Dec 22 17:15:48 2000 Terminating any active server processes... Server
processeswere terminated at Fri Dec 22 17:15:48 2000 Reinitializing shared memory and semaphores DEBUG:  starting up
DEBUG: database system was interrupted at 2000-12-22 17:15:47 DEBUG:  CheckPoint record at (0, 316624) DEBUG:  Redo
recordat (0, 316624); Undo record at (0, 0); Shutdown TRUE
 

the full src/test/regress/log/postmaster.log can be snagged from
http://www.rcfile.org/postmaster.log

in addition to this, compiling on DEC/Alpha with gcc does not work,
without some shameful hackery :) as __INTERLOCKED_TESTBITSS_QUAD() is 
a builtin that gcc does not know about. The DEC cc builds pg properly.
either way pg is built the test results are much the same, esp the
FAILURE of misc regression test.

If there is anything else I can do to help get this working, please
let me know.
 Brent Verner


Re: 7.1 on DEC/Alpha

From
Brent Verner
Date:
On 22 Dec 2000 at 20:27 (-0500), Brent Verner wrote:

observation:
 commenting out the queries with 'FROM person* p' causes the misc regression test to pass.
   SELECT p.name, p.hobbies.name FROM person* p;
 Brent

| Hi,
|   I saw the thread from a few days ago about Linux/Alpha and 7.1. I
| believe I'm seeing the same problems with DEC/Alpha (Tru64Unix 4.0D).
| 
| I noticed the following in the postmaster.log, which occurs, as the
| Linux/Alpha bug report states, during the misc regression test.
| 
|   DEBUG:  copy: line 293, XLogWrite: had to create new log file - you probably should do checkpoints more often
|   Server process (pid 24954) exited with status 139 at Fri Dec 22 17:15:48 2000
|   Terminating any active server processes...
|   Server processes were terminated at Fri Dec 22 17:15:48 2000
|   Reinitializing shared memory and semaphores
|   DEBUG:  starting up
|   DEBUG:  database system was interrupted at 2000-12-22 17:15:47
|   DEBUG:  CheckPoint record at (0, 316624)
|   DEBUG:  Redo record at (0, 316624); Undo record at (0, 0); Shutdown TRUE
| 
| the full src/test/regress/log/postmaster.log can be snagged from
| http://www.rcfile.org/postmaster.log
| 
| in addition to this, compiling on DEC/Alpha with gcc does not work,
| without some shameful hackery :) as __INTERLOCKED_TESTBITSS_QUAD() is 
| a builtin that gcc does not know about. The DEC cc builds pg properly.
| either way pg is built the test results are much the same, esp the
| FAILURE of misc regression test.
| 
| If there is anything else I can do to help get this working, please
| let me know.
| 
|   Brent Verner


Re: 7.1 on DEC/Alpha

From
Brent Verner
Date:
On 22 Dec 2000 at 21:58 (-0500), Brent Verner wrote:
| On 22 Dec 2000 at 20:27 (-0500), Brent Verner wrote:
| 
| observation:
| 
|   commenting out the queries with 'FROM person* p' causes the misc
|   regression test to pass.

that's not what I meant to say. the misc test still FAILS, but it 
no longer causes pg to die.
 b


Re: 7.1 on DEC/Alpha

From
Brent Verner
Date:
here's a post-mortem.

#0  0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0,    econtext=0x14016a030, isNull=0x14016ab31 "",
isDone=0x0)at execQual.c:1096
 
#1  0x1200ceafc in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,    isNull=0x14016ab31 "", isDone=0x0) at
execQual.c:1234
#2  0x1200cdd74 in ExecEvalFuncArgs (fcache=0x14016aa70, argList=0x14016a030,    econtext=0x14016a030) at
execQual.c:603
#3  0x1200cde54 in ExecMakeFunctionResult (fcache=0x14016aa70,    arguments=0x1401616d0, econtext=0x14016a030,
isNull=0x11fffdf88"",    isDone=0x0) at execQual.c:654
 
#4  0x1200ce224 in ExecEvalOper (opClause=0x1401615f0, econtext=0x14016a030,    isNull=0x11fffdf88 "", isDone=0x0) at
execQual.c:841
#5  0x1200cea24 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,    isNull=0x14016ab31 "", isDone=0x0) at
execQual.c:1204
#6  0x1200cec54 in ExecQual (qual=0x14016a1a0, econtext=0x14016a030)   at execQual.c:1356
#7  0x1200cf2a8 in ExecScan (node=0x14016a1d0, accessMtd=0x1200d8320 <SeqNext>)   at execScan.c:129
#8  0x1200d846c in ExecSeqScan (node=0x1401615f0) at nodeSeqscan.c:138
#9  0x1200cc280 in ExecProcNode (node=0x14016a1d0, parent=0x14016a1d0)   at execProcnode.c:284
#10 0x1200ca8c0 in ExecutePlan (estate=0x14016a310, plan=0x14016a1d0,    numberTuples=1,
direction=ForwardScanDirection,destfunc=0x140020c20)   at execMain.c:959
 
#11 0x1200c9b50 in ExecutorRun (queryDesc=0x1401615f0, estate=0x14016a310,    count=0) at execMain.c:199
#12 0x1200d1140 in postquel_getnext (es=0x140160630) at functions.c:324
#13 0x1200d1300 in postquel_execute (es=0x140160630, fcinfo=0x1401604a0,    fcache=0x140160590) at functions.c:417
#14 0x1200d14d8 in fmgr_sql (fcinfo=0x1401604a0) at functions.c:542
#15 0x1200ce09c in ExecMakeFunctionResult (fcache=0x140160480,    arguments=0x14015e810, econtext=0x140119cd0,
isNull=0x140160350"",    isDone=0x11fffe258) at execQual.c:712
 
#16 0x1200ce2c4 in ExecEvalFunc (funcClause=0x1401615f0, econtext=0x140119cd0,    isNull=0x140160350 "",
isDone=0x11fffe258)at execQual.c:883
 
#17 0x1200cea3c in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,    isNull=0x14016ab31 "", isDone=0x0) at
execQual.c:1208
#18 0x1200c8e10 in ExecEvalIter (iterNode=0x1401615f0, econtext=0x14016a030,    isNull=0x1 <Error reading address 0x1:
Invalidargument>, isDone=0x0)   at execFlatten.c:56
 
#19 0x1200ce9b0 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,    isNull=0x14016ab31 "", isDone=0x0) at
execQual.c:1183
#20 0x1200cdd74 in ExecEvalFuncArgs (fcache=0x140160290, argList=0x14016a030,    econtext=0x140119cd0) at
execQual.c:603
#21 0x1200cde54 in ExecMakeFunctionResult (fcache=0x140160290,    arguments=0x14015e840, econtext=0x140119cd0,
isNull=0x11fffe3a0"",    isDone=0x11fffe468) at execQual.c:654
 
#22 0x1200ce2c4 in ExecEvalFunc (funcClause=0x1401615f0, econtext=0x140119cd0,    isNull=0x11fffe3a0 "",
isDone=0x11fffe468)at execQual.c:883
 
#23 0x1200cea3c in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,    isNull=0x14016ab31 "", isDone=0x0) at
execQual.c:1208
#24 0x1200ce574 in ExecEvalFieldSelect (fselect=0x14015e720,    econtext=0x14016a030, isNull=0x11fffe3a0 "",
isDone=0x0)at execQual.c:1091
 
#25 0x1200ceafc in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,    isNull=0x14016ab31 "", isDone=0x0) at
execQual.c:1234
#26 0x1200c8e10 in ExecEvalIter (iterNode=0x1401615f0, econtext=0x14016a030,    isNull=0x1 <Error reading address 0x1:
Invalidargument>, isDone=0x0)   at execFlatten.c:56
 
#27 0x1200ce9b0 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,    isNull=0x14016ab31 "", isDone=0x0) at
execQual.c:1183
#28 0x1200ceea4 in ExecTargetList (targetlist=0x14015e870,    targettype=0x140160000, values=0x140160260,
econtext=0x140119cd0,   isDone=0x11fffe5a8) at execQual.c:1528
 
#29 0x1200cf1a8 in ExecProject (projInfo=0x0, isDone=0x1) at execQual.c:1751
#30 0x1200d8074 in ExecResult (node=0x14015e5b0) at nodeResult.c:167
#31 0x1200cc238 in ExecProcNode (node=0x14015e5b0, parent=0x14015e5b0)   at execProcnode.c:272
#32 0x1200ca8c0 in ExecutePlan (estate=0x14015eab0, plan=0x14015e5b0,    numberTuples=0,
direction=ForwardScanDirection,destfunc=0x1401603a0)   at execMain.c:959
 
#33 0x1200c9b50 in ExecutorRun (queryDesc=0x1401615f0, estate=0x14015eab0,    count=0) at execMain.c:199
#34 0x12013e5c0 in ProcessQuery (parsetree=0x14015ea80, plan=0x140160000)   at pquery.c:305
#35 0x12013c568 in pg_exec_query_string (   query_string=0x140115310 "SELECT p.hobbies.equipment.name, p.hobbies.name,
p.nameFROM person* p;", parse_context=0x1400c5c60) at postgres.c:817
 
#36 0x12013dd10 in PostgresMain (argv=0x11fffe9a8, real_argv=0x11ffffae8,    username=0x1400b72f9 "pgadmin") at
postgres.c:1827
#37 0x12011aef0 in DoBackend (port=0x1400b7080) at postmaster.c:2021
#38 0x12011a888 in BackendStartup (port=0x1400b7080) at postmaster.c:1798
#39 0x12011938c in ServerLoop () at postmaster.c:957
#40 0x120118c10 in PostmasterMain (argv=0x11ffffae8) at postmaster.c:664
#41 0x1200e5980 in main (argv=0x11ffffae8) at main.c:138



Re: Re: 7.1 on DEC/Alpha

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
> here's a post-mortem.

> #0  0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0, 
>     econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096

Looks reasonable as far as it goes.  Evidently the crash is in the
heap_getattr macro call at line 1096 of src/backend/executor/execQual.c.
We need to look at the data structures that macro uses.
What do you get from

p *fselect

p *econtext

p *resSlot->val

p *resSlot->ttc_tupleDescriptor

BTW, if you didn't configure with --enable-cassert, it'd be a good idea
to go back and try it that way...
        regards, tom lane


Re: Re: 7.1 on DEC/Alpha

From
Brent Verner
Date:
On 24 Dec 2000 at 01:00 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > here's a post-mortem.
| 
| > #0  0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0, 
| >     econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096
| 
| Looks reasonable as far as it goes.  Evidently the crash is in the
| heap_getattr macro call at line 1096 of src/backend/executor/execQual.c.
| We need to look at the data structures that macro uses.
| What do you get from
| 
| p *fselect

$1 = {type = T_FieldSelect, arg = 0x140169d40, fieldnum = 1, resulttype = 25,  resulttypmod = -1}

| p *econtext

$2 = {type = T_ExprContext, ecxt_scantuple = 0x14016a568,  ecxt_innertuple = 0x0, ecxt_outertuple = 0x0,
ecxt_per_query_memory= 0x1400c5df0, ecxt_per_tuple_memory = 0x1400c6670,  ecxt_param_exec_vals = 0x0,
ecxt_param_list_info= 0x140141760,  ecxt_aggvalues = 0x0, ecxt_aggnulls = 0x0}
 

| p *resSlot->val

Error accessing memory address 0x40141838: Invalid argument.
| p *resSlot->ttc_tupleDescriptor

Error accessing memory address 0x40141848: Invalid argument.


additionally:

(gdb) p result
$4 = 1075058736

(gdb) p *resSlot
Error accessing memory address 0x40141830: Invalid argument.


| BTW, if you didn't configure with --enable-cassert, it'd be a good idea
| to go back and try it that way...

will reconfig/rebuild shortly.
 brent


Re: 7.1 on DEC/Alpha

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
> (gdb) p *resSlot
> Error accessing memory address 0x40141830: Invalid argument.

Oooh.  resSlot has been truncated to 32 bits --- judging by the other
nearby pointer values, it almost certainly should have been 0x140141830.
Now we have a lead.

I am guessing that the truncation happened somewhere in
executor/functions.c, but don't see it right away...
        regards, tom lane


Re: Re: 7.1 on DEC/Alpha

From
Brent Verner
Date:
On 24 Dec 2000 at 00:47 (-0500), Tom Lane wrote:
|
| > I'll send the patch that allows me to
| > cleanly build with gcc. right now, s_lock.h does the wrong thing
| > when compiling on Alpha/OSF with gcc.
|
| Roger, we want to build with either.

The attached patch _seems_ to do the right thing. could someone
who knows Alpha assembly check it out (please).

for more info on Alpha assembly, this link may help.
http://tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V40D_HTML/APS31DTE/TITLE.HTM

  brent 'who learned too much today'

Attachment

Re: 7.1 on DEC/Alpha

From
Brent Verner
Date:
On 24 Dec 2000 at 01:19 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > (gdb) p *resSlot
| > Error accessing memory address 0x40141830: Invalid argument.
| 
| Oooh.  resSlot has been truncated to 32 bits --- judging by the other
| nearby pointer values, it almost certainly should have been 0x140141830.
| Now we have a lead.

FWIW, saying 'set econtext->ecxt_param_list_info->value 0x14014183' in
geb allows the process to not SEGV where it _was_ destined to do so, 
though it does SEGV in a later return to the function. I've tried to
determine where this value is originating, and where it is subsequently
modified, but have not been able to do so. lost in gdb. 

Q: I tried doing 'watch <address>', but this (appeared) to just hang. is there some trick to using 'watch' on addresses
thatI might be overlooking?
 

| I am guessing that the truncation happened somewhere in
| executor/functions.c, but don't see it right away...

more observations WRT sql that blows up postgres on Alpha.

works: SELECT p.hobbies.equipment.name, p.hobbies.name, p.name    FROM ONLY person p;

breaks: SELECT p.hobbies.equipment.name, p.hobbies.name, p.name    FROM person p; SELECT p.hobbies.equipment.name,
p.hobbies.name,p.name    FROM person* p;
 

whatever it is that ONLY causes, avoids the breakage. I've spent the
past two days in a gdb-hole, going in circles. I just think don't know 
enough (about gdb or postgres) to make any further progress. anyway, 
if someone could tell me what difference the ONLY keyword makes WRT
pg internally, it might help me quit running in circles.

thanks. brent



Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
> more observations WRT sql that blows up postgres on Alpha.
> works:
>   SELECT p.hobbies.equipment.name, p.hobbies.name, p.name 
>     FROM ONLY person p;
> breaks:
>   SELECT p.hobbies.equipment.name, p.hobbies.name, p.name 
>     FROM person p;
>   SELECT p.hobbies.equipment.name, p.hobbies.name, p.name 
>     FROM person* p;

OK, I see the problem.  The breakage actually is present in 7.0.* and
prior versions as well, it just doesn't happen to be exposed by the
regress tests --- until now.

The trouble is the way that entire-tuple function arguments are handled.
Tuple types are declared in pg_type as being the same size as Oid, ie,
4 bytes.  This reflects situations where a tuple value is represented by
an Oid reference to a row in a table.  (I am not sure whether there is
any code left that depends on that ... in any case I'm nervous about
changing it during beta.)  But the expression evaluator's implementation
of a tuple argument is that the Datum value contains a pointer to a
TupleTableSlot.  This works fine as long as the Datum is just passed
around as a Datum, but if anyone tries to form a tuple containing that
Datum, only 4 bytes get stored into the tuple.  Result: failure on
machines where pointers are wider than 4 bytes.

The reason this shows up in this particular regression test now, and
not before, is that 7.1 does the function evaluations at the top of
the Append plan that implements inheritance union, whereas 7.0 did it
at the bottom.  That means that in 7.1, the TupleTableSlot Datum gets
inserted into a tuple that becomes part of the Append output before
it gets to the function execution.  7.0 would still show the bug
under the right circumstances --- a join would do it, for example.

I think that there may still be cases where an Oid is the correct
representation of a tuple type; anyway I'm afraid to foreclose that
possibility.  What I'm thinking about doing is setting typmod of
an entire-tuple function argument to sizeof(Pointer), rather than
the default -1, to indicate that a pointer representation is being
used.  Comments, hackers?
        regards, tom lane


Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
Brent Verner
Date:
On 26 Dec 2000 at 14:41 (-0500), Tom Lane wrote:
| I wrote:
| > ... What I'm thinking about doing is setting typmod of
| > an entire-tuple function argument to sizeof(Pointer), rather than
| > the default -1, to indicate that a pointer representation is being
| > used.  Comments, hackers?
|
| Here is a patch to current sources along this line.  I have not
| committed it, since I'm not sure it does the job.  It doesn't break
| the regress tests on my machine, but does it fix them on Alphas?
| Please apply it locally and let me know what you find.

what I'm seeing now is much the same. FWIW, it looks like we're picking
up the cruft around

  functions.c:354    paramLI->value = fcinfo->arg[paramLI->id - 1];

(both of which are type Datum)

i've been in circles trying to figure out where fcinfo->arg is filled.
can you point me toward that?

thanks for your help.
  brent

Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
> | Please apply it locally and let me know what you find.

> what I'm seeing now is much the same.

Drat.  More to do, then.

> i've been in circles trying to figure out where fcinfo->arg is filled.
> can you point me toward that?

See src/backend/utils/fmgr/README and src/backend/utils/fmgr/fmgr.c.
But fmgr is probably only the carrier of disease, not the source...

            regards, tom lane

Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
Brent Verner
Date:
On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Please apply it locally and let me know what you find.
| 
| > what I'm seeing now is much the same.

sorry, I sent the previous email w/o the details of the different 
behavior. Inside ExecEvalFieldSelect(), result is now 303, instead
of 110599844 (...or whatever is was). I'm not sure if this gives 
you any additional clues.

thanks. brent


Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
Brent Verner
Date:
On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Please apply it locally and let me know what you find.
| 
| > what I'm seeing now is much the same.
| 
| Drat.  More to do, then.
| 
| > i've been in circles trying to figure out where fcinfo->arg is filled.
| > can you point me toward that?
| 
| See src/backend/utils/fmgr/README and src/backend/utils/fmgr/fmgr.c.
| But fmgr is probably only the carrier of disease, not the source...

ok, I've tracked this further (in the right direction I hope:).

these are the steps leading up the the assignment of the fscked
fcache->fcinfo.arg[i] at execQual.c:603, which is what will eventually
blow up ExecEvalFieldSelect.


Breakpoint 4, ExecMakeFunctionResult (fcache=0x14014e700,    arguments=0x14014c850, econtext=0x140127ae0,
isNull=0x14014e390"",    isDone=0x11fffde78) at execQual.c:652
 
652             if (fcache->fcinfo.nargs > 0 && !fcache->argsValid)
(gdb) print fcache->fcinfo
$56 = {flinfo = 0x14014e700, context = 0x0, resultinfo = 0x14014e7d0,  isnull = 0 '\000', nargs = 1, arg = {0 <repeats
16times>},  argnull = '\000' <repeats 15 times>}
 
(gdb) cont
Breakpoint 6, ExecEvalVar (variable=0x14014c820, econtext=0x140127ae0,    isNull=0x14014e7c0 "") at execQual.c:298
298             switch (variable->varno)
(gdb) print *variable
$57 = {type = T_Var, varno = 65001, varattno = 1, vartype = 21220,  vartypmod = 8, varlevelsup = 0, varnoold = 1,
varoattno= 0}
 
(gdb) print *econtext
$58 = {type = T_ExprContext, ecxt_scantuple = 0x14014cc58,  ecxt_innertuple = 0x0, ecxt_outertuple = 0x14014cc58,
ecxt_per_query_memory= 0x1400e6370, ecxt_per_tuple_memory = 0x1400e66a0,  ecxt_param_exec_vals = 0x0,
ecxt_param_list_info= 0x0,  ecxt_aggvalues = 0x0, ecxt_aggnulls = 0x0}
 
(gdb) break 313
(gdb) cont
(gdb) print *slot
$60 = {type = T_TupleTableSlot, val = 0x14014e430, ttc_shouldFree = 0 '\000',  ttc_descIsNew = 1 '\001',
ttc_tupleDescriptor= 0x14014ded0, ttc_buffer = 0}
 
(gdb) break 353
(gdb) cont
(gdb) print *heapTuple
$73 = {t_len = 48, t_self = {ip_blkid = {bi_hi = 65535, bi_lo = 65535},    ip_posid = 0}, t_tableOid = 0, t_datamcxt =
0x1400e6370, t_data = 0x14014e450}
 
(gdb) print attnum
$74 = 1
(gdb) print *tuple_type
$75 = {natts = 2, attrs = 0x14014df00, constr = 0x0}
(gdb) print isNull
$76 = (bool *) 0x14014e7c0 ""
(gdb) break 359
(gdb) cont
# after heap_getattr, we have the smashed value.
(gdb) print result
$79 = 303


is this nearing the problem, or still simply witnessing symptoms?
 brent 'delirious from sleep dep.'



Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
Brent Verner
Date:
On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Please apply it locally and let me know what you find.
|
| > what I'm seeing now is much the same.
|
| Drat.  More to do, then.

after hours in the gdb-hole, I see this... maybe a clue? :)

src/include/access/common/heaptuple.c:

450     {
451
452       /*
453        * Fix me when going to a machine with more than a four-byte
454        * word!
455        */
456       off = att_align(off, att[j]->attlen, att[j]->attalign);
457
458       att[j]->attcacheoff = off;
459
460       off = att_addlength(off, att[j]->attlen, tp + off);
461     }

I'm pretty sure I don't know best how to fix this, but I've got some
randomly entered code compiling now :)  If it passes the regression
tests I'll send it along.

  brent 'glad the coffee shop in the backyard is open now :)'


Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
> after hours in the gdb-hole, I see this... maybe a clue? :)

I don't think that comment means anything.  Possibly it's a leftover
from a time when there was something unportable there.  But if att_align
were broken on Alphas, you'd have a lot worse problems than what you're
seeing.

            regards, tom lane

Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
> these are the steps leading up the the assignment of the fscked
> fcache->fcinfo.arg[i] at execQual.c:603, which is what will eventually
> blow up ExecEvalFieldSelect.

That looks OK as far as it goes.  Inside ExecEvalVar, you need to look
at the tuple_type data structure in more detail, specificallyp *tuple_type->attrs[0]p *tuple_type->attrs[1]
(I think the leading * is correct here, try omitting it if gdb gets
unhappy.)

> (gdb) print *variable
> $57 = {type = T_Var, varno = 65001, varattno = 1, vartype = 21220, 
>   vartypmod = 8, varlevelsup = 0, varnoold = 1, varoattno = 0}

That part looks promising --- vartypmod is sizeof(Pointer) not -1,
so the front-end part of my patch seems to be working.  What I suspect
we'll find is that the tupledesc doesn't show sizeof the first field to
be 8 the way we want.  Which would imply that I missed a place (or
multiple places :-() that needs to know about the convention for typmod
of a tuple datatype.
        regards, tom lane


Brent Verner <brent@rcfile.org> writes:
> | Hm.  I thought I'd fixed that.  Are you up to date on
> | src/backend/utils/adt/oid.c ?  Current CVS has rev 1.42.

> yup. got that version -- 1.42 2000/12/22 21:36:09 tgl

You're right, it was still broken :-(.  I think I've got it now, though.

Oliver Elphick was kind enough to arrange access to an Alpha running
Debian Linux, and I find that current-as-of-this-moment sources pass
all regression tests in either serial or parallel test mode on that
system.  Curiously, however, the system fails when you try to shut
it down:

Smart Shutdown request at Thu Dec 28 02:41:49 2000
DEBUG:  shutting down
FATAL 2:  Checkpoint lock is busy while data base is shutting down
Shutdown failed - abort

I have no idea why this should be.  Evidently there's something wrong
with the TAS() macro --- yet it seems to work fine elsewhere.  Ideas
anyone?
        regards, tom lane


Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
Brent Verner
Date:
On 27 Dec 2000 at 21:45 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Hm.  I thought I'd fixed that.  Are you up to date on
| > | src/backend/utils/adt/oid.c ?  Current CVS has rev 1.42.
| 
| > yup. got that version -- 1.42 2000/12/22 21:36:09 tgl
| 
| You're right, it was still broken :-(.  I think I've got it now, though.

i'll check it tomorrow.

| Oliver Elphick was kind enough to arrange access to an Alpha running
| Debian Linux, and I find that current-as-of-this-moment sources pass
| all regression tests in either serial or parallel test mode on that
| system.  Curiously, however, the system fails when you try to shut
| it down:

good. I'm glad you guys linked up :)

| Smart Shutdown request at Thu Dec 28 02:41:49 2000
| DEBUG:  shutting down
| FATAL 2:  Checkpoint lock is busy while data base is shutting down
| Shutdown failed - abort

I'm not seeing this with my latest revision of the TAS() asm.

Smart Shutdown request at Wed Dec 27 19:25:45 2000
DEBUG:  shutting down
DEBUG:  MoveOfflineLogs: remove 0000000000000000
DEBUG:  database system is shut down

| I have no idea why this should be.  Evidently there's something wrong
| with the TAS() macro --- yet it seems to work fine elsewhere.  Ideas
| anyone?

re-evaluating the asm stuff now.

thanks. brent


Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
"Oliver Elphick"
Date:
Tom Lane wrote:
... >system.  Curiously, however, the system fails when you try to shut >it down: > >Smart Shutdown request at Thu Dec
2802:41:49 2000 >DEBUG:  shutting down >FATAL 2:  Checkpoint lock is busy while data base is shutting down >Shutdown
failed- abort > >I have no idea why this should be.  Evidently there's something wrong >with the TAS() macro --- yet it
seemsto work fine elsewhere.  Ideas >anyone?
 
It's not just on Alpha; I've seen that on my i386 Linux system.

-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47  6B 7E 39 CC 56 E4 C1 47
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
========================================   "For God shall bring every work into judgment,      with every secret thing,
whetherit be good, or       whether it be evil."               Ecclesiastes 12:14 
 




"Oliver Elphick" <olly@lfix.co.uk> writes:
>> Smart Shutdown request at Thu Dec 28 02:41:49 2000
>> DEBUG:  shutting down
>> FATAL 2:  Checkpoint lock is busy while data base is shutting down
>> Shutdown failed - abort
> It's not just on Alpha; I've seen that on my i386 Linux system.

Oooh, that's interesting.  I was just blindly assuming that it was
a problem with the Alpha spinlock code (we've sure heard plenty of
discussion of same).  But maybe there's an actual logic bug in the
checkpoint code.  I don't see one in a quick scan though.

FWIW, I do *not* see this behavior on HPUX.  It seems perfectly
reproducible on the Debian Alpha box.  Is it reproducible on your
i386 box, or only sometimes?

Vadim, any ideas?
        regards, tom lane


Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

From
"Oliver Elphick"
Date:
Tom Lane wrote: >"Oliver Elphick" <olly@lfix.co.uk> writes: >>> FATAL 2:  Checkpoint lock is busy while data base is
shuttingdown
 
 >> It's not just on Alpha; I've seen that on my i386 Linux system.
 >FWIW, I do *not* see this behavior on HPUX.  It seems perfectly >reproducible on the Debian Alpha box.  Is it
reproducibleon your >i386 box, or only sometimes?
 


Hmm. I'm just waking up a bit more.  Now I'm thinking slightly more
clearly, I saw the problem yesterday when I was doing an Alpha build
on faure.debian.org; so I think it was actually on Alpha, not i386 after
all.  Sorry for the red herring.

-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47  6B 7E 39 CC 56 E4 C1 47
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
========================================   "For God shall bring every work into judgment,      with every secret thing,
whetherit be good, or       whether it be evil."               Ecclesiastes 12:14