Re: [HACKERS] Re: SIGBUS in AllocSetAlloc & jdbc - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: [HACKERS] Re: SIGBUS in AllocSetAlloc & jdbc
Date
Msg-id 199905090312.MAA00466@ext16.sra.co.jp
Whole thread Raw
In response to Re: [HACKERS] Re: SIGBUS in AllocSetAlloc & jdbc  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Responses Re: [HACKERS] Re: SIGBUS in AllocSetAlloc & jdbc  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] Re: SIGBUS in AllocSetAlloc & jdbc  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> > Hmm.  The documentation does say somewhere that LO object handles are
> > only good within a transaction ... so it's amazing this worked reliably
> > under 6.4.x.
> > 
> > Is there any way we could improve the backend's LO functions to defend
> > against this sort of misuse, rather than blindly accepting a stale
> > filehandle?
> 
> It should not be very difficult. We could explicitly close LO
> filehandles on commits.
> 
> But I'm now not confident on this. From comments in be-fsstubs.c:
> 
> >Builtin functions for open/close/read/write operations on large objects.
> >These functions operate in the current portal variable context, which
> >means the large object descriptors hang around between transactions and
> >are not deallocated until explicitly closed, or until the portal is
> >closed.
> 
> If above is true, LO filehandles should be able to survive between
> transactions.
> 
> Following data are included in them. My question is: Can these data
> survive between transactions? I guess not.
> 
> typedef struct LargeObjectDesc
> {
>     Relation    heap_r;        /* heap relation */
>     Relation    index_r;    /* index relation on seqno attribute */
>     IndexScanDesc iscan;        /* index scan we're using */
>     TupleDesc    hdesc;        /* heap relation tuple desc */
>     TupleDesc    idesc;        /* index relation tuple desc */
> 
>     [snip]

The answer was yes. Since these data are allocated in
GlobalMemoryContext, they could survive after commit. So next question
is why the backend crashes if LO calls are not in a transaction?

At the commit time _lo_commit() is called under GlobalMemoryContext to
destroy IndexScanDesc. So it seems the IndexScanDesc is supposed to be
created under GlobalMemoryContext.  But actually lo_read/lo_write,
they might create IndexScanDesc also, may not be called under the
context since no context switching is made with them(I don't know why
since other LO calls make context switching).  As a result it's
possible that IndexScanDesc might be freed under a wrong context. Too
bad.

This would not happen if lo_seek is done (it's executed under
GlobalMemoryContext) then lo_read/lo_write gets called(they reuse
IndexScanDesc created by inv_seek) *AND* no committing is done before
lo_read/lo_write. This is why we do not observe the backend crash with
ImageViewer running in a transaction.

But I must say other apps may not be as lucky as ImageViewer since
it's not always the case that lo_seek is called prior to
lo_read/lo_write.

[ BTW, ImageViewer seems to make calls to following set of LOs *twice*
to display an image. Why?

lo_open
lo_tell
lo_lseek
lo_lseek
lo_read
lo_close
]

Possible solutions might be:

(1) do a context switching in lo_read/lo_write

(2) ask apps not to make LO calls between transactions

(3) close LOs fd at commit

(2) is the current situation but not very confortable for us. Also for
a certain app this is not a solution as I mentioned above. (3) seems
reasonable but people might be surprised to find their existing apps
won't run any more. Moreover, changings might not be trivial and it
make me nervous since we don't have enough time before 6.5 is
out. With (1) modifications would be minimum, and we can keep the
backward compatibility for apps.  So my conclusion is that (1) is the
best. If there's no objection, I will commit the change for (1).
---
Tatsuo Ishii



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] NUMERIC type conversions leave much to be desired
Next
From: Thomas Lockhart
Date:
Subject: Re: [HACKERS] NUMERIC type conversions leave much to be desired