Re: Question about MemoryContextRegisterResetCallback - Mailing list pgsql-general

From Michel Pelletier
Subject Re: Question about MemoryContextRegisterResetCallback
Date
Msg-id CACxu=v+o-qo6vFf9xV+o-PgaDAzYgyZx2kchPBS2EVmStCu81Q@mail.gmail.com
Whole thread Raw
In response to Re: Question about MemoryContextRegisterResetCallback  (Michel Pelletier <pelletier.michel@gmail.com>)
Responses Re: Question about MemoryContextRegisterResetCallback  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
After absorbing some of the code you've pointed out I have a couple of questions about my understanding before I start hacking on making expanded matrices.

Serializing sparse matrices can be done with _expand/_build functions, and the size is known, so I can implement the EOM_flatten_into_methods.  From the array examples, it looks like accessor functions are responsible for detecting and unflattening themselves, so I think I've got that understood.

Reading expandeddatum.h says "The format appearing on disk is called the data type's "flattened" representation. since it is required to be a contiguous blob of bytes -- but the type can have an expanded representation that is not.  Data types must provide means to translate an expanded representation back to flattened form."  

It mentions "on disk" does this mean the flattened representation must be binary compatible with what matrix_send emits?  They will likely be the same now, so I can see this as a convenience, but is it a requirement?  Future matrix_send implementations may do some form of compressed sparse row format, which would be inefficient for in-memory copies.

Thanks again,

-Michel

On Sun, Jan 13, 2019 at 10:51 AM Michel Pelletier <pelletier.michel@gmail.com> wrote:
On Sun, Jan 13, 2019 at 9:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I suppose what you're doing is returning a pointer to a GraphBLAS object
as a Datum (or part of a pass-by-ref Datum)?  If so, that's not going
to work terribly well, because it ignores the problem that datatype-
independent code is going to assume it can copy Datum values using
datumCopy() or equivalent logic.  More often than not, such copying
is done to move the value into a different memory context in preparation
for freeing the original context.  If you delete the GraphBLAS object
when the original context is deleted, you now have a dangling pointer
in the copy.

We did invent some infrastructure awhile ago that could potentially
handle this sort of situation: it's the "expanded datum" stuff.
The idea here would be that your representation involving a GraphBLAS
pointer would be an efficient-to-operate-on expanded object.  You
would need to be able to serialize and deserialize that representation
into plain self-contained Datums (probably varlena blobs), but hopefully
GraphBLAS is capable of going along with that.  You'd still need a
memory context reset callback attached to each expanded object, to
free the associated GraphBLAS object --- but expanded objects are
explicitly aware of which context they're in, so at least in principle
that should work.  (I'm not sure anyone's actually tried to build
an expanded-object representation that has external resources, so
we might find there are some bugs to fix there.)

Take a look at

src/include/utils/expandeddatum.h
src/backend/utils/adt/expandeddatum.c
src/backend/utils/adt/array_expanded.c
src/backend/utils/adt/expandedrecord.c


Ah I see, the water is much deeper here.  Thanks for the detailed explanation, expandeddatum.h was very helpful and I see now how array_expanded works.  If I run into any problems registering my callback in the expanded context I'll repost back.

Thanks Tom!

-Michel
 
                        regards, tom lane

pgsql-general by date:

Previous
From: Sherrylyn Branchaw
Date:
Subject: pg_restore restores privileges differently from psql
Next
From: Adrian Klaver
Date:
Subject: Re: pg_restore restores privileges differently from psql