Thread: Expanded Object Header and Flat Cache

Expanded Object Header and Flat Cache

From
Paul Ramsey
Date:
I've been working through the expanded object code to try and get a
demonstration of it working with PostGIS (still having some problems,
but it's a learning experience). On an unrelated from, I noticed in
the array expanded code, that the array code is managing its own copy
of a cache of the flat representation and flat size,


https://github.com/postgres/postgres/blob/cf7dfbf2d6c5892747cd6fca399350d86c16f00f/src/backend/utils/adt/array_expanded.c#L247-L253

This seems like generic code, which any implementor is going to end up
copying (after seeing it, and seeing how often the flatten size
callback is being hit while debugging code, it seems like an obvious
next thing to add to my expanded representation, once I get things
working).

Why isn't caching the flat representation and size (and short
circuiting when the cache is already filled) part of the generic
functionality PgSQL provides? Should it be? I guess it would imply a
required function to dirty the EOH cache when changes are made to the
in-memory data, but that seems no worse as part of the generic API
than in all the client code.

P.



Re: Expanded Object Header and Flat Cache

From
Tom Lane
Date:
Paul Ramsey <pramsey@cleverelephant.ca> writes:
> I've been working through the expanded object code to try and get a
> demonstration of it working with PostGIS (still having some problems,
> but it's a learning experience). On an unrelated from, I noticed in
> the array expanded code, that the array code is managing its own copy
> of a cache of the flat representation and flat size,

>
https://github.com/postgres/postgres/blob/cf7dfbf2d6c5892747cd6fca399350d86c16f00f/src/backend/utils/adt/array_expanded.c#L247-L253

> This seems like generic code, which any implementor is going to end up
> copying (after seeing it, and seeing how often the flatten size
> callback is being hit while debugging code, it seems like an obvious
> next thing to add to my expanded representation, once I get things
> working).

Well, mumble, I'm not so sure.  In the array code that's not really a
cache, but a halfway point between the flat representation and a fully
deconstructed representation, which is useful mainly because it saves
cycles while un-flattening.  Being able to return it as-is while
flattening is just a nice bonus AFAICS; I doubt that's a big win in
practical use.  And in other datatypes it might be much less feasible to
do that.  Before leaving Salesforce I did another expanded-object
implementation for them, of a datatype that was basically much like hstore
(key-value thing).  The expanded representation was a dynahash hashtable,
and there was no equivalent of the array code's internal flat values
because it didn't map to the dynahash form at all.  Besides which their
use-case was such that the flattened representation was hardly ever
needed.

Caching the flat size does seem like a near universally-useful
optimization, because otherwise you have to compute it twice while
flattening.  But there's not enough there to justify trying to share code,
AFAICS; most of the trickiness is in knowing when you've invalidated the
cached flat size, and that's going to be pretty operation-specific.
        regards, tom lane