Re: About Custom Aggregates, C Extensions and Memory - Mailing list pgsql-hackers

From Marthin Laubscher
Subject Re: About Custom Aggregates, C Extensions and Memory
Date
Msg-id C73DDB9E-5AE1-48FC-867A-58BB0EF3EA34@lobeshare.co.za
Whole thread Raw
In response to Re: About Custom Aggregates, C Extensions and Memory  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us> wrote:

> Well, yeah, that's the problem. You can certainly maintain your own persistent data structure somewhere, but then
it'sentirely on your head to manage it and avoid memory leakage/bloating as you process more and more data. The
mechanismsI pointed you at provide a structure that makes sure space gets reclaimed when there's no longer a reference
toit, but if you go the roll-your-own route then it's a lot messier.
 

Of course, I'm sole owner and admin of every instance of PostgreSQL involved, but a great many things have always been
andwill remain on my head if I "gots it wrongly", so yes, rolling my own would be an option of last resort. I was able
to"see" how the aggregate memory mechanism worked but confess that I must still connect the dots in the expanded datum
case.I know you're seeing something there I don't yet recognise, so I'll look again.  
 

> A mechanism that might work well enough is a transaction-lifespan hash table. You could look at, for example,
uncommitted_enum_typesin pg_enum.c for sample code.
 

And at that, naturally.

Trying hard to not get too far ahead of myself here, perhaps I should try running into the foreseen performance issue
beforeaddressing it. How about I make a trivial implementation (skipping over the complex optimised processing,
compressionand advanced logic ensuring canonical compressed values) of the aggregate and user defined type the
aggregatewould calculate, and put that up for a review first.
 

If all goes to script, the result ought to be that the aggregate would nicely reuse the decoded version of the
aggregatein memory and forget it existed when the final function has been run. But then each user defined type function
gettingcalled would decode the stored byte array, do its bit on the data, and encode to a byte array again.
 

It won't be optimal, but it will be simple, and go a long way towards ensuring I got the whole type and aggregate
ecosystemset up as it should be. It will also settle any doubts as to whether membership tests would use "IN" or "ANY"
semantics.

Step next would be to use say the transaction memory context to retain the decoded version in memory between calls to
different(non-aggregate) UDT functions (made in the same transaction). Beyond finding the appropriate context to use,
thechallenge, I understand, would be to identify the particular instance of the UDT in memory, which is where you're
suggestingusing a hash table. With fewer outstanding vagaries and uncertainties, the way forward might be quite obvious
bythen, but from where I stand right now  I can only think of worse ways than hashing, so it's most likely be hashing.


That said, the two most important reality checks we'd have to consider at that point would be:

a) whether there are ever enough, as in more than three, of these values present in the same memory context to warrant
thehash calculation overhead since a full comparison will be required anyway, and
 

b) whether the special characteristic of my UDT values where identical values have, by definition, identical structure
inmemory offers enough opportunity to use a simple reference count to decide if a value can/should be changed in-place,
ifthe changed value should be written to its own fresh slot or just need an increased reference count on memory that
alreadyrepresent that value. 
 

Once we cracked the intra-function nut, we could look for a way to share memory between the aggregate- and other
UDT-functions.

Either way, I've got a boat load more reading and writing to do. Thank you ever so much for your time and attention. It
isa great kindness, well appreciated.
 

Regards,
Marthin Laubscher
 





pgsql-hackers by date:

Previous
From: Maxim Orlov
Date:
Subject: Small issue with kerberos tests
Next
From: Sami Imseih
Date:
Subject: Re: Improve LWLock tranche name visibility across backends