Re: varlena beyond 1GB and matrix - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: varlena beyond 1GB and matrix |
Date | |
Msg-id | CAMsr+YEe0T8MMbn=1NwRMRAVgQ_K1dghpEGHWswx7tcP3VC9rA@mail.gmail.com Whole thread Raw |
In response to | Re: varlena beyond 1GB and matrix (Kohei KaiGai <kaigai@kaigai.gr.jp>) |
Responses |
Re: varlena beyond 1GB and matrix
|
List | pgsql-hackers |
On 8 December 2016 at 12:01, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: >> At a higher level, I don't understand exactly where such giant >> ExpandedObjects would come from. (As you point out, there's certainly >> no easy way for a client to ship over the data for one.) So this feels >> like a very small part of a useful solution, if indeed it's part of a >> useful solution at all, which is not obvious. >> > I expect an aggregate function that consumes millions of rows as source > of a large matrix larger than 1GB. Once it is formed to a variable, it is > easy to deliver as an argument of PL functions. You might be interested in how Java has historically dealt with similar issues. For a long time the JVM had quite low limits on the maximum amount of RAM it could manage, in the single gigabytes for a long time. Even for the 64-bit JVM. Once those limitations were lifted, the garbage collector algorithm placed a low practical limit on how much RAM it could cope with effectively. If you were doing scientific computing with Java, lots of big image/video work, using GPGPUs, doing large scale caching, etc, this rapidly became a major pain point. So people introduced external memory mappings to Java, where objects could reference and manage memory outside the main JVM heap. The most well known is probably BigMemory (https://www.terracotta.org/products/bigmemory), but there are many others. They exposed this via small opaque handle objects that you used to interact with the external memory store via library functions. It might make a lot of sense to apply the same principle to PostgreSQL, since it's much less intrusive than true 64-bit VARLENA. Rather than extending all of PostgreSQL to handle special-case split-up VARLENA extended objects, have your interim representation be a simple opaque value that points to externally mapped memory. Your operators for the type, etc, know how to work with it. You probably don't need a full suite of normal operators, you'll be interacting with the data in a limited set of ways. The main issue would presumably be one of resource management, since we currently assume we can just copy a Datum around without telling anybody about it or doing any special management. You'd need to know when to clobber your external segment, when to copy(!) it if necessary, etc. This probably makes sense for working with GPGPUs anyway, since they like dealing with big contiguous chunks of memory (or used to, may have improved?). It sounds like only code specifically intended to work with the oversized type should be doing much with it except passing it around as an opaque handle, right? Do you need to serialize this type to/from disk at all? Or just exchange it in chunks with a client? If you do need to, can you possibly do TOAST-like or pg_largeobject-like storage where you split it up for on disk storage then reassemble for use? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: