Thread: varlena beyond 1GB and matrix

varlena beyond 1GB and matrix

From

Kohei KaiGai

Date:

07 December 2016, 13:50:28

This is a design proposal for matrix data type which can be larger
than 1GB. Not only a new data type support, it also needs a platform
enhancement because existing varlena has a hard limit (1GB).
We had a discussion about this topic on the developer unconference
at Tokyo/Akihabara, the day before PGconf.ASIA. The design proposal
below stands on the overall consensus on the discussion.

============BACKGROUND
============
The varlena format has either short (1-byte) or long (4-bytes) header.
We use the long header for in-memory structure which is referenced
by VARSIZE()/VARDATA(), or on-disk strcuture which is larger than
126b but less than TOAST threshold. Elsewhere, the short format is
used if varlena is less than 126b or externally stored in the toast
relation. Any kind of varlena representation does not support data-
size larger than 1GB.
On the other hands, some use cases which handle (relative) big-data
in database are interested in variable length datum larger than 1GB.

One example is what I've presented at PGconf.SV.
A PL/CUDA function takes two arguments (2D-array of int4 instead of
matrix), then returns top-N combination of the chemical compounds
according to the similarity, like as:
   SELECT knn_gpu_similarity(5, Q.matrix,                                D.matrix)     FROM (SELECT
array_matrix(bitmap)matrix             FROM finger_print_query            WHERE tag LIKE '%abc%') Q,          (SELECT
array_matrix(bitmap)matrix             FROM finger_print_10m) D;
 

array_matrix() is an aggregate function to generate 2D-array which
contains all the input relation stream.
It works fine as long as data size is less than 1GB. Once it exceeds
the boundary, user has to split the 2D-array by manual, although
it is not uncommon the recent GPU model has more than 10GB RAM.

It is really problematic if we cannot split the mathematical problem
into several portions appropriately. And, it is not comfortable for
users who cannot use full capability of the GPU device.

=========PROBLEM
=========
Our problem is that varlena format does not permit to have a variable
length datum larger than 1GB, even if our "matrix" type wants to move
a bunch of data larger than 1GB.
So, we need to solve the problem of the varlena format restriction
prior to the matrix type implementation.

In the developer unconference, people had discussed three ideas towards
the problem. Then, overall consensus was the idea of special data-type
which can contain multiple indirect references to other data chunks.
Both of the main part and referenced data chunks are less than 1GB,
but total amount of data size we can represent is more than 1GB.

For example, even if we have a large matrix around 8GB, its sub-matrix
separated into 9 portions (3x3) are individually less than 1GB.
It is problematic when we try to save the matrix which contains indirect
reference to the sub-matrixes, because toast_save_datum() writes out
the flat portion just after the varlena head onto a tuple or a toast
relation as is.
If main portion of the matrix contains pointers (indirect reference),
it is obviously problematic. We need to have an infrastructure to
serialize the indirect reference prior to saving.

BTW, other ideas but less acknowledgement were 64bit varlena header
and utilization of large object. The earlier idea breaks effects to
the current data format, thus, will make unexpected side-effect on
the existing code of PostgreSQL core and extensions. The later one
requires users to construct a large object preliminary. It makes
impossible to use interim result of sub-query, and leads unnecessary
i/o for preparation.

==========SOLUTION
==========
I like to propose a new optional type handler 'typserialize' to
serialize an in-memory varlena structure (that can have indirect
references) to on-disk format.
If any, it shall be involced on the head of toast_insert_or_update()
than indirect references are transformed to something other which
is safe to save. (My expectation is, the 'typserialize' handler
preliminary saves the indirect chunks to the toast relation, then
put toast pointers instead.)

On the other hands, it is uncertain whether we need 'typdeserialize'
handler symmetrically. Because all the functions/operators which
support the special data types should know its internal format,
it is possible to load the data chunks indirectly referenced on
demand.
It will be beneficial from the performance perspective if functions
/operators touches only a part of the large structure, because the
rest of portions are not necessary to load into the memory.

One thing I'm not certain is, whether we can update the datum
supplied as an argument in functions/operators.
Let me assume the data structure below:
 struct {     int32   varlena_head;     Oid     element_id;     int     matrix_type;     int     blocksz_x; /*
horizontalsize of each block */     int     blocksz_y; /* vertical size of each block */     int     gridsz_x;  /*
horizontal# of blocks */     int     gridsz_y;  /* vertical # of blocks */     struct {         Oid     va_valueid;
   Oid     va_toastrelid;         void   *ptr_block;     } blocks[FLEXIBLE_ARRAY_MEMBER]; };
 

If and when this structure is fetched from the tuple, its @ptr_block
is initialized to NULL. Once it is supplied to a function which
references a part of blocks, type specific code can load sub-matrix
from the toast relation, then update the @ptr_block not to load the
sub-matrix from the toast multiple times.
I'm not certain whether it is acceptable behavior/manner.

If it is OK, it seems to me the direction to support matrix larger
than 1GB is all green.


Your comments are welcome.

====================FOR YOUR REFERENCE
====================
* Beyond the 1GB limitation of varlena
http://kaigai.hatenablog.com/entry/2016/12/04/223826

* PGconf.SV 2016 and PL/CUDA
http://kaigai.hatenablog.com/entry/2016/11/17/070708

* PL/CUDA slides at PGconf.ASIA (English)
http://www.slideshare.net/kaigai/pgconfasia2016-plcuda-en

Best regards,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: varlena beyond 1GB and matrix

From

Jim Nasby

Date:

07 December 2016, 21:40:25

On 12/7/16 5:50 AM, Kohei KaiGai wrote:
> If and when this structure is fetched from the tuple, its @ptr_block
> is initialized to NULL. Once it is supplied to a function which
> references a part of blocks, type specific code can load sub-matrix
> from the toast relation, then update the @ptr_block not to load the
> sub-matrix from the toast multiple times.
> I'm not certain whether it is acceptable behavior/manner.

I'm glad you're looking into this. The 1G limit is becoming a larger 
problem every day.

Have you considered using ExpandedObjects to accomplish this? I don't 
think the API would work as-is, but I suspect there's other places where 
we'd like to be able to have this capability (arrays and JSONB come to 
mind).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Re: varlena beyond 1GB and matrix

From

Robert Haas

Date:

07 December 2016, 23:04:16

On Wed, Dec 7, 2016 at 8:50 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> I like to propose a new optional type handler 'typserialize' to
> serialize an in-memory varlena structure (that can have indirect
> references) to on-disk format.
> If any, it shall be involced on the head of toast_insert_or_update()
> than indirect references are transformed to something other which
> is safe to save. (My expectation is, the 'typserialize' handler
> preliminary saves the indirect chunks to the toast relation, then
> put toast pointers instead.)

This might not work.  The reason is that we have important bits of
code that expect that they can figure out how to do some operation on
a datum (find its length, copy it, serialize it) based only on typlen
and typbyval.  See src/backend/utils/adt/datum.c for assorted
examples.  Note also the lengthy comment near the top of the file,
which explains that typlen > 0 indicates a fixed-length type, typlen
== -1 indicates a varlena, and typlen == -2 indicates a cstring.  I
think there's probably room for other typlen values; for example, we
could have typlen == -3 indicate some new kind of thing -- a
super-varlena that has a higher length limit, or some other kind of
thing altogether.

Now, you can imagine trying to treat what you're talking about as a
new type of TOAST pointer, but I think that's not going to help,
because at some point the TOAST pointer gets de-toasted into a varlena
... which is still limited to 1GB.  So that's not really going to
work.  And it brings out another point, which is that if you define a
new typlen code, like -3, for super-big things, they won't be
varlenas, which means they won't work with the existing TOAST
interfaces.  Still, you might be able to fix that.  You would probably
have to do some significant surgery on the wire protocol, per the
commit message for fa2fa995528023b2e6ba1108f2f47558c6b66dcd.

I think it's probably a mistake to conflate objects with substructure
with objects > 1GB.  Those are two somewhat orthogonal needs.  As Jim
also points out, expanded objects serve the first need.  Of course,
those assume that we're dealing with a varlena, so if we made a
super-varlena, we'd still need to create an equivalent.  But perhaps
it would be a fairly simple adaptation of what we've already got.
Handling objects >1GB at all seems like the harder part of the
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: varlena beyond 1GB and matrix

From

Tom Lane

Date:

07 December 2016, 23:36:52

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 7, 2016 at 8:50 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> I like to propose a new optional type handler 'typserialize' to
>> serialize an in-memory varlena structure (that can have indirect
>> references) to on-disk format.

> I think it's probably a mistake to conflate objects with substructure
> with objects > 1GB.  Those are two somewhat orthogonal needs.

Maybe.  I think where KaiGai-san is trying to go with this is being
able to turn an ExpandedObject (which could contain very large amounts
of data) directly into a toast pointer or vice versa.  There's nothing
really preventing a TOAST OID from having more than 1GB of data
attached, and if you had a side channel like this you could transfer
the data without ever having to form a larger-than-1GB tuple.

The hole in that approach, to my mind, is that there are too many places
that assume that they can serialize an ExpandedObject into part of an
in-memory tuple, which might well never be written to disk, or at least
not written to disk in a table.  (It might be intended to go into a sort
or hash join, for instance.)  This design can't really work for that case,
and unfortunately I think it will be somewhere between hard and impossible
to remove all the places where that assumption is made.

At a higher level, I don't understand exactly where such giant
ExpandedObjects would come from.  (As you point out, there's certainly
no easy way for a client to ship over the data for one.)  So this feels
like a very small part of a useful solution, if indeed it's part of a
useful solution at all, which is not obvious.

FWIW, ExpandedObjects themselves are far from a fully fleshed out
concept, one of the main problems being that they don't have very long
lifespans except in the case that they're the value of a plpgsql
variable.  I think we would need to move things along quite a bit in
that area before it would get to be useful to think in terms of
ExpandedObjects containing multiple GB of data.  Otherwise, the
inevitable flattenings and re-expansions are just going to kill you.

Likewise, the need for clients to be able to transfer data in chunks
gets pressing well before you get to 1GB.  So there's a lot here that
really should be worked on before we try to surmount that barrier.
        regards, tom lane

Re: varlena beyond 1GB and matrix

From

Tom Lane

Date:

08 December 2016, 00:16:27

I wrote:
> Maybe.  I think where KaiGai-san is trying to go with this is being
> able to turn an ExpandedObject (which could contain very large amounts
> of data) directly into a toast pointer or vice versa.  There's nothing
> really preventing a TOAST OID from having more than 1GB of data
> attached, and if you had a side channel like this you could transfer
> the data without ever having to form a larger-than-1GB tuple.

BTW, you could certainly imagine attaching such infrastructure for
direct-to-TOAST-table I/O to ExpandedObjects today, independently
of any ambitions about larger-than-1GB values.  I'm not entirely sure
how often it would get exercised, which is the key subtext of what
I wrote before, but it's clearly a possible optimization of what
we do now.
        regards, tom lane

Re: varlena beyond 1GB and matrix

From

Kohei KaiGai

Date:

08 December 2016, 03:44:23

2016-12-08 8:04 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Dec 7, 2016 at 8:50 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> I like to propose a new optional type handler 'typserialize' to
>> serialize an in-memory varlena structure (that can have indirect
>> references) to on-disk format.
>> If any, it shall be involced on the head of toast_insert_or_update()
>> than indirect references are transformed to something other which
>> is safe to save. (My expectation is, the 'typserialize' handler
>> preliminary saves the indirect chunks to the toast relation, then
>> put toast pointers instead.)
>
> This might not work.  The reason is that we have important bits of
> code that expect that they can figure out how to do some operation on
> a datum (find its length, copy it, serialize it) based only on typlen
> and typbyval.  See src/backend/utils/adt/datum.c for assorted
> examples.  Note also the lengthy comment near the top of the file,
> which explains that typlen > 0 indicates a fixed-length type, typlen
> == -1 indicates a varlena, and typlen == -2 indicates a cstring.  I
> think there's probably room for other typlen values; for example, we
> could have typlen == -3 indicate some new kind of thing -- a
> super-varlena that has a higher length limit, or some other kind of
> thing altogether.
>
> Now, you can imagine trying to treat what you're talking about as a
> new type of TOAST pointer, but I think that's not going to help,
> because at some point the TOAST pointer gets de-toasted into a varlena
> ... which is still limited to 1GB.  So that's not really going to
> work.  And it brings out another point, which is that if you define a
> new typlen code, like -3, for super-big things, they won't be
> varlenas, which means they won't work with the existing TOAST
> interfaces.  Still, you might be able to fix that.  You would probably
> have to do some significant surgery on the wire protocol, per the
> commit message for fa2fa995528023b2e6ba1108f2f47558c6b66dcd.
>
Hmm... The reason why I didn't introduce the idea of 64bit varlena format is
this approach seems too invasive for existing PostgreSQL core and extensions,
because I assumed this "long" variable length datum utilize/enhance existing
varlena infrastructure.
However, once we have completely independent infrastructure from the exiting
varlena, we may not need to have a risk of code mixture.
It seems to me an advantage.

My concern about ExpandedObject is its flat format still needs 32bit varlena
header that restrict the total length up to 1GB, so, it was not possible to
represent a large data chunk.
So, I didn't think the current ExpandedObject is a solution for us.

Regarding to the protocol stuff, I want to consider the support of a large
record next to the internal data format, because my expected primary
usage is an internal use for in-database analytics, then user will get
its results from a complicated logic described in PL function.

> I think it's probably a mistake to conflate objects with substructure
> with objects > 1GB.  Those are two somewhat orthogonal needs.  As Jim
> also points out, expanded objects serve the first need.  Of course,
> those assume that we're dealing with a varlena, so if we made a
> super-varlena, we'd still need to create an equivalent.  But perhaps
> it would be a fairly simple adaptation of what we've already got.
> Handling objects >1GB at all seems like the harder part of the
> problem.
>
I could get your point almost. Does the last line above mention about
amount of the data object >1GB? even if the "super-varlena" format
allows 64bit length?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: varlena beyond 1GB and matrix

From

Kohei KaiGai

Date:

08 December 2016, 04:01:35

2016-12-08 8:36 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Dec 7, 2016 at 8:50 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>> I like to propose a new optional type handler 'typserialize' to
>>> serialize an in-memory varlena structure (that can have indirect
>>> references) to on-disk format.
>
>> I think it's probably a mistake to conflate objects with substructure
>> with objects > 1GB.  Those are two somewhat orthogonal needs.
>
> Maybe.  I think where KaiGai-san is trying to go with this is being
> able to turn an ExpandedObject (which could contain very large amounts
> of data) directly into a toast pointer or vice versa.  There's nothing
> really preventing a TOAST OID from having more than 1GB of data
> attached, and if you had a side channel like this you could transfer
> the data without ever having to form a larger-than-1GB tuple.
>
> The hole in that approach, to my mind, is that there are too many places
> that assume that they can serialize an ExpandedObject into part of an
> in-memory tuple, which might well never be written to disk, or at least
> not written to disk in a table.  (It might be intended to go into a sort
> or hash join, for instance.)  This design can't really work for that case,
> and unfortunately I think it will be somewhere between hard and impossible
> to remove all the places where that assumption is made.
>
Regardless of the ExpandedObject, does the flatten format need to
contain fully flatten data chunks?
If a data type internally contains multiple toast pointers as like an array,
its flatten image is likely very small we can store using an existing
varlena mechanism.
One problem is VARSIZE() will never tell us exact total length of
the data even if it references multiple GB scale chunks.

> At a higher level, I don't understand exactly where such giant
> ExpandedObjects would come from.  (As you point out, there's certainly
> no easy way for a client to ship over the data for one.)  So this feels
> like a very small part of a useful solution, if indeed it's part of a
> useful solution at all, which is not obvious.
>
I expect an aggregate function that consumes millions of rows as source
of a large matrix larger than 1GB. Once it is formed to a variable, it is
easy to deliver as an argument of PL functions.

> FWIW, ExpandedObjects themselves are far from a fully fleshed out
> concept, one of the main problems being that they don't have very long
> lifespans except in the case that they're the value of a plpgsql
> variable.  I think we would need to move things along quite a bit in
> that area before it would get to be useful to think in terms of
> ExpandedObjects containing multiple GB of data.  Otherwise, the
> inevitable flattenings and re-expansions are just going to kill you.q
>
> Likewise, the need for clients to be able to transfer data in chunks
> gets pressing well before you get to 1GB.  So there's a lot here that
> really should be worked on before we try to surmount that barrier.
>
Do you point out the problem around client<->server protocol, isn't it?
Likely, we eventually need this enhancement. I agree.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: varlena beyond 1GB and matrix

From

Craig Ringer

Date:

08 December 2016, 06:53:41

On 8 December 2016 at 07:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Likewise, the need for clients to be able to transfer data in chunks
> gets pressing well before you get to 1GB.  So there's a lot here that
> really should be worked on before we try to surmount that barrier.

Yeah. I tend to agree with Tom here. Allowing >1GB varlena-like
objects, when we can barely cope with our existing ones in
dump/restore, in clients, etc, doesn't strike me as quite the right
direction to go in.

I understand it solves a specific, niche case you're dealing with when
exchanging big blobs of data with a GPGPU. But since the client
doesn't actually see that large blob, it's split up into objects that
will work on the current protocol and interfaces, why is is necessary
to have instances of a single data type with >1GB values, rather than
take a TOAST-like / pg_largeobject-like approach and split it up for
storage?

I'm concerned that this adds a special case format that will create
maintenance burden and pain down the track, and it won't help with
pain points users face like errors dumping/restoring rows with big
varlena objects, problems efficiently exchanging them on the wire
protocol, etc.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: varlena beyond 1GB and matrix

From

Craig Ringer

Date:

08 December 2016, 07:11:19

On 8 December 2016 at 12:01, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

>> At a higher level, I don't understand exactly where such giant
>> ExpandedObjects would come from.  (As you point out, there's certainly
>> no easy way for a client to ship over the data for one.)  So this feels
>> like a very small part of a useful solution, if indeed it's part of a
>> useful solution at all, which is not obvious.
>>
> I expect an aggregate function that consumes millions of rows as source
> of a large matrix larger than 1GB. Once it is formed to a variable, it is
> easy to deliver as an argument of PL functions.

You might be interested in how Java has historically dealt with similar issues.

For a long time the JVM had quite low limits on the maximum amount of
RAM it could manage, in the single gigabytes for a long time. Even for
the 64-bit JVM. Once those limitations were lifted, the garbage
collector algorithm placed a low practical limit on how much RAM it
could cope with effectively.

If you were doing scientific computing with Java, lots of big
image/video work, using GPGPUs, doing large scale caching, etc, this
rapidly became a major pain point. So people introduced external
memory mappings to Java, where objects could reference and manage
memory outside the main JVM heap. The most well known is probably
BigMemory (https://www.terracotta.org/products/bigmemory), but there
are many others. They exposed this via small opaque handle objects
that you used to interact with the external memory store via library
functions.

It might make a lot of sense to apply the same principle to
PostgreSQL, since it's much less intrusive than true 64-bit VARLENA.
Rather than extending all of PostgreSQL to handle special-case
split-up VARLENA extended objects, have your interim representation be
a simple opaque value that points to externally mapped memory. Your
operators for the type, etc, know how to work with it. You probably
don't need a full suite of normal operators, you'll be interacting
with the data in a limited set of ways.

The main issue would presumably be one of resource management, since
we currently assume we can just copy a Datum around without telling
anybody about it or doing any special management. You'd need to know
when to clobber your external segment, when to copy(!) it if
necessary, etc. This probably makes sense for working with GPGPUs
anyway, since they like dealing with big contiguous chunks of memory
(or used to, may have improved?).

It sounds like only code specifically intended to work with the
oversized type should be doing much with it except passing it around
as an opaque handle, right?

Do you need to serialize this type to/from disk at all? Or just
exchange it in chunks with a client? If you do need to, can you
possibly do TOAST-like or pg_largeobject-like storage where you split
it up for on disk storage then reassemble for use?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: varlena beyond 1GB and matrix

From

Kohei KaiGai

Date:

08 December 2016, 13:00:36

2016-12-08 16:11 GMT+09:00 Craig Ringer <craig@2ndquadrant.com>:
> On 8 December 2016 at 12:01, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>
>>> At a higher level, I don't understand exactly where such giant
>>> ExpandedObjects would come from.  (As you point out, there's certainly
>>> no easy way for a client to ship over the data for one.)  So this feels
>>> like a very small part of a useful solution, if indeed it's part of a
>>> useful solution at all, which is not obvious.
>>>
>> I expect an aggregate function that consumes millions of rows as source
>> of a large matrix larger than 1GB. Once it is formed to a variable, it is
>> easy to deliver as an argument of PL functions.
>
> You might be interested in how Java has historically dealt with similar issues.
>
> For a long time the JVM had quite low limits on the maximum amount of
> RAM it could manage, in the single gigabytes for a long time. Even for
> the 64-bit JVM. Once those limitations were lifted, the garbage
> collector algorithm placed a low practical limit on how much RAM it
> could cope with effectively.
>
> If you were doing scientific computing with Java, lots of big
> image/video work, using GPGPUs, doing large scale caching, etc, this
> rapidly became a major pain point. So people introduced external
> memory mappings to Java, where objects could reference and manage
> memory outside the main JVM heap. The most well known is probably
> BigMemory (https://www.terracotta.org/products/bigmemory), but there
> are many others. They exposed this via small opaque handle objects
> that you used to interact with the external memory store via library
> functions.
>
> It might make a lot of sense to apply the same principle to
> PostgreSQL, since it's much less intrusive than true 64-bit VARLENA.
> Rather than extending all of PostgreSQL to handle special-case
> split-up VARLENA extended objects, have your interim representation be
> a simple opaque value that points to externally mapped memory. Your
> operators for the type, etc, know how to work with it. You probably
> don't need a full suite of normal operators, you'll be interacting
> with the data in a limited set of ways.
>
> The main issue would presumably be one of resource management, since
> we currently assume we can just copy a Datum around without telling
> anybody about it or doing any special management. You'd need to know
> when to clobber your external segment, when to copy(!) it if
> necessary, etc. This probably makes sense for working with GPGPUs
> anyway, since they like dealing with big contiguous chunks of memory
> (or used to, may have improved?).
>
> It sounds like only code specifically intended to work with the
> oversized type should be doing much with it except passing it around
> as an opaque handle, right?
>
Thanks for your proposition. Its characteristics looks like a largeobject
but volatile (because of no backing storage).
As you mentioned, I also think its resource management is the core of issues.
We have no reference count mechanism, so it is uncertain when we should
release the orphan memory chunk, or we have to accept this memory consumption
until clean-up of the relevant memory context.
Moreever, we have to pay attention to the scenario when this opaque identifier
is delivered to the background worker, thus, it needs to be valid on other
process's context. (Or, prohibit to exchange it on the planner stage?)

> Do you need to serialize this type to/from disk at all? Or just
> exchange it in chunks with a client? If you do need to, can you
> possibly do TOAST-like or pg_largeobject-like storage where you split
> it up for on disk storage then reassemble for use?
>
Even though I don't expect this big chunk is entirely exported to/imported
from the client at once, it makes sense to save/load this big chunk to/from
the disk, because construction of a big memory structure consumes much CPU
cycles than simple memory copy from buffers.
So, in other words, it does not need valid typinput/typoutput handlers, but
serialization/deserialization on disk i/o is helpful.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: [HACKERS] varlena beyond 1GB and matrix

From

Robert Haas

Date:

23 December 2016, 02:23:26

On Wed, Dec 7, 2016 at 10:44 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> Handling objects >1GB at all seems like the harder part of the
>> problem.
>>
> I could get your point almost. Does the last line above mention about
> amount of the data object >1GB? even if the "super-varlena" format
> allows 64bit length?

Sorry, I can't understand your question about what I wrote.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] varlena beyond 1GB and matrix

From

Robert Haas

Date:

23 December 2016, 02:24:10

On Wed, Dec 7, 2016 at 11:01 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> Regardless of the ExpandedObject, does the flatten format need to
> contain fully flatten data chunks?

I suspect it does, and I think that's why this isn't going to get very
far without a super-varlena format.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] varlena beyond 1GB and matrix

From

Kohei KaiGai

Date:

23 December 2016, 04:44:39

2016-12-23 8:23 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Dec 7, 2016 at 10:44 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>> Handling objects >1GB at all seems like the harder part of the
>>> problem.
>>>
>> I could get your point almost. Does the last line above mention about
>> amount of the data object >1GB? even if the "super-varlena" format
>> allows 64bit length?
>
> Sorry, I can't understand your question about what I wrote.
>
I thought you just pointed out it is always harder part to treat large
amount of data even if data format allows >1GB or more. Right?

-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: [HACKERS] varlena beyond 1GB and matrix

From

Kohei KaiGai

Date:

23 December 2016, 04:49:43

2016-12-23 8:24 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Dec 7, 2016 at 11:01 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> Regardless of the ExpandedObject, does the flatten format need to
>> contain fully flatten data chunks?
>
> I suspect it does, and I think that's why this isn't going to get very
> far without a super-varlena format.
>
Yep, I'm now under investigation how to implement with typlen == -3
approach. Likely, it will be the most straight-forward infrastructure for
other potential use cases more than matrix/vector.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: [HACKERS] varlena beyond 1GB and matrix

From

Robert Haas

Date:

23 December 2016, 20:31:50

On Thu, Dec 22, 2016 at 8:44 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> 2016-12-23 8:23 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
>> On Wed, Dec 7, 2016 at 10:44 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>>> Handling objects >1GB at all seems like the harder part of the
>>>> problem.
>>>>
>>> I could get your point almost. Does the last line above mention about
>>> amount of the data object >1GB? even if the "super-varlena" format
>>> allows 64bit length?
>>
>> Sorry, I can't understand your question about what I wrote.
>>
> I thought you just pointed out it is always harder part to treat large
> amount of data even if data format allows >1GB or more. Right?

I *think* we agreeing.  But I'm still not 100% sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company