Re: [HACKERS] GSOC - TOAST'ing in slices - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] GSOC - TOAST'ing in slices
Date
Msg-id CA+TgmoZOTjzt4qCHQJNtvJ-EksUqdHNhT_WvEwaL5ZjpAyz3Eg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] GSOC - TOAST'ing in slices  (George Papadrosou <gpapadrosou@gmail.com>)
Responses Re: [HACKERS] GSOC - TOAST'ing in slices  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
<gpapadrosou@gmail.com> wrote:
> The project’s idea is implement different slicing approaches according to
> the value’s datatype. For example a text field could be split upon character
> boundaries while a JSON document would be split in a way that allows fast
> access to it’s keys or values.

Hmm.  So if you had a long text field containing multibyte characters,
and you split it after, say, every 1024 characters rather than after
every N bytes, then you could do substr() without detoasting the whole
field.  On the other hand, my guess is that you'd waste a fair amount
of space in the TOAST table, because it's unlikely that the chunks
would be exactly the right size to fill every page of the table
completely.  On balance it seems like you'd be worse off, because
substr() probably isn't all that common an operation.

Now, in contrast, slicing JSON is a very common operation, so a
smarter slicing scheme might well pay off, but the question is - what
kind of a splitting method would actually allow fast access to the
keys or values?  It strikes me that this might be a difficult problem.
Tabula raza, you could design a serialization format that was aware
that it might get toasted and was constructed in such a way that as to
contain boundaries that are actually referenced from within the
format, so that, say, after reading the toplevel keys and values, you
could know that you next need chunk #103.  But unless the existing
jsonb binary format was designed with that in mind, it doesn't seem
likely to end up being true just by chance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Remove obsolete text from hash/README
Next
From: Arthur Zakirov
Date:
Subject: Re: [HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USERMAPPING statements