RE: Best way to keep track of a sliced TOAST - Mailing list pgsql-hackers
From | Bruno Hass |
---|---|
Subject | RE: Best way to keep track of a sliced TOAST |
Date | |
Msg-id | BL0PR07MB4065471221B9A74A3070B4CF91440@BL0PR07MB4065.namprd07.prod.outlook.com Whole thread Raw |
In response to | Re: Best way to keep track of a sliced TOAST (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Best way to keep track of a sliced TOAST
|
List | pgsql-hackers |
> It seems to me that making this overly pluggable is likely to be a net
> negative, because there probably aren't really that many different
> ways of doing this that are useful, and because having to store more
> identifying information will make the toasted datum larger. One idea
> is to let the datatype divide the datum up into variable-sized chunks
> and then have the on-disk format store a list of chunk lengths in
> chunk 0 (and following, if there are lots of chunks?) followed by the
> chunks themselves. The data would all go into the TOAST table as it
> does today, and the TOASTed data could be read without knowing
> anything about the data type. However, code that knows how the data
> was chunked at TOAST time could try to speed things up by operating
> directly on the compressed data if it can figure out which chunk it
> needs without fetching everything.
> ways of doing this that are useful, and because having to store more
> identifying information will make the toasted datum larger. One idea
> is to let the datatype divide the datum up into variable-sized chunks
> and then have the on-disk format store a list of chunk lengths in
> chunk 0 (and following, if there are lots of chunks?) followed by the
> chunks themselves. The data would all go into the TOAST table as it
> does today, and the TOASTed data could be read without knowing
> anything about the data type. However, code that knows how the data
> was chunked at TOAST time could try to speed things up by operating
> directly on the compressed data if it can figure out which chunk it
> needs without fetching everything.
This idea is what I was hoping to achieve. Would we be able to make optimizations on deTOASTing just by storing the chunk lengths in chunk 0? Also, wouldn't it break existing functions by dedicating a whole chunk (possibly more) to such metadata?
De: Robert Haas <robertmhaas@gmail.com>
Enviado: terça-feira, 12 de março de 2019 14:34
Para: Bruno Hass
Cc: pgsql-hackers
Assunto: Re: Best way to keep track of a sliced TOAST
Enviado: terça-feira, 12 de março de 2019 14:34
Para: Bruno Hass
Cc: pgsql-hackers
Assunto: Re: Best way to keep track of a sliced TOAST
On Mon, Mar 11, 2019 at 9:27 AM Bruno Hass <bruno_hass@live.com> wrote:
> I've been reading about TOASTing and would like to modify how the slicing works by taking into consideration the type of the varlena field. These changes would support future implementations of type specific optimized TOAST'ing functions. The first step would be to add information to the TOAST so we know if it is sliced or not and by which function it was sliced and TOASTed. This information should not break the current on disk format of TOASTs. I had the idea of putting this information on the varattrib struct va_header, perhaps adding more bit layouts to represent sliced TOASTs. This idea, however, was pointed to me to be a rather naive approach. What would be the best way to do this?
Well, you can't really use va_header, because every possible bit
pattern for va_header means something already. The first byte tells
us what kind of varlena we have:
* Bit layouts for varlena headers on big-endian machines:
*
* 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
* 01xxxxxx 4-byte length word, aligned, *compressed* data (up to 1G)
* 10000000 1-byte length word, unaligned, TOAST pointer
* 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 126b)
*
* Bit layouts for varlena headers on little-endian machines:
*
* xxxxxx00 4-byte length word, aligned, uncompressed data (up to 1G)
* xxxxxx10 4-byte length word, aligned, *compressed* data (up to 1G)
* 00000001 1-byte length word, unaligned, TOAST pointer
* xxxxxxx1 1-byte length word, unaligned, uncompressed data (up to 126b)
All of the bits other than the ones that tell us what kind of varlena
we've got are part of the length word itself; you couldn't use any bit
pattern for some other purpose without breaking on-disk compatibility
with existing releases. What you could possibly do is add a new
possible value of vartag_external, which tells us what "kind" of
toasted datum we've got. Currently, toasted datums stored on disk are
always type 18, but there's no reason that I know of why we couldn't
have more than one possibility there.
However, I think you might want to discuss on this mailing list a bit
more about what you are hoping to achieve before you do too much
development, at least if you aspire to get something committed. A
project like the one you are proposing sounds like something not for
the faint of heart, and it's not really clear what benefits you
anticipate. I think there has been previous discussion of this topic
at least for jsonb, so you might also want to search the archives for
those discussions. I wouldn't go so far as to say that this idea
can't work or wouldn't have any value, but it does seem like the kind
of thing where you could spend a lot of time going down a dead end,
and discussion on the list might help you avoid some of those dead
ends.
It seems to me that making this overly pluggable is likely to be a net
negative, because there probably aren't really that many different
ways of doing this that are useful, and because having to store more
identifying information will make the toasted datum larger. One idea
is to let the datatype divide the datum up into variable-sized chunks
and then have the on-disk format store a list of chunk lengths in
chunk 0 (and following, if there are lots of chunks?) followed by the
chunks themselves. The data would all go into the TOAST table as it
does today, and the TOASTed data could be read without knowing
anything about the data type. However, code that knows how the data
was chunked at TOAST time could try to speed things up by operating
directly on the compressed data if it can figure out which chunk it
needs without fetching everything.
But that is just an idea, and it might turn out to suck.
Nice name, by the way, if an inferior spelling. :-)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
> I've been reading about TOASTing and would like to modify how the slicing works by taking into consideration the type of the varlena field. These changes would support future implementations of type specific optimized TOAST'ing functions. The first step would be to add information to the TOAST so we know if it is sliced or not and by which function it was sliced and TOASTed. This information should not break the current on disk format of TOASTs. I had the idea of putting this information on the varattrib struct va_header, perhaps adding more bit layouts to represent sliced TOASTs. This idea, however, was pointed to me to be a rather naive approach. What would be the best way to do this?
Well, you can't really use va_header, because every possible bit
pattern for va_header means something already. The first byte tells
us what kind of varlena we have:
* Bit layouts for varlena headers on big-endian machines:
*
* 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
* 01xxxxxx 4-byte length word, aligned, *compressed* data (up to 1G)
* 10000000 1-byte length word, unaligned, TOAST pointer
* 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 126b)
*
* Bit layouts for varlena headers on little-endian machines:
*
* xxxxxx00 4-byte length word, aligned, uncompressed data (up to 1G)
* xxxxxx10 4-byte length word, aligned, *compressed* data (up to 1G)
* 00000001 1-byte length word, unaligned, TOAST pointer
* xxxxxxx1 1-byte length word, unaligned, uncompressed data (up to 126b)
All of the bits other than the ones that tell us what kind of varlena
we've got are part of the length word itself; you couldn't use any bit
pattern for some other purpose without breaking on-disk compatibility
with existing releases. What you could possibly do is add a new
possible value of vartag_external, which tells us what "kind" of
toasted datum we've got. Currently, toasted datums stored on disk are
always type 18, but there's no reason that I know of why we couldn't
have more than one possibility there.
However, I think you might want to discuss on this mailing list a bit
more about what you are hoping to achieve before you do too much
development, at least if you aspire to get something committed. A
project like the one you are proposing sounds like something not for
the faint of heart, and it's not really clear what benefits you
anticipate. I think there has been previous discussion of this topic
at least for jsonb, so you might also want to search the archives for
those discussions. I wouldn't go so far as to say that this idea
can't work or wouldn't have any value, but it does seem like the kind
of thing where you could spend a lot of time going down a dead end,
and discussion on the list might help you avoid some of those dead
ends.
It seems to me that making this overly pluggable is likely to be a net
negative, because there probably aren't really that many different
ways of doing this that are useful, and because having to store more
identifying information will make the toasted datum larger. One idea
is to let the datatype divide the datum up into variable-sized chunks
and then have the on-disk format store a list of chunk lengths in
chunk 0 (and following, if there are lots of chunks?) followed by the
chunks themselves. The data would all go into the TOAST table as it
does today, and the TOASTed data could be read without knowing
anything about the data type. However, code that knows how the data
was chunked at TOAST time could try to speed things up by operating
directly on the compressed data if it can figure out which chunk it
needs without fetching everything.
But that is just an idea, and it might turn out to suck.
Nice name, by the way, if an inferior spelling. :-)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date: