RE: Best way to keep track of a sliced TOAST - Mailing list pgsql-hackers

From Bruno Hass
Subject RE: Best way to keep track of a sliced TOAST
Date
Msg-id BL0PR07MB4065DE68DBDCAF316F0145BF91420@BL0PR07MB4065.namprd07.prod.outlook.com
Whole thread Raw
In response to Re: Best way to keep track of a sliced TOAST  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Best way to keep track of a sliced TOAST
List pgsql-hackers
I would like to optimize the jsonb key access operations. I could not find the discussion you've mentioned, but I am giving some thought to the idea. 

Instead of storing lengths, could we dedicate the first chunk of the TOASTed jsonb to store where each key is located? Would it be a good idea?

You've mentioned that the current jsonb format is byte-oriented. Does that imply that a single jsonb key value might be split between multiple chunks?


Bruno Hass


De: Robert Haas <robertmhaas@gmail.com>
Enviado: sexta-feira, 15 de março de 2019 12:22
Para: Bruno Hass
Cc: pgsql-hackers
Assunto: Re: Best way to keep track of a sliced TOAST
 
On Fri, Mar 15, 2019 at 7:37 AM Bruno Hass <bruno_hass@live.com> wrote:
> This idea is what I was hoping to achieve. Would we be able to make optimizations on deTOASTing  just by storing the chunk lengths in chunk 0?

I don't know. I guess we could also NOT store the chunk lengths and
just say that if you don't know which chunk you want by chunk number,
your only other alternative is to read the chunks in order.  The
problem with that is that it you can no longer index by byte-position
without fetching every chunk prior to that byte position, but maybe
that's not important enough to justify the overhead of a list of chunk
lengths.  Or maybe it depends on what you want to do with it.

Again, stuff like what you are suggesting here has been suggested
before.  I think the problem is if someone did the work to invent such
an infrastructure, that wouldn't actually do anything by itself.  We'd
then need to find an application of it where it brought us some clear
advantage.  As I said in my previous email, jsonb seems like a
promising candidate, but I don't think it's a slam dunk.  What would
the design look like, exactly?  Which operations would get faster, and
could we really make them work?  The existing format is, I think,
designed with a byte-oriented format in mind, and a chunk-oriented
format might have different design constraints.  It seems like an idea
with potential, but there's a lot of daylight between a directional
idea with potential and a specific idea accompanied by a high-quality
implementation thereof.

> Also, wouldn't it break existing functions by dedicating a whole chunk (possibly more) to such metadata?

Anybody writing such a patch would have to be prepared to fix any such
breakage that occurred, at least as regards core code.  I would guess
that this could be done without breaking too much third-party code,
but I guess it depends on exactly what the author of this hypothetical
patch ends up changing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Nikita Glukhov
Date:
Subject: Re: Psql patch to show access methods info
Next
From: Tomas Vondra
Date:
Subject: Re: performance issue in remove_from_unowned_list()