Hello all,
thank you for your replies. I agree with Alexander Korotkov that it is important to have a quality patch at the end of the summer.
Stephen, you mentioned PostGIS, but the conversation seems to lean towards JSONB. What are your thoughts?
Also, if I am to include some ideas/approaches in the proposal, it seems I should really focus on understanding how a specific data type is used, queried and indexed, which is a lot of exploring for a newcomer in postgres code.
In the meanwhile, I am trying to find how jsonb is indexed and queried. After I grasp the current situation I will be to think about new approaches.
Regards,
George
Robert Haas <robertmhaas@gmail.com> writes:On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
<gpapadrosou@gmail.com> wrote:
The project’s idea is implement different slicing approaches according to
the value’s datatype. For example a text field could be split upon character
boundaries while a JSON document would be split in a way that allows fast
access to it’s keys or values.
Hmm. So if you had a long text field containing multibyte characters,
and you split it after, say, every 1024 characters rather than after
every N bytes, then you could do substr() without detoasting the whole
field. On the other hand, my guess is that you'd waste a fair amount
of space in the TOAST table, because it's unlikely that the chunks
would be exactly the right size to fill every page of the table
completely. On balance it seems like you'd be worse off, because
substr() probably isn't all that common an operation.
Keep in mind also that slicing on "interesting" boundaries rather thanwith the current procrustean-bed approach could save you at most one ortwo chunk fetches per access. So the upside seems limited. Moreover,how are you going to know whether a given toast item has been storedaccording to your newfangled approach? I doubt we're going to acceptforcing a dump/reload for this.IMO, the real problem here is to be able to predict which chunk(s) tofetch at all, and I'd suggest focusing on that part of the problem ratherthan changes to physical storage. It's hard to see how to do anythingvery smart for text (except in the single-byte-encoding case, which isalready solved). But the JSONB format was designed with some thoughtto this issue, so you might be able to get some traction there. regards, tom lane