Re: [PATCH] Compression dictionaries for JSONB - Mailing list pgsql-hackers
From | Aleksander Alekseev |
---|---|
Subject | Re: [PATCH] Compression dictionaries for JSONB |
Date | |
Msg-id | CAJ7c6TOx878pUmmj2ibRiF-JaOYLvYHAA27ebdANgTDOs58PvA@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Compression dictionaries for JSONB (Nikita Malakhov <hukutoc@gmail.com>) |
List | pgsql-hackers |
Hi Nikita, Thanks for your feedback! > Aleksander, I've carefully gone over discussion and still have some questions to ask - > > 1) Is there any means of measuring overhead of dictionaries over vanilla implementation? IMO it is a must because > JSON is a widely used functionality. Also, as it was mentioned before, to check the dictionary value must be detoasted; Not sure what overhead you have in mind. The patch doesn't affect the vanilla JSONB implementation. > 2) Storing dictionaries in one table. As I wrote before, this will surely lead to locks and waits while inserting and updating > dictionaries, and could cause serious performance issues. And vacuuming this table will lead to locks for all tables using > dictionaries until vacuum is complete; I believe this is true to some degree. But doesn't the same generally apply to the rest of catalog tables? I'm not that concerned about inserting/updating since this is a rare operation. Vacuuming shouldn't be such a problem unless the user creates/deletes dictionaries all the time. Am I missing something? > 3) JSON documents in production environments could be very complex and use thousands of keys, so creating dictionary > directly in SQL statement is not very good approach, so it's another reason to have means for creating dictionaries asa > separate tables and/or passing them as files or so; Yes, it was proposed to update dictionaries automatically e.g. during the VACUUM of the table that contains compressed documents. This is simply out of scope of this particular patch. It was argued that the manual update should be supported too, which is implemented in this patch. > 4) Suggested mechanics, if put on top of the TOAST, could not benefit from knowledge if internal JSON structure, which > is seen as important drawback in spite of extensive research work done on working with JSON schema (storing, validating, > etc.), and also it cannot recognize and help to compress duplicated parts of JSON document; Could you please elaborate on this a bit and/or maybe give an example? ... > In Pluggable TOAST we suggest that as an improvement compression should be put inside the Toaster as an option, > thus the Toaster could have maximum benefits from knowledge of data internal structure (and in future use JSON Schema). ... Current implementation doesn't use the knowledge of JSONB format, that's true. This is because previously we agreed there is no "one size fits all" compression method, thus several are going to be supported eventually. The current algorithm was chosen merely as the one that is going to work good enough for any data type, not just JSONB. Nothing prevents an alternative compression method from using the knowledge of JSONB structure. As, I believe, Matthias pointed out above, only partial decompression would be a challenge. This is indeed something that would be better to implement somewhere closer to the TOAST level. Other than that I'm not sure what you mean. > 5) A small test issue - if dictionaried' JSON has a key which is equal to OID used in a dictionary for some other key? Again, I'm having difficulties understanding the case you are describing. Could you give a specific example? > For using in special Toaster for JSON datatype compression dictionaries seem to be very valuable addition, but now I > have to agree that this feature in current state is competing with Pluggable TOAST. I disagree with the word "competing" here. Again, Matthias had a very good point about this above. In short, pluggable TOAST is a low-level internal mechanism, but it doesn't provide a good interface for the end user and has several open issues. The most important one IMO is how it is supposed to work with pluggable AMs in the general case. "Compression dictionaries" have a good user interface, and the implementation is not that important. The current implementation uses casts, as the only option available at the moment. But nothing prevents it from using Pluggable TOAST if this will produce a cleaner code (I believe it will) and will allow delivering partial decompression (this is yet to be figured out). -- Best regards, Aleksander Alekseev
pgsql-hackers by date: