Re: [HACKERS] Proposal: generic WAL compression - Mailing list pgsql-hackers
From | Oleg Ivanov |
---|---|
Subject | Re: [HACKERS] Proposal: generic WAL compression |
Date | |
Msg-id | 5A87DA4C.6050509@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] Proposal: generic WAL compression (Antonin Houska <ah@cybertec.at>) |
List | pgsql-hackers |
Hello everyone! Unfortunately, I faced the use case with RUM index, in which my patch produced enormously large time overhead (queries execution time is about 2 or 3 times slower). Only now I finally managed to limit this overhead by 20% or even make it statistically insignificant on my HDD. Nevertheless, SSD evaluation is still required. So I attach new test totat_test.sh script which includes the bad RUM use case and the new version of the patch. Thanks everybody for review, it improved the patch a lot. The majority of the listed drawbacks were fixed, the others are discussed below. On 11/16/2017 08:31 PM, Antonin Houska wrote: > * writeToPtr() and readFromPtr() are applied to the existing code. I think > this is a reason for a separate diff, to be applied before the actual patch. Removed the existing code refactoring, that should be a separate patch indeed. > * What's the reason for changing FRAGMENT_HEADER_SIZE ? I see no change in > the data layout that writeFragment() produces. Diff delta has its own header which consists of one char and one int. To make this point more clear, I defined DIFF_DELTA_HEADER_SIZE and added 2 * DIFF_DELTA_HEADER_SIZE to MAX_DELTA_SIZE. > ** "An O(ND) Difference Algorithm and Its Variations" - I think the reference > should contain more information, e.g. name of the author(s). I think I even > saw URLs to scientific papers in the PG source but I'm not 100% sure right > now. The reference to the paper contains much more information now. > ** As for the algorithm itself: Although I didn't have enough patience (and > time) to follow it in every detail for this first review, I think I can > imagine what it is about. Yet I think reading would be easier if the > concepts of alignment and "diff delta" were depicted in the comments, > perhaps using some sort of "ASCII graphics". I briefly described the main idea of the dynamical programming algorithm in comments of alignRegions. It can be further refined if you think it is still unclear now, but I'm afraid the detailed description will result in just copying the referred paper into the comments. > ** DiffDeltaOperations enumeration: the individual values should be described. > > > * computeRegionDiffDelta() > > ** Does "previous delta" in one of the comments refer to "base delta", > i.e. the delta computation w/o your patch? If so, the comment should > mention it explicitly. > > ** If you use Min() to initialize the for-loop, it'd be more consistent if you > also used Max() when checking the limits: > > for (i = Min(validStart, targetStart); i < Max(validEnd, targetEnd); ++i) > > And similarly you can simplify the body of the loop. I found that this part of code is unnecessary and deleted it. On 11/17/2017 03:02 PM, Arthur Silva wrote: > I wonder if the algorithm could be changed to operate with units of 2 > or 4 bytes (instead of 1). > Since the algorithm speed is essentially doubled/quadrupled it could > yield a better runtime/compression trade-off. > > Does that make sense? This approach implies that we cannot detect data shift by one byte, for example. It is suitable for 2- or 4-bytes-aligned data, but looks not general enough. On 01/23/2018 12:37 AM, Stephen Frost wrote: >> * writeToPtr() and readFromPtr() are applied to the existing code. I think >> this is a reason for a separate diff, to be applied before the actual patch. > I'm not really a fan of using these, for my 2c. I'm not sure how others > feel, but having these macros which change one of the arguments to the > macro (by advancing the pointer) doesn't result in a net improvement to > readability for me. Sounds reasonable, but the patch uses such constructions as writeToPtr rather often. How do you feel about using writeToPtr(&ptr, data) which which more explicitly indicates the first argument changing? On 02/07/2018 09:02 PM, Markus Nullmeier wrote: > One general comment I can already make is that enabling compression > should be made optional, which appears to be a small and easy addition > to the generic WAL API. Now I'm working on this.I consider creation the function GenericXLogRegisterBufferEx in addition to GenericXLogRegisterBuffer. GenericXLogRegisterBufferEx will take a structure with parameters for diff delta such as maximal alignment length, to which parts of page it must be applied, etc. And GenericXLogRegisterBufferwill call GenericXLogRegisterBufferEx with default parameters. This allows using different compression settings for different indexes. What do you think about such solution?
Attachment
pgsql-hackers by date: