Re: [Proposal] Page Compression for OLTP - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: [Proposal] Page Compression for OLTP |
Date | |
Msg-id | alpine.DEB.2.22.394.2005210914440.2856263@pseudo Whole thread Raw |
In response to | [Proposal] Page Compression for OLTP (chenhj <chjischj@163.com>) |
Responses |
Re: [Proposal] Page Compression for OLTP
|
List | pgsql-hackers |
Hello, My 0.02€, some of which may just show some misunderstanding on my part: - you have clearly given quite a few thoughts about the what and how… which makes your message an interesting read. - Could this be proposed as some kind of extension, provided that enough hooks are available? ISTM that foreign tables and/or alternative storage engine (aka ACCESS METHOD) provide convenient APIs which could fit the need for these? Or are they not appropriate? You seem to suggest that there are not. If not, what could be done to improve API to allow what you are seeking to do? Maybe you need a somehow lower-level programmable API which does not exist already, or at least is not exported already, but could be specified and implemented with limited effort? Basically you would like to read/write pg pages to somewhere, and then there is the syncing issue to consider. Maybe such a "page storage" API could provide benefit for some specialized hardware, eg persistent memory stores, so there would be more reason to define it anyway? I think it might be valuable to give it some thoughts. - Could you maybe elaborate on how your plan differs from [4] and [5]? - Have you consider keeping page headers and compressing tuple data only? - I'm not sure there is a point in going below the underlying file system blocksize, quite often 4 KiB? Or maybe yes? Or is there a benefit to aim at 1/4 even if most pages overflow? - ISTM that your approach entails 3 "files". Could it be done with 2? I'd suggest that the possible overflow pointers (coa) could be part of the headers so that when reading the 3.1 page, then the header would tell where to find the overflow 3.2, without requiring an additional independent structure with very small data in it, most of it zeros. Possibly this is not possible, because it would require some available space in standard headers when the is page is not compressible, and there is not enough. Maybe creating a little room for that in existing headers (4 bytes could be enough?) would be a good compromise. Hmmm. Maybe the approach I suggest would only work for 1/2 compression, but not for other target ratios, but I think it could be made to work if the pointer can entail several blocks in the overflow table. - If one page is split in 3 parts, could it creates problems on syncing, if 1/3 or 2/3 pages get written, but maybe that is manageable with WAL as it would note that the page was not synced and that is enough for replay. - I'm unclear how you would manage the 2 representations of a page in memory. I'm afraid that a 8 KiB page compressed to 4 KiB would basically take 12 KiB, i.e. reduce the available memory for caching purposes. Hmmm. The current status is that a written page probably takes 16 KiB, once in shared buffers and once in the system caches, so it would be an improvement anyway. - Maybe the compressed and overflow table could become bloated somehow, which would require the vaccuuming implementation and add to the complexity of the implementation? - External tools should be available to allow page inspection, eg for debugging purposes. -- Fabien.
pgsql-hackers by date: