Re: [HACKERS] Proposal: generic WAL compression - Mailing list pgsql-hackers

From Arthur Silva
Subject Re: [HACKERS] Proposal: generic WAL compression
Date
Msg-id CAO_YK0VEAMm1ChYidO+XkbEPWDdOkDdnAJR1j3QVgJdtkrP_yg@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Proposal: generic WAL compression  (Oleg Ivanov <o.ivanov@postgrespro.ru>)
List pgsql-hackers
I wonder if the algorithm could be changed to operate with units of 2 or 4 bytes (instead of 1).
Since the algorithm speed is essentially doubled/quadrupled it could yield a better runtime/compression trade-off.

Does that make sense?

--
Arthur Silva


On Wed, Nov 1, 2017 at 12:43 AM, Oleg Ivanov <o.ivanov@postgrespro.ru> wrote:
Hackers,

a few years ago generic WAL was proposed by Alexander Korotkov (https://www.postgresql.org/message-id/flat/CAPpHfdsXwZmojm6Dx%2BTJnpYk27kT4o7Ri6X_4OSWcByu1Rm%2BVA%40mail.gmail.com#CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com). and was committed into PostgreSQL 9.6.
One of the generic WAL advantages is the common interface for safe interaction with WAL for modules and extensions. The interface allows module to register the page, then change it, and then generic WAL generates and writes into xlog the algorithm of changing the old version of page into the new one. In the case of crash and recovery this algorithm may be applied.

Bloom and RUM indexes use generic WAL. The issue is that the generic algorithm of turning the old page into the new one is not optimal in the sense of record size in xlog. Now the main idea of the approach is to find all positions in the page where new version is different from the original one and write these changes into xlog. It works well if the page was rewritten completely or if only a few bytes have been changed. Nevertheless, this approach produces too large WAL record in the case of inserting or deleting a few bytes into the page. In this case there are almost no position where the old version and the new one are equal, so the record size is near the page size, but actual amount of changes in the page is small. This is exactly what happens often in RUM indexes.

In order to overcome that issue, I would like to propose the patch, which provides possibility to use another approach of the WAL record construction. If another approach fails to make a short enough record, it rolls back to the original approach. The main idea of another approach is not to perform bytewise comparison of pages, but finding the minimal editing distance between two pages and the corresponding editing algorithm. In the general case, finding editing distance takes O(N^2) time and memory. But there is also an algorithm which requires only O(ND) time and O(D^2) memory, where D is the editing distance between two sequences. Also for given D algorithm may show that the minimal editing distance between two sequences is more than D in the same amount of time and memory.

The special case of this algorithm which does not consider replace operations is described in the paper (http://www.xmailserver.org/diff2.pdf). The version of this algorithm which consumes O(ND) time and O(N) memory is used in diff console command, but for our purposes we don't need to increase the constant for the time in order to decrease memory complexity. For RUM indexes we usually have small enough editing distance (less than 64), so D^2 is not too much to store.

The results of experiments:

+------------------------------+----------------+----------------+----------------+----------------+
|             Name             |  WAL_diff(MB)  |  WAL_orig(MB) |  Time_diff(s)  |  Time_orig(s)  |
+------------------------------+----------------+----------------+----------------+----------------+
| rum: make installcheck       | 38.9           | 82.5 | 4.37           | 4.16           |
+------------------------------+----------------+----------------+----------------+----------------+
| bloom: make installcheck     | 1.0            | 1.0 | 0.66           | 0.53           |
+------------------------------+----------------+----------------+----------------+----------------+
| test00.sql                 | 20.5           | 51.0 | 1.86           | 1.41           |
+------------------------------+----------------+----------------+----------------+----------------+
| test01.sql                   | 207.1      | 219.7   | 8.06 | 6.89           |
+------------------------------+----------------+----------------+----------------+----------------+

We can see that the patch provides a little slowdown, but compresses generic WAL efficiently for RUM index. Also I'm going to do a few more experiments on this patch with another data.

The patch was tested on Lubuntu 14.04, but should not contain any platform-specific items. The patch, the files and scripts for doing the experiments and performance tests are attached.

Oleg Ivanov
Postgres Professional
The Russian PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] Proposal: Local indexes for partitioned table
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping