Re: pglz performance - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: pglz performance |
Date | |
Msg-id | 20191127152818.ojgfrxfzr7pqkpic@development Whole thread Raw |
In response to | Re: pglz performance (Andrey Borodin <x4mmm@yandex-team.ru>) |
Responses |
Re: pglz performance
Re: pglz performance |
List | pgsql-hackers |
On Wed, Nov 27, 2019 at 05:47:25PM +0500, Andrey Borodin wrote: >Hi Tomas! > >Thanks for benchmarking this! > >> 26 нояб. 2019 г., в 14:43, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >> >> On Mon, Nov 25, 2019 at 05:29:40PM +0900, Michael Paquier wrote: >>> On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote: >>>> I think status Needs Review describes what is going on better. It's >>>> not like something is awaited from my side. >>> >>> Indeed. You are right so I have moved the patch instead, with "Needs >>> review". The patch status was actually incorrect in the CF app, as it >>> was marked as waiting on author. >>> >>> @Tomas: updated versions of the patches have been sent by Andrey. >> >> I've done benchmarks on the two last patches, using the data sets from >> test_pglz repository [1], but using three simple queries: >> >> 1) prefix - first 100 bytes of the value >> >> SELECT length(substr(value, 0, 100)) FROM t >> >> 2) infix - 100 bytes from the middle >> >> SELECT length(substr(value, test_length/2, 100)) FROM t >> >> 3) suffix - last 100 bytes >> >> SELECT length(substr(value, test_length - 100, 100)) FROM t >> >> See the two attached scripts, implementing this benchmark. The test >> itself did a 60-second pgbench runs (single client) measuring tps on two >> different machines. >> >> patch 1: v4-0001-Use-memcpy-in-pglz-decompression.patch >> patch 2: v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch >> >> The results (compared to master) from the first machine (i5-2500k CPU) >> look like this: >> >> patch 1 | patch 2 >> dataset prefix infix suffix | prefix infix suffix >> ------------------------------------------------------------------------- >> 000000010000000000000001 99% 134% 161% | 100% 126% 152% >> 000000010000000000000006 99% 260% 287% | 100% 257% 279% >> 000000010000000000000008 100% 100% 100% | 100% 95% 91% >> 16398 100% 168% 221% | 100% 159% 215% >> shakespeare.txt 100% 138% 141% | 100% 116% 117% >> mr 99% 120% 128% | 100% 107% 108% >> dickens 100% 129% 132% | 100% 100% 100% >> mozilla 100% 119% 120% | 100% 102% 104% >> nci 100% 149% 141% | 100% 143% 135% >> ooffice 99% 121% 123% | 100% 97% 98% >> osdb 100% 99% 99% | 100% 100% 99% >> reymont 99% 130% 132% | 100% 106% 107% >> samba 100% 126% 132% | 100% 105% 111% >> sao 100% 100% 99% | 100% 100% 100% >> webster 100% 127% 127% | 100% 106% 106% >> x-ray 99% 99% 99% | 100% 100% 100% >> xml 100% 144% 144% | 100% 130% 128% >> >> and on the other one (xeon e5-2620v4) looks like this: >> >> patch 1 | patch 2 >> dataset prefix infix suffix | prefix infix suffix >> ------------------------------------------------------------------------ >> 000000010000000000000001 98% 147% 170% | 98% 132% 159% >> 000000010000000000000006 100% 340% 314% | 98% 334% 355% >> 000000010000000000000008 99% 100% 105% | 99% 99% 101% >> 16398 101% 153% 205% | 99% 148% 201% >> shakespeare.txt 100% 147% 149% | 99% 117% 118% >> mr 100% 131% 139% | 99% 112% 108% >> dickens 100% 143% 143% | 99% 103% 102% >> mozilla 100% 122% 122% | 99% 105% 106% >> nci 100% 151% 135% | 100% 135% 125% >> ooffice 99% 127% 129% | 98% 101% 102% >> osdb 102% 100% 101% | 102% 100% 99% >> reymont 101% 142% 143% | 100% 108% 108% >> samba 100% 132% 136% | 99% 109% 112% >> sao 99% 101% 100% | 99% 100% 100% >> webster 100% 132% 129% | 100% 106% 106% >> x-ray 99% 101% 100% | 90% 101% 101% >> xml 100% 147% 148% | 100% 127% 125% >> >> In general, I think the results for both patches seem clearly a win, but >> maybe patch 1 is bit better, especially on the newer (xeon) CPU. So I'd >> probably go with that one. > > > >From my POV there are two interesting new points in your benchmarks: >1. They are more or lesss end-to-end benchmarks with whole system involved. >2. They provide per-payload breakdown > Yes. I was considering using the test_pglz extension first, but in the end I decided an end-to-end test is easier to do and more relevant. >Prefix experiment is mostly related to reading from page cache and not >directly connected with decompression. It's a bit strange that we >observe 1% degradation in certain experiments, but I believe it's a >noise. > Yes, I agree it's probably noise - it's not always a degradation, there are cases where it actually improves by ~1%. Perhaps more runs would even this out, or maybe it's due to different bin layout or something. I should have some results from a test with longer (10-minute) run soon, but I don't think this is a massive issue. >Infix and Suffix results are correlated. We observe no impact of the >patch on compressed data. > TBH I have not looked at which data sets are compressible etc. so I can't really comment on this. FWIW the reason why I did the prefix/infix/suffix is primarily that I was involved in some recent patches tweaking TOAST slicing, so I wanted to se if this happens to negatively affect it somehow. And it does not. >test_pglz also includes slicing by 2Kb and 8Kb. This was done to >imitate toasting. But as far as I understand, in your test data payload >will be inserted into toast table too, won't it? If so, I agree that >patch 1 looks like a better option. > Yes, the tests simply do whatever PostgreSQL would do when loading and storing this data, including TOASTing. >> 27 нояб. 2019 г., в 1:05, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >> >> Code-wise I think the patches are mostly fine, although the comments >> might need some proof-reading. >> >> 1) I wasn't really sure what a "nibble" is, but maybe it's just me and >> it's a well-known term. >I've took the word from pg_lzcompress.c comments > * The offset is in the upper nibble of T1 and in T2. > * The length is in the lower nibble of T1. Aha, good. I haven't noticed that word before, so I assumed it's introduced by those patches. And the first thing I thought of was "nibbles" video game [1]. Which obviously left me a bit puzzled ;-) But it seems to be a well-known term, I just never heard it before. [1] https://en.wikipedia.org/wiki/Nibbles_(video_game) >> >> 2) First byte use lower -> First byte uses lower >> >> 3) nibble contain upper -> nibble contains upper >> >> 4) to preven possible uncertanity -> to prevent possible uncertainty >> >> 5) I think we should briefly explain why memmove would be incompatible >> with pglz, it's not quite clear to me. >Here's the example >+ * Consider input: 112341234123412341234 >+ * At byte 5 here ^ we have match with length 16 and >+ * offset 4. 11234M(len=16, off=4) > >If we simply memmove() this 16 bytes we will produce >112341234XXXXXXXXXXXX, where series of X is 12 undefined bytes, that >were at bytes [6:18]. > OK, thanks. >> >> 6) I'm pretty sure the comment in the 'while (off < len)' branch will be >> badly mangled by pgindent. > >I think I can just write it without line limit and then run pgindent. >Will try to do it this evening. Also, I will try to write more about >memmove. > >> >> 7) The last change moving "copy" to the next line seems unnecessary. > >Oh, looks like I had been rewording this comment, and eventually came >to the same text..Yes, this change is absolutely unnecessary. > >Thanks! > Good. I'll wait for an updated version of the patch and then try to get it pushed by the end of the CF. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: