Thread: Compressing temporary files
Hi hackers! There's a lot of compression discussions nowadays. And that's cool! Recently Naresh Chainani in private discussion shared with me the idea to compress temporary files on disk. And I was thrilled to find no evidence of implementation of this interesting idea. I've prototyped Random Access Compressed File for fun[0]. The code is very dirty proof-of-concept. I compress Buffile by one block at a time. There are directory pages to store information about the size of each compressedblock. If any byte of the block is changed - whole block is recompressed. Wasted space is never reused. If compressedblock is more then BLCSZ - unknown bad things will happen :) Here are some my observations. 0. The idea seems feasible. API of fd.c used by buffile.c can easily be abstracted for compressed temporary files. Seeksare necessary, but they are not very frequent. It's easy to make temp file compression GUC-controlled. 1. Temp file footprint can be easily reduced. For example query create unlogged table y as select random()::text t from generate_series(0,9999999) g; uses for toast index build 140000000 bytes of temp file. With patch this value is reduced to 40841704 (x3.42 smaller). 2. I have not found any evidence of performance improvement. I've only benchmarked patch on my laptop. And RAM (page cache)diminished any difference between writing compressed block and uncompressed block. How do you think: does it worth to pursue the idea? OLTP systems rarely rely on data spilled to disk. Are there any known good random access compressed file libs? So we could avoid reinventing the wheel. Maybe someone tried this approach before? Thanks! Best regards, Andrey Borodin. [0] https://github.com/x4m/postgres_g/commit/426cd767694b88e64f5e6bee99fc653c45eb5abd
On Sat, Sep 11, 2021 at 05:31:37PM +0500, Andrey Borodin wrote: > How do you think: does it worth to pursue the idea? OLTP systems rarely rely on data spilled to disk. > Are there any known good random access compressed file libs? So we could avoid reinventing the wheel. > Maybe someone tried this approach before? Why are temporary tables more useful for compression that other database files? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Sat, Sep 11, 2021 at 8:31 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > I've prototyped Random Access Compressed File for fun[0]. The code is very dirty proof-of-concept. > I compress Buffile by one block at a time. There are directory pages to store information about the size of each compressedblock. If any byte of the block is changed - whole block is recompressed. Wasted space is never reused. If compressedblock is more then BLCSZ - unknown bad things will happen :) Just reading this description, I suppose it's also Bad if the block is recompressed and the new compressed size is larger than the previous compressed size. Or do you have some way to handle that? I think it's probably quite tricky to make this work if the temporary files can be modified after the data is first written. If you have a temporary file that's never changed after the fact, then you could compress all the blocks and maintain, on the side, an index that says where the compressed version of each block starts. That could work whether or not the blocks expand when you try to compress them, and you could even skip compression for blocks that get bigger when "compressed" or which don't compress nicely, just by including a boolean flag in your index saying whether that particular block is compressed or not. But as soon as you have a case where the blocks can get modified after they are created, then I don't see how to make it work nicely. You can't necessarily fit the new version of the block in the space allocated for the old version of the block, and putting it elsewhere could turn sequential I/O into random I/O. Leaving all that aside, I think this feature has *some* potential, because I/O is expensive and compression could let us do less of it. The problem is that a lot of the I/O that PostgreSQL thinks it does isn't real I/O. Everybody is pretty much forced to set work_mem conservatively to avoid OOM, which means a large proportion of operations that exceed work_mem and thus spill to files don't actually result in real I/O. They end up fitting in memory after all; it's only that the memory in question belongs to the OS rather than to PostgreSQL. And for operations of that type, which I believe to be very common, compression is strictly a loss. You're doing extra CPU work to avoid I/O that isn't actually happening. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 9/11/21 2:31 PM, Andrey Borodin wrote: > Hi hackers! > > There's a lot of compression discussions nowadays. And that's cool! > Recently Naresh Chainani in private discussion shared with me the > idea to compress temporary files on disk. And I was thrilled to find > no evidence of implementation of this interesting idea. > > I've prototyped Random Access Compressed File for fun[0]. The code is > very dirty proof-of-concept. I compress Buffile by one block at a > time. There are directory pages to store information about the size > of each compressed block. If any byte of the block is changed - whole > block is recompressed. Wasted space is never reused. If compressed > block is more then BLCSZ - unknown bad things will happen :) > Might be an interesting feature, and the approach seems reasonable too (of course, it's a PoC, so it has rough edges that'd need to be solved). Not sure if compressing it at the 8kB block granularity is good or bad. Presumably larger compression blocks would give better compression, but that's a detail we would investigate later. > Here are some my observations. > > 0. The idea seems feasible. API of fd.c used by buffile.c can easily > be abstracted for compressed temporary files. Seeks are necessary, > but they are not very frequent. It's easy to make temp file > compression GUC-controlled. > Hmm. How much more expensive the seeks are, actually? If we compress the files block by block, then it's decompression of 8kB of data. Of course, that's not free, but if you compare it to doing less I/O, it may easily be a significant win. > 1. Temp file footprint can be easily reduced. For example query > create unlogged table y as select random()::text t from > generate_series(0,9999999) g; uses for toast index build 140000000 > bytes of temp file. With patch this value is reduced to 40841704 > (x3.42 smaller). > That seems a bit optimistic, really. The problem is that while random() is random, it means we're only dealing with 10 characters in the text value. That's pretty redundant, and the compression benefits from that. But then again, data produced by queries (which we may need to sort, which generates temp files) is probably redundant too. > 2. I have not found any evidence of performance improvement. I've > only benchmarked patch on my laptop. And RAM (page cache) diminished > any difference between writing compressed block and uncompressed > block. > I expect the performance improvement to be less direct, requiring contention for resources (memory and I/O bandwidth). If you have multiple sessions and memory pressure, that'll force temporary files from page cache to disk. The compression will reduce the memory pressure (because of less data written to page cache), possibly even eliminating the need to write dirty pages to disk. And if we still have to write data to disk, this reduces the amount we have to write. Of course, it may also reduce the disk space required for temp files, which is also nice. > How do you think: does it worth to pursue the idea? OLTP systems > rarely rely on data spilled to disk. Are there any known good random > access compressed file libs? So we could avoid reinventing the > wheel. Maybe someone tried this approach before? > I'd say it's worth investigating further. Not sure about existing solutions / libraries for this problem, but my guess is the overall approach is roughly what you implemented. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Sep 11, 2021, 6:01 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > Hi hackers! > > There's a lot of compression discussions nowadays. And that's cool! > Recently Naresh Chainani in private discussion shared with me the idea to compress temporary files on disk. > And I was thrilled to find no evidence of implementation of this interesting idea. > > I've prototyped Random Access Compressed File for fun[0]. The code is very dirty proof-of-concept. > I compress Buffile by one block at a time. There are directory pages to store information about the size of each compressedblock. If any byte of the block is changed - whole block is recompressed. Wasted space is never reused. If compressedblock is more then BLCSZ - unknown bad things will happen :) > > Here are some my observations. > > 0. The idea seems feasible. API of fd.c used by buffile.c can easily be abstracted for compressed temporary files. Seeksare necessary, but they are not very frequent. It's easy to make temp file compression GUC-controlled. > > 1. Temp file footprint can be easily reduced. For example query > create unlogged table y as select random()::text t from generate_series(0,9999999) g; > uses for toast index build 140000000 bytes of temp file. With patch this value is reduced to 40841704 (x3.42 smaller). > > 2. I have not found any evidence of performance improvement. I've only benchmarked patch on my laptop. And RAM (page cache)diminished any difference between writing compressed block and uncompressed block. > > How do you think: does it worth to pursue the idea? OLTP systems rarely rely on data spilled to disk. > Are there any known good random access compressed file libs? So we could avoid reinventing the wheel. > Maybe someone tried this approach before? Are you proposing to compress the temporary files being created by the postgres processes under $PGDATA/base/pgsql_tmp? Are there any other directories that postgres processes would write temporary files to? Are you proposing to compress the temporary files that get generated during the execution of queries? IIUC, the temp files under the pgsql_tmp directory get cleaned up at the end of each txn right? In what situations the temporary files under the pgsql_tmp directory would remain even after the txns that created them are committed/aborted? Here's one scenario: if a backend crashes while executing a huge analytic query, I can understand that the temp files would remain in pgsql_tmp and we have the commit [1] cleaning them on restart. Any other scenarios that fill up the pgsql_tmp directory? [1] commit cd91de0d17952b5763466cfa663e98318f26d357 Author: Tomas Vondra <tomas.vondra@postgresql.org> Date: Thu Mar 18 16:05:03 2021 +0100 Remove temporary files after backend crash Regards, Bharath Rupireddy.