Thread: Compress ReorderBuffer spill files using LZ4
Hi, When the content of a large transaction (size exceeding logical_decoding_work_mem) and its sub-transactions has to be reordered during logical decoding, then, all the changes are written on disk in temporary files located in pg_replslot/<slot_name>. Decoding very large transactions by multiple replication slots can lead to disk space saturation and high I/O utilization. When compiled with LZ4 support (--with-lz4), this patch enables data compression/decompression of these temporary files. Each transaction change that must be written on disk (ReorderBufferDiskChange) is now compressed and encapsulated in a new structure. 3 different compression strategies are implemented: 1. LZ4 streaming compression is the preferred one and works efficiently for small individual changes. 2. LZ4 regular compression when the changes are too large for using the streaming API. 3. No compression when compression fails, the change is then stored not compressed. When not using compression, the following case generates 1590MB of spill files: CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT); INSERT INTO t SELECT i, 'Hello number n°'||i::TEXT FROM generate_series(1, 10000000) as i; With LZ4 compression, it creates 653MB of spill files: 58.9% less disk space usage. Open items: 1. The spill_bytes column from pg_stat_get_replication_slot() still returns plain data size, not the compressed data size. Should we expose the compressed data size when compression occurs? 2. Do we want a GUC to switch compression on/off? Regards, JT
Attachment
On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: > > When the content of a large transaction (size exceeding > logical_decoding_work_mem) and its sub-transactions has to be > reordered during logical decoding, then, all the changes are written > on disk in temporary files located in pg_replslot/<slot_name>. > Decoding very large transactions by multiple replication slots can > lead to disk space saturation and high I/O utilization. > Why can't one use 'streaming' option to send changes to the client once it reaches the configured limit of 'logical_decoding_work_mem'? > > 2. Do we want a GUC to switch compression on/off? > It depends on the overhead of decoding. Did you try to measure the decoding overhead of decompression when reading compressed files? -- With Regards, Amit Kapila.
On Thu, Jun 6, 2024 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > When the content of a large transaction (size exceeding > > logical_decoding_work_mem) and its sub-transactions has to be > > reordered during logical decoding, then, all the changes are written > > on disk in temporary files located in pg_replslot/<slot_name>. > > Decoding very large transactions by multiple replication slots can > > lead to disk space saturation and high I/O utilization. > > > > Why can't one use 'streaming' option to send changes to the client > once it reaches the configured limit of 'logical_decoding_work_mem'? > > > > > 2. Do we want a GUC to switch compression on/off? > > > > It depends on the overhead of decoding. Did you try to measure the > decoding overhead of decompression when reading compressed files? I think it depends on the trade-off between the I/O savings from reducing the data size and the performance cost of compressing and decompressing the data. This balance is highly dependent on the hardware. For example, if you have a very slow disk and a powerful processor, compression could be advantageous. Conversely, if the disk is very fast, the I/O savings might be minimal, and the compression overhead could outweigh the benefits. Additionally, the effectiveness of compression also depends on the compression ratio, which varies with the type of data being compressed. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit : > > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > When the content of a large transaction (size exceeding > > logical_decoding_work_mem) and its sub-transactions has to be > > reordered during logical decoding, then, all the changes are written > > on disk in temporary files located in pg_replslot/<slot_name>. > > Decoding very large transactions by multiple replication slots can > > lead to disk space saturation and high I/O utilization. > > > > Why can't one use 'streaming' option to send changes to the client > once it reaches the configured limit of 'logical_decoding_work_mem'? That's right, setting subscription's option 'streaming' to 'on' moves the problem away from the publisher to the subscribers. This patch tries to improve the default situation when 'streaming' is set to 'off'. > > 2. Do we want a GUC to switch compression on/off? > > > > It depends on the overhead of decoding. Did you try to measure the > decoding overhead of decompression when reading compressed files? Quick benchmarking executed on my laptop shows 1% overhead. Table DDL: CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT); Data generated with: INSERT INTO t SELECT i, 'Text number n°'||i::TEXT FROM generate_series(1, 10000000) as i; Restoration duration measured using timestamps of log messages: "DEBUG: restored XXXX/YYYY changes from disk" HEAD: 25.54s, 25.94s, 25.516s, 26.267s, 26.11s / avg=25.874s Patch: 26.872s, 26.311s, 25.753s, 26.003, 25.843s / avg=26.156s Regards, JT
On Thu, Jun 6, 2024 at 6:22 PM Julien Tachoires <julmon@gmail.com> wrote: > > Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit : > > > > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > > > When the content of a large transaction (size exceeding > > > logical_decoding_work_mem) and its sub-transactions has to be > > > reordered during logical decoding, then, all the changes are written > > > on disk in temporary files located in pg_replslot/<slot_name>. > > > Decoding very large transactions by multiple replication slots can > > > lead to disk space saturation and high I/O utilization. > > > > > > > Why can't one use 'streaming' option to send changes to the client > > once it reaches the configured limit of 'logical_decoding_work_mem'? > > That's right, setting subscription's option 'streaming' to 'on' moves > the problem away from the publisher to the subscribers. This patch > tries to improve the default situation when 'streaming' is set to > 'off'. > Can we think of changing the default to 'parallel'? BTW, it would be better to use 'parallel' for the 'streaming' option, if the workload has large transactions. Is there a reason to use a default value in this case? > > > 2. Do we want a GUC to switch compression on/off? > > > > > > > It depends on the overhead of decoding. Did you try to measure the > > decoding overhead of decompression when reading compressed files? > > Quick benchmarking executed on my laptop shows 1% overhead. > Thanks. We probably need different types of data (say random data in bytea column, etc.) for this. -- With Regards, Amit Kapila.
On 2024-Jun-06, Amit Kapila wrote: > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > When the content of a large transaction (size exceeding > > logical_decoding_work_mem) and its sub-transactions has to be > > reordered during logical decoding, then, all the changes are written > > on disk in temporary files located in pg_replslot/<slot_name>. > > Decoding very large transactions by multiple replication slots can > > lead to disk space saturation and high I/O utilization. I like the general idea of compressing the output of logical decoding. It's not so clear to me that we only want to do so for spilling to disk; for instance, if the two nodes communicate over a slow network, it may even be beneficial to compress when streaming, so to this question: > Why can't one use 'streaming' option to send changes to the client > once it reaches the configured limit of 'logical_decoding_work_mem'? I would say that streaming doesn't necessarily have to mean we don't want compression, because for some users it might be beneficial. I think a GUC would be a good idea. Also, what if for whatever reason you want a different compression algorithm or different compression parameters? Looking at the existing compression UI we offer in pg_basebackup, perhaps you could add something like this: compress_logical_decoding = none compress_logical_decoding = lz4:42 compress_logical_decoding = spill-zstd:99 "none" says to never use compression (perhaps should be the default), "lz4:42" says to use lz4 with parameters 42 on both spilling and streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but only for spilling to disk. (I don't mean to say that you should implement Zstd compression with this patch, only that you should choose the implementation so that adding Zstd support (or whatever) later is just a matter of adding some branches here and there. With the current #ifdef you propose, it's hard to do that. Maybe separate the parts that depend on the specific algorithm to algorithm-agnostic functions.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Le jeu. 6 juin 2024 à 06:40, Amit Kapila <amit.kapila16@gmail.com> a écrit : > > On Thu, Jun 6, 2024 at 6:22 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit : > > > > > > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > > > > > When the content of a large transaction (size exceeding > > > > logical_decoding_work_mem) and its sub-transactions has to be > > > > reordered during logical decoding, then, all the changes are written > > > > on disk in temporary files located in pg_replslot/<slot_name>. > > > > Decoding very large transactions by multiple replication slots can > > > > lead to disk space saturation and high I/O utilization. > > > > > > > > > > Why can't one use 'streaming' option to send changes to the client > > > once it reaches the configured limit of 'logical_decoding_work_mem'? > > > > That's right, setting subscription's option 'streaming' to 'on' moves > > the problem away from the publisher to the subscribers. This patch > > tries to improve the default situation when 'streaming' is set to > > 'off'. > > > > Can we think of changing the default to 'parallel'? BTW, it would be > better to use 'parallel' for the 'streaming' option, if the workload > has large transactions. Is there a reason to use a default value in > this case? You're certainly right, if using the streaming API helps to avoid bad situations and there is no downside, it could be used by default. > > > > 2. Do we want a GUC to switch compression on/off? > > > > > > > > > > It depends on the overhead of decoding. Did you try to measure the > > > decoding overhead of decompression when reading compressed files? > > > > Quick benchmarking executed on my laptop shows 1% overhead. > > > > Thanks. We probably need different types of data (say random data in > bytea column, etc.) for this. Yes, good idea, will run new tests in that sense. Thank you! Regards, JT
Le jeu. 6 juin 2024 à 07:24, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit : > > On 2024-Jun-06, Amit Kapila wrote: > > > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > > > When the content of a large transaction (size exceeding > > > logical_decoding_work_mem) and its sub-transactions has to be > > > reordered during logical decoding, then, all the changes are written > > > on disk in temporary files located in pg_replslot/<slot_name>. > > > Decoding very large transactions by multiple replication slots can > > > lead to disk space saturation and high I/O utilization. > > I like the general idea of compressing the output of logical decoding. > It's not so clear to me that we only want to do so for spilling to disk; > for instance, if the two nodes communicate over a slow network, it may > even be beneficial to compress when streaming, so to this question: > > > Why can't one use 'streaming' option to send changes to the client > > once it reaches the configured limit of 'logical_decoding_work_mem'? > > I would say that streaming doesn't necessarily have to mean we don't > want compression, because for some users it might be beneficial. Interesting idea, will try to evaluate how to compress/decompress data transiting via streaming and how good the compression ratio would be. > I think a GUC would be a good idea. Also, what if for whatever reason > you want a different compression algorithm or different compression > parameters? Looking at the existing compression UI we offer in > pg_basebackup, perhaps you could add something like this: > > compress_logical_decoding = none > compress_logical_decoding = lz4:42 > compress_logical_decoding = spill-zstd:99 > > "none" says to never use compression (perhaps should be the default), > "lz4:42" says to use lz4 with parameters 42 on both spilling and > streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but > only for spilling to disk. I agree, if the server was compiled with support of multiple compression libraries, users should be able to choose which one they want to use. > (I don't mean to say that you should implement Zstd compression with > this patch, only that you should choose the implementation so that > adding Zstd support (or whatever) later is just a matter of adding some > branches here and there. With the current #ifdef you propose, it's hard > to do that. Maybe separate the parts that depend on the specific > algorithm to algorithm-agnostic functions.) Makes sense, will rework this patch in that way. Thank you! Regards, JT
On Thu, Jun 6, 2024 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Jun-06, Amit Kapila wrote: > > > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > > > When the content of a large transaction (size exceeding > > > logical_decoding_work_mem) and its sub-transactions has to be > > > reordered during logical decoding, then, all the changes are written > > > on disk in temporary files located in pg_replslot/<slot_name>. > > > Decoding very large transactions by multiple replication slots can > > > lead to disk space saturation and high I/O utilization. > > I like the general idea of compressing the output of logical decoding. > It's not so clear to me that we only want to do so for spilling to disk; > for instance, if the two nodes communicate over a slow network, it may > even be beneficial to compress when streaming, so to this question: > > > Why can't one use 'streaming' option to send changes to the client > > once it reaches the configured limit of 'logical_decoding_work_mem'? > > I would say that streaming doesn't necessarily have to mean we don't > want compression, because for some users it might be beneficial. +1 > I think a GUC would be a good idea. Also, what if for whatever reason > you want a different compression algorithm or different compression > parameters? Looking at the existing compression UI we offer in > pg_basebackup, perhaps you could add something like this: > > compress_logical_decoding = none > compress_logical_decoding = lz4:42 > compress_logical_decoding = spill-zstd:99 > > "none" says to never use compression (perhaps should be the default), > "lz4:42" says to use lz4 with parameters 42 on both spilling and > streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but > only for spilling to disk. > I think the compression option should be supported at the CREATE SUBSCRIPTION level instead of being controlled by a GUC. This way, we can decide on compression for each subscription individually rather than applying it to all subscribers. It makes more sense for the subscriber to control this, especially when we are planning to compress the data sent downstream. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On 2024-Jun-07, Dilip Kumar wrote: > I think the compression option should be supported at the CREATE > SUBSCRIPTION level instead of being controlled by a GUC. This way, we > can decide on compression for each subscription individually rather > than applying it to all subscribers. It makes more sense for the > subscriber to control this, especially when we are planning to > compress the data sent downstream. True. (I think we have some options that are in GUCs for the general behavior and can be overridden by per-subscription options for specific tailoring; would that make sense here? I think it does, considering that what we mostly want is to save disk space in the publisher when spilling to disk.) -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "I can't go to a restaurant and order food because I keep looking at the fonts on the menu. Five minutes later I realize that it's also talking about food" (Donald Knuth)
On Fri, Jun 7, 2024 at 2:39 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Jun-07, Dilip Kumar wrote: > > > I think the compression option should be supported at the CREATE > > SUBSCRIPTION level instead of being controlled by a GUC. This way, we > > can decide on compression for each subscription individually rather > > than applying it to all subscribers. It makes more sense for the > > subscriber to control this, especially when we are planning to > > compress the data sent downstream. > > True. (I think we have some options that are in GUCs for the general > behavior and can be overridden by per-subscription options for specific > tailoring; would that make sense here? I think it does, considering > that what we mostly want is to save disk space in the publisher when > spilling to disk.) Yeah, that makes sense. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 6, 2024 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Jun-06, Amit Kapila wrote: > > > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > > > When the content of a large transaction (size exceeding > > > logical_decoding_work_mem) and its sub-transactions has to be > > > reordered during logical decoding, then, all the changes are written > > > on disk in temporary files located in pg_replslot/<slot_name>. > > > Decoding very large transactions by multiple replication slots can > > > lead to disk space saturation and high I/O utilization. > > I like the general idea of compressing the output of logical decoding. > It's not so clear to me that we only want to do so for spilling to disk; > for instance, if the two nodes communicate over a slow network, it may > even be beneficial to compress when streaming, so to this question: > > > Why can't one use 'streaming' option to send changes to the client > > once it reaches the configured limit of 'logical_decoding_work_mem'? > > I would say that streaming doesn't necessarily have to mean we don't > want compression, because for some users it might be beneficial. > Fair enough. it would be an interesting feature if we see the wider usefulness of compression/decompression of logical changes. For example, if this can improve the performance of applying large transactions (aka reduce the apply lag for them) even when the 'streaming' option is 'parallel' then it would have a much wider impact. -- With Regards, Amit Kapila.
On Fri, Jun 7, 2024 at 2:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I think the compression option should be supported at the CREATE > SUBSCRIPTION level instead of being controlled by a GUC. This way, we > can decide on compression for each subscription individually rather > than applying it to all subscribers. It makes more sense for the > subscriber to control this, especially when we are planning to > compress the data sent downstream. > Yes, that makes sense. However, we then need to provide this option via SQL APIs as well for other plugins. -- With Regards, Amit Kapila.
On 6/6/24 16:24, Alvaro Herrera wrote: > On 2024-Jun-06, Amit Kapila wrote: > >> On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote: >>> >>> When the content of a large transaction (size exceeding >>> logical_decoding_work_mem) and its sub-transactions has to be >>> reordered during logical decoding, then, all the changes are written >>> on disk in temporary files located in pg_replslot/<slot_name>. >>> Decoding very large transactions by multiple replication slots can >>> lead to disk space saturation and high I/O utilization. > > I like the general idea of compressing the output of logical decoding. > It's not so clear to me that we only want to do so for spilling to disk; > for instance, if the two nodes communicate over a slow network, it may > even be beneficial to compress when streaming, so to this question: > >> Why can't one use 'streaming' option to send changes to the client >> once it reaches the configured limit of 'logical_decoding_work_mem'? > > I would say that streaming doesn't necessarily have to mean we don't > want compression, because for some users it might be beneficial. > > I think a GUC would be a good idea. Also, what if for whatever reason > you want a different compression algorithm or different compression > parameters? Looking at the existing compression UI we offer in > pg_basebackup, perhaps you could add something like this: > > compress_logical_decoding = none > compress_logical_decoding = lz4:42 > compress_logical_decoding = spill-zstd:99 > > "none" says to never use compression (perhaps should be the default), > "lz4:42" says to use lz4 with parameters 42 on both spilling and > streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but > only for spilling to disk. > > (I don't mean to say that you should implement Zstd compression with > this patch, only that you should choose the implementation so that > adding Zstd support (or whatever) later is just a matter of adding some > branches here and there. With the current #ifdef you propose, it's hard > to do that. Maybe separate the parts that depend on the specific > algorithm to algorithm-agnostic functions.) > I haven't been following the "libpq compression" thread, but wouldn't that also do compression for the streaming case? That was my assumption, at least, and it seems like the right way - we probably don't want to patch every place that sends data over network independently, right? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6/6/24 12:58, Julien Tachoires wrote: > ... > > When compiled with LZ4 support (--with-lz4), this patch enables data > compression/decompression of these temporary files. Each transaction > change that must be written on disk (ReorderBufferDiskChange) is now > compressed and encapsulated in a new structure. > I'm a bit confused, but why tie this to having lz4? Why shouldn't this be supported even for pglz, or whatever algorithms we add in the future? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Le ven. 7 juin 2024 à 05:59, Tomas Vondra <tomas.vondra@enterprisedb.com> a écrit : > > On 6/6/24 12:58, Julien Tachoires wrote: > > ... > > > > When compiled with LZ4 support (--with-lz4), this patch enables data > > compression/decompression of these temporary files. Each transaction > > change that must be written on disk (ReorderBufferDiskChange) is now > > compressed and encapsulated in a new structure. > > > > I'm a bit confused, but why tie this to having lz4? Why shouldn't this > be supported even for pglz, or whatever algorithms we add in the future? That's right, reworking this patch in that sense. Regards, JT
Hi, Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit : > > Le ven. 7 juin 2024 à 05:59, Tomas Vondra > <tomas.vondra@enterprisedb.com> a écrit : > > > > On 6/6/24 12:58, Julien Tachoires wrote: > > > ... > > > > > > When compiled with LZ4 support (--with-lz4), this patch enables data > > > compression/decompression of these temporary files. Each transaction > > > change that must be written on disk (ReorderBufferDiskChange) is now > > > compressed and encapsulated in a new structure. > > > > > > > I'm a bit confused, but why tie this to having lz4? Why shouldn't this > > be supported even for pglz, or whatever algorithms we add in the future? > > That's right, reworking this patch in that sense. Please find a new version of this patch adding support for LZ4, pglz and ZSTD. It introduces the new GUC logical_decoding_spill_compression which is used to set the compression method. In order to stay aligned with the other server side GUCs related to compression methods (wal_compression, default_toast_compression), the compression level is not exposed to users. The last patch of this set is still in WIP, it adds the machinery required for setting the compression methods as a subscription option: CREATE SUBSCRIPTION ... WITH (spill_compression = ...); I think there is a major problem with this approach: the logical decoding context is tied to one replication slot, but multiple subscriptions can use the same replication slot. How should this work if 2 subscriptions want to use the same replication slot but different compression methods? At this point, compression is only available for the changes spilled on disk. It is still not clear to me if the compression of data transiting through the streaming protocol should be addressed by this patch set or by another one. Thought ? Regards, JT
Attachment
- v2-0004-Compress-ReorderBuffer-spill-files-using-PGLZ.patch
- v2-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patch
- v2-0002-Add-GUC-logical_decoding_spill_compression.patch
- v2-0005-Compress-ReorderBuffer-spill-files-using-ZSTD.patch
- v2-0003-Fix-spill_bytes-counter.patch
- v2-0007-WIP-Add-the-subscription-option-spill_compression.patch
- v2-0006-Add-ReorderBuffer-ondisk-compression-tests.patch
On 7/15/24 20:50, Julien Tachoires wrote: > Hi, > > Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit : >> >> Le ven. 7 juin 2024 à 05:59, Tomas Vondra >> <tomas.vondra@enterprisedb.com> a écrit : >>> >>> On 6/6/24 12:58, Julien Tachoires wrote: >>>> ... >>>> >>>> When compiled with LZ4 support (--with-lz4), this patch enables data >>>> compression/decompression of these temporary files. Each transaction >>>> change that must be written on disk (ReorderBufferDiskChange) is now >>>> compressed and encapsulated in a new structure. >>>> >>> >>> I'm a bit confused, but why tie this to having lz4? Why shouldn't this >>> be supported even for pglz, or whatever algorithms we add in the future? >> >> That's right, reworking this patch in that sense. > > Please find a new version of this patch adding support for LZ4, pglz > and ZSTD. It introduces the new GUC logical_decoding_spill_compression > which is used to set the compression method. In order to stay aligned > with the other server side GUCs related to compression methods > (wal_compression, default_toast_compression), the compression level is > not exposed to users. > Sounds reasonable. I wonder if it might be useful to allow specifying the compression level in those places, but that's clearly not something this patch needs to do. > The last patch of this set is still in WIP, it adds the machinery > required for setting the compression methods as a subscription option: > CREATE SUBSCRIPTION ... WITH (spill_compression = ...); > I think there is a major problem with this approach: the logical > decoding context is tied to one replication slot, but multiple > subscriptions can use the same replication slot. How should this work > if 2 subscriptions want to use the same replication slot but different > compression methods? > Do we really support multiple subscriptions sharing the same slot? I don't think we do, but maybe I'm missing something. > At this point, compression is only available for the changes spilled > on disk. It is still not clear to me if the compression of data > transiting through the streaming protocol should be addressed by this > patch set or by another one. Thought ? > I'd stick to only compressing the data spilled to disk. It might be useful to compress the streamed data too, but why shouldn't we compress the regular (non-streamed) transactions too? Yeah, it's more efficient to compress larger chunks, but we can fit quite large transactions into logical_decoding_work_mem without spilling. FWIW I'd expect that to be handled at the libpq level - there's already a patch for that, but I haven't checked if it would handle this. But maybe more importantly, I think compressing streamed data might need to handle some sort of negotiation of the compression algorithm, which seems fairly complex. To conclude, I'd leave this out of scope for this patch. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Le lun. 15 juil. 2024 à 12:28, Tomas Vondra <tomas.vondra@enterprisedb.com> a écrit : > > On 7/15/24 20:50, Julien Tachoires wrote: > > The last patch of this set is still in WIP, it adds the machinery > > required for setting the compression methods as a subscription option: > > CREATE SUBSCRIPTION ... WITH (spill_compression = ...); > > I think there is a major problem with this approach: the logical > > decoding context is tied to one replication slot, but multiple > > subscriptions can use the same replication slot. How should this work > > if 2 subscriptions want to use the same replication slot but different > > compression methods? > > > > Do we really support multiple subscriptions sharing the same slot? I > don't think we do, but maybe I'm missing something. You are right, it's not supported, the following error is raised in this case: ERROR: replication slot "sub1" is active for PID 51735 I was distracted by the fact that nothing prevents the configuration of multiple subscriptions sharing the same replication slot. Thanks, JT
On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 7/15/24 20:50, Julien Tachoires wrote: > > Hi, > > > > Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit : > >> > >> Le ven. 7 juin 2024 à 05:59, Tomas Vondra > >> <tomas.vondra@enterprisedb.com> a écrit : > >>> > >>> On 6/6/24 12:58, Julien Tachoires wrote: > >>>> ... > >>>> > >>>> When compiled with LZ4 support (--with-lz4), this patch enables data > >>>> compression/decompression of these temporary files. Each transaction > >>>> change that must be written on disk (ReorderBufferDiskChange) is now > >>>> compressed and encapsulated in a new structure. > >>>> > >>> > >>> I'm a bit confused, but why tie this to having lz4? Why shouldn't this > >>> be supported even for pglz, or whatever algorithms we add in the future? > >> > >> That's right, reworking this patch in that sense. > > > > Please find a new version of this patch adding support for LZ4, pglz > > and ZSTD. It introduces the new GUC logical_decoding_spill_compression > > which is used to set the compression method. In order to stay aligned > > with the other server side GUCs related to compression methods > > (wal_compression, default_toast_compression), the compression level is > > not exposed to users. > > > > Sounds reasonable. I wonder if it might be useful to allow specifying > the compression level in those places, but that's clearly not something > this patch needs to do. > > > The last patch of this set is still in WIP, it adds the machinery > > required for setting the compression methods as a subscription option: > > CREATE SUBSCRIPTION ... WITH (spill_compression = ...); > > I think there is a major problem with this approach: the logical > > decoding context is tied to one replication slot, but multiple > > subscriptions can use the same replication slot. How should this work > > if 2 subscriptions want to use the same replication slot but different > > compression methods? > > > > Do we really support multiple subscriptions sharing the same slot? I > don't think we do, but maybe I'm missing something. > > > At this point, compression is only available for the changes spilled > > on disk. It is still not clear to me if the compression of data > > transiting through the streaming protocol should be addressed by this > > patch set or by another one. Thought ? > > > > I'd stick to only compressing the data spilled to disk. It might be > useful to compress the streamed data too, but why shouldn't we compress > the regular (non-streamed) transactions too? Yeah, it's more efficient > to compress larger chunks, but we can fit quite large transactions into > logical_decoding_work_mem without spilling. > > FWIW I'd expect that to be handled at the libpq level - there's already > a patch for that, but I haven't checked if it would handle this. But > maybe more importantly, I think compressing streamed data might need to > handle some sort of negotiation of the compression algorithm, which > seems fairly complex. > > To conclude, I'd leave this out of scope for this patch. > Your point sounds reasonable to me. OTOH, if we want to support compression for spill case then shouldn't there be a question how frequent such an option would be required? Users currently have an option to stream large transactions for parallel apply or otherwise in which case no spilling is required. I feel sooner or later we will make such behavior (streaming=parallel) as default, and then spilling should happen in very few cases. Is it worth adding this new option and GUC if that is true? -- With Regards, Amit Kapila.
On 7/16/24 14:52, Amit Kapila wrote: > On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 7/15/24 20:50, Julien Tachoires wrote: >>> Hi, >>> >>> Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit : >>>> >>>> Le ven. 7 juin 2024 à 05:59, Tomas Vondra >>>> <tomas.vondra@enterprisedb.com> a écrit : >>>>> >>>>> On 6/6/24 12:58, Julien Tachoires wrote: >>>>>> ... >>>>>> >>>>>> When compiled with LZ4 support (--with-lz4), this patch enables data >>>>>> compression/decompression of these temporary files. Each transaction >>>>>> change that must be written on disk (ReorderBufferDiskChange) is now >>>>>> compressed and encapsulated in a new structure. >>>>>> >>>>> >>>>> I'm a bit confused, but why tie this to having lz4? Why shouldn't this >>>>> be supported even for pglz, or whatever algorithms we add in the future? >>>> >>>> That's right, reworking this patch in that sense. >>> >>> Please find a new version of this patch adding support for LZ4, pglz >>> and ZSTD. It introduces the new GUC logical_decoding_spill_compression >>> which is used to set the compression method. In order to stay aligned >>> with the other server side GUCs related to compression methods >>> (wal_compression, default_toast_compression), the compression level is >>> not exposed to users. >>> >> >> Sounds reasonable. I wonder if it might be useful to allow specifying >> the compression level in those places, but that's clearly not something >> this patch needs to do. >> >>> The last patch of this set is still in WIP, it adds the machinery >>> required for setting the compression methods as a subscription option: >>> CREATE SUBSCRIPTION ... WITH (spill_compression = ...); >>> I think there is a major problem with this approach: the logical >>> decoding context is tied to one replication slot, but multiple >>> subscriptions can use the same replication slot. How should this work >>> if 2 subscriptions want to use the same replication slot but different >>> compression methods? >>> >> >> Do we really support multiple subscriptions sharing the same slot? I >> don't think we do, but maybe I'm missing something. >> >>> At this point, compression is only available for the changes spilled >>> on disk. It is still not clear to me if the compression of data >>> transiting through the streaming protocol should be addressed by this >>> patch set or by another one. Thought ? >>> >> >> I'd stick to only compressing the data spilled to disk. It might be >> useful to compress the streamed data too, but why shouldn't we compress >> the regular (non-streamed) transactions too? Yeah, it's more efficient >> to compress larger chunks, but we can fit quite large transactions into >> logical_decoding_work_mem without spilling. >> >> FWIW I'd expect that to be handled at the libpq level - there's already >> a patch for that, but I haven't checked if it would handle this. But >> maybe more importantly, I think compressing streamed data might need to >> handle some sort of negotiation of the compression algorithm, which >> seems fairly complex. >> >> To conclude, I'd leave this out of scope for this patch. >> > > Your point sounds reasonable to me. OTOH, if we want to support > compression for spill case then shouldn't there be a question how > frequent such an option would be required? Users currently have an > option to stream large transactions for parallel apply or otherwise in > which case no spilling is required. I feel sooner or later we will > make such behavior (streaming=parallel) as default, and then spilling > should happen in very few cases. Is it worth adding this new option > and GUC if that is true? > I don't know, but streaming is 'off' by default, and I'm not aware of any proposals to change this, so when you suggest "sooner or later" we'll change this, I'd probably bet on "later or never". I haven't been following the discussions about parallel apply very closely, but my impression from dealing with similar stuff in other tools is that it's rather easy to run into issues with some workloads, which just makes me more skeptical about "streamin=parallel" by default. But as I said, I'm out of the loop so I may be wrong ... As for whether the GUC is needed, I don't know. I guess we might do the same thing we do for streaming - we don't have a GUC to enable this, but we default to 'off' and the client has to request that when opening the replication connection. So it'd be specified at the subscription level, more or less. But then how would we specify compression for cases that invoke decoding directly by pg_logical_slot_get_changes()? Through options? BTW if we specify this at subscription level, will it be possible to change the compression method? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jul 16, 2024 at 7:31 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 7/16/24 14:52, Amit Kapila wrote: > > On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> FWIW I'd expect that to be handled at the libpq level - there's already > >> a patch for that, but I haven't checked if it would handle this. But > >> maybe more importantly, I think compressing streamed data might need to > >> handle some sort of negotiation of the compression algorithm, which > >> seems fairly complex. > >> > >> To conclude, I'd leave this out of scope for this patch. > >> > > > > Your point sounds reasonable to me. OTOH, if we want to support > > compression for spill case then shouldn't there be a question how > > frequent such an option would be required? Users currently have an > > option to stream large transactions for parallel apply or otherwise in > > which case no spilling is required. I feel sooner or later we will > > make such behavior (streaming=parallel) as default, and then spilling > > should happen in very few cases. Is it worth adding this new option > > and GUC if that is true? > > > > I don't know, but streaming is 'off' by default, and I'm not aware of > any proposals to change this, so when you suggest "sooner or later" > we'll change this, I'd probably bet on "later or never". > > I haven't been following the discussions about parallel apply very > closely, but my impression from dealing with similar stuff in other > tools is that it's rather easy to run into issues with some workloads, > which just makes me more skeptical about "streamin=parallel" by default. > But as I said, I'm out of the loop so I may be wrong ... > It is difficult to say whether enabling it by default will have issues or not but till now we haven't seen many reports for the streaming = 'parallel' option. It could be due to the reason that not many people enable it in their workloads. We can probably find out by enabling it by default. > As for whether the GUC is needed, I don't know. I guess we might do the > same thing we do for streaming - we don't have a GUC to enable this, but > we default to 'off' and the client has to request that when opening the > replication connection. So it'd be specified at the subscription level, > more or less. > > But then how would we specify compression for cases that invoke decoding > directly by pg_logical_slot_get_changes()? Through options? > If we decide to go with this then yeah that is one way, another possibility is to make it a slot's property, so we can allow to take a new parameter in pg_create_logical_replication_slot(). We can even think of inventing a new API to alter the slot's properties if we decide to go this route. > BTW if we specify this at subscription level, will it be possible to > change the compression method? > This needs analysis but offhand I can't see the problems with it. -- With Regards, Amit Kapila.
Le mer. 17 juil. 2024 à 02:12, Amit Kapila <amit.kapila16@gmail.com> a écrit : > > On Tue, Jul 16, 2024 at 7:31 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > > > On 7/16/24 14:52, Amit Kapila wrote: > > > On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra > > > <tomas.vondra@enterprisedb.com> wrote: > > >> > > >> FWIW I'd expect that to be handled at the libpq level - there's already > > >> a patch for that, but I haven't checked if it would handle this. But > > >> maybe more importantly, I think compressing streamed data might need to > > >> handle some sort of negotiation of the compression algorithm, which > > >> seems fairly complex. > > >> > > >> To conclude, I'd leave this out of scope for this patch. > > >> > > > > > > Your point sounds reasonable to me. OTOH, if we want to support > > > compression for spill case then shouldn't there be a question how > > > frequent such an option would be required? Users currently have an > > > option to stream large transactions for parallel apply or otherwise in > > > which case no spilling is required. I feel sooner or later we will > > > make such behavior (streaming=parallel) as default, and then spilling > > > should happen in very few cases. Is it worth adding this new option > > > and GUC if that is true? > > > > > > > I don't know, but streaming is 'off' by default, and I'm not aware of > > any proposals to change this, so when you suggest "sooner or later" > > we'll change this, I'd probably bet on "later or never". > > > > I haven't been following the discussions about parallel apply very > > closely, but my impression from dealing with similar stuff in other > > tools is that it's rather easy to run into issues with some workloads, > > which just makes me more skeptical about "streamin=parallel" by default. > > But as I said, I'm out of the loop so I may be wrong ... > > > > It is difficult to say whether enabling it by default will have issues > or not but till now we haven't seen many reports for the streaming = > 'parallel' option. It could be due to the reason that not many people > enable it in their workloads. We can probably find out by enabling it > by default. > > > As for whether the GUC is needed, I don't know. I guess we might do the > > same thing we do for streaming - we don't have a GUC to enable this, but > > we default to 'off' and the client has to request that when opening the > > replication connection. So it'd be specified at the subscription level, > > more or less. > > > > But then how would we specify compression for cases that invoke decoding > > directly by pg_logical_slot_get_changes()? Through options? > > > > If we decide to go with this then yeah that is one way, another > possibility is to make it a slot's property, so we can allow to take a > new parameter in pg_create_logical_replication_slot(). We can even > think of inventing a new API to alter the slot's properties if we > decide to go this route. Please find a new version of this patch set. The compression method is now set on subscriber level via CREATE SUBSCRIPTION or ALTER SUBSCRIPTION and can be passed to pg_logical_slot_get_changes()/pg_logical_slot_get_binary_changes() through the option spill_compression. > > BTW if we specify this at subscription level, will it be possible to > > change the compression method? > > > > This needs analysis but offhand I can't see the problems with it. I didn't notice any issue, the compression method can be changed even when a decoding is in progress, in this case, the replication worker restart due to parameter change. JT
Attachment
- v3-0002-Fix-spill_bytes-counter.patch
- v3-0004-Compress-ReorderBuffer-spill-files-using-ZSTD.patch
- v3-0003-Compress-ReorderBuffer-spill-files-using-PGLZ.patch
- v3-0005-Add-the-subscription-option-spill_compression.patch
- v3-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patch
- v3-0006-Add-ReorderBuffer-ondisk-compression-tests.patch
Hi Tomas, Le lun. 23 sept. 2024 à 18:13, Tomas Vondra <tomas@vondra.me> a écrit : > > Hi, > > I've spent a bit more time on this, mostly running tests to get a better > idea of the practical benefits. Thank you for your code review and testing! > Firstly, I think there's a bug in ReorderBufferCompress() - it's legal > for pglz_compress() to return -1. This can happen if the data is not > compressible, and would not fit into the output buffer. The code can't > just do elog(ERROR) in this case, it needs to handle that by storing the > raw data. The attached fixup patch makes this work for me - I'm not > claiming this is the best way to handle this, but it works. > > FWIW I find it strange the tests included in the patch did not trigger > this. That probably means the tests are not quite sufficient. > > > Now, to the testing. Attached are two scripts, testing different cases: > > test-columns.sh - Table with a variable number of 'float8' columns. > > test-toast.sh - Table with a single text column. > > The script always sets up a publication/subscription on two instances, > generates certain amount of data (~1GB for columns, ~3.2GB for TOAST), > waits for it to be replicated to the replica, and measures how much data > was spilled to disk with the different compression methods (off, pglz > and lz4). There's a couple more metrics, but that's irrelevant here. It would be interesting to run the same tests with zstd: in my early testing I found that zstd was able to provide a better compression ratio than lz4, but seemed to use more CPU resources/is slower. > For the "column" test, it looks like this (this is in MB): > > rows columns distribution off pglz lz4 > ======================================================== > 100000 1000 compressible 778 20 9 > random 778 778 16 > -------------------------------------------------------- > 1000000 100 compressible 916 116 62 > random 916 916 67 > > It's very clear that for the "compressible" data (which just copies the > same value into all columns), both pglz and lz4 can significantly reduce > the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100 > columns it's a bit less efficient, but still good. > > For the "random" data (where every column gets a random value, but rows > are copied), it's a very different story - pglz does not help at all, > while lz4 still massively reduces the amount of spilled data. > > I think the explanation is very simple - for pglz, we compress each row > on it's own, there's no concept of streaming/context. If a row is > compressible, it works fine, but when the row gets random, pglz can't > compress it at all. For lz4, this does not matter, because with the > streaming mode it still sees that rows are just repeated, and so can > compress them efficiently. That's correct. > For TOAST test, the results look like this: > > distribution repeats toast off pglz lz4 > =============================================================== > compressible 10000 lz4 14 2 1 > pglz 40 4 3 > 1000 lz4 32 16 9 > pglz 54 17 10 > --------------------------------------------------------- > random 10000 lz4 3305 3305 3157 > pglz 3305 3305 3157 > 1000 lz4 3166 3162 1580 > pglz 3334 3326 1745 > ---------------------------------------------------------- > random2 10000 lz4 3305 3305 3157 > pglz 3305 3305 3158 > 1000 lz4 3160 3156 3010 > pglz 3334 3326 3172 > > The "repeats" value means how long the string is - it's the number of > "md5" hashes added to the string. The number of rows is calculated to > keep the total amount of data the same. The "toast" column tracks what > compression was used for TOAST, I was wondering if it matters. > > This time there are three data distributions - compressible means that > each TOAST value is nicely compressible, "random" means each value is > random (not compressible), but the rows are just copy of the same value > (so on the whole there's a lot of redundancy). And "random2" means each > row is random and unique (so not compressible at all). > > The table shows that with compressible TOAST values, compressing the > spill file is rather useless. The reason is that ReorderBufferCompress > is handling raw TOAST data, which is already compressed. Yes, it may > further reduce the amount of data, but it's negligible when compared to > the original amount of data. > > For the random cases, the spill compression is rather pointless. Yes, > lz4 can reduce it to 1/2 for the shorter strings, but other than that > it's not very useful. It's still interesting to confirm that data already compressed or random data cannot be significantly compressed. > For a while I was thinking this approach is flawed, because it only sees > and compressed changes one by one, and that seeing a batch of changes > would improve this (e.g. we'd see the copied rows). But I realized lz4 > already does that (in the streaming mode at least), and yet it does not > help very much. Presumably that depends on how large the context is. If > the random string is long enough, it won't help. > > So maybe this approach is fine, and doing the compression at a lower > layer (for the whole file), would not really improve this. Even then > we'd only see a limited amount of data. > > Maybe the right answer to this is that compression does not help cases > where most of the replicated data is TOAST, and that it can help cases > with wide (and redundant) rows, or repeated rows. And that lz4 is a > clearly superior choice. (This also raises the question if we want to > support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this, > but doesn't that behave more like pglz, i.e. no context?) I'm working on a new version of this patch set that will include the changes you suggested in your review. About using LZ4 regular API, the goal was to use it when we cannot use the streaming API due to raw data larger than LZ4 ring buffer. But this is something I'm going to delete in the new version because I'm planning to use a similar approach as we do in astreamer_lz4.c: using frames, not blocks. LZ4 frame API looks very similar to ZSTD's streaming API. > FWIW when doing these tests, it made me realize how useful would it be > to track both the "raw" and "spilled" amounts. That is before/after > compression. It'd make calculating compression ratio much easier. Yes, that's why I tried to "fix" the spill_bytes counter. Regards, JT
On 9/23/24 21:58, Julien Tachoires wrote: > Hi Tomas, > > Le lun. 23 sept. 2024 à 18:13, Tomas Vondra <tomas@vondra.me> a écrit : >> >> Hi, >> >> I've spent a bit more time on this, mostly running tests to get a better >> idea of the practical benefits. > > Thank you for your code review and testing! > >> Firstly, I think there's a bug in ReorderBufferCompress() - it's legal >> for pglz_compress() to return -1. This can happen if the data is not >> compressible, and would not fit into the output buffer. The code can't >> just do elog(ERROR) in this case, it needs to handle that by storing the >> raw data. The attached fixup patch makes this work for me - I'm not >> claiming this is the best way to handle this, but it works. >> >> FWIW I find it strange the tests included in the patch did not trigger >> this. That probably means the tests are not quite sufficient. >> >> >> Now, to the testing. Attached are two scripts, testing different cases: >> >> test-columns.sh - Table with a variable number of 'float8' columns. >> >> test-toast.sh - Table with a single text column. >> >> The script always sets up a publication/subscription on two instances, >> generates certain amount of data (~1GB for columns, ~3.2GB for TOAST), >> waits for it to be replicated to the replica, and measures how much data >> was spilled to disk with the different compression methods (off, pglz >> and lz4). There's a couple more metrics, but that's irrelevant here. > > It would be interesting to run the same tests with zstd: in my early > testing I found that zstd was able to provide a better compression > ratio than lz4, but seemed to use more CPU resources/is slower. > Oh, I completely forgot about zstd. I don't think it'd substantially change the conclusions, though. It might compress better/worse for some cases, but the overall behavior would remain the same. I can't test this right now, the testmachine is busy with some other stuff. But it should not be difficult to update the test scripts I attached and get results yourself. There's a couple hard-coded paths that need to be updated, ofc. >> For the "column" test, it looks like this (this is in MB): >> >> rows columns distribution off pglz lz4 >> ======================================================== >> 100000 1000 compressible 778 20 9 >> random 778 778 16 >> -------------------------------------------------------- >> 1000000 100 compressible 916 116 62 >> random 916 916 67 >> >> It's very clear that for the "compressible" data (which just copies the >> same value into all columns), both pglz and lz4 can significantly reduce >> the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100 >> columns it's a bit less efficient, but still good. >> >> For the "random" data (where every column gets a random value, but rows >> are copied), it's a very different story - pglz does not help at all, >> while lz4 still massively reduces the amount of spilled data. >> >> I think the explanation is very simple - for pglz, we compress each row >> on it's own, there's no concept of streaming/context. If a row is >> compressible, it works fine, but when the row gets random, pglz can't >> compress it at all. For lz4, this does not matter, because with the >> streaming mode it still sees that rows are just repeated, and so can >> compress them efficiently. > > That's correct. > >> For TOAST test, the results look like this: >> >> distribution repeats toast off pglz lz4 >> =============================================================== >> compressible 10000 lz4 14 2 1 >> pglz 40 4 3 >> 1000 lz4 32 16 9 >> pglz 54 17 10 >> --------------------------------------------------------- >> random 10000 lz4 3305 3305 3157 >> pglz 3305 3305 3157 >> 1000 lz4 3166 3162 1580 >> pglz 3334 3326 1745 >> ---------------------------------------------------------- >> random2 10000 lz4 3305 3305 3157 >> pglz 3305 3305 3158 >> 1000 lz4 3160 3156 3010 >> pglz 3334 3326 3172 >> >> The "repeats" value means how long the string is - it's the number of >> "md5" hashes added to the string. The number of rows is calculated to >> keep the total amount of data the same. The "toast" column tracks what >> compression was used for TOAST, I was wondering if it matters. >> >> This time there are three data distributions - compressible means that >> each TOAST value is nicely compressible, "random" means each value is >> random (not compressible), but the rows are just copy of the same value >> (so on the whole there's a lot of redundancy). And "random2" means each >> row is random and unique (so not compressible at all). >> >> The table shows that with compressible TOAST values, compressing the >> spill file is rather useless. The reason is that ReorderBufferCompress >> is handling raw TOAST data, which is already compressed. Yes, it may >> further reduce the amount of data, but it's negligible when compared to >> the original amount of data. >> >> For the random cases, the spill compression is rather pointless. Yes, >> lz4 can reduce it to 1/2 for the shorter strings, but other than that >> it's not very useful. > > It's still interesting to confirm that data already compressed or > random data cannot be significantly compressed. > >> For a while I was thinking this approach is flawed, because it only sees >> and compressed changes one by one, and that seeing a batch of changes >> would improve this (e.g. we'd see the copied rows). But I realized lz4 >> already does that (in the streaming mode at least), and yet it does not >> help very much. Presumably that depends on how large the context is. If >> the random string is long enough, it won't help. >> >> So maybe this approach is fine, and doing the compression at a lower >> layer (for the whole file), would not really improve this. Even then >> we'd only see a limited amount of data. >> >> Maybe the right answer to this is that compression does not help cases >> where most of the replicated data is TOAST, and that it can help cases >> with wide (and redundant) rows, or repeated rows. And that lz4 is a >> clearly superior choice. (This also raises the question if we want to >> support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this, >> but doesn't that behave more like pglz, i.e. no context?) > > I'm working on a new version of this patch set that will include the > changes you suggested in your review. About using LZ4 regular API, the > goal was to use it when we cannot use the streaming API due to raw > data larger than LZ4 ring buffer. But this is something I'm going to > delete in the new version because I'm planning to use a similar > approach as we do in astreamer_lz4.c: using frames, not blocks. LZ4 > frame API looks very similar to ZSTD's streaming API. > >> FWIW when doing these tests, it made me realize how useful would it be >> to track both the "raw" and "spilled" amounts. That is before/after >> compression. It'd make calculating compression ratio much easier. > > Yes, that's why I tried to "fix" the spill_bytes counter. > But I think the 'fixed' counter only tracks the data after the new compression, right? I'm suggesting to have two counters - one for "raw" data (before compression) and "compressed" (after compression). regards -- Tomas Vondra