Re: Proposal: Adding compression of temporary files - Mailing list pgsql-hackers

From Filip Janus
Subject Re: Proposal: Adding compression of temporary files
Date
Msg-id CAFjYY+JDSpOQwYAfTQQ43=BA=d32XfcAdaPVJgHheV9fQBbLWg@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Adding compression of temporary files  (Filip Janus <fjanus@redhat.com>)
Responses Re: Proposal: Adding compression of temporary files
List pgsql-hackers
Hi,
Thank you, Tomas, for the thorough and detailed review!
I'm posting an updated patch set incorporating the changes from your review.

Changes applied from review:
- Simplified BufFileCreateTemp interface
- Improved error handling in BufFileLoadBuffer/BufFileDumpBuffer
- Unified compression header format (CompressHeader struct)
- Added tuplestore integration (compression when EXEC_FLAG_BACKWARD is not required)
- Various code cleanups and comment improvements
Additional change (not from review):
- Switched from static shared buffer to per-file allocation. The shared buffer 
   provided a negligible performance benefit while keeping memory allocated for the backend's lifetime.
Future work:
- Support for additional compression methods (gzip, zstd)
- Random access and seek operations with compression


    -Filip-


út 13. 1. 2026 v 14:34 odesílatel Filip Janus <fjanus@redhat.com> napsal:
Hi, 
Yes, it needs to be rebased. I am working on it. I will post it here soon.
 

    -Filip-


út 13. 1. 2026 v 13:51 odesílatel lakshmi <lakshmigcdac@gmail.com> napsal:
Hi all,
I tried to replicate the temporary file compression issue by applying the two patches shared in the thread on current PostgreSQL master.
here is what i observed,
1) patch 1:0001-Add-transparent-compression-for-temporary-files.patch
when applying the first patch it ultimately fails to apply due to context mismatches.

failures i see are in the following files:
src/backend/storage/file/buffile.c
src/backend/utils/misc/guc_tables.c
src/backend/utils/misc/postgresql.conf.sample

2) The second patch 0002-Add-regression-tests-for-temporary-file-compression.patch ,applies successfully without any issues.

Does it mean that the implementation patch needs to be rebased or otherwise adjusted for the current codebase, and if so, what would be the recommended way to proceed?could you please suggest how I should apply the implementation patch in this case?


regards
lakshmi

On Tue, Jan 13, 2026 at 5:01 PM Filip Janus <fjanus@redhat.com> wrote:
Rebase after changes introduced in guc_tables.c

    -Filip-


út 19. 8. 2025 v 17:48 odesílatel Filip Janus <fjanus@redhat.com> napsal:
Fix overlooked compiler warnings 

    -Filip-


po 18. 8. 2025 v 18:51 odesílatel Filip Janus <fjanus@redhat.com> napsal:
I rebased the proposal and fixed the problem causing those problems.

    -Filip-


út 17. 6. 2025 v 16:49 odesílatel Andres Freund <andres@anarazel.de> napsal:
Hi,

On 2025-04-25 23:54:00 +0200, Filip Janus wrote:
> The latest rebase.

This often seems to fail during tests:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5382

E.g.
https://api.cirrus-ci.com/v1/artifact/task/4667337632120832/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress

=== dumping /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/regression.diffs ===
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out   2025-05-26 05:04:40.686524215 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out   2025-05-26 05:15:00.534907680 +0000
@@ -594,11 +594,8 @@
 select count(*) from join_foo
   left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
   on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
- count
--------
-     3
-(1 row)
-
+ERROR:  could not read from temporary file: read only 8180 of 1572860 bytes
+CONTEXT:  parallel worker
 select final > 1 as multibatch
   from hash_join_batches(
 $$
@@ -606,11 +603,7 @@
     left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
     on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
 $$);
- multibatch
-------------
- t
-(1 row)
-
+ERROR:  current transaction is aborted, commands ignored until end of transaction block
 rollback to settings;
 -- single-batch with rescan, parallel-oblivious
 savepoint settings;


Greetings,

Andres


Attachment

pgsql-hackers by date:

Previous
From: Henson Choi
Date:
Subject: Re: Row pattern recognition
Next
From: Sami Imseih
Date:
Subject: Re: Cleaning up PREPARE query strings?