Thread: Increased storage size of jsonb in pg15
Hello,
We are seeing an increase in the reported storage size of our jsonb columns (using pg_column_size) since upgrading to Postgres 15.5 from version 14. The increase in size is enough to push us over the size threshold and more of our data to get written to the TOAST table.
I can't find any information online, is anyone aware of what would cause this change in storage size for essentially the same data?
I can't find any information online, is anyone aware of what would cause this change in storage size for essentially the same data?
Kind regards,
Sean
On 12/13/23 14:07, Sean Flaherty wrote: > Hello, > > We are seeing an increase in the reported storage size of our jsonb > columns (using pg_column_size) since upgrading to Postgres 15.5 from > version 14. The increase in size is enough to push us over the size > threshold and more of our data to get written to the TOAST table. More information needed: 1) Table definition. 2) An example of reported size for the 14.? and 15.5 cases. 3) Differences in the database attributes between the two instances. 4) Was this done on same machine? If not how are the machines different? 5) How the upgrade was done? > > I can't find any information online, is anyone aware of what would cause > this change in storage size for essentially the same data? > > Kind regards, > Sean -- Adrian Klaver adrian.klaver@aklaver.com
More information needed:
1) Table definition.
2) An example of reported size for the 14.? and 15.5 cases.
Since upgrading from 14.8 to 15.5, the jsonb data that was previously written in 14.8 is reporting a smaller size than the same hourly data written after the upgrade (upgrade indicated in yellow):
3) Differences in the database attributes between the two instances.
No differences.
4) Was this done on same machine? If not how are the machines different?
Same AWS RDS Postgres instance.
5) How the upgrade was done?
1) Table definition.
\d "DataRecords"
Partitioned table "public.DataRecords"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+-------------
filename | character varying(255) | | not null |
timestamp | timestamp without time zone | | not null |
data | jsonb | | not null | '{}'::jsonb
created_at | timestamp with time zone | | not null |
updated_at | timestamp with time zone | | not null |
Partition key: RANGE ("timestamp")
Indexes:
"DataRecords_pkey" PRIMARY KEY, btree (filename, "timestamp")
"DataRecords_filename_updated_at_idx" btree (filename, updated_at)
Foreign-key constraints:
"DataRecords_filename_fkey" FOREIGN KEY (filename) REFERENCES "DataFiles"(filename) ON UPDATE CASCADE ON DELETE CASCADE
Number of partitions: 97 (Use \d+ to list them.)
2) An example of reported size for the 14.? and 15.5 cases.
Since upgrading from 14.8 to 15.5, the jsonb data that was previously written in 14.8 is reporting a smaller size than the same hourly data written after the upgrade (upgrade indicated in yellow):
file | hourly_timestamp | filename_bytes | timestamp_bytes | data_filesize | created_at_bytes | updated_at_bytes |
datafile.dat | 2023-10-19 12:00:00 | 23 | 8 | 1682 | 8 | 8 |
datafile.dat | 2023-10-19 13:00:00 | 23 | 8 | 1687 | 8 | 8 |
datafile.dat | 2023-10-19 14:00:00 | 23 | 8 | 1685 | 8 | 8 |
datafile.dat | 2023-10-19 15:00:00 | 23 | 8 | 1668 | 8 | 8 |
datafile.dat | 2023-10-19 16:00:00 | 23 | 8 | 2155 | 8 | 8 |
datafile.dat | 2023-10-19 17:00:00 | 23 | 8 | 2178 | 8 | 8 |
datafile.dat | 2023-10-19 18:00:00 | 23 | 8 | 2199 | 8 | 8 |
datafile.dat | 2023-10-19 19:00:00 | 23 | 8 | 2187 | 8 | 8 |
datafile.dat | 2023-10-19 20:00:00 | 23 | 8 | 2180 | 8 | 8 |
datafile.dat | 2023-10-19 21:00:00 | 23 | 8 | 2176 | 8 | 8 |
datafile.dat | 2023-10-19 22:00:00 | 23 | 8 | 2053 | 8 | 8 |
datafile.dat | 2023-10-19 23:00:00 | 23 | 8 | 2043 | 8 | 8 |
3) Differences in the database attributes between the two instances.
No differences.
4) Was this done on same machine? If not how are the machines different?
Same AWS RDS Postgres instance.
5) How the upgrade was done?
Using the AWS RDS upgrade.
On Wed, Dec 13, 2023 at 3:21 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/13/23 14:07, Sean Flaherty wrote:
> Hello,
>
> We are seeing an increase in the reported storage size of our jsonb
> columns (using pg_column_size) since upgrading to Postgres 15.5 from
> version 14. The increase in size is enough to push us over the size
> threshold and more of our data to get written to the TOAST table.
More information needed:
1) Table definition.
2) An example of reported size for the 14.? and 15.5 cases.
3) Differences in the database attributes between the two instances.
4) Was this done on same machine? If not how are the machines different?
5) How the upgrade was done?
>
> I can't find any information online, is anyone aware of what would cause
> this change in storage size for essentially the same data?
>
> Kind regards,
> Sean
--
Adrian Klaver
adrian.klaver@aklaver.com
On 12/13/23 15:49, Sean Flaherty wrote: > More information needed: > > 2) An example of reported size for the 14.? and 15.5 cases. > > Since upgrading from 14.8 to 15.5, the jsonb data that was previously > written in 14.8 is reporting a smaller size than the same hourly data > written after the upgrade (upgrade indicated in yellow): What is producing datafile.dat and how? > > *file* *hourly_timestamp* *filename_bytes* *timestamp_bytes* > *data_filesize* *created_at_bytes* *updated_at_bytes* > datafile.dat 2023-10-19 12:00:00 23 8 1682 8 8 > datafile.dat 2023-10-19 13:00:00 23 8 1687 8 8 > datafile.dat 2023-10-19 14:00:00 23 8 1685 8 8 > datafile.dat 2023-10-19 15:00:00 23 8 1668 8 8 > datafile.dat 2023-10-19 16:00:00 23 8 2155 8 8 > datafile.dat 2023-10-19 17:00:00 23 8 2178 8 8 > datafile.dat 2023-10-19 18:00:00 23 8 2199 8 8 > datafile.dat 2023-10-19 19:00:00 23 8 2187 8 8 > datafile.dat 2023-10-19 20:00:00 23 8 2180 8 8 > datafile.dat 2023-10-19 21:00:00 23 8 2176 8 8 > datafile.dat 2023-10-19 22:00:00 23 8 2053 8 8 > datafile.dat 2023-10-19 23:00:00 23 8 2043 8 8 -- Adrian Klaver adrian.klaver@aklaver.com
We have a process that runs once an hour to read the .dat file in csv format then a node script using the pg package version "8.8.0" to create the json objects and insert the data records as jsonb data.
None of the upload process changed during the underlying database upgrade.
On Wed, Dec 13, 2023 at 4:56 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/13/23 15:49, Sean Flaherty wrote:
> More information needed:
>
> 2) An example of reported size for the 14.? and 15.5 cases.
>
> Since upgrading from 14.8 to 15.5, the jsonb data that was previously
> written in 14.8 is reporting a smaller size than the same hourly data
> written after the upgrade (upgrade indicated in yellow):
What is producing datafile.dat and how?
>
> *file* *hourly_timestamp* *filename_bytes* *timestamp_bytes*
> *data_filesize* *created_at_bytes* *updated_at_bytes*
> datafile.dat 2023-10-19 12:00:00 23 8 1682 8 8
> datafile.dat 2023-10-19 13:00:00 23 8 1687 8 8
> datafile.dat 2023-10-19 14:00:00 23 8 1685 8 8
> datafile.dat 2023-10-19 15:00:00 23 8 1668 8 8
> datafile.dat 2023-10-19 16:00:00 23 8 2155 8 8
> datafile.dat 2023-10-19 17:00:00 23 8 2178 8 8
> datafile.dat 2023-10-19 18:00:00 23 8 2199 8 8
> datafile.dat 2023-10-19 19:00:00 23 8 2187 8 8
> datafile.dat 2023-10-19 20:00:00 23 8 2180 8 8
> datafile.dat 2023-10-19 21:00:00 23 8 2176 8 8
> datafile.dat 2023-10-19 22:00:00 23 8 2053 8 8
> datafile.dat 2023-10-19 23:00:00 23 8 2043 8 8
--
Adrian Klaver
adrian.klaver@aklaver.com
On 12/14/23 06:48, Sean Flaherty wrote: > We have a process that runs once an hour to read the .dat file in csv > format then a node script using the pg package > <https://github.com/brianc/node-postgres> version "8.8.0" to create the > json objects and insert the data records as jsonb data. Now I am not understanding. 1) In your OP you mentioned checking size of the column storage using pg_column_size, yet what you show for increase in size are datafile.dat. 2) So how is datafile.dat related to this issue? 3) Show how you are determining that the storage in the database has increased in size. > > None of the upload process changed during the underlying database upgrade. > > > On Wed, Dec 13, 2023 at 4:56 PM Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 12/13/23 15:49, Sean Flaherty wrote: > > More information needed: > > > > > 2) An example of reported size for the 14.? and 15.5 cases. > > > > Since upgrading from 14.8 to 15.5, the jsonb data that was > previously > > written in 14.8 is reporting a smaller size than the same hourly > data > > written after the upgrade (upgrade indicated in yellow): > > What is producing datafile.dat and how? > > > > > *file* *hourly_timestamp* *filename_bytes* > *timestamp_bytes* > > *data_filesize* *created_at_bytes* *updated_at_bytes* > > datafile.dat 2023-10-19 12:00:00 23 8 1682 8 > 8 > > datafile.dat 2023-10-19 13:00:00 23 8 1687 8 > 8 > > datafile.dat 2023-10-19 14:00:00 23 8 1685 8 > 8 > > datafile.dat 2023-10-19 15:00:00 23 8 1668 8 > 8 > > datafile.dat 2023-10-19 16:00:00 23 8 2155 8 > 8 > > datafile.dat 2023-10-19 17:00:00 23 8 2178 8 > 8 > > datafile.dat 2023-10-19 18:00:00 23 8 2199 8 > 8 > > datafile.dat 2023-10-19 19:00:00 23 8 2187 8 > 8 > > datafile.dat 2023-10-19 20:00:00 23 8 2180 8 > 8 > > datafile.dat 2023-10-19 21:00:00 23 8 2176 8 > 8 > > datafile.dat 2023-10-19 22:00:00 23 8 2053 8 > 8 > > datafile.dat 2023-10-19 23:00:00 23 8 2043 8 > 8 > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Dec 14, 2023 at 7:48 AM Sean Flaherty <sflaherty@grndwork.com> wrote:
We have a process that runs once an hour to read the .dat file in csv format then a node script using the pg package version "8.8.0" to create the json objects and insert the data records as jsonb data.None of the upload process changed during the underlying database upgrade.
Basic debugging requires the existence of a self-contained reproducer. In this case ideally one that only uses psql and some static (already processed) data files, and that is known to produce the observed behaviors on non-RDS PostgreSQL.
David J.
We did a little more digging on our side, which I apologize for not doing beforehand.
We wrote a script to create a simple table with a jsonb column, inserted some json data into the column and then used pg_column_size to test the size of the jsonb data column.
We ran the script against local docker versions of postgres 14.8 and 15.5 and saw that the data in the jsonb column was the same size.
We then ran the script on AWS RDS Postgres and saw a size difference for the same json data between versions 14.8 and 15.4 (we also saw the larger size on RDS postgres 15.5 and 16.1).
Thank you for your prompt responses and assistance.
If you are curious, I can share the script we used with you.
We wrote a script to create a simple table with a jsonb column, inserted some json data into the column and then used pg_column_size to test the size of the jsonb data column.
We ran the script against local docker versions of postgres 14.8 and 15.5 and saw that the data in the jsonb column was the same size.
We then ran the script on AWS RDS Postgres and saw a size difference for the same json data between versions 14.8 and 15.4 (we also saw the larger size on RDS postgres 15.5 and 16.1).
Version size (kb) notes RDS Postgres-14.8 1587 (the same size we were seeing locally) RDS Postgres-15.4 2112
Thank you for your prompt responses and assistance.
If you are curious, I can share the script we used with you.
Kind regards,
Sean
On Thu, Dec 14, 2023 at 9:47 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, Dec 14, 2023 at 7:48 AM Sean Flaherty <sflaherty@grndwork.com> wrote:We have a process that runs once an hour to read the .dat file in csv format then a node script using the pg package version "8.8.0" to create the json objects and insert the data records as jsonb data.None of the upload process changed during the underlying database upgrade.Basic debugging requires the existence of a self-contained reproducer. In this case ideally one that only uses psql and some static (already processed) data files, and that is known to produce the observed behaviors on non-RDS PostgreSQL.David J.
On 12/15/23 12:43, Sean Flaherty wrote: > We did a little more digging on our side, which I apologize for not > doing beforehand. > > We wrote a script to create a simple table with a jsonb column, inserted > some json data into the column and then used *pg_column_size* to test > the size of the jsonb data column. > > We ran the script against local docker versions of postgres 14.8 and > 15.5 and saw that the data in the jsonb column was the same size. > > We then ran the script on AWS RDS Postgres and saw a size difference for > the same json data between versions 14.8 and 15.4 (we also saw the > larger size on RDS postgres 15.5 and 16.1). This points at RDS Postgres being the issue. Since that is behind the AWS curtain, you will need to contact AWS technical support for information on what changed. > > *Version* *size (kb)* *notes * > RDS Postgres-14.8 1587 (the same size we were seeing locally) > RDS Postgres-15.4 2112 > > > Thank you for your prompt responses and assistance. > > If you are curious, I can share the script we used with you. > > > Kind regards, > Sean -- Adrian Klaver adrian.klaver@aklaver.com
Follow-up:
Working with AWS, we found that starting in RDS Postgres 15, the default_toast_compression parameter is set to use lz4 compression instead of pglz. This resulted in the increased json storage size we were seeing.
I have been able to reproduce the increased storage size on RDS Postgres and using my local docker instance of postgres 15.5 by changing the local default_toast_compression value in postgresql.conf.
I have attached the test script we use to create a table, insert some test records and a query to test the JSON data size on disk.
Working with AWS, we found that starting in RDS Postgres 15, the default_toast_compression parameter is set to use lz4 compression instead of pglz. This resulted in the increased json storage size we were seeing.
I have been able to reproduce the increased storage size on RDS Postgres and using my local docker instance of postgres 15.5 by changing the local default_toast_compression value in postgresql.conf.
I have attached the test script we use to create a table, insert some test records and a query to test the JSON data size on disk.
Kind regards,
Sean
On Fri, Dec 15, 2023 at 1:56 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/15/23 12:43, Sean Flaherty wrote:
> We did a little more digging on our side, which I apologize for not
> doing beforehand.
>
> We wrote a script to create a simple table with a jsonb column, inserted
> some json data into the column and then used *pg_column_size* to test
> the size of the jsonb data column.
>
> We ran the script against local docker versions of postgres 14.8 and
> 15.5 and saw that the data in the jsonb column was the same size.
>
> We then ran the script on AWS RDS Postgres and saw a size difference for
> the same json data between versions 14.8 and 15.4 (we also saw the
> larger size on RDS postgres 15.5 and 16.1).
This points at RDS Postgres being the issue. Since that is behind the
AWS curtain, you will need to contact AWS technical support for
information on what changed.
>
> *Version* *size (kb)* *notes *
> RDS Postgres-14.8 1587 (the same size we were seeing locally)
> RDS Postgres-15.4 2112
>
>
> Thank you for your prompt responses and assistance.
>
> If you are curious, I can share the script we used with you.
>
>
> Kind regards,
> Sean
--
Adrian Klaver
adrian.klaver@aklaver.com
Attachment
On 12/28/23 09:13, Sean Flaherty wrote: > Follow-up: > Working with AWS, we found that starting in RDS Postgres 15, the > default_toast_compression parameter is set to use lz4 compression > instead of pglz. This resulted in the increased json storage size we > were seeing. > > I have been able to reproduce the increased storage size on RDS Postgres > and using my local docker instance of postgres 15.5 by changing the > local default_toast_compression value in postgresql.conf. > > I have attached the test script we use to create a table, insert some > test records and a query to test the JSON data size on disk. I can confirm I see the same results using Postgres 16 installed from the PGDG repo on Ubuntu 22.04. That the lz4 data size is greater then the pglz data size. > > > Kind regards, > Sean > -- Adrian Klaver adrian.klaver@aklaver.com
I'm rather new to the mailing list, are there any additional steps I should take (i.e. posting to pgsql-hackers, etc.)?
On Thu, Dec 28, 2023 at 11:23 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/28/23 09:13, Sean Flaherty wrote:
> Follow-up:
> Working with AWS, we found that starting in RDS Postgres 15, the
> default_toast_compression parameter is set to use lz4 compression
> instead of pglz. This resulted in the increased json storage size we
> were seeing.
>
> I have been able to reproduce the increased storage size on RDS Postgres
> and using my local docker instance of postgres 15.5 by changing the
> local default_toast_compression value in postgresql.conf.
>
> I have attached the test script we use to create a table, insert some
> test records and a query to test the JSON data size on disk.
I can confirm I see the same results using Postgres 16 installed from
the PGDG repo on Ubuntu 22.04. That the lz4 data size is greater then
the pglz data size.
>
>
> Kind regards,
> Sean
>
--
Adrian Klaver
adrian.klaver@aklaver.com
On 12/28/23 12:41, Sean Flaherty wrote: > I'm rather new to the mailing list, are there any additional steps I > should take (i.e. posting to pgsql-hackers, etc.)? > For what purpose? You are seeing differences in compression strategies between lz4 and pglz. The 'fix' would be to go back to pglz. > > On Thu, Dec 28, 2023 at 11:23 AM Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 12/28/23 09:13, Sean Flaherty wrote: > > Follow-up: > > Working with AWS, we found that starting in RDS Postgres 15, the > > default_toast_compression parameter is set to use lz4 compression > > instead of pglz. This resulted in the increased json storage > size we > > were seeing. > > > > I have been able to reproduce the increased storage size on RDS > Postgres > > and using my local docker instance of postgres 15.5 by changing the > > local default_toast_compression value in postgresql.conf. > > > > I have attached the test script we use to create a table, insert > some > > test records and a query to test the JSON data size on disk. > > I can confirm I see the same results using Postgres 16 installed from > the PGDG repo on Ubuntu 22.04. That the lz4 data size is greater then > the pglz data size. > > > > > > > Kind regards, > > Sean > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > -- Adrian Klaver adrian.klaver@aklaver.com
On Fri, Dec 29, 2023 at 4:47 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > On 12/28/23 12:41, Sean Flaherty wrote: > > I'm rather new to the mailing list, are there any additional steps I > > should take (i.e. posting to pgsql-hackers, etc.)? > > > > For what purpose? You are seeing differences in compression strategies > between lz4 and pglz. The 'fix' would be to go back to pglz. Agreed, lz4 is known for its high compression speed, but lower compression ratio, this is the trade off one should bear in mind. > > > > > On Thu, Dec 28, 2023 at 11:23 AM Adrian Klaver > > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > > > On 12/28/23 09:13, Sean Flaherty wrote: > > > Follow-up: > > > Working with AWS, we found that starting in RDS Postgres 15, the > > > default_toast_compression parameter is set to use lz4 compression > > > instead of pglz. This resulted in the increased json storage > > size we > > > were seeing. > > > > > > I have been able to reproduce the increased storage size on RDS > > Postgres > > > and using my local docker instance of postgres 15.5 by changing the > > > local default_toast_compression value in postgresql.conf. > > > > > > I have attached the test script we use to create a table, insert > > some > > > test records and a query to test the JSON data size on disk. > > > > I can confirm I see the same results using Postgres 16 installed from > > the PGDG repo on Ubuntu 22.04. That the lz4 data size is greater then > > the pglz data size. > > > > > > > > > > > Kind regards, > > > Sean > > > > > > > > > -- > > Adrian Klaver > > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > > > -- Regards Junwang Zhao
Junwang Zhao <zhjwpku@gmail.com> writes: > On Fri, Dec 29, 2023 at 4:47 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> For what purpose? You are seeing differences in compression strategies >> between lz4 and pglz. The 'fix' would be to go back to pglz. > Agreed, lz4 is known for its high compression speed, but lower > compression ratio, this is the trade off one should bear in mind. I don't know if we can make any blanket statements like that, but if we can, shouldn't there be some advice in the manual? AFAICS, right now there's exactly zip about why you should choose one over the other. regards, tom lane
What we found is that using lz4 compression on JSONB data is 20-25% larger on disk than pglz. We are running a production workload that is storing jsonb data with a focus read performance. The documented increase in write speed wasn't a large benefit, however, the increase in storage size moved the bulk of our data into TOAST and off the JSON performance cliff ("2-10× slower queries") described by Evan was impactful.
This article does a nice job describing the differences between pglz and lz4 compression for different data but does not include json or jsonb.
I believe validation of our numbers and additional documentation on the trade-offs in compression types would be very useful.
On Fri, Dec 29, 2023 at 7:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Junwang Zhao <zhjwpku@gmail.com> writes:
> On Fri, Dec 29, 2023 at 4:47 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> For what purpose? You are seeing differences in compression strategies
>> between lz4 and pglz. The 'fix' would be to go back to pglz.
> Agreed, lz4 is known for its high compression speed, but lower
> compression ratio, this is the trade off one should bear in mind.
I don't know if we can make any blanket statements like that, but
if we can, shouldn't there be some advice in the manual? AFAICS,
right now there's exactly zip about why you should choose one over
the other.
regards, tom lane
On 12/29/23 07:21, Sean Flaherty wrote: > What we found is that using lz4 compression on JSONB data is 20-25% > larger on disk than pglz. We are running a production workload that is > storing jsonb data with a focus read performance. The documented > increase in write speed wasn't a large benefit, however, the increase in > storage size moved the bulk of our data into TOAST and off the JSON > performance cliff ("2-10× slower queries") described by Evan > <https://www.evanjones.ca/postgres-large-json-performance.html> was > impactful. > > This > <https://www.postgresql.fastware.com/blog/what-is-the-new-lz4-toast-compression-in-postgresql-14> article does a nice jobdescribing the differences between pglz and lz4 compression for different data but does not include json or jsonb. > > I believe validation of our numbers and additional documentation on the > trade-offs in compression types would be very useful. Yes, that would be useful. Also per this: "Working with AWS, we found that starting in RDS Postgres 15, the default_toast_compression parameter is set to use lz4 compression instead of pglz." there is a discussion to be had with AWS about the advisability of changing defaults without testing what that does to the end user or notifying the end user. > > On Fri, Dec 29, 2023 at 7:23 AM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > Junwang Zhao <zhjwpku@gmail.com <mailto:zhjwpku@gmail.com>> writes: > > On Fri, Dec 29, 2023 at 4:47 AM Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > >> For what purpose? You are seeing differences in compression > strategies > >> between lz4 and pglz. The 'fix' would be to go back to pglz. > > > Agreed, lz4 is known for its high compression speed, but lower > > compression ratio, this is the trade off one should bear in mind. > > I don't know if we can make any blanket statements like that, but > if we can, shouldn't there be some advice in the manual? AFAICS, > right now there's exactly zip about why you should choose one over > the other. > > regards, tom lane > -- Adrian Klaver adrian.klaver@aklaver.com
I have included your suggestion to document any changes to the default Postgres settings to the Amazon RDS for PostgreSQL updates page in our ticket with AWS.
On Fri, Dec 29, 2023 at 9:43 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/29/23 07:21, Sean Flaherty wrote:
> What we found is that using lz4 compression on JSONB data is 20-25%
> larger on disk than pglz. We are running a production workload that is
> storing jsonb data with a focus read performance. The documented
> increase in write speed wasn't a large benefit, however, the increase in
> storage size moved the bulk of our data into TOAST and off the JSON
> performance cliff ("2-10× slower queries") described by Evan
> <https://www.evanjones.ca/postgres-large-json-performance.html> was
> impactful.
>
> This
> <https://www.postgresql.fastware.com/blog/what-is-the-new-lz4-toast-compression-in-postgresql-14> article does a nice job describing the differences between pglz and lz4 compression for different data but does not include json or jsonb.
>
> I believe validation of our numbers and additional documentation on the
> trade-offs in compression types would be very useful.
Yes, that would be useful.
Also per this:
"Working with AWS, we found that starting in RDS Postgres 15, the
default_toast_compression parameter is set to use lz4 compression
instead of pglz."
there is a discussion to be had with AWS about the advisability of
changing defaults without testing what that does to the end user or
notifying the end user.
>
> On Fri, Dec 29, 2023 at 7:23 AM Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
> Junwang Zhao <zhjwpku@gmail.com <mailto:zhjwpku@gmail.com>> writes:
> > On Fri, Dec 29, 2023 at 4:47 AM Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> >> For what purpose? You are seeing differences in compression
> strategies
> >> between lz4 and pglz. The 'fix' would be to go back to pglz.
>
> > Agreed, lz4 is known for its high compression speed, but lower
> > compression ratio, this is the trade off one should bear in mind.
>
> I don't know if we can make any blanket statements like that, but
> if we can, shouldn't there be some advice in the manual? AFAICS,
> right now there's exactly zip about why you should choose one over
> the other.
>
> regards, tom lane
>
--
Adrian Klaver
adrian.klaver@aklaver.com