Home > mailing lists

Re: Compressing large column by moving it to a unique table - Mailing list pgsql-general

From	Ron Johnson
Subject	Re: Compressing large column by moving it to a unique table
Date	February 13 09:16:10
Msg-id	CANzqJaBp4nVkAR+sXdQKN=8iTu6Pm4RTfTDPgPxHZ1S1JhMDgw@mail.gmail.com Whole thread Raw
In response to	Compressing large column by moving it to a unique table (Adrian Garcia Badaracco <adrian@adriangb.com>)
Responses	Re: Compressing large column by moving it to a unique table
List	pgsql-general

Tree view

On Mon, Feb 12, 2024 at 10:12 PM Adrian Garcia Badaracco <adrian@adriangb.com> wrote:

I am using Timescale so I'll be mentioning some timestamp stuff but I think this is a general postgres question for the most part.

I have a table with some fixed, small columns (id, timestamp, etc) and a large JSONB column (let's call it `attributes`). `attributes` has 1000s of schemas, but given a schema, there's a lot of duplication. Across all rows, more than 99% of the data is duplicated (as measured by `count(attributes)` vs `count(distinct attributes)`.

I can't normalize `attributes` into real columns because it is quite variable (remember 1000s of schemas).

My best idea is to make a table like `(day timestamptz, hash text, attributes jsonb)` and then in my original table replace `attributes` with a reference to `new_table`.

Meaning that there are many fewer rows in new_table?

I can then make a view that joins them `select original_table.timestamp, new_table.attributes from original join new_table on (time_bucket('1 day', timestamp) = day AND original.hash = new_table.hash)` or something like that. The idea of time bucketing into 1 day is to balance write and read speed (by relying on timescale to do efficient time partitioning, data retention, etc.).

I recognize this is essentially creating a key-value store in postgres and also janky compression, so I am cautious about it.

If my interpretation (that there are many fewer rows in new_table) is correct, then you've stumbled into the Second Normal Form of database design: https://en.wikipedia.org/wiki/Second_normal_form#Example

pgsql-general by date:

From: Adrian Garcia Badaracco
Date: 13 February, 06:11:34
Subject: Compressing large column by moving it to a unique table

From: Durumdara
Date: 13 February, 10:56:26
Subject: MAT. VIEW security problems and PG 10-11 versions?

Re: Compressing large column by moving it to a unique table - Mailing list pgsql-general

Previous

Next