Re: [GENERAL] "Shared strings"-style table - Mailing list pgsql-general

From David G. Johnston
Subject Re: [GENERAL] "Shared strings"-style table
Date
Msg-id CAKFQuwYGqEVgsZtCozBw46zNQUXtfX-W-vVcxJ89Sjvk7JSXtA@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] "Shared strings"-style table  (Seamus Abshere <seamus@abshere.net>)
Responses Re: [GENERAL] "Shared strings"-style table
List pgsql-general
On Fri, Oct 13, 2017 at 9:29 AM, Seamus Abshere <seamus@abshere.net> wrote:
> On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote
> > Theoretically / blue sky, could there be a table or column type that
> > transparently handles "shared strings" like this, reducing size on disk
> > at the cost of lookup overhead for all queries?
> > (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> > only for large objects?)

On Fri, Oct 13, 2017, at 01:12 PM, David G. Johnston wrote:
> Row-independence is baked into PostgreSQL pretty deeply...

Could you say more about that?

​Not intelligibly...basically as far as PostgreSQL is concerned all the data to reconstruct a row from a given table is present in that table.  From a practical perspective the "TOAST table" for a table IS part of the main table since it has no practical independent use.

As an aside I was thinking along the lines of an actual compression routine which is what a spreadsheet file is able to do since a spreadsheet contains the data from every row and column in a single file and is able to compress the entire file by finding commonalities across rows and columns.  A database generally cannot do that.

As for "transparent lookup tables for text columns"...I suppose one could implement a "system-managed-enum" type with many of the same properties of an actual enum but avoiding many of its problems by not exposing the enum-ness to the user and instead just exposing the text labels...I suspect faced with prospect of doing something that complex most users would just setup a FK relationship.
What about the comparison to TOAST, which stores values off-table?

TOAST solves a technical problem related to the fact that records "on the table" have a very small size limitation (kb) while stored values can be at least as large as a GB.  TOAST does involved compression but the input to the compression algorithm is a single cell (row and column) in a table.​  As noted above I consider the TOAST table and main table to be a single logical table.

Like I said the enum type has similar properties to what you want - but Melvin is right that using it requires careful consideration of how your data might change in the future.

David J.

pgsql-general by date:

Previous
From: "Igal @ Lucee.org"
Date:
Subject: Re: [GENERAL] Permissions for Web App
Next
From: "David G. Johnston"
Date:
Subject: Re: [GENERAL] Permissions for Web App