Home > mailing lists

[GENERAL] "Shared strings"-style table - Mailing list pgsql-general

From	Seamus Abshere
Subject	[GENERAL] "Shared strings"-style table
Date	October 13, 2017 18:49:21
Msg-id	1507909761.179283.1137828392.30EBE584@webmail.messagingengine.com Whole thread Raw
Responses	Re: [GENERAL] "Shared strings"-style table Re: [GENERAL] "Shared strings"-style table Re: [GENERAL] "Shared strings"-style table
List	pgsql-general

Tree view

hey,

In the spreadsheet world, there is this concept of "shared strings," a
simple way of compressing spreadsheets when the data is duplicated in
many cells.

In my database, I have a table with >200 million rows and >300 columns
(all the households in the United States). For clarity of development
and debugging, I have not made any effort to normalize its contents, so
millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
TOWNHOUSE" (yes, that whole string!) instead of some code representing
it.

Theoretically / blue sky, could there be a table or column type that
transparently handles "shared strings" like this, reducing size on disk
at the cost of lookup overhead for all queries?

(I guess maybe it's like TOAST, but content-hashed and de-duped and not
only for large objects?)

Thanks,
Seamus

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

From: "Daniel Verite"
Date: 13 October 2017, 16:36:10
Subject: Re: [GENERAL] Restore LargeObjects on different server

From: Rob Sargent
Date: 13 October 2017, 19:09:05
Subject: Re: [GENERAL] "Shared strings"-style table

[GENERAL] "Shared strings"-style table - Mailing list pgsql-general

Previous

Next