Re: [GENERAL] "Shared strings"-style table - Mailing list pgsql-general

From Rob Sargent
Subject Re: [GENERAL] "Shared strings"-style table
Date
Msg-id dd0a0ae0-c28c-92d7-0650-559a4a43dd7b@gmail.com
Whole thread Raw
In response to [GENERAL] "Shared strings"-style table  (Seamus Abshere <seamus@abshere.net>)
List pgsql-general

On 10/13/2017 09:49 AM, Seamus Abshere wrote:
> hey,
>
> In the spreadsheet world, there is this concept of "shared strings," a
> simple way of compressing spreadsheets when the data is duplicated in
> many cells.
>
> In my database, I have a table with >200 million rows and >300 columns
> (all the households in the United States). For clarity of development
> and debugging, I have not made any effort to normalize its contents, so
> millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
> TOWNHOUSE" (yes, that whole string!) instead of some code representing
> it.
>
> Theoretically / blue sky, could there be a table or column type that
> transparently handles "shared strings" like this, reducing size on disk
> at the cost of lookup overhead for all queries?
>
> (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> only for large objects?)
>
> Thanks,
> Seamus
>
> --
> Seamus Abshere, SCEA
> https://www.faraday.io
> https://github.com/seamusabshere
> https://linkedin.com/in/seamusabshere
>
>
What data type are these columns now?  I would be tempted to map the 
full strings to an abbreviation just so I didn't have to alter all the 
columns to an "id";  Optional to place any RI on the columns to the 
abbreviation dictionary table.  Just use the translation as a last step 
in user facing reports.  If you can map/abbreviate to 4 characters, 
you've approximated the disk size of an integer.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Seamus Abshere
Date:
Subject: [GENERAL] "Shared strings"-style table
Next
From: "David G. Johnston"
Date:
Subject: Re: [GENERAL] "Shared strings"-style table