Re: Storage for multiple variable-length attributes in a single row - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Storage for multiple variable-length attributes in a single row
Date
Msg-id 20220207171948.i7v4hn5zv4c6gr5f@jrouhaud
Whole thread Raw
In response to Re: Storage for multiple variable-length attributes in a single row  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On Mon, Feb 07, 2022 at 10:10:53AM -0700, David G. Johnston wrote:
> On Mon, Feb 7, 2022 at 9:58 AM Esteban Zimanyi <estebanzimanyi@gmail.com>
> wrote:
> 
> >
> > As suggested by David, this goes beyond the "traditional" usage of
> > PostgreSQL. Therefore my questions are
> > * What is the suggested strategy to splitting these 2K attributes into
> > vertically partitioned tables where the tables are linked by the primary
> > key (e.g. trip number in the example above). Are there any limitations/best
> > practices in the number/size of TOASTED attributes that a table should
> > contain.
> > * In each partitioned table containing N TOASTED attributes, given the
> > above requirements, are there any limitations/best practices in storing
> > them using extended storage or an alternative one such as external.
> >
> >
> Frankly, the best practice is "don't have that many columns".  Since you
> do, I posit that you are just going to have to make decisions (possibly
> with experimentation) on your own.  Or maybe ask around on a MobilityDB
> forum what people using that tool and having these kinds of data structures
> do.  From a core PostgreSQL perspective you've already deviated from the
> model structures that it was designed with in mind.
> I'm really confused that you'd want the data value itself to contain a
> timestamp that, on a per-row basis, should be the same timestamp that every
> other value on the row has.  Extracting the timestamp to it own column and
> using simpler and atomic data types is how core PostgreSQL and the
> relational model normalization recommend dealing with this situation.  Then
> you just break up the attributes of a similar nature into their own tables
> based upon their shared nature.  In almost all cases relying on "main"
> storage.

Actually looking at the original example:

>    CREATE TYPE tint (
>      internallength = variable,
>      [...]
>      storage = extended,
>      alignment = double,
>      [...]
>    );

I'm wondering if it's just some miscommunication here.  If the tint data type
only needs to hold a timestamp and an int, I don't see why it would be
varlerna at all.

So if a single tint can internally hold thousands of (int, timestamptz), a bit
like pgpointcloud, then having it by default external (so both possibly
out-of-line and compressed) seems like a good idea, as you can definitely hit
the 8k boundary, it should compress nicely and you also avoid some quite high
tuple header overhead.



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Storage for multiple variable-length attributes in a single row
Next
From: Robert Haas
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations