Re: Design strategy for table with many attributes - Mailing list pgsql-general

From Guyren Howe
Subject Re: Design strategy for table with many attributes
Date
Msg-id 2A2982A5-C93C-4A88-8ADE-6F3565D4B42A@gmail.com
Whole thread Raw
In response to Design strategy for table with many attributes  (Lok P <loknath.73@gmail.com>)
List pgsql-general
Ultimately, the way you should store the data depends on how you will use it. When you retrieve these values, what are
thedifferent ways you’ll be using them? 

Normalised representations are more flexible, and the pragmatic, performance-based consideration is that all the values
ina row are always effectively retrieved together. So if you reasonably often have a query that only accesses the
creationdate and transaction id, then it will be pretty slow if you are also always retrieving 500 other columns you
don’tneed. 

So: you might often pull all the values *other* than the attributes (creation date, creator, job information, whatever)
together.This argues that those values should be in one table, and the attributes in another. 

Will you usually be using *all* of the attributes for a particular transaction together in the same operation? It might
makesense to store them in eg an array in that case. But this significantly reduces the speed of accessing particular
attributesseparately. 

It is likely that you will want to slice things by particular named attribute across many transactions. This argues for
themore normalised form, as does the general principle of doing things in the way that is most general and flexible. 

When considering how the data will be used, please consider not only the developers of your current application(s), but
alsoeg data analysts, managers, future applications etc. 

The less specific you can be about how you want to use the data, the more you should lean toward fully normalising.

Note also that you can store your data in a normalised and maximally flexible form, but also use triggers or batch jobs
togather various permutations of the data for specific purposes. If you really do, say, both have some major part of
yourproject that uses all the attributes on a given transaction together, but you also have other uses, you may want to
storeboth the normalised/attribute table and the “all the values together” version. 

Even if you want to store “all the values together”, it may well be better to use an array, JSON or HStore, rather than
havinga super-wide table. JSON would eg let you enumerate all the column names (for example) and employ Postgres’s
reallynice JSON query features. 


> On Jul 4, 2024, at 12:37, Lok P <loknath.73@gmail.com> wrote:
>
> Hello,
> In one of the applications we are getting transactions in messages/events format and also in files and then they are
gettingparsed and stored into the relational database. The number of attributes/columns each transaction has is ~900+.
Logicallythey are part of one single transaction and should be stored in one table as one single row. There will be
~500millionsuch transactions each day coming into the system. And there will be approx ~10K peak write TPS and 5K read
TPSin target state. This system has a postgres database as a "source of truth" or OLTP store. And then data moves to
snowflakesfor the olap store. 
>
> Initially when the system was designed the number of attributes per transaction was <100 but slowly the business
wantsto keep/persist other attributes too in the current system and the number of columns keep growing. 
>
> However, as worked with some database systems , we get few suggestions from DBA's to not have many columns in a
singletable. For example in oracle they say not to go beyond ~255 columns as then row chaining and row migration type
ofthings are going to hunt us. Also we are afraid concurrent DMLS on the table may cause this as a contention point. So
Iwanted to understand , in such a situation what would be the best design approach we should use irrespective of
databases?Or say, what is the maximum number of columns per table we should restrict? Should we break the single
transactioninto multiple tables like one main table and other addenda tables with the same primary key to join and
fetchthe results wherever necessary? 
>
> Regards
> Lok




pgsql-general by date:

Previous
From: Vasu Nagendra
Date:
Subject: JSONPath operator and escaping values in query
Next
From: Murthy Nunna
Date:
Subject: psql help