Thread: Need help in database design

Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:
Hii Community,

I need to provide a support for some functionality for my application for that I need to store 50 key value pair set, so I am in a dilemma, weather I create 50 new columns of int2 data type each column will contain value of a specific key or should I go with JSONB data type with 50 key value pair, the table on which I am going to do that all contains 1 Billion rows of data and have 84 hash partitions, I have gone through multiple articles some of them mentioned it's a good approach to create 50 new columns and some states that creating one JSONB would be best that's why I need your help to move forward, also I am ready to make H-Store instead of JSONB if it provides better performance.
Please help me to comes out from that dilemma.

Regards,
Divyansh Gupta,
Database Administrator

Re: Need help in database design

From
Adrian Klaver
Date:
On 12/23/24 07:53, Divyansh Gupta JNsThMAudy wrote:
> Hii Community,
> 
> I need to provide a support for some functionality for my application 
> for that I need to store 50 key value pair set, so I am in a dilemma, 
> weather I create 50 new columns of int2 data type each column will 

This is unclear, I am trying to figure out you go from '50 key value 
pair set' to '50 new columns of int2'.

In other words how 50 pairs turn into 50 columns?

Then there is the question of why 50 keys per row in the first place?


> contain value of a specific key or should I go with JSONB data type with 
> 50 key value pair, the table on which I am going to do that all contains 
> 1 Billion rows of data and have 84 hash partitions, I have gone through 
> multiple articles some of them mentioned it's a good approach to create 
> 50 new columns and some states that creating one JSONB would be best 
> that's why I need your help to move forward, also I am ready to make 
> H-Store instead of JSONB if it provides better performance.
> Please help me to comes out from that dilemma.
> 
> Regards,
> Divyansh Gupta,
> Database Administrator

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

As per the discussion with other team members they suggested if we store 50 values for keys in an individual column that will provide better performance as the data type is native (INT2) on the other hand if we store all the key value pair in a single JSONB column the performance will degrade even after applying a GIN index on that however the statement sounds funny but I want to take everyone openion?

On Mon, 23 Dec 2024, 10:05 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 12/23/24 07:53, Divyansh Gupta JNsThMAudy wrote:
> Hii Community,
>
> I need to provide a support for some functionality for my application
> for that I need to store 50 key value pair set, so I am in a dilemma,
> weather I create 50 new columns of int2 data type each column will

This is unclear, I am trying to figure out you go from '50 key value
pair set' to '50 new columns of int2'.

In other words how 50 pairs turn into 50 columns?

Then there is the question of why 50 keys per row in the first place?


> contain value of a specific key or should I go with JSONB data type with
> 50 key value pair, the table on which I am going to do that all contains
> 1 Billion rows of data and have 84 hash partitions, I have gone through
> multiple articles some of them mentioned it's a good approach to create
> 50 new columns and some states that creating one JSONB would be best
> that's why I need your help to move forward, also I am ready to make
> H-Store instead of JSONB if it provides better performance.
> Please help me to comes out from that dilemma.
>
> Regards,
> Divyansh Gupta,
> Database Administrator

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

Also as you ask how 50 pairs turns into 50 column so each column will be a key and the value of that key will store inside their respective column for each row

On Mon, 23 Dec 2024, 10:14 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:

As per the discussion with other team members they suggested if we store 50 values for keys in an individual column that will provide better performance as the data type is native (INT2) on the other hand if we store all the key value pair in a single JSONB column the performance will degrade even after applying a GIN index on that however the statement sounds funny but I want to take everyone openion?

On Mon, 23 Dec 2024, 10:05 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 12/23/24 07:53, Divyansh Gupta JNsThMAudy wrote:
> Hii Community,
>
> I need to provide a support for some functionality for my application
> for that I need to store 50 key value pair set, so I am in a dilemma,
> weather I create 50 new columns of int2 data type each column will

This is unclear, I am trying to figure out you go from '50 key value
pair set' to '50 new columns of int2'.

In other words how 50 pairs turn into 50 columns?

Then there is the question of why 50 keys per row in the first place?


> contain value of a specific key or should I go with JSONB data type with
> 50 key value pair, the table on which I am going to do that all contains
> 1 Billion rows of data and have 84 hash partitions, I have gone through
> multiple articles some of them mentioned it's a good approach to create
> 50 new columns and some states that creating one JSONB would be best
> that's why I need your help to move forward, also I am ready to make
> H-Store instead of JSONB if it provides better performance.
> Please help me to comes out from that dilemma.
>
> Regards,
> Divyansh Gupta,
> Database Administrator

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Need help in database design

From
Adrian Klaver
Date:
On 12/23/24 08:46, Divyansh Gupta JNsThMAudy wrote:
> Also as you ask how 50 pairs turns into 50 column so each column will be 
> a key and the value of that key will store inside their respective 
> column for each row

My problem with understanding this is 50 pairs = 100 values, I don't 
understand how that can fit in 50 columns that can only handle 1 value 
each. You need to provide some example data showing what you want to 
achieve. It does not need to be the full 50, just something to show the 
process.


> 
> 
> On Mon, 23 Dec 2024, 10:14 pm Divyansh Gupta JNsThMAudy, 
> <ag1567827@gmail.com <mailto:ag1567827@gmail.com>> wrote:
> 
>     As per the discussion with other team members they suggested if we
>     store 50 values for keys in an individual column that will provide
>     better performance as the data type is native (INT2) on the other
>     hand if we store all the key value pair in a single JSONB column the
>     performance will degrade even after applying a GIN index on that
>     however the statement sounds funny but I want to take everyone openion?
> 
> 
>     On Mon, 23 Dec 2024, 10:05 pm Adrian Klaver,
>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>         On 12/23/24 07:53, Divyansh Gupta JNsThMAudy wrote:
>          > Hii Community,
>          >
>          > I need to provide a support for some functionality for my
>         application
>          > for that I need to store 50 key value pair set, so I am in a
>         dilemma,
>          > weather I create 50 new columns of int2 data type each column
>         will
> 
>         This is unclear, I am trying to figure out you go from '50 key
>         value
>         pair set' to '50 new columns of int2'.
> 
>         In other words how 50 pairs turn into 50 columns?
> 
>         Then there is the question of why 50 keys per row in the first
>         place?
> 
> 
>          > contain value of a specific key or should I go with JSONB
>         data type with
>          > 50 key value pair, the table on which I am going to do that
>         all contains
>          > 1 Billion rows of data and have 84 hash partitions, I have
>         gone through
>          > multiple articles some of them mentioned it's a good approach
>         to create
>          > 50 new columns and some states that creating one JSONB would
>         be best
>          > that's why I need your help to move forward, also I am ready
>         to make
>          > H-Store instead of JSONB if it provides better performance.
>          > Please help me to comes out from that dilemma.
>          >
>          > Regards,
>          > Divyansh Gupta,
>          > Database Administrator
> 
>         -- 
>         Adrian Klaver
>         adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Need help in database design

From
"David G. Johnston"
Date:
On Mon, Dec 23, 2024, 09:54 Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/23/24 08:46, Divyansh Gupta JNsThMAudy wrote:
> Also as you ask how 50 pairs turns into 50 column so each column will be
> a key and the value of that key will store inside their respective
> column for each row

My problem with understanding this is 50 pairs = 100 values, I don't
understand how that can fit in 50 columns that can only handle 1 value
each. You need to provide some example data showing what you want to
achieve. It does not need to be the full 50, just something to show the
process.


Because the column name is the same as a dictionary key...

David J.


Re: Need help in database design

From
Adrian Klaver
Date:
On 12/23/24 08:55, David G. Johnston wrote:
> On Mon, Dec 23, 2024, 09:54 Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 12/23/24 08:46, Divyansh Gupta JNsThMAudy wrote:
>      > Also as you ask how 50 pairs turns into 50 column so each column
>     will be
>      > a key and the value of that key will store inside their respective
>      > column for each row
> 
>     My problem with understanding this is 50 pairs = 100 values, I don't
>     understand how that can fit in 50 columns that can only handle 1 value
>     each. You need to provide some example data showing what you want to
>     achieve. It does not need to be the full 50, just something to show the
>     process.
> 
> 
> 
> Because the column name is the same as a dictionary key...

Aah, got it thanks.

> 
> David J.
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

Sure , for example, I have 50 key with name as t1 , t2 ,t3 .......t50

Now each key could have values from 0 to 3 

So let suppose we have JSONB like that for row r1

{

t1: 1

t2: 2

t3 : 3

}


As if I convert it into columns so r1 will have

t1 column will contain 1

t2 column will contain 2

...... So on


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.



On Mon, 23 Dec 2024, 10:24 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 12/23/24 08:46, Divyansh Gupta JNsThMAudy wrote:
> Also as you ask how 50 pairs turns into 50 column so each column will be
> a key and the value of that key will store inside their respective
> column for each row

My problem with understanding this is 50 pairs = 100 values, I don't
understand how that can fit in 50 columns that can only handle 1 value
each. You need to provide some example data showing what you want to
achieve. It does not need to be the full 50, just something to show the
process.


>
>
> On Mon, 23 Dec 2024, 10:14 pm Divyansh Gupta JNsThMAudy,
> <ag1567827@gmail.com <mailto:ag1567827@gmail.com>> wrote:
>
>     As per the discussion with other team members they suggested if we
>     store 50 values for keys in an individual column that will provide
>     better performance as the data type is native (INT2) on the other
>     hand if we store all the key value pair in a single JSONB column the
>     performance will degrade even after applying a GIN index on that
>     however the statement sounds funny but I want to take everyone openion?
>
>
>     On Mon, 23 Dec 2024, 10:05 pm Adrian Klaver,
>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>         On 12/23/24 07:53, Divyansh Gupta JNsThMAudy wrote:
>          > Hii Community,
>          >
>          > I need to provide a support for some functionality for my
>         application
>          > for that I need to store 50 key value pair set, so I am in a
>         dilemma,
>          > weather I create 50 new columns of int2 data type each column
>         will
>
>         This is unclear, I am trying to figure out you go from '50 key
>         value
>         pair set' to '50 new columns of int2'.
>
>         In other words how 50 pairs turn into 50 columns?
>
>         Then there is the question of why 50 keys per row in the first
>         place?
>
>
>          > contain value of a specific key or should I go with JSONB
>         data type with
>          > 50 key value pair, the table on which I am going to do that
>         all contains
>          > 1 Billion rows of data and have 84 hash partitions, I have
>         gone through
>          > multiple articles some of them mentioned it's a good approach
>         to create
>          > 50 new columns and some states that creating one JSONB would
>         be best
>          > that's why I need your help to move forward, also I am ready
>         to make
>          > H-Store instead of JSONB if it provides better performance.
>          > Please help me to comes out from that dilemma.
>          >
>          > Regards,
>          > Divyansh Gupta,
>          > Database Administrator
>
>         --
>         Adrian Klaver
>         adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Need help in database design

From
Ron Johnson
Date:
How do you uniquely reference each set of 50 k/v pairs?

On Mon, Dec 23, 2024 at 12:01 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Sure , for example, I have 50 key with name as t1 , t2 ,t3 .......t50

Now each key could have values from 0 to 3 

So let suppose we have JSONB like that for row r1

{

t1: 1

t2: 2

t3 : 3

}


As if I convert it into columns so r1 will have

t1 column will contain 1

t2 column will contain 2

...... So on


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.



On Mon, 23 Dec 2024, 10:24 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 12/23/24 08:46, Divyansh Gupta JNsThMAudy wrote:
> Also as you ask how 50 pairs turns into 50 column so each column will be
> a key and the value of that key will store inside their respective
> column for each row

My problem with understanding this is 50 pairs = 100 values, I don't
understand how that can fit in 50 columns that can only handle 1 value
each. You need to provide some example data showing what you want to
achieve. It does not need to be the full 50, just something to show the
process.


>
>
> On Mon, 23 Dec 2024, 10:14 pm Divyansh Gupta JNsThMAudy,
> <ag1567827@gmail.com <mailto:ag1567827@gmail.com>> wrote:
>
>     As per the discussion with other team members they suggested if we
>     store 50 values for keys in an individual column that will provide
>     better performance as the data type is native (INT2) on the other
>     hand if we store all the key value pair in a single JSONB column the
>     performance will degrade even after applying a GIN index on that
>     however the statement sounds funny but I want to take everyone openion?
>
>
>     On Mon, 23 Dec 2024, 10:05 pm Adrian Klaver,
>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>         On 12/23/24 07:53, Divyansh Gupta JNsThMAudy wrote:
>          > Hii Community,
>          >
>          > I need to provide a support for some functionality for my
>         application
>          > for that I need to store 50 key value pair set, so I am in a
>         dilemma,
>          > weather I create 50 new columns of int2 data type each column
>         will
>
>         This is unclear, I am trying to figure out you go from '50 key
>         value
>         pair set' to '50 new columns of int2'.
>
>         In other words how 50 pairs turn into 50 columns?
>
>         Then there is the question of why 50 keys per row in the first
>         place?
>
>
>          > contain value of a specific key or should I go with JSONB
>         data type with
>          > 50 key value pair, the table on which I am going to do that
>         all contains
>          > 1 Billion rows of data and have 84 hash partitions, I have
>         gone through
>          > multiple articles some of them mentioned it's a good approach
>         to create
>          > 50 new columns and some states that creating one JSONB would
>         be best
>          > that's why I need your help to move forward, also I am ready
>         to make
>          > H-Store instead of JSONB if it provides better performance.
>          > Please help me to comes out from that dilemma.
>          >
>          > Regards,
>          > Divyansh Gupta,
>          > Database Administrator
>
>         --
>         Adrian Klaver
>         adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Adrian Klaver
Date:
On 12/23/24 09:01, Divyansh Gupta JNsThMAudy wrote:
> Sure , for example, I have 50 key with name as t1 , t2 ,t3 .......t50
> 
> Now each key could have values from 0 to 3
> 
> So let suppose we have JSONB like that for row r1
> 
> {
> 
> t1: 1
> 
> t2: 2
> 
> t3 : 3
> 
> }
> 
> 
> As if I convert it into columns so r1 will have
> 
> t1 column will contain 1
> 
> t2 column will contain 2
> 
> ...... So on
> 
> 
> So here my question is considering one JSONB column is perfect or 
> considering 50 columns will be more optimised.
> 

1) How are you going to use the key:value pairs in queries?

2) I would recommend creating a sample dataset that you import into 
tables that are built with either the column method or the JSON method 
and test a representative set of queries.

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

Each row have a PK (gdid) that will uniquely refrence 50 k/v pair set

On Mon, 23 Dec 2024, 10:35 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
How do you uniquely reference each set of 50 k/v pairs?

On Mon, Dec 23, 2024 at 12:01 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Sure , for example, I have 50 key with name as t1 , t2 ,t3 .......t50

Now each key could have values from 0 to 3 

So let suppose we have JSONB like that for row r1

{

t1: 1

t2: 2

t3 : 3

}


As if I convert it into columns so r1 will have

t1 column will contain 1

t2 column will contain 2

...... So on


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.



On Mon, 23 Dec 2024, 10:24 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 12/23/24 08:46, Divyansh Gupta JNsThMAudy wrote:
> Also as you ask how 50 pairs turns into 50 column so each column will be
> a key and the value of that key will store inside their respective
> column for each row

My problem with understanding this is 50 pairs = 100 values, I don't
understand how that can fit in 50 columns that can only handle 1 value
each. You need to provide some example data showing what you want to
achieve. It does not need to be the full 50, just something to show the
process.


>
>
> On Mon, 23 Dec 2024, 10:14 pm Divyansh Gupta JNsThMAudy,
> <ag1567827@gmail.com <mailto:ag1567827@gmail.com>> wrote:
>
>     As per the discussion with other team members they suggested if we
>     store 50 values for keys in an individual column that will provide
>     better performance as the data type is native (INT2) on the other
>     hand if we store all the key value pair in a single JSONB column the
>     performance will degrade even after applying a GIN index on that
>     however the statement sounds funny but I want to take everyone openion?
>
>
>     On Mon, 23 Dec 2024, 10:05 pm Adrian Klaver,
>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>         On 12/23/24 07:53, Divyansh Gupta JNsThMAudy wrote:
>          > Hii Community,
>          >
>          > I need to provide a support for some functionality for my
>         application
>          > for that I need to store 50 key value pair set, so I am in a
>         dilemma,
>          > weather I create 50 new columns of int2 data type each column
>         will
>
>         This is unclear, I am trying to figure out you go from '50 key
>         value
>         pair set' to '50 new columns of int2'.
>
>         In other words how 50 pairs turn into 50 columns?
>
>         Then there is the question of why 50 keys per row in the first
>         place?
>
>
>          > contain value of a specific key or should I go with JSONB
>         data type with
>          > 50 key value pair, the table on which I am going to do that
>         all contains
>          > 1 Billion rows of data and have 84 hash partitions, I have
>         gone through
>          > multiple articles some of them mentioned it's a good approach
>         to create
>          > 50 new columns and some states that creating one JSONB would
>         be best
>          > that's why I need your help to move forward, also I am ready
>         to make
>          > H-Store instead of JSONB if it provides better performance.
>          > Please help me to comes out from that dilemma.
>          >
>          > Regards,
>          > Divyansh Gupta,
>          > Database Administrator
>
>         --
>         Adrian Klaver
>         adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
"David G. Johnston"
Date:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.

Re: Need help in database design

From
Adrian Klaver
Date:
On 12/23/24 09:07, Divyansh Gupta JNsThMAudy wrote:
> Each row have a PK (gdid) that will uniquely refrence 50 k/v pair set
> 

Are the 50 key/value pairs there to just define a PK or are they going 
to be used in some other combination in queries?


-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Need help in database design

From
Ron Johnson
Date:
As I expected.

Next:
1) Will all 50 of those k/v pairs be populated when you insert the record?
2) Will there be updates?
3) In each row, will some of those 50 pairs stay empty?
4) When querying a row, will the app care about all 50 k/v pairs at the same time, or just one -- or even some -- of them?


On Mon, Dec 23, 2024 at 12:07 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Each row have a PK (gdid) that will uniquely refrence 50 k/v pair set

On Mon, 23 Dec 2024, 10:35 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
How do you uniquely reference each set of 50 k/v pairs?

On Mon, Dec 23, 2024 at 12:01 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Sure , for example, I have 50 key with name as t1 , t2 ,t3 .......t50

Now each key could have values from 0 to 3 

So let suppose we have JSONB like that for row r1

{

t1: 1

t2: 2

t3 : 3

}


As if I convert it into columns so r1 will have

t1 column will contain 1

t2 column will contain 2

...... So on


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.



On Mon, 23 Dec 2024, 10:24 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 12/23/24 08:46, Divyansh Gupta JNsThMAudy wrote:
> Also as you ask how 50 pairs turns into 50 column so each column will be
> a key and the value of that key will store inside their respective
> column for each row

My problem with understanding this is 50 pairs = 100 values, I don't
understand how that can fit in 50 columns that can only handle 1 value
each. You need to provide some example data showing what you want to
achieve. It does not need to be the full 50, just something to show the
process.


>
>
> On Mon, 23 Dec 2024, 10:14 pm Divyansh Gupta JNsThMAudy,
> <ag1567827@gmail.com <mailto:ag1567827@gmail.com>> wrote:
>
>     As per the discussion with other team members they suggested if we
>     store 50 values for keys in an individual column that will provide
>     better performance as the data type is native (INT2) on the other
>     hand if we store all the key value pair in a single JSONB column the
>     performance will degrade even after applying a GIN index on that
>     however the statement sounds funny but I want to take everyone openion?
>
>
>     On Mon, 23 Dec 2024, 10:05 pm Adrian Klaver,
>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>         On 12/23/24 07:53, Divyansh Gupta JNsThMAudy wrote:
>          > Hii Community,
>          >
>          > I need to provide a support for some functionality for my
>         application
>          > for that I need to store 50 key value pair set, so I am in a
>         dilemma,
>          > weather I create 50 new columns of int2 data type each column
>         will
>
>         This is unclear, I am trying to figure out you go from '50 key
>         value
>         pair set' to '50 new columns of int2'.
>
>         In other words how 50 pairs turn into 50 columns?
>
>         Then there is the question of why 50 keys per row in the first
>         place?
>
>
>          > contain value of a specific key or should I go with JSONB
>         data type with
>          > 50 key value pair, the table on which I am going to do that
>         all contains
>          > 1 Billion rows of data and have 84 hash partitions, I have
>         gone through
>          > multiple articles some of them mentioned it's a good approach
>         to create
>          > 50 new columns and some states that creating one JSONB would
>         be best
>          > that's why I need your help to move forward, also I am ready
>         to make
>          > H-Store instead of JSONB if it provides better performance.
>          > Please help me to comes out from that dilemma.
>          >
>          > Regards,
>          > Divyansh Gupta,
>          > Database Administrator
>
>         --
>         Adrian Klaver
>         adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

Empty JSONB by mistake, I wrote array sorry about that

On Mon, 23 Dec 2024, 10:59 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.

Re: Need help in database design

From
Ron Johnson
Date:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Adrian Klaver
Date:
On 12/23/24 09:49, Divyansh Gupta JNsThMAudy wrote:
> I have one confusion with this design if I opt to create 50 columns I 
> need to create 50 index which will work with userid index in Bitmap on 
> the other hand if I create a JSONB column I need to create a single index ?
> 

There still is the part where you plan to partition this data on some as 
not yet fully specified hash.


-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Need help in database design

From
Ron Johnson
Date:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:
Adrian, Please check this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:
Ron here is the entire table schema FYI, userid is the mandate column on which filter is always applies:

CREATE TABLE dbo.googledocs_tbl (

gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,

userid int8 NOT NULL,

useremail varchar(600) NOT NULL,

title public.citext NULL,

authorname varchar(600) NULL,

authoremail varchar(600) NULL,

updated varchar(300) NOT NULL,

entryid varchar(2000) NOT NULL,

lastmodifiedby varchar(600) NULL,

lastmodifiedbyemail varchar(600) NULL,

"size" varchar(300) NULL,

contenttype varchar(250) NULL,

fileextension varchar(50) NULL,

docfoldername public.citext NULL,

folderresourceid public.citext NULL,

filesize int8 DEFAULT 0 NOT NULL,

retentionstatus int2 DEFAULT 0 NOT NULL,

docfileref int8 NULL,

usid int4 NULL,

archivepath varchar(500) NULL,

createddate timestamp(6) DEFAULT NULL::timestamp without time zone NULL,

zipfilename varchar(100) NULL,

oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT NULL,

onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL,

startsnapshot int4 DEFAULT 0 NOT NULL,

currentsnapshot int4 DEFAULT 0 NOT NULL,

dismiss int2 DEFAULT 0 NOT NULL,

checksum varchar NULL,

typeoffile int2 GENERATED ALWAYS AS (

CASE

WHEN authoremail::text = useremail::text THEN 0::smallint

ELSE 1::smallint

END) STORED NOT NULL,

parquetfilename varchar(100) NULL,

metadata_vector tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig, (((((COALESCE(title::character varying, ''::text::character varying)::text || ' '::text) || (COALESCE(docfoldername::text, ''::text) || ' '::text)) || (COALESCE(authorname, ''::text::character varying)::text || ' '::text)) || (COALESCE(fileextension, ''::text::character varying)::text || ' '::text)) || (COALESCE(lastmodifiedby, ''::text::character varying)::text || ' '::text)) || COALESCE(contenttype, ''::character varying::text::character varying)::text)) STORED NULL,

isfileencrypted int4 DEFAULT 0 NULL,

addons_json jsonb DEFAULT '{}'::jsonb NULL,

CONSTRAINT googledocs_tbl_clone_pkey PRIMARY KEY (gdid, userid),

CONSTRAINT fk_googledocs_tbl_clone_users_tbl FOREIGN KEY (userid) REFERENCES dbo.users_tbl(uid) ON DELETE CASCADE

)

PARTITION BY HASH (userid);


On Mon, Dec 23, 2024 at 11:53 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Adrian, Please check this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Ron Johnson
Date:

1. I bet you'd get better performance using RANGE partitioning.
2. Twenty million rows per userid is a LOT.  No subdivisions (like date range)?

On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Adrian, Please check this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Ron Johnson
Date:
Where are the 50 "t* columns?

On Mon, Dec 23, 2024 at 1:26 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Ron here is the entire table schema FYI, userid is the mandate column on which filter is always applies:

CREATE TABLE dbo.googledocs_tbl (

gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,

userid int8 NOT NULL,

useremail varchar(600) NOT NULL,

title public.citext NULL,

authorname varchar(600) NULL,

authoremail varchar(600) NULL,

updated varchar(300) NOT NULL,

entryid varchar(2000) NOT NULL,

lastmodifiedby varchar(600) NULL,

lastmodifiedbyemail varchar(600) NULL,

"size" varchar(300) NULL,

contenttype varchar(250) NULL,

fileextension varchar(50) NULL,

docfoldername public.citext NULL,

folderresourceid public.citext NULL,

filesize int8 DEFAULT 0 NOT NULL,

retentionstatus int2 DEFAULT 0 NOT NULL,

docfileref int8 NULL,

usid int4 NULL,

archivepath varchar(500) NULL,

createddate timestamp(6) DEFAULT NULL::timestamp without time zone NULL,

zipfilename varchar(100) NULL,

oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT NULL,

onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL,

startsnapshot int4 DEFAULT 0 NOT NULL,

currentsnapshot int4 DEFAULT 0 NOT NULL,

dismiss int2 DEFAULT 0 NOT NULL,

checksum varchar NULL,

typeoffile int2 GENERATED ALWAYS AS (

CASE

WHEN authoremail::text = useremail::text THEN 0::smallint

ELSE 1::smallint

END) STORED NOT NULL,

parquetfilename varchar(100) NULL,

metadata_vector tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig, (((((COALESCE(title::character varying, ''::text::character varying)::text || ' '::text) || (COALESCE(docfoldername::text, ''::text) || ' '::text)) || (COALESCE(authorname, ''::text::character varying)::text || ' '::text)) || (COALESCE(fileextension, ''::text::character varying)::text || ' '::text)) || (COALESCE(lastmodifiedby, ''::text::character varying)::text || ' '::text)) || COALESCE(contenttype, ''::character varying::text::character varying)::text)) STORED NULL,

isfileencrypted int4 DEFAULT 0 NULL,

addons_json jsonb DEFAULT '{}'::jsonb NULL,

CONSTRAINT googledocs_tbl_clone_pkey PRIMARY KEY (gdid, userid),

CONSTRAINT fk_googledocs_tbl_clone_users_tbl FOREIGN KEY (userid) REFERENCES dbo.users_tbl(uid) ON DELETE CASCADE

)

PARTITION BY HASH (userid);


On Mon, Dec 23, 2024 at 11:53 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Adrian, Please check this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,

[snip] 
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

Range partition can help when you applies filter for a specific range but in my case I need to apply filter on userid always, however I have date columns but there is less variation in timestamp which I have that's why didn't go for range partition.

On Mon, 23 Dec 2024, 11:57 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:

1. I bet you'd get better performance using RANGE partitioning.
2. Twenty million rows per userid is a LOT.  No subdivisions (like date range)?

On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Adrian, Please check this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

Currently I haven't created those columns , I have created addons_json column which is a JSONB column yet in a discussion weather I should create or consider only one JSONB column.

On Tue, 24 Dec 2024, 12:00 am Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:

Range partition can help when you applies filter for a specific range but in my case I need to apply filter on userid always, however I have date columns but there is less variation in timestamp which I have that's why didn't go for range partition.

On Mon, 23 Dec 2024, 11:57 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:

1. I bet you'd get better performance using RANGE partitioning.
2. Twenty million rows per userid is a LOT.  No subdivisions (like date range)?

On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Adrian, Please check this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Arnold Morein
Date:
I would like to make a suggestion, if I may. Granted, I do not understand the underlying task at hand, but:


A table with multiple columns of the same type smacks of designs that harken back to the days of mainframes. (STOP THAT!) The data described is a non-normalized array of integers that is meaningless outside of code. Table structures should be at least a LITTLE self-descriptive.

It is also not flexible (what if you suddenly need t51? how long would that table space adjustment take in production?) and space is wasted if not all 50 columns are populated.

Use a design that is basically a storage area for name/value pairs:

create table dbo.googledocs_tbl (
    id long identity primary key, — easy way to access a single record
    owner_id integer/long not null, — fk to owning parent record in other table such as user
    owner_type char(2), — optional field, identifies the owing table, makes this table even more generic
    property_name varchar(n) not null, — required unique name for property, not an array reference (t1, t4, t50)
    — the names are controlled by the developer but should be human interpretable which can then be used in queries
    property_value int4 not null — the important value in question
);

The fields owner_id, owner_type, property_name become a tertiary key that can never be changed, are unique and easily accessible via index lookup.

Add a timestamp if need be

You could then partition the record by owner_type or owner_id or whatever else comes to mind.

Then you just have to figure out the best way to index this monster for optimized queries.



On Dec 23, 2024, at 12:31 PM, Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Currently I haven't created those columns , I have created addons_json column which is a JSONB column yet in a discussion weather I should create or consider only one JSONB column.

On Tue, 24 Dec 2024, 12:00 am Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:

Range partition can help when you applies filter for a specific range but in my case I need to apply filter on userid always, however I have date columns but there is less variation in timestamp which I have that's why didn't go for range partition.

On Mon, 23 Dec 2024, 11:57 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:

1. I bet you'd get better performance using RANGE partitioning.
2. Twenty million rows per userid is a LOT.  No subdivisions (like date range)?

On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Adrian, Please check this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Ron Johnson
Date:
Are these columns really unique for all 20M rows that a userid can have in the table?  I'm dubious.

Split a LOT of those columns out into a separate table named "user" with PK userid.  It'll save a huge amount of disk space, and speed up queries by not having to fetch it all every time.

useremail varchar(600) NOT NULL,
title public.citext NULL,
authorname varchar(600) NULL,
authoremail varchar(600) NULL,
updated varchar(300) NOT NULL,
entryid varchar(2000) NOT NULL,
lastmodifiedby varchar(600) NULL,
lastmodifiedbyemail varchar(600) NULL,
"size" varchar(300) NULL,
contenttype varchar(250) NULL,
fileextension varchar(50) NULL,
docfoldername public.citext NULL,
folderresourceid public.citext NULL,
filesize int8 DEFAULT 0 NOT NULL,
retentionstatus int2 DEFAULT 0 NOT NULL,
docfileref int8 NULL,
usid int4 NULL,
archivepath varchar(500) NULL,
createddate timestamp(6) DEFAULT NULL::timestamp without time zone NULL,
zipfilename varchar(100) NULL,
oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT NULL,
onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL,
startsnapshot int4 DEFAULT 0 NOT NULL,
currentsnapshot int4 DEFAULT 0 NOT NULL,
dismiss int2 DEFAULT 0 NOT NULL,
checksum varchar NULL,
typeoffile int2 GENERATED ALWAYS AS (



On Mon, Dec 23, 2024 at 1:32 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Currently I haven't created those columns , I have created addons_json column which is a JSONB column yet in a discussion weather I should create or consider only one JSONB column.

On Tue, 24 Dec 2024, 12:00 am Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:

Range partition can help when you applies filter for a specific range but in my case I need to apply filter on userid always, however I have date columns but there is less variation in timestamp which I have that's why didn't go for range partition.

On Mon, 23 Dec 2024, 11:57 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:

1. I bet you'd get better performance using RANGE partitioning.
2. Twenty million rows per userid is a LOT.  No subdivisions (like date range)?

On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Adrian, Please check this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

Thank you everyone for giving your valuable responses, I am glad that everyone understands my concern. I got some good ideas about the database design that I am following after going through some stress testing I will implement the same.

Thank you so much Everyone

On Tue, 24 Dec 2024, 12:09 am Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Are these columns really unique for all 20M rows that a userid can have in the table?  I'm dubious.

Split a LOT of those columns out into a separate table named "user" with PK userid.  It'll save a huge amount of disk space, and speed up queries by not having to fetch it all every time.

useremail varchar(600) NOT NULL,
title public.citext NULL,
authorname varchar(600) NULL,
authoremail varchar(600) NULL,
updated varchar(300) NOT NULL,
entryid varchar(2000) NOT NULL,
lastmodifiedby varchar(600) NULL,
lastmodifiedbyemail varchar(600) NULL,
"size" varchar(300) NULL,
contenttype varchar(250) NULL,
fileextension varchar(50) NULL,
docfoldername public.citext NULL,
folderresourceid public.citext NULL,
filesize int8 DEFAULT 0 NOT NULL,
retentionstatus int2 DEFAULT 0 NOT NULL,
docfileref int8 NULL,
usid int4 NULL,
archivepath varchar(500) NULL,
createddate timestamp(6) DEFAULT NULL::timestamp without time zone NULL,
zipfilename varchar(100) NULL,
oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT NULL,
onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL,
startsnapshot int4 DEFAULT 0 NOT NULL,
currentsnapshot int4 DEFAULT 0 NOT NULL,
dismiss int2 DEFAULT 0 NOT NULL,
checksum varchar NULL,
typeoffile int2 GENERATED ALWAYS AS (



On Mon, Dec 23, 2024 at 1:32 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Currently I haven't created those columns , I have created addons_json column which is a JSONB column yet in a discussion weather I should create or consider only one JSONB column.

On Tue, 24 Dec 2024, 12:00 am Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:

Range partition can help when you applies filter for a specific range but in my case I need to apply filter on userid always, however I have date columns but there is less variation in timestamp which I have that's why didn't go for range partition.

On Mon, 23 Dec 2024, 11:57 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:

1. I bet you'd get better performance using RANGE partitioning.
2. Twenty million rows per userid is a LOT.  No subdivisions (like date range)?

On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Adrian, Please check this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
If your queries all reference userid, then you only need indices on gdid and userid.

On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?

On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
Given what you just wrote, I'd stick with 50 separate t* columns.  Simplifies queries, simplifies updates, and eliminates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Values can be updated based on customer actions

All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.

On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> wrote:
Let's make it more understandable, here is the table schema with 50 columns in it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every time when i query I will query it along with userid 
Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key filters if customer applies 

On the other hand if I create a single jsonb column the schema will look like :

CREATE TABLE dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}'::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would be like 
where userid = 12345678 and ((addons_json @> {t1:1}) or  (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
more key filters if customer applies 



On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.

The relational database engine is designed around the column-based approach.  Especially if the columns are generally unchanging, combined with using fixed-width data types.

David J.



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Need help in database design

From
thiemo@gelassene-pferde.biz
Date:

Just out of curiosity, not suggestion this is the solution. Why save json in PostgreSQL and not in a DB specialised on JSON like MongoDB?

Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> escribió:

Thank you everyone for giving your valuable responses, I am glad that everyone understands my concern. I got some good ideas about the database design that I am following after going through some stress testing I will implement the same.

Thank you so much Everyone

Re: Need help in database design

From
Divyansh Gupta JNsThMAudy
Date:

The current application does not have support for that, If I go for that, it will require a lot of effort from the team which Is not possible right now.

On Tue, 24 Dec 2024, 12:47 am , <thiemo@gelassene-pferde.biz> wrote:

Just out of curiosity, not suggestion this is the solution. Why save json in PostgreSQL and not in a DB specialised on JSON like MongoDB?

Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> escribió:

Thank you everyone for giving your valuable responses, I am glad that everyone understands my concern. I got some good ideas about the database design that I am following after going through some stress testing I will implement the same.

Thank you so much Everyone

Re: Need help in database design

From
"David G. Johnston"
Date:
On Mon, Dec 23, 2024 at 11:26 AM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,

userid int8 NOT NULL,

useremail varchar(600) NOT NULL,

title public.citext NULL,

authorname varchar(600) NULL,

authoremail varchar(600) NULL,

updated varchar(300) NOT NULL,

entryid varchar(2000) NOT NULL,

lastmodifiedby varchar(600) NULL,

lastmodifiedbyemail varchar(600) NULL,

"size" varchar(300) NULL,

contenttype varchar(250) NULL,

fileextension varchar(50) NULL,

docfoldername public.citext NULL,

folderresourceid public.citext NULL,

filesize int8 DEFAULT 0 NOT NULL,

retentionstatus int2 DEFAULT 0 NOT NULL,

docfileref int8 NULL,

usid int4 NULL,

archivepath varchar(500) NULL,

createddate timestamp(6) DEFAULT NULL::timestamp without time zone NULL,

zipfilename varchar(100) NULL,

oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT NULL,

onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL,

startsnapshot int4 DEFAULT 0 NOT NULL,

currentsnapshot int4 DEFAULT 0 NOT NULL,

dismiss int2 DEFAULT 0 NOT NULL,

checksum varchar NULL,

typeoffile int2 GENERATED ALWAYS AS (

CASE

WHEN authoremail::text = useremail::text THEN 0::smallint

ELSE 1::smallint

END) STORED NOT NULL,

parquetfilename varchar(100) NULL,

metadata_vector tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig, (((((COALESCE(title::character varying, ''::text::character varying)::text || ' '::text) || (COALESCE(docfoldername::text, ''::text) || ' '::text)) || (COALESCE(authorname, ''::text::character varying)::text || ' '::text)) || (COALESCE(fileextension, ''::text::character varying)::text || ' '::text)) || (COALESCE(lastmodifiedby, ''::text::character varying)::text || ' '::text)) || COALESCE(contenttype, ''::character varying::text::character varying)::text)) STORED NULL,

isfileencrypted int4 DEFAULT 0 NULL,

addons_json jsonb DEFAULT '{}'::jsonb NULL,

CONSTRAINT googledocs_tbl_clone_pkey PRIMARY KEY (gdid, userid),

CONSTRAINT fk_googledocs_tbl_clone_users_tbl FOREIGN KEY (userid) REFERENCES dbo.users_tbl(uid) ON DELETE CASCADE

)

PARTITION BY HASH (userid);



There are a lot of unconventional choices for data types and column naming there...and it doesn't seem as normalized as it could be.

Besides that, consider that this probably should be at least two tables.  Put columns that you expect to change whenever the user makes a typical edit on one table.  Put those columns that can never change, or at least would rarely do so, on another.  Narrower tables is usually a win and with this division you aren't throwing away and replacing all of the static data each time the volatile data changes.

David J.

Re: Need help in database design

From
Greg Sabino Mullane
Date:
You might also look into using a bitmap, for some or all of those fields. It depends on how many distinct values each can have, of course, and also on how exactly they are accessed, but bitmaps can save you quite a bit of space.

Cheers,
Greg