Re: How to do faster DML - Mailing list pgsql-general

From veem v
Subject Re: How to do faster DML
Date
Msg-id CAB+=1TVjov+y-8=d3ey-B7-Wj-23KPAKQQAejkjXcS50_KZ=pg@mail.gmail.com
Whole thread Raw
In response to Re: How to do faster DML  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Responses Re: How to do faster DML  (Dominique Devienne <ddevienne@gmail.com>)
Re: How to do faster DML  (Greg Sabino Mullane <htamfids@gmail.com>)
List pgsql-general
Thank you .

On Mon, 12 Feb 2024 at 03:52, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2024-02-11 12:08:47 -0500, Ron Johnson wrote:
> On Sun, Feb 11, 2024 at 11:54 AM veem v <veema0000@gmail.com> wrote:
>     When you said "you would normally prefer those over numeric " I was
>     thinking the opposite. As you mentioned integer is a fixed length data type
>     and will occupy 4 bytes whether you store 15 or 99999999.But in case of
>     variable length type like Number or numeric , it will resize itself based
>     on the actual data, So is there any downside of going with the variable
>     length data type like Numeric,
>
>
> Consider a table with a bunch of NUMERIC fields.  One of those records has
> small values (aka three bytes).  It fits neatly in 2KiB.
>
> And then you update all those NUMERIC fields to big numbers that take 15
> bytes.  Suddenly (or eventually, if you update them at different times), the
> record does not fit in 2KiB, and so must be moved to its own.page.  That causes
> extra IO.

I think that's not much of a concern with PostgreSQL because you can't
update a row in-place anyway because of MVCC. So in any case you're
writing a new row. If you're lucky there is enough free space in the same
page and you can do a HOT update, but that's quite independent on
whether the row changes size.



Good to know. So it means here in postgres, there is no such concern like "row chaining", "row migration" etc. which we normally have in a non mvcc database (like Oracle say). And there its not advisable to have more than ~255 columns in a table even its technically possible. And if such requirement arises, we normally break the table into 2 different tables with some columns in common to join them.


So we were thinking, adding many column to a table should be fine in postgres (as here we have a use case in which total number of columns may go till ~500+). But then,  considering the access of columns towards the end of a row is going to add more time as compared to the column which is at the start of the row. As you mentioned, accessing 100th column may add 4 to 5 times more as compared to the access of first column. So , is it advisable here to go for similar approach of breaking the table into two , if the total number of column reaches certain number/threshold for a table?

Regards
Veem

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Query hangs (and then timeout) after using COPY to import data
Next
From: Dávid Suchan
Date:
Subject: Safest pgupgrade jump distance