Re: How to do faster DML - Mailing list pgsql-general

From Laurenz Albe
Subject Re: How to do faster DML
Date
Msg-id 150064d6bf41b16498d8266324d6fbd8626f701c.camel@cybertec.at
Whole thread Raw
In response to Re: How to do faster DML  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
List pgsql-general
On Sun, 2024-02-11 at 13:25 +0530, veem v wrote:
> On Sun, 2024-02-11 at 01:25 +0100, Peter J. Holzer wrote:
> > On 2024-02-06 11:25:05 +0530, veem v wrote:
> > > With respect to the storage part:- In Oracle database we were supposed to keep
> > > the frequently accessed column first and less frequently accessed columns
> > > towards last in the table while creating table DDL. This used to help the query
> > > performance while accessing those columns as it's a row store database. Are
> > > there any such performance implications exists in postgres? And there the data
> > > types like varchar used to only occupy the space which the real data/value
> > > contains. 
> > >
> > > But here in postgres, it seems a bit different as it has significant
> > > overhead in storage, as these seem like being fixed length data types and will
> > > occupy the space no matter what the read data holds.
> >
> > Yes. Numbers in Oracle are variable length, so most Oracle tablesbetween
> > wouldn't contain many fixed length columns. In PostgreSQL must numeric
> > types are fixed length, so you'll have quite a lot of them.
>
> So it means , say in other databases like (oracle database), we were careless
> choosing the data length , say for example Varchar2(4000), if the real data which
> is inserted into the table holds a varchar string of length 20 bytes then Oracle
> trimmed it to occupy the 20 bytes length only in the storage. but in postgre here
> we need to be cautious and define the length as what the data attribute can max
> contains , because that amount of fixed space is allocated to every value which
> is inserted into the table for that attribute/data element. Similarly for
> Number/Numeric data type.  Please correct if my understanding is wrong.

Your understanding is wrong.  Oracle and PostgreSQL are not too different about
storing values.  The data type that Oracle calls "number", and that is called
"numeric" in PostgreSQL and in the SQL standard, is stored a variable length data
type in both Oracle and PostgreSQL.

It is just that Oracle has very few data types (I saw them gush about adding
"boolean" as a great new feature in version 23).  So, for example, there are
no integer data types in Oracle, and you have to store them as a variable
length data type.  PostgreSQL has integer data types, which are fixed length
(2, 4 or 8 bytes) and provide much more efficient storage for integers.

"character varying" is also pretty similar in Oracle and PostgreSQL, except
that Oracle calls it "varchar2".

The only fixed-length character data type is "character", but that is always
a bad choice, in Oracle as in PostgreSQL.

About your initial question: in PostgreSQL there is also a certain performance
gain if you store frequently used columns first, since the database has to
skip fewer columns to get to the data.  If the previous columns are fixed
length data types like integers, that is cheaper, because we don't have to
look at the data to know how long they are.

Another thing to consider is padding.  Each fixed-width data type has certain
alignment requirements (imposed by CPU processing) that you can find in
"pg_type.typalign".  This can lead to wasted space in the form of
"padding bytes".  For example, if a "bigint" follows a ASCII single-character
"text" or "varchar" value (which occupies two bytes), there will be six
padding bytes between them to align the "bigint" at a storage address that is
a multiple of eight.

But although both of these considerations (skipping over previous columns and
padding) are relevant for performance, they are often a micro-optimization
that you won't be able to measure, and you shouldn't lose too much sleep
over them.

>
> > So there's a bit of a tradeoff between minimizing alignment overhead and
> > arranging columns for fastest access.

Precisely.

Yours,
Laurenz Albe



pgsql-general by date:

Previous
From: Karsten Hilbert
Date:
Subject: Re: How should we design our tables and indexes
Next
From: "Peter J. Holzer"
Date:
Subject: Re: How to do faster DML