Re: Fixed length data types issue - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Fixed length data types issue
Date
Msg-id 87irk0c5rg.fsf@enterprisedb.com
Whole thread Raw
In response to Re: Fixed length data types issue  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Fixed length data types issue  (Gregory Stark <stark@enterprisedb.com>)
Re: Fixed length data types issue  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Martijn van Oosterhout <kleptog@svana.org> writes:

> So you end up storing the typmod in the Datum itself, which brings you
> right back to varlena.

Not really since the Datum doesn't actually end up on disk in the case of
pass-by-reference.

which leads us to:

> Well, the root of the problem depends on your perspective. If the
> purpose behind all of this is to save disk space, perhaps the root of
> the problem is that disk representation and memory representation are
> intimately tied?

Indeed.

Consider this real table definition I found in a few moments searching for
schemas on google:
   PRVDR_CTRL_TYPE_CD: CHAR(2)   PRVDR_NUM: CHAR(6)   NPI: NUMBER   RPT_STUS_CD: CHAR(1)   FY_BGN_DT: DATE   FY_END_DT:
DATE  PROC_DT: DATE   INITL_RPT_SW: CHAR(1)   LAST_RPT_SW: CHAR(1)   TRNSMTL_NUM: CHAR(2)   FI_NUM: CHAR(5)
ADR_VNDR_CD:CHAR(1)   FI_CREAT_DT: DATE   UTIL_CD: CHAR(1)   NPR_DT: DATE   SPEC_IND: CHAR(1)   FI_RCPT_DT: DATE
 

By my count postgres would use 154 bytes for this record. Whereas in fact
there's no need for it to take more than 87 bytes. Almost 100% overhead for
varattlen headers and the padding they necessitate.

This is not a pathological example. This is a very common style of database
schema definition. Many many database tables in the real world are a 1-1
translations of existing flat file databases which have lots of short fixed
length ascii codes. Any database interacting with any old school inventory
management systems, financial databases, marketing database, etc is likely to
be of this form.

So it seems what has to happen here is we need a way of defining a data type
that has a different on-disk representation from its in-memory definition.
That means a lot more cpu overhead since I imagine it will mean pallocing the
in-memory representation before you can actually do anything with the data.

The disk reader and writer functions could probably use the typmod but it
seems what they really want to have access to is the attlen because what they
really want to know is the length of the object that their pointer refers to.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Nimesh Satam"
Date:
Subject: Template0 age is increasing speedily.
Next
From: Gregory Stark
Date:
Subject: Re: Fixed length data types issue