Re: Fixed length data types issue - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Fixed length data types issue
Date
Msg-id 450477CF.4020401@markdilger.com
Whole thread Raw
In response to Re: Fixed length data types issue  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Fixed length data types issue  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Martijn van Oosterhout wrote:
> On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote:
>>> Well, it is unless you are willing to give up support of non-Intel CPUs;
>>> most other popular chips are strict about alignment, and will fail an
>>> attempt to do a nonaligned fetch.
>> Intel CPUs are detectable at compile time, right?  Do we use less 
>> padding in the layout for tables on Intel-based servers?  If not, could we?
> 
> Intel CPUs may not complain about unaligned reads, they're still
> inefficient. Internally it does two aligned reads and rearranges the
> bytes. On other architechtures the OS can emulate that but postgres
> doesn't use that for obvious reasons.

This gets back to the CPU vs. I/O bound issue, right?  Might not some 
people (with heavily taxed disks but lightly taxed CPU) prefer that 
trade-off?

>> For the example schema which started this thread, a contrib extension 
>> for ascii fields could be written, with types like ascii1, ascii2, 
>> ascii3, and ascii4, each with implicit upcasts to text.  A contrib for 
>> int1 and uint1 could be written to store single byte integers in a 
>> single byte, performing math on them correctly, etc.
> 
> The problem is that for each of those ascii types, to actually use them
> they would have to be converted, which would amount to allocating some
> memory, copying and adding a length header. At some point you have to
> wonder whether you're actually saving anything.
> 
> Have a nice day,

I'm not sure what you mean by "actually use them".  The types could have 
their own comparator operators.  So you could use them for sorting and 
indexing, and use them in WHERE clauses with these comparisons without 
any conversion to/from text.  I mentioned implicit upcasts to text 
merely to handle other cases, such as using them in a LIKE or ILIKE, or 
concatenation, etc., where the work of providing this functionality for 
each contrib datatype would not really be justified.

I'm not personally as interested in the aforementioned ascii types as I 
am in the int1 and int3 types, but the argument in favor of each is 
about the same.  If a person has a large table made of small data, it 
seems really nuts to have 150% - 400% bloat on that table, when such a 
small amount of work is needed to write the contrib datatypes necessary 
to store the data compactly.  The argument made upthread that a 
quadratic number of conversion operators is necessitated doesn't seem 
right to me, given that each type could upcast to the canonical built in 
type.  (int1 => smallint, int3 => integer, ascii1 => text, ascii2 => 
text, ascii3 => text, etc.)  Operations on data of differing type can be 
done in the canonical type, but the common case for many users would be 
operations between data of the same type, for which no conversion is 
required.

Am I missing something that would prevent this approach from working?  I 
am seriously considering writing these contrib datatypes for use either 
on pgfoundary or the contrib/ subdirectory for the 8.3 release, but am 
looking for advice if I am really off-base.

Thanks,

mark



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: contrib uninstall scripts need some love
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] ISBN/ISSN/ISMN/EAN13 module