Re: Fixed length data types issue - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Fixed length data types issue
Date
Msg-id 45098656.3050607@markdilger.com
Whole thread Raw
In response to Re: Fixed length data types issue  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
My apologies if you are seeing this twice.  I posted it last night, but 
it still does not appear to have made it to the group.

Mark Dilger wrote:
> Tom Lane wrote:
>> Mark Dilger <pgsql@markdilger.com> writes:
>>> Tom Lane wrote:
>>>> Please provide a stack trace --- AFAIK there shouldn't be any reason 
>>>> why
>>>> a pass-by-ref 3-byte type wouldn't work.
>>
>>> (gdb) bt
>>> #0  0xb7e01d45 in memcpy () from /lib/libc.so.6
>>> #1  0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, 
>>> values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "", 
>>> infomask=0x83c2ef0, bit=0x0)
>>>      at heaptuple.c:181
>>
>> Hm, are you sure you provided a valid pointer (not the integer value
>> itself) as the Datum output from int3_in?
>>
>> (Looks at patch ... ) Um, I think you didn't, although that coding
>> is far too cute to be actually readable ...
>>
>>             regards, tom lane
> 
> Ok, I have it working on my intel architecture machine.  Here are some 
> of my findings.  Disk usage is calculated by running 'du -b' in 
> /usr/local/pgsql/data before and after loading the table, and taking the 
> difference.  That directory is deleted, recreated, and initdb rerun 
> between each test.  The host system is a dual processor, dual core 2.4 
> GHz system, 2 GB DDR400 memory, 10,000 RPM SCSI ultra160 hard drive with 
> the default postgresql.conf file as created by initdb.  The code is the 
> stock postgresql-8.1.4 release tarball compiled with gcc and configured 
> without debug or cassert options enabled.
> 
> 
> INT3 VS INT4
> ------------
> Using a table of 8 integers per row and 16777216 rows, I can drop the 
> disk usage from 1.2 GB down to 1.0 GB by defining those integers as int3 
> rather than int4.  (It works out to about 70.5 bytes per row vs. 62.5 
> bytes per row.)  However, the load time actually increases, probably due 
> to CPU/memory usage.  The time increased from 197 seconds to 213 
> seconds.  Note that int3 is defined pass-by-reference due to a 
> limitation in the code that prevents pass-by-value for any datasize 
> other than 1, 2, or 4 bytes.
> 
> Using a table of only one integer per row, the table size is exactly the 
> same (down to the byte) whether I use int3 or int4.  I suspect this is 
> due to data alignment for the row being on at least a 4 byte boundary.
> 
> Creating an index on a single column of the 8-integer-per-row table, the 
> index size is exactly the same whether the integers are int3 or int4. 
> Once again, I suspect that data alignment is eliminating the space savings.
> 
> I haven't tested this, but I suspect that if the column following an 
> int3 is aligned on 4 or 8 byte boundaries, that the int3 column will 
> have an extra byte padded and hence will have no performance gain.
> 
> 
> INT1 VS INT2
> ------------
> Once again using a table of 8 integers per row and 16777216 rows, I can 
> drop the disk usage from 909 MB down to 774 MB by defining those 
> integers as int1 rather than int2.  (54 bytes per row vs 46 bytes per 
> row.)  The load time also drops, from 179 seconds to 159 seconds.  Note 
> that int1 is defined pass-by-value.
> 
> 
> mark



pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Mid cycle release?
Next
From: Stefan Kaltenbrunner
Date:
Subject: Re: Mid cycle release?