Re: Fixed length data types issue - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: Fixed length data types issue |
Date | |
Msg-id | 45098656.3050607@markdilger.com Whole thread Raw |
In response to | Re: Fixed length data types issue (Bruce Momjian <bruce@momjian.us>) |
List | pgsql-hackers |
My apologies if you are seeing this twice. I posted it last night, but it still does not appear to have made it to the group. Mark Dilger wrote: > Tom Lane wrote: >> Mark Dilger <pgsql@markdilger.com> writes: >>> Tom Lane wrote: >>>> Please provide a stack trace --- AFAIK there shouldn't be any reason >>>> why >>>> a pass-by-ref 3-byte type wouldn't work. >> >>> (gdb) bt >>> #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 >>> #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, >>> values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "", >>> infomask=0x83c2ef0, bit=0x0) >>> at heaptuple.c:181 >> >> Hm, are you sure you provided a valid pointer (not the integer value >> itself) as the Datum output from int3_in? >> >> (Looks at patch ... ) Um, I think you didn't, although that coding >> is far too cute to be actually readable ... >> >> regards, tom lane > > Ok, I have it working on my intel architecture machine. Here are some > of my findings. Disk usage is calculated by running 'du -b' in > /usr/local/pgsql/data before and after loading the table, and taking the > difference. That directory is deleted, recreated, and initdb rerun > between each test. The host system is a dual processor, dual core 2.4 > GHz system, 2 GB DDR400 memory, 10,000 RPM SCSI ultra160 hard drive with > the default postgresql.conf file as created by initdb. The code is the > stock postgresql-8.1.4 release tarball compiled with gcc and configured > without debug or cassert options enabled. > > > INT3 VS INT4 > ------------ > Using a table of 8 integers per row and 16777216 rows, I can drop the > disk usage from 1.2 GB down to 1.0 GB by defining those integers as int3 > rather than int4. (It works out to about 70.5 bytes per row vs. 62.5 > bytes per row.) However, the load time actually increases, probably due > to CPU/memory usage. The time increased from 197 seconds to 213 > seconds. Note that int3 is defined pass-by-reference due to a > limitation in the code that prevents pass-by-value for any datasize > other than 1, 2, or 4 bytes. > > Using a table of only one integer per row, the table size is exactly the > same (down to the byte) whether I use int3 or int4. I suspect this is > due to data alignment for the row being on at least a 4 byte boundary. > > Creating an index on a single column of the 8-integer-per-row table, the > index size is exactly the same whether the integers are int3 or int4. > Once again, I suspect that data alignment is eliminating the space savings. > > I haven't tested this, but I suspect that if the column following an > int3 is aligned on 4 or 8 byte boundaries, that the int3 column will > have an extra byte padded and hence will have no performance gain. > > > INT1 VS INT2 > ------------ > Once again using a table of 8 integers per row and 16777216 rows, I can > drop the disk usage from 909 MB down to 774 MB by defining those > integers as int1 rather than int2. (54 bytes per row vs 46 bytes per > row.) The load time also drops, from 179 seconds to 159 seconds. Note > that int1 is defined pass-by-value. > > > mark
pgsql-hackers by date: