Thread: Row Size and Indexing

Row Size and Indexing

From
"tschak"
Date:
Hi guys,

I have some questions about postgres and would be glad if you could
help me...
Here ist what I am planning to do:
Build a database with 1 Billion (10^9) rows describing geometric
properties:
create table steric_descriptor(
distance0 bit(4),
distance1 bit(4,
distance2 bit(4,
distance3 bit(4),
distance4 bit(4),
distance5 bit(4),
distance6 bit(4),
distance7 bit(4),
comp_desc BLOB);

Question Nr ONE --- Concerning row size
Is it correct that I"ll need:
27 Byte (rowdescriptor) +4Byte (OID) + 4 Byte (for the distances) +
BLOB_SIze (say 30 BYTE)
=> 65 Byte per Tupel dending on the storage space of the BLOB (in
BLOB_Space or the tupel itself)?
Is it possible to index the row if I disable the ROWID to save storage
space in order to get more tupels per page?

QUESTION NR. TWO --- Indexing
My query will look like this:
SELECT p from steric_descriptor WHERE
p.distance0 <= threshold0
AND ... ... ... ...
AND p.distance7 <= threshold7;
Acutally I was planning to use 8 distinct Bitmap-Indices with 15 Bits
(2Byte) each per row (16 Byte in total) but as far as I know postgres
does not offer a real disk-resident Bitmap-Index... Is that true? If
yes, I would have to use composite B*trees which do not perform well
for composite range queries. This leads to the new Bitmap-Scan of
PostgreSQL => 8 B*Trees with a 4Bit key and blocksize/rowsize +1
pointer (4Byte each). This means: approx. twice the size for the
B*Tree-Version just to use the same functionality as a real
Bitmap-Index would offer...
Is there a way to avoid this or are my calculations somewhat wrong
(maybe even totally)?

I would appreciate it if someone could verify my thoughts or give me a
hint how to proceed... I know that it will be quite a big database and
I am not even sure if PostgreSQL can handle such large tables
efficiently.

Question Nr. THREE:
If I do need OIDs for indexing and I have 10^9 Rows in my DB, do the
Indices need Oids as well... If yes I might get problems with the 4
Billion size limit of the Oids, is that correct? I don"t know how the
physical structure of an Index looks like, so I might be wrong on this
as well...


Thanks in advance,

Chuck


Re: Row Size and Indexing

From
Martijn van Oosterhout
Date:
On Fri, Jan 13, 2006 at 06:48:28AM -0800, tschak wrote:
> Hi guys,
>
> I have some questions about postgres and would be glad if you could
> help me...
> Here ist what I am planning to do:
> Build a database with 1 Billion (10^9) rows describing geometric
> properties:

<snip>

> Question Nr ONE --- Concerning row size
> Is it correct that I"ll need:
> 27 Byte (rowdescriptor) +4Byte (OID) + 4 Byte (for the distances) +
> BLOB_SIze (say 30 BYTE)
> => 65 Byte per Tupel dending on the storage space of the BLOB (in
> BLOB_Space or the tupel itself)?
> Is it possible to index the row if I disable the ROWID to save storage
> space in order to get more tupels per page?

If you don't need an OID, don't include it. IIRC, bit(4) will be 8
bytes due to being variable length. If you want to store small numbers,
maybe smallint (2 bytes) is more useful.

You can index anything and everything. PostgreSQL doesn't have a
ROWID so I'm not sure what you're referring to here.

> QUESTION NR. TWO --- Indexing
> My query will look like this:
> SELECT p from steric_descriptor WHERE
> p.distance0 <= threshold0
> AND ... ... ... ...
> AND p.distance7 <= threshold7;
> Acutally I was planning to use 8 distinct Bitmap-Indices with 15 Bits
> (2Byte) each per row (16 Byte in total) but as far as I know postgres
> does not offer a real disk-resident Bitmap-Index... Is that true? If
> yes, I would have to use composite B*trees which do not perform well
> for composite range queries. This leads to the new Bitmap-Scan of
> PostgreSQL => 8 B*Trees with a 4Bit key and blocksize/rowsize +1
> pointer (4Byte each). This means: approx. twice the size for the
> B*Tree-Version just to use the same functionality as a real
> Bitmap-Index would offer...
> Is there a way to avoid this or are my calculations somewhat wrong
> (maybe even totally)?

Well, PostgreSQL allows you to create your own index types if you don't
like b-trees. However, b-tree should be able to do what you want. I
have no idea what a "real disk-resident Bitmap-Index" would look like
so I can't comment on that.

BTW, you using <= on a bit type, which seems wierd to me. Shouldn't
they be numbers?

> Question Nr. THREE:
> If I do need OIDs for indexing and I have 10^9 Rows in my DB, do the
> Indices need Oids as well... If yes I might get problems with the 4
> Billion size limit of the Oids, is that correct? I don"t know how the
> physical structure of an Index looks like, so I might be wrong on this
> as well...

If you can't think of a reason why you need OIDs, don't use them. The
latest release of PostgreSQL phases them out for user tables anyway
since they're not actually useful 99% of the time.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

this is what i meant

From
"tschak"
Date:
> If you don't need an OID, don't include it. IIRC, bit(4) will be 8
> bytes due to being variable length. If you want to store small numbers,
> maybe smallint (2 bytes) is more useful.

What I acually meant was the OID... Is smallint(2) the smalles size I
can allocate?  If I cannot allocate 4Bit or less directly this implies
8 * 2Byte in comparison to just 8 *4 Bit space requirements which is
HUGE difference in my scenario! So the question is:
Can I explicitly store 4 Bit (FIXED lenght) in a column without
"wasting" space?

> You can index anything and everything. PostgreSQL doesn't have a
> ROWID so I'm not sure what you're referring to here.
I am currently thinking about the database size. Therefore I nee the
exact rowsize and the size of the index to design my database. Even if
postgres does not use explicit rowids, there has to be some sort of
reference for each key in the index to the row it refers to. In Oracle
this is done using a rowid and I need the size for such a reference in
Postgres.

> Well, PostgreSQL allows you to create your own index types if you don't
> like b-trees. However, b-tree should be able to do what you want. I
> have no idea what a "real disk-resident Bitmap-Index" would look like
> so I can't comment on that.

A disc resident BitIndex (as Oracle offers it) is stored in the
dababase, which means, that you need as many bits for an index entry as
there are different attribute values for the indexed column (in my case
15). If the attribute value in column j is set to i, then the ith bit
is set to one and all the other ones are zero (actually these are ALL
possible Bitvectors --- Postgres produces one of those "columns"
dependend on the query). In my case this would save storage space for
the index, because I would only need 2 Bytes for each index entry. In
case of a BTree I need space for the key (4Bit) AND for the pointer(s)
(4 Byte) in the index' tree-structure.

> BTW, you using <= on a bit type, which seems wierd to me. Shouldn't
> they be numbers?
Actually the bittypes just represent hashbins coding real distances of
a distance descriptor (80 different distances) from molecular biology.
Once again this is done to reduce the space complexity (from float down
to 4 Bit --- which still represent the situation detailed enough). This
needs to be done because of the massive amount of data.


> If you can't think of a reason why you need OIDs, don't use them. The
> latest release of PostgreSQL phases them out for user tables anyway
> since they're not actually useful 99% of the time.

Alright... so I do not need the OIDs! But as for as I know I need OIDs
if I want to use a BLOB for the complete information, which I would
like to do once again to reduce the size of each row in the table (from
80 *4 Bits down to just 1 Pointer to the BLOB).

Thanks for any suggestions and/or further information


Re: this is what i meant

From
Tom Lane
Date:
"tschak" <jochen.schlosser@gmail.com> writes:
> Can I explicitly store 4 Bit (FIXED lenght) in a column without
> "wasting" space?

No.  There is no provision in Postgres for data types occupying less
than a byte.  You'll need to think about merging those columns together.

> Alright... so I do not need the OIDs! But as for as I know I need OIDs
> if I want to use a BLOB for the complete information, which I would
> like to do once again to reduce the size of each row in the table (from
> 80 *4 Bits down to just 1 Pointer to the BLOB).

You are confusing OIDs with BLOBs ... they are not the same thing.
There are a couple different ways of handling BLOBs in Postgres: bytea
values and "large objects".  For the most part I'd suggest bytea, though
if you need cheap access to sub-sections of a BLOB, large objects would
be the way to go.

            regards, tom lane