Home > mailing lists

Re: BTree index - Mailing list pgsql-hackers

From	Alvaro Herrera
Subject	Re: BTree index
Date	November 9, 2003 20:58:44
Msg-id	20031109215728.GA29601@dcc.uchile.cl Whole thread Raw
In response to	BTree index (vjanand@uwm.edu)
List	pgsql-hackers

Tree view

On Wed, Nov 05, 2003 at 09:08:31AM -0600, vjanand@uwm.edu wrote:

> I am trying to find information regarding creation of B-tree index in postgres 
> for variable length character data (Char/varchar type). Specifically, what 
> pagination policy is used, does it use prefix BTree, or any other form of 
> compression (encoding)? 

I was very surprised while writing this answer:

The whole key is stored (no prefix, no pagination).  If the key is too
big, it won't fit into the index and the insertion will be rejected:

regression=# create table test (a text);
CREATE TABLE
regression=# create index test_idx on test(a);
CREATE INDEX
regression=# insert into test values ('hello world');
INSERT 17115 1
regression=# insert into test select repeat(a,10) from test;
INSERT 17116 1
regression=# insert into test select repeat(a,10) from test;
INSERT 0 2
regression=# insert into test select repeat(a,10) from test;
INSERT 0 4
regression=# insert into test select repeat(a,10) from test;
INSERT 0 8
regression=# insert into test select repeat(a,10) from test;
ERROR:  fila de índice requiere 12624 bytes, tamaño máximo es 8191
-- oops
regression=# set lc_messages to 'C';
SET
regression=# insert into test select repeat(a,10) from test;
ERROR:  index row requires 12624 bytes, maximum size is 8191

So, what size were the tuples inserted:
regression=# select max(length(a)) from test; max   
--------110000
(1 fila)

What!? 110000 bytes?  I have always had the idea that the tuples were
uncompressed, so how can 110000 bytes be stored in 8191 bytes?  After
tracking into the sourcecode, I found that in
src/backend/access/common/indextuple.c the index_formtuple routine seems
to compress the key before insertion.  In src/include/tuptoaster.h there
is a symbol for activation of this feature that is set at least on my
sources (TOAST_INDEX_HACK).

So, there you are: the compression used is the same "lousy fast" LZ
algorithm used elsewhere in the TOAST code (toast_compress_datum()).

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
FOO MANE PADME HUM

pgsql-hackers by date:

From: Hannu Krosing
Date: 09 November 2003, 20:57:25
Subject: Re: UPPER()/LOWER() and UTF-8

From: Larry Rosenman
Date: 09 November 2003, 21:19:19
Subject: Re: OSR5: Passes without the setsockopt() calls...

Re: BTree index - Mailing list pgsql-hackers

Previous

Next