Re: why is gist index taking so much space on the disc - Mailing list pgsql-hackers

From Kevin McArthur
Subject Re: why is gist index taking so much space on the disc
Date
Msg-id 002701c5eece$fa2ec550$0701a8c0@kdesktop
Whole thread Raw
In response to why is gist index taking so much space on the disc  (Grzegorz Jaskiewicz <gj@pointblue.com.pl>)
List pgsql-hackers
Take the query.

select a,b from dupa where b::text in (select b::text from dupa group by
b::text having count(b) > 2);

This is acceptable to create a unique constraint, however, we cannot mark
the column unique, without defining btree operators, which clearly are not
possible for sorting. Is there any way to base the operators based on the
text representation of the type for strict equality (not to be confused with
same or equivilent) and thus use that not as an ordering method, but as a
simple equality for uniqueness.

Kevin McArthur

----- Original Message ----- 
From: "Grzegorz Jaskiewicz" <gj@pointblue.com.pl>
To: <pgsql-hackers@postgresql.org>
Sent: Monday, November 21, 2005 7:58 AM
Subject: [HACKERS] why is gist index taking so much space on the disc


> Hi folks
>
> my conquers with Gist index for custom type are nearly finished. It  is 
> working as it is now, but there are few problems here and there.
> One of em, being amount of disc space index it self takes. The type 
> stucture it self takes 160bytes. Adding 100.000 rows into table -  CREATE 
> TABLE blah (a serial, b customType);
> with my gist index takes around 2GB on disc ! 100.000 is a large  number, 
> but the purpose of having gist in first place is defeated if  that machine 
> can't handle fast I/O or has at least 3GB of ram, first  to hold index in 
> cache, secondly to operate postgres caching (shared  memory).
> Is it normal that index is so hudge ? Even tho my type has built in  masks 
> (element that can match few different values), and %. up front  the string 
> (which behaves just like the sql % in b ~ '%.something').  And both are 
> used to build "unions" for pick-split, and other  operations. Is it 
> because of pick-split it self ? It does good work  in splitting up table 
> of elements into two separate ones, by sorting  them first, than creating 
> common "mask" for L and P. And by scanning  whole table again, and putting 
> elements matching into L or P. L and P  elements sometimes overlap, but so 
> far I can't find better solution.  Having to iterate 10 or 20 times using 
> k-means (the type holds tree a  like structure) isn't going to boost 
> efficiency either.
> This index works, and it is very fast, but still large.
>
> So final question, what should I do to make that index much smaller  on 
> the disc.
>
> -- 
> GJ
>
> "If we knew what we were doing, it wouldn't be called Research, would 
> it?" - AE
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
> 



pgsql-hackers by date:

Previous
From: Grzegorz Jaskiewicz
Date:
Subject: Re: why is gist index taking so much space on the disc
Next
From: Marc Munro
Date:
Subject: Re: [pgsql-hackers] Daily digest v1.5568 (24 messages)