> I have somewhere around 3M rows in the image table, and 37K rows in the
> ancestry table. The following is representative of some of the common
> queries I issue:
>
> select * from image natural join ancestry where ancestorid=1000000 and
> (state & 7::bigint) = 0::bigint;
>
> When I ask postgres to EXPLAIN it, I get the following:
>
> Merge Join (cost=81858.22..81900.60 rows=124 width=49)
> -> Sort (cost=81693.15..81693.15 rows=16288 width=41)
> -> Seq Scan on image (cost=0.00..80279.17 rows=16288 width=41)
> -> Sort (cost=165.06..165.06 rows=45 width=8)
> -> Index Scan using ancestry_ancestorid_key on ancestry
> (cost=0.00..163.83 rows=45 width=8)
>
> It appears to me that the query executes as follows:
>
> 1. Scan every row in the image table to find those where (state &
> 7::bigint) = 0::bigint
> 2. Sort the results
> 3. Use an index on ancestry to find rows where ancestorid=1000000
> 4. Sort the results
> 5. Join the two
FWIW, I use INTs as bit vectors for options in various applications
and have run into this in a few cases. In the database, I only care
about a few bits in the options INT, so what I did was create a
function for each of the bits that I care about and then a function
index. Between the two, I've managed to solve my performance
problems.
CREATE FUNCTION app_option_foo_is_set(INT)
RETURNS BOOL
IMMUTABLE
AS '
BEGIN
IF $1 & 7::INT THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
' LANGUAGE 'plpgsql';
CREATE INDEX app_option_foo_fidx ON app_option_tbl (app_option_foo_is_set(options));
VACUUM ANALYZE;
Just make sure that you set your function to be IMMUTABLE. -sc
PS It'd be slick if PostgreSQL would collapse adjacent booleans into a
bit in a byte: it'd save some apps a chunk of space. 32 options ==
32 bytes with the type BOOL, but if adjacent BOOLs were collapsed,
it'd only be 4 bytes on disk and maybe some page header data.
--
Sean Chittenden