explicit casting required for index use - Mailing list pgsql-performance
From | Reece Hart |
---|---|
Subject | explicit casting required for index use |
Date | |
Msg-id | 1067104140.16297.125.camel@tallac Whole thread Raw |
Responses |
Re: explicit casting required for index use
|
List | pgsql-performance |
Here's the basic issue: PostgreSQL doesn't use indexes unless a query criterion is of exactly the same type as the index type. This occurs even when a cast would enable the use of an index and greatly improve performance. I understand that casting is needed to use an index and will therefore affect performance -- the part I don't understand is why postgresql doesn't automatically cast query arguments to the column type, thereby enabling indexes on that column.
I have a table that looks like this (extra cols, indexes, and fk constraints removed):
Without an explicit cast of the svm criterion:
Note two things above: 1) The explicit cast greatly reduces the predicted (and actual) cost. 2) The uncasted query eventually casts svm to double precision, which seems odd since the column itself is real (that is, it eventually does cast, but to the "wrong" type).
For small queries (returning ~10 rows), this is worth 100x in speed (9ms v. 990ms... in absolute terms, no big deal). For larger result sets (~200 rows), I've seen more like 1000x speed increases by using an explicit cast. For the larger queries, this can mean seconds versus many minutes.
Having to explicitly cast criterion is very non-intuitive. Moreover, it seems quite straightforward that PostgreSQL might incorporate casts (and perhaps even function calls like upper() for functional indexes) into its query strategy optimization. (I suppose functional indexes would apply only to immutable fx only, but that's fine.)
Thanks,
Reece
I have a table that looks like this (extra cols, indexes, and fk constraints removed):
I often search for pseq_ids based on all of pmodel_id, run_id, and svm threshold as below, hence the multi-column index.unison@csb=# \d paprospect2 Table "unison.paprospect2" Column | Type | Modifiers -------------+---------+-------------------------------------------------------------------pseq_id | integer | not nullrun_id | integer | not nullpmodel_id | integer | not nullsvm | real | Indexes: paprospect2_search1 btree (pmodel_id, run_id, svm),
Without an explicit cast of the svm criterion:
And with an explicit cast to real (the same as the column type and indexed type):unison@csb=# explain select pseq_id from paprospect2 where pmodel_id=8210 and run_id=1 and svm>=11;Index Scan using paprospect2_search2 on paprospect2 (cost=0.00..43268.93 rows=2 width=4) Index Cond: ((pmodel_id = 8210) AND (run_id = 1)) Filter: (svm >= 11::double precision)
unison@csb=# explain select pseq_id from paprospect2 where pmodel_id=8210 and run_id=1 and svm>=11::real;Index Scan using paprospect2_search1 on paprospect2 (cost=0.00..6.34 rows=2 width=4) Index Cond: ((pmodel_id = 8210) AND (run_id = 1) AND (svm >= 11::real))
Note two things above: 1) The explicit cast greatly reduces the predicted (and actual) cost. 2) The uncasted query eventually casts svm to double precision, which seems odd since the column itself is real (that is, it eventually does cast, but to the "wrong" type).
For small queries (returning ~10 rows), this is worth 100x in speed (9ms v. 990ms... in absolute terms, no big deal). For larger result sets (~200 rows), I've seen more like 1000x speed increases by using an explicit cast. For the larger queries, this can mean seconds versus many minutes.
Having to explicitly cast criterion is very non-intuitive. Moreover, it seems quite straightforward that PostgreSQL might incorporate casts (and perhaps even function calls like upper() for functional indexes) into its query strategy optimization. (I suppose functional indexes would apply only to immutable fx only, but that's fine.)
Thanks,
Reece
-- Reece Hart, Ph.D. rkh@gene.com, http://www.gene.com/ Genentech, Inc. 650/225-6133 (voice), -5389 (fax) Bioinformatics and Protein Engineering 1 DNA Way, MS-93 http://www.in-machina.com/~reece/ South San Francisco, CA 94080-4990 reece@in-machina.com, GPG: 0x25EC91A0 |
pgsql-performance by date: