Re: Can I do better than this heapscan and sort? - Mailing list pgsql-performance
From | Merlin Moncure |
---|---|
Subject | Re: Can I do better than this heapscan and sort? |
Date | |
Msg-id | CAHyXU0zVGH_MsXjJu5-++V0ZYQkEBju1JZY41nqEVddrtUWoYw@mail.gmail.com Whole thread Raw |
In response to | Can I do better than this heapscan and sort? (Andy Halsall <halsall_andy@hotmail.com>) |
Responses |
Re: Can I do better than this heapscan and sort?
|
List | pgsql-performance |
On Thu, Jun 21, 2012 at 3:07 PM, Andy Halsall <halsall_andy@hotmail.com> wrote: > I have two tables node and relationship. Each relationship record connects > two nodes and has an application keys (unfortunately named) that can be used > by the application to look-up a relationship and get from one node to the > other. > > My query uses a node id and a description of a relationship from the node, > and must find the "next" relationship that the node has. It does this by > finding all the relationships that could be "next", ordering them and then > getting the first. > > Details are below but I end up with 6896 candidates for "next". > > If I'm reading the output correctly it takes 13.509 ms to apply the filter > and another 7 ms or so to do the sort of the remaining 6896 nodes. > > Have tried many index combinations to try and improve the results. I suspect > that with so many nodes to sort, postgresql will opt for heap scan rather > than index. But why does it not use the IDX_order_sort_down_2 index for the > sort? > > Thanks, > Andy > > > Details.......... > > Version > ------- > PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.5.2, > 64-bit > > Tables > ------ > CREATE TABLE node ( > node_id bigint NOT NULL, > node_type int4 NOT NULL, > c_state int4 NOT NULL, > d_state int4 NOT NULL, > sort_key bigint NOT NULL, > permissions bytea NOT NULL, > audit bytea NOT NULL, > pkg_id bytea NULL, > created timestamp NOT NULL > ); > > CREATE TABLE relationship ( > rel_id bigint NOT NULL, > rel_type integer NOT NULL, > s_t_n bigint NOT NULL, > t_s_n bigint NOT NULL, > state integer NOT NULL, > control integer NOT NULL, > sort_key bigint NOT NULL, > prime_key bytea NULL, > prime_key_len integer NOT NULL, > sec_key bytea NULL, > sec_key_len integer NOT NULL, > up_sort_key bigint NOT NULL, > up_prime_key bytea NULL, > up_prime_key_len integer NOT NULL, > up_sec_key bytea NULL, > up_sec_key_len integer NOT NULL, > permissions bytea NOT NULL, > t_s_n_type integer NOT NULL, > created timestamp NOT NULL > ); > > Constraints > ----------- > -- Primary keys > ALTER TABLE node ADD CONSTRAINT PK_node PRIMARY KEY (node_id); > > ALTER TABLE relationship ADD CONSTRAINT PK_relationship PRIMARY KEY > (rel_id); > > -- Foreign keys > ALTER TABLE relationship ADD CONSTRAINT FK_node_s FOREIGN KEY (s_t_n) > REFERENCES node (node_id); > > ALTER TABLE relationship ADD CONSTRAINT FK_node_n FOREIGN KEY (t_s_n) > REFERENCES node (node_id); > > > Indexes > ------- > CREATE INDEX IDX_node_type ON node (node_type ASC) TABLESPACE ds_appex_ts_10 > ; > CREATE INDEX IDX_node_sort_key ON node (sort_key ASC) TABLESPACE > ds_appex_ts_10 > ; > CREATE INDEX IDX_relationship_s_t_n ON relationship (s_t_n ASC) TABLESPACE > ds_appex_ts_10 > ; > CREATE INDEX IDX_relationship_t_s_n ON relationship (t_s_n ASC) TABLESPACE > ds_appex_ts_10 > ; > CREATE INDEX IDX_relationship_type ON relationship (rel_type ASC) TABLESPACE > ds_appex_ts_10 > ; > CREATE INDEX IDX_relationship_prime_key ON relationship (prime_key ASC) > TABLESPACE ds_appex_ts_10 > ; > CREATE INDEX IDX_relationship_u_prime_key ON relationship (up_prime_key ASC) > TABLESPACE ds_appex_ts_10 > ; > CREATE INDEX IDX_relationship_sec_key ON relationship (sec_key ASC) > TABLESPACE ds_appex_ts_10 > ; > CREATE INDEX IDX_order_first ON node(sort_key DESC, node_id DESC) TABLESPACE > ds_appex_ts_10 > ; > CREATE INDEX IDX_order_sort_down_1 ON relationship(sort_key DESC, prime_key > ASC NULLS FIRST, sec_key ASC NULLS FIRST) TABLESPACE ds_appex_ts_10 > ; > CREATE INDEX IDX_order_sort_down_2 ON relationship(sort_key DESC, prime_key > ASC NULLS FIRST, sec_key DESC NULLS FIRST) TABLESPACE ds_appex_ts_10 > ; > CREATE INDEX IDX_order_sort_up ON relationship(up_sort_key DESC, > up_prime_key ASC NULLS FIRST, up_sec_key ASC NULLS FIRST) TABLESPACE > ds_appex_ts_10 > ; > > Query > ----- > CREATE OR REPLACE FUNCTION sp_get_rel_sort_dup_sec_desc(in_rel_type1 > integer, in_rel_type2 integer, in_node_type integer, in_own_guid bigint, > in_prev_prime_key bytea, in_prev_prime_key_len integer, in_prev_sec_key > bytea, in_prev_sec_key_len integer, in_prev_sort_key bigint, in_ctrl > integer) RETURNS select_rel_holder AS > ' > declare > h select_rel_holder%rowtype; > > begin > SELECT INTO h r.rel_id, r.t_s_n, r.rel_type, r.sort_key, > r.state,r.permissions, r.control, > r.prime_key, r.prime_key_len, r.sec_key, > r.sec_key_len, > r.up_prime_key, r.up_prime_key_len, r.up_sec_key, > r.up_sec_key_len > FROM relationship r > WHERE r.s_t_n = in_own_guid AND (r.rel_type = in_rel_type1 OR > r.rel_type = in_rel_type2) > AND > ( > ( > ( > r.prime_key > in_prev_prime_key > OR > ( r.prime_key = in_prev_prime_key AND r.sec_key < > in_prev_sec_key) > ) > AND > r.sort_key = in_prev_sort_key > ) > > OR > r.sort_key < in_prev_sort_key > ) > AND t_s_n_type = in_node_type > AND r.control >= in_ctrl > ORDER BY sort_key DESC, prime_key ASC NULLS FIRST, sec_key DESC > NULLS FIRST LIMIT 1; > RETURN h; > end > ' > language 'plpgsql' STABLE; > > > EXPLAIN ANALYZE output > ------------------------------- > Limit (cost=48.90..48.90 rows=1 width=89) (actual > time=21.480..21.480 rows=1 loops=1) > Output: rel_id, t_s_n, rel_type, sort_key, state, permissions, > control, prime_key, prime_key_len, sec_key, sec_key_len, up_prime_key, > up_prime_key_l > en, up_sec_key, up_sec_key_len > > -> Sort (cost=48.90..48.90 rows=1 width=89) (actual > time=21.479..21.479 rows=1 loops=1) > Output: rel_id, t_s_n, rel_type, sort_key, state, > permissions, control, prime_key, prime_key_len, sec_key, sec_key_len, > up_prime_key, up_prime > _key_len, up_sec_key, up_sec_key_len > Sort Key: r.sort_key, r.prime_key, r.sec_key > Sort Method: top-N heapsort Memory: 25kB > > -> Bitmap Heap Scan on public.relationship r > (cost=3.39..48.89 rows=1 width=89) (actual time=1.034..13.509 rows=6986 > loops=1) > Output: rel_id, t_s_n, rel_type, sort_key, state, > permissions, control, prime_key, prime_key_len, sec_key, sec_key_len, > up_prime_key, up > _prime_key_len, up_sec_key, up_sec_key_len > Recheck Cond: (r.s_t_n = $4) > Filter: ((r.control >= $10) AND (r.t_s_n_type = $3) > AND ((r.rel_type = $1) OR (r.rel_type = $2)) AND ((((r.prime_key > $5) OR > ((r.prime_ > key = $5) AND (r.sec_key < $7))) AND (r.sort_key = $9)) OR (r.sort_key < > $9))) > > -> Bitmap Index Scan on idx_relationship_s_t_n > (cost=0.00..3.39 rows=18 width=0) (actual time=0.951..0.951 rows=6989 > loops=1) > Index Cond: (r.s_t_n = $4) Absolutely. You need to learn and master row-wise comparison. It was added for exactly this purpose :-). SELECT * FROM foo WHERE (a,b,c) > (a1,b1,c1) ORDER BY a,b,c LIMIT k; will be fully optimized if you have an index on a,b,c (a1,b1,c1 are the last ones you read off). Be advised that if there is not a lot of cardinality on 'a', you may need to disable certain index plans to get a good plan in some cases. merlin
pgsql-performance by date: