Thread: Weird performance issue with custom function with a for loop.
Hello, the following function takes forever to execute as is. I see 'Inserting original actions in temporary table' and nothing after that. If i replace orig_user_id in the FOR loop with 1811 (the same orig_user_id passed as the function parameter) it returns immediately correctly (the table has indices so it's very fast). I am using postgres 8.4.6 on linux (amd64). CREATE OR REPLACE FUNCTION validate_test_session(orig_user_id bigint, orig_start_date timestamp without time zone, orig_end_date timestamp without time zone) RETURNS boolean AS $BODY$DECLARE orig_action RECORD; action_counter BIGINT; ignored_games INTEGER[]; BEGIN ignored_games := ARRAY[1,2,7,10,17]; /* Populate tables for test */ /* Populate original session actions */ RAISE NOTICE 'Inserting original actions in temporary table'; action_counter := 0; FOR orig_action IN (SELECT game_tables.game_type_id, game_round_actions.table_id, game_round_actions.round_id, action_time, action_desc, action_area, amount, action_value, seat_id, action_id FROM game_round_actions INNER JOIN game_tables ON game_round_actions.table_id = game_tables.table_id WHERE game_round_actions.user_id = orig_user_id AND game_round_actions.sub_action_id = 0 AND game_round_actions.action_time BETWEEN orig_start_date AND orig_end_date AND game_tables.game_type_id <> ANY(ignored_games) ORDER BY action_time, action_id, sub_action_id) LOOP RAISE NOTICE 'Found action %', action_counter; action_counter := action_counter + 1; END LOOP; RETURN TRUE; END;$BODY$ LANGUAGE plpgsql VOLATILE; -- Regards, Nicos Panayides IT Manager Magneta Technologies Ltd Tel: +357 22721919, 22317400 Fax: +357 22721917 Web: http://www.magneta.eu
Nicos Panayides <nicos@magneta.com.cy> writes: > the following function takes forever to execute as is. I see 'Inserting > original actions in temporary table' and nothing after that. If i > replace orig_user_id in the FOR loop with 1811 (the same orig_user_id > passed as the function parameter) it returns immediately correctly (the > table has indices so it's very fast). It seems likely that you're getting a different plan for the generic case because that user id isn't representative of the overall average for the column. You could investigate by explaining a parameterized query: PREPARE foo (bigint) AS SELECT ... WHERE game_round_actions.user_id = $1 ... ; EXPLAIN EXECUTE foo(1811); (To really see exactly what's happening, you'd probably need to parameterize for each of the plpgsql variables used in the query; I'm suspicious that the BETWEEN might be contributing to the issue as well.) Possibly increasing the stats target for the user id column would help, but it's hard to be sure without knowing what its distribution is like. regards, tom lane
On 01/31/2011 08:48 PM, Tom Lane wrote: > Nicos Panayides<nicos@magneta.com.cy> writes: > >> the following function takes forever to execute as is. I see 'Inserting >> original actions in temporary table' and nothing after that. If i >> replace orig_user_id in the FOR loop with 1811 (the same orig_user_id >> passed as the function parameter) it returns immediately correctly (the >> table has indices so it's very fast). >> > It seems likely that you're getting a different plan for the generic > case because that user id isn't representative of the overall average > for the column. You could investigate by explaining a parameterized > query: > > PREPARE foo (bigint) AS > SELECT ... WHERE game_round_actions.user_id = $1 ... ; > EXPLAIN EXECUTE foo(1811); > > (To really see exactly what's happening, you'd probably need to > parameterize for each of the plpgsql variables used in the query; > I'm suspicious that the BETWEEN might be contributing to the > issue as well.) > > Possibly increasing the stats target for the user id column would help, > but it's hard to be sure without knowing what its distribution is like. > > regards, tom lane > I tried the prepared statement with both $1 and 1811 for user_id and here's the plans I got: "Sort (cost=51704688.71..51704689.50 rows=314 width=57)" " Sort Key: game_round_actions.action_time, game_round_actions.action_id" " -> Nested Loop (cost=0.00..51704675.69 rows=314 width=57)" " -> Seq Scan on game_round_actions (cost=0.00..51702078.26 rows=314 width=53)" " Filter: ((action_time >= $2) AND (action_time <= $3) AND (sub_action_id = 0) AND (user_id = $1))" " -> Index Scan using "PK_game_table" on game_tables (cost=0.00..8.26 rows=1 width=12)" " Index Cond: (game_tables.table_id = game_round_actions.table_id)" " Filter: (game_tables.game_type_id <> ANY ($4))" "Sort (cost=226660.58..226661.33 rows=300 width=57)" " Sort Key: game_round_actions.action_time, game_round_actions.action_id" " -> Nested Loop (cost=0.00..226648.24 rows=300 width=57)" " -> Index Scan using i_session on game_round_actions (cost=0.00..224166.97 rows=300 width=53)" " Index Cond: ((action_time >= $2) AND (action_time <= $3))" " Filter: (user_id = 1811)" " -> Index Scan using "PK_game_table" on game_tables (cost=0.00..8.26 rows=1 width=12)" " Index Cond: (game_tables.table_id = game_round_actions.table_id)" " Filter: (game_tables.game_type_id <> ANY ($4))" Here's the table definition: CREATE TABLE game_round_actions ( table_id bigint NOT NULL, round_id integer NOT NULL, action_id integer NOT NULL, seat_id integer NOT NULL, action_desc character varying(20) NOT NULL, action_area character varying(100), amount numeric(16,6), action_value character varying(100), action_time timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP AT TIME ZONE 'UTC', user_id bigint, sub_action_id integer NOT NULL, -- Sub action id is 0 for the root actions. >0 for generated actions. CONSTRAINT "PK_game_round_actions" PRIMARY KEY (table_id, round_id, action_id, sub_action_id), CONSTRAINT fk_game_round_actions_round FOREIGN KEY (table_id, round_id) REFERENCES game_rounds (table_id, round_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE, CONSTRAINT fk_game_round_actions_user FOREIGN KEY (table_id, user_id) REFERENCES game_table_users (table_id, user_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH ( OIDS=FALSE ); CREATE INDEX i_session ON game_round_actions USING btree (action_time) WHERE user_id <> 0 AND sub_action_id = 0; The table contains 1 655 528 000 rows (estimated) and there are about 10000 unique user_ids. The data spans about 2 years. Shouldn't postgres realise that in both cases user_id is compared against a constant value and chose the same plan? How do I increase the stats target for the column? -- Regards, Nicos Panayides IT Manager Magneta Technologies Ltd Tel: +357 22721919, 22317400 Fax: +357 22721917 Web: http://www.magneta.eu
Nicos Panayides <nicos@magneta.com.cy> writes: > On 01/31/2011 08:48 PM, Tom Lane wrote: >> It seems likely that you're getting a different plan for the generic >> case because that user id isn't representative of the overall average >> for the column. > I tried the prepared statement with both $1 and 1811 for user_id and > here's the plans I got: [ bad ] > " -> Seq Scan on game_round_actions (cost=0.00..51702078.26 > rows=314 width=53)" > " Filter: ((action_time >= $2) AND (action_time <= $3) AND > (sub_action_id = 0) AND (user_id = $1))" [ good ] > " -> Index Scan using i_session on game_round_actions > (cost=0.00..224166.97 rows=300 width=53)" > " Index Cond: ((action_time >= $2) AND (action_time <= $3))" > " Filter: (user_id = 1811)" So the question is why it won't use that index in the parameterized case ... > CREATE INDEX i_session > ON game_round_actions > USING btree > (action_time) > WHERE user_id <> 0 AND sub_action_id = 0; ... and the answer is that it can't prove user_id <> 0 when it doesn't know the value of the parameter equated to user_id, so it cannot build a plan that relies on using that partial index. (IOW, if it did use the index, it would get the wrong answer if $1 happened to be zero.) I don't know the reason you had for making the index partial in the first place, but maybe you should reconsider that. Another possibility is to explicitly include "user_id <> 0" in the query conditions, if you're certain that the passed-in value is never zero. regards, tom lane
On 02/01/2011 06:52 PM, Tom Lane wrote: > Nicos Panayides<nicos@magneta.com.cy> writes: > >> On 01/31/2011 08:48 PM, Tom Lane wrote: >> >>> It seems likely that you're getting a different plan for the generic >>> case because that user id isn't representative of the overall average >>> for the column. >>> > >> I tried the prepared statement with both $1 and 1811 for user_id and >> here's the plans I got: >> > [ bad ] > >> " -> Seq Scan on game_round_actions (cost=0.00..51702078.26 >> rows=314 width=53)" >> " Filter: ((action_time>= $2) AND (action_time<= $3) AND >> (sub_action_id = 0) AND (user_id = $1))" >> > [ good ] > >> " -> Index Scan using i_session on game_round_actions >> (cost=0.00..224166.97 rows=300 width=53)" >> " Index Cond: ((action_time>= $2) AND (action_time<= $3))" >> " Filter: (user_id = 1811)" >> > So the question is why it won't use that index in the parameterized case ... > > >> CREATE INDEX i_session >> ON game_round_actions >> USING btree >> (action_time) >> WHERE user_id<> 0 AND sub_action_id = 0; >> > ... and the answer is that it can't prove user_id<> 0 when it doesn't > know the value of the parameter equated to user_id, so it cannot build > a plan that relies on using that partial index. (IOW, if it did use the > index, it would get the wrong answer if $1 happened to be zero.) > > I don't know the reason you had for making the index partial in the > first place, but maybe you should reconsider that. Another possibility > is to explicitly include "user_id<> 0" in the query conditions, if > you're certain that the passed-in value is never zero. > > regards, tom lane > I added an additional user_id <> 0 and performance is normal. I didn't know that the database does not use the value of the parameter to chose a plan. It makes sense now that you have explained it. The reason we use a partial index is that the table is very huge and we are really interested only in those rows. Actually the condition should have used user_id IS NOT NULL instead of <> 0. Unfortunately the index takes quite a few hours to build so we implemented the workaround you suggested for now. If the condition was "user_id IS NOT NULL AND sub_action_id=0" would postgres always chose the index since user_id = NULL will always be false? Thank you very much for help! -- Regards, Nicos Panayides IT Manager Magneta Technologies Ltd Tel: +357 22721919, 22317400 Fax: +357 22721917 Web: http://www.magneta.eu