Thread: Seqscan slowness and stored procedures
Hello, I have a SQL function (which I've pasted below) and while testing its code directly (outside a function), this is the "normal", default plan: http://explain.depesz.com/s/vfP (67 ms) and this is the plain with enable_seqscan turned off: http://explain.depesz.com/s/EFP (27 ms) Disabling seqscan results in almost 2.5x faster execution. However, when this code is wrapped in a function, the execution time is closer to the second case (which is great, I'm not complaining): edem=> explain analyze select * from document_content_top_voted(36); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Function Scan on document_content_top_voted (cost=0.25..10.25 rows=1000 width=188) (actual time=20.644..20.821 rows=167 loops=1) Total runtime: 21.236 ms (2 rows) I assume that the difference between the function execution time and the direct plan with seqscan disabled is due to SQL parsing and planning. Since the plan is compiled-in for stored procedures, is the planner in that case already running under the assumption that seqscans must be disabled (or something to that effect)? Would tweaking enable_seqscan and other planner functions during the CREATE FUNCTION have an effect on the stored plan? Do the functions need to be re-created when the database is fully populated, to adjust their stored plans with regards to new selectivity situation on the indexes? ---- The SQL function is: -- Retrieves document chunks of a specified document which have the most votes DROP FUNCTION IF EXISTS document_content_top_voted(INTEGER); CREATE OR REPLACE FUNCTION document_content_top_voted(document_id INTEGER) RETURNS TABLE (chunk_id INTEGER, seq INTEGER, content TEXT, ctime INTEGER, log TEXT, nr_chunk_upvotes INTEGER, nr_chunk_downvotes INTEGER, nr_seq_changes INTEGER, nr_seq_comments INTEGER, user_login VARCHAR, user_public_name VARCHAR, user_email VARCHAR) AS $$ WITH documents_top_chunks AS ( SELECT (SELECT chunk_id FROM documents_chunks_votes_total WHERE documents_id=$1 AND documents_chunks_votes_total.seq=documents_seqs.seq AND votes= (SELECT max(votes) FROM documents_chunks_votes_total WHERE documents_id=$1 AND documents_chunks_votes_total.seq=documents_seqs.seq) ORDER BY chunk_id DESC LIMIT 1) AS chunk_id, seq AS doc_seq FROM documents_seqs WHERE documents_id = $1 ORDER BY seq ) SELECT chunk_id, doc_seq, content, documents_chunks.ctime, documents_chunks.log, COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE documents_chunks_id=chunk_id AND vote=1)::integer, 0) AS nr_chunk_upvotes, COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE documents_chunks_id=chunk_id AND vote=-1)::integer, 0) AS nr_chunk_downvotes, (SELECT COUNT(*) FROM documents_chunks WHERE documents_id=$1 AND seq=doc_seq)::integer AS nr_seq_changes, (SELECT COUNT(*) FROM documents_seq_comments WHERE documents_seq_comments.documents_id=$1 AND seq=doc_seq)::integer AS nr_seq_comments, users.login, users.public_name, users.email FROM documents_chunks JOIN documents_top_chunks ON documents_chunks.id = documents_top_chunks.chunk_id JOIN users ON users.id=creator_uid ORDER BY doc_seq $$ LANGUAGE SQL; (comments on improving the efficiency of the SQL code are also appreciated)
Hello 2012/5/26 Ivan Voras <ivoras@freebsd.org>: > Hello, > > I have a SQL function (which I've pasted below) and while testing its > code directly (outside a function), this is the "normal", default plan: > > http://explain.depesz.com/s/vfP (67 ms) > > and this is the plain with enable_seqscan turned off: > > http://explain.depesz.com/s/EFP (27 ms) > > Disabling seqscan results in almost 2.5x faster execution. > > However, when this code is wrapped in a function, the execution time is > closer to the second case (which is great, I'm not complaining): > see http://archives.postgresql.org/pgsql-general/2009-12/msg01189.php Regards Pavel > edem=> explain analyze select * from document_content_top_voted(36); > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------- > Function Scan on document_content_top_voted (cost=0.25..10.25 > rows=1000 width=188) (actual time=20.644..20.821 rows=167 loops=1) > Total runtime: 21.236 ms > (2 rows) > > I assume that the difference between the function execution time and the > direct plan with seqscan disabled is due to SQL parsing and planning. > > Since the plan is compiled-in for stored procedures, is the planner in > that case already running under the assumption that seqscans must be > disabled (or something to that effect)? > > Would tweaking enable_seqscan and other planner functions during the > CREATE FUNCTION have an effect on the stored plan? > > Do the functions need to be re-created when the database is fully > populated, to adjust their stored plans with regards to new selectivity > situation on the indexes? > > ---- > > The SQL function is: > > -- Retrieves document chunks of a specified document which have the most > votes > > DROP FUNCTION IF EXISTS document_content_top_voted(INTEGER); > CREATE OR REPLACE FUNCTION document_content_top_voted(document_id INTEGER) > RETURNS TABLE > (chunk_id INTEGER, seq INTEGER, content TEXT, ctime INTEGER, log > TEXT, > nr_chunk_upvotes INTEGER, nr_chunk_downvotes INTEGER, > nr_seq_changes INTEGER, nr_seq_comments INTEGER, > user_login VARCHAR, user_public_name VARCHAR, user_email VARCHAR) > AS $$ > WITH documents_top_chunks AS ( > SELECT > (SELECT > chunk_id > FROM > documents_chunks_votes_total > WHERE > documents_id=$1 AND > documents_chunks_votes_total.seq=documents_seqs.seq AND votes= > (SELECT > max(votes) > FROM > documents_chunks_votes_total > WHERE > documents_id=$1 AND > documents_chunks_votes_total.seq=documents_seqs.seq) > ORDER BY > chunk_id DESC > LIMIT 1) AS chunk_id, seq AS doc_seq > FROM > documents_seqs > WHERE > documents_id = $1 > ORDER BY seq > ) SELECT > chunk_id, doc_seq, content, documents_chunks.ctime, > documents_chunks.log, > COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE > documents_chunks_id=chunk_id AND vote=1)::integer, 0) AS nr_chunk_upvotes, > COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE > documents_chunks_id=chunk_id AND vote=-1)::integer, 0) AS > nr_chunk_downvotes, > (SELECT COUNT(*) FROM documents_chunks WHERE documents_id=$1 AND > seq=doc_seq)::integer AS nr_seq_changes, > (SELECT COUNT(*) FROM documents_seq_comments WHERE > documents_seq_comments.documents_id=$1 AND seq=doc_seq)::integer AS > nr_seq_comments, > users.login, users.public_name, users.email > FROM > documents_chunks > JOIN documents_top_chunks ON documents_chunks.id = > documents_top_chunks.chunk_id > JOIN users ON users.id=creator_uid > ORDER BY doc_seq > $$ LANGUAGE SQL; > > (comments on improving the efficiency of the SQL code are also appreciated) > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On 27 May 2012 05:28, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hello > > 2012/5/26 Ivan Voras <ivoras@freebsd.org>: >> Hello, >> >> I have a SQL function (which I've pasted below) and while testing its >> code directly (outside a function), this is the "normal", default plan: >> >> http://explain.depesz.com/s/vfP (67 ms) >> >> and this is the plain with enable_seqscan turned off: >> >> http://explain.depesz.com/s/EFP (27 ms) >> >> Disabling seqscan results in almost 2.5x faster execution. >> >> However, when this code is wrapped in a function, the execution time is >> closer to the second case (which is great, I'm not complaining): >> > > see http://archives.postgresql.org/pgsql-general/2009-12/msg01189.php Hi, Thank you for your answer, but if you read my post, you'll hopefully realize my questions are different from that in the linked post, and are not answered by the post.
2012/5/27 Ivan Voras <ivoras@freebsd.org>: > On 27 May 2012 05:28, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Hello >> >> 2012/5/26 Ivan Voras <ivoras@freebsd.org>: >>> Hello, >>> >>> I have a SQL function (which I've pasted below) and while testing its >>> code directly (outside a function), this is the "normal", default plan: >>> >>> http://explain.depesz.com/s/vfP (67 ms) >>> >>> and this is the plain with enable_seqscan turned off: >>> >>> http://explain.depesz.com/s/EFP (27 ms) >>> >>> Disabling seqscan results in almost 2.5x faster execution. >>> >>> However, when this code is wrapped in a function, the execution time is >>> closer to the second case (which is great, I'm not complaining): >>> >> >> see http://archives.postgresql.org/pgsql-general/2009-12/msg01189.php > > Hi, > > Thank you for your answer, but if you read my post, you'll hopefully > realize my questions are different from that in the linked post, and > are not answered by the post. yes, sorry, Pavel
Ivan Voras wrote: > I have a SQL function (which I've pasted below) and while testing its > code directly (outside a function), this is the "normal", default plan: > > http://explain.depesz.com/s/vfP (67 ms) > > and this is the plain with enable_seqscan turned off: > > http://explain.depesz.com/s/EFP (27 ms) > > Disabling seqscan results in almost 2.5x faster execution. > > However, when this code is wrapped in a function, the execution time is > closer to the second case (which is great, I'm not complaining): > > edem=> explain analyze select * from document_content_top_voted(36); > QUERY PLAN > ------------------------------------------------------------------------ ------------------------------ > ----------------------------- > Function Scan on document_content_top_voted (cost=0.25..10.25 > rows=1000 width=188) (actual time=20.644..20.821 rows=167 loops=1) > Total runtime: 21.236 ms > (2 rows) > > I assume that the difference between the function execution time and the > direct plan with seqscan disabled is due to SQL parsing and planning. That cannot be, because SQL functions do not cache execution plans. Did you take caching of table data in the buffer cache or the filesystem cache into account? Did you run your tests several times in a row and were the actual execution times consistent? > Since the plan is compiled-in for stored procedures, is the planner in > that case already running under the assumption that seqscans must be > disabled (or something to that effect)? > > Would tweaking enable_seqscan and other planner functions during the > CREATE FUNCTION have an effect on the stored plan? No, but you can use the SET clause of CREATE FUNCTION to change enable_seqscan for this function if you know that this is the right thing. But be aware that things might be different for other function arguments or when the table data change, so this is normally considered a bad idea. > Do the functions need to be re-created when the database is fully > populated, to adjust their stored plans with regards to new selectivity > situation on the indexes? No. Even in PL/pgSQL, where plans are cached, this is only for the lifetime of the database session. The plan is generated when the function is called for the first time in a database session. Yours, Laurenz Albe
On 8 June 2012 11:58, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > Did you take caching of table data in the buffer cache or the filesystem > cache into account? Did you run your tests several times in a row and > were the actual execution times consistent? Yes, and yes. >> Would tweaking enable_seqscan and other planner functions during the >> CREATE FUNCTION have an effect on the stored plan? > > No, but you can use the SET clause of CREATE FUNCTION to change > enable_seqscan for this function if you know that this is the right > thing. > But be aware that things might be different for other function arguments > or when the table data change, so this is normally considered a bad > idea. Ok. >> Do the functions need to be re-created when the database is fully >> populated, to adjust their stored plans with regards to new > selectivity >> situation on the indexes? > > No. Even in PL/pgSQL, where plans are cached, this is only for the > lifetime of the database session. The plan is generated when the > function is called for the first time in a database session. Thanks for clearing this up for me! I thought SQL functions are also pre-planned and that the plans are static.