Thread: Seqscan slowness and stored procedures

Seqscan slowness and stored procedures

From

Ivan Voras

Date:

26 May 2012, 18:39:36

Hello,

I have a SQL function (which I've pasted below) and while testing its
code directly (outside a function), this is the "normal", default plan:

http://explain.depesz.com/s/vfP (67 ms)

and this is the plain with enable_seqscan turned off:

http://explain.depesz.com/s/EFP (27 ms)

Disabling seqscan results in almost 2.5x faster execution.

However, when this code is wrapped in a function, the execution time is
closer to the second case (which is great, I'm not complaining):

edem=> explain analyze select * from document_content_top_voted(36);
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Function Scan on document_content_top_voted  (cost=0.25..10.25
rows=1000 width=188) (actual time=20.644..20.821 rows=167 loops=1)
 Total runtime: 21.236 ms
(2 rows)

I assume that the difference between the function execution time and the
direct plan with seqscan disabled is due to SQL parsing and planning.

Since the plan is compiled-in for stored procedures, is the planner in
that case already running under the assumption that seqscans must be
disabled (or something to that effect)?

Would tweaking enable_seqscan and other planner functions during the
CREATE FUNCTION have an effect on the stored plan?

Do the functions need to be re-created when the database is fully
populated, to adjust their stored plans with regards to new selectivity
situation on the indexes?

----

The SQL function is:

-- Retrieves document chunks of a specified document which have the most
votes

DROP FUNCTION IF EXISTS document_content_top_voted(INTEGER);
CREATE OR REPLACE FUNCTION document_content_top_voted(document_id INTEGER)
    RETURNS TABLE
        (chunk_id INTEGER, seq INTEGER, content TEXT, ctime INTEGER, log
TEXT,
        nr_chunk_upvotes INTEGER, nr_chunk_downvotes INTEGER,
nr_seq_changes INTEGER, nr_seq_comments INTEGER,
        user_login VARCHAR, user_public_name VARCHAR, user_email VARCHAR)
AS $$
    WITH documents_top_chunks AS (
        SELECT
            (SELECT
                chunk_id
            FROM
                documents_chunks_votes_total
            WHERE
                documents_id=$1 AND
documents_chunks_votes_total.seq=documents_seqs.seq AND votes=
                    (SELECT
                        max(votes)
                    FROM
                        documents_chunks_votes_total
                    WHERE
                        documents_id=$1 AND
documents_chunks_votes_total.seq=documents_seqs.seq)
            ORDER BY
                chunk_id DESC
            LIMIT 1) AS chunk_id, seq AS doc_seq
        FROM
            documents_seqs
        WHERE
            documents_id = $1
        ORDER BY seq
    ) SELECT
        chunk_id, doc_seq, content, documents_chunks.ctime,
documents_chunks.log,
        COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE
documents_chunks_id=chunk_id AND vote=1)::integer, 0) AS nr_chunk_upvotes,
        COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE
documents_chunks_id=chunk_id AND vote=-1)::integer, 0) AS
nr_chunk_downvotes,
        (SELECT COUNT(*) FROM documents_chunks WHERE documents_id=$1 AND
seq=doc_seq)::integer AS nr_seq_changes,
        (SELECT COUNT(*) FROM documents_seq_comments WHERE
documents_seq_comments.documents_id=$1 AND seq=doc_seq)::integer AS
nr_seq_comments,
        users.login, users.public_name, users.email
    FROM
        documents_chunks
        JOIN documents_top_chunks ON documents_chunks.id =
documents_top_chunks.chunk_id
        JOIN users ON users.id=creator_uid
    ORDER BY doc_seq
$$ LANGUAGE SQL;

(comments on improving the efficiency of the SQL code are also appreciated)

Re: Seqscan slowness and stored procedures

From

Pavel Stehule

Date:

27 May 2012, 00:29:37

Hello

2012/5/26 Ivan Voras <ivoras@freebsd.org>:
> Hello,
>
> I have a SQL function (which I've pasted below) and while testing its
> code directly (outside a function), this is the "normal", default plan:
>
> http://explain.depesz.com/s/vfP (67 ms)
>
> and this is the plain with enable_seqscan turned off:
>
> http://explain.depesz.com/s/EFP (27 ms)
>
> Disabling seqscan results in almost 2.5x faster execution.
>
> However, when this code is wrapped in a function, the execution time is
> closer to the second case (which is great, I'm not complaining):
>

see http://archives.postgresql.org/pgsql-general/2009-12/msg01189.php

Regards

Pavel

> edem=> explain analyze select * from document_content_top_voted(36);
>                                                            QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on document_content_top_voted  (cost=0.25..10.25
> rows=1000 width=188) (actual time=20.644..20.821 rows=167 loops=1)
>  Total runtime: 21.236 ms
> (2 rows)
>
> I assume that the difference between the function execution time and the
> direct plan with seqscan disabled is due to SQL parsing and planning.
>
> Since the plan is compiled-in for stored procedures, is the planner in
> that case already running under the assumption that seqscans must be
> disabled (or something to that effect)?
>
> Would tweaking enable_seqscan and other planner functions during the
> CREATE FUNCTION have an effect on the stored plan?
>
> Do the functions need to be re-created when the database is fully
> populated, to adjust their stored plans with regards to new selectivity
> situation on the indexes?
>
> ----
>
> The SQL function is:
>
> -- Retrieves document chunks of a specified document which have the most
> votes
>
> DROP FUNCTION IF EXISTS document_content_top_voted(INTEGER);
> CREATE OR REPLACE FUNCTION document_content_top_voted(document_id INTEGER)
>    RETURNS TABLE
>        (chunk_id INTEGER, seq INTEGER, content TEXT, ctime INTEGER, log
> TEXT,
>        nr_chunk_upvotes INTEGER, nr_chunk_downvotes INTEGER,
> nr_seq_changes INTEGER, nr_seq_comments INTEGER,
>        user_login VARCHAR, user_public_name VARCHAR, user_email VARCHAR)
> AS $$
>    WITH documents_top_chunks AS (
>        SELECT
>            (SELECT
>                chunk_id
>            FROM
>                documents_chunks_votes_total
>            WHERE
>                documents_id=$1 AND
> documents_chunks_votes_total.seq=documents_seqs.seq AND votes=
>                    (SELECT
>                        max(votes)
>                    FROM
>                        documents_chunks_votes_total
>                    WHERE
>                        documents_id=$1 AND
> documents_chunks_votes_total.seq=documents_seqs.seq)
>            ORDER BY
>                chunk_id DESC
>            LIMIT 1) AS chunk_id, seq AS doc_seq
>        FROM
>            documents_seqs
>        WHERE
>            documents_id = $1
>        ORDER BY seq
>    ) SELECT
>        chunk_id, doc_seq, content, documents_chunks.ctime,
> documents_chunks.log,
>        COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE
> documents_chunks_id=chunk_id AND vote=1)::integer, 0) AS nr_chunk_upvotes,
>        COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE
> documents_chunks_id=chunk_id AND vote=-1)::integer, 0) AS
> nr_chunk_downvotes,
>        (SELECT COUNT(*) FROM documents_chunks WHERE documents_id=$1 AND
> seq=doc_seq)::integer AS nr_seq_changes,
>        (SELECT COUNT(*) FROM documents_seq_comments WHERE
> documents_seq_comments.documents_id=$1 AND seq=doc_seq)::integer AS
> nr_seq_comments,
>        users.login, users.public_name, users.email
>    FROM
>        documents_chunks
>        JOIN documents_top_chunks ON documents_chunks.id =
> documents_top_chunks.chunk_id
>        JOIN users ON users.id=creator_uid
>    ORDER BY doc_seq
> $$ LANGUAGE SQL;
>
> (comments on improving the efficiency of the SQL code are also appreciated)
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Seqscan slowness and stored procedures

From

Ivan Voras

Date:

27 May 2012, 12:58:53

On 27 May 2012 05:28, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hello
>
> 2012/5/26 Ivan Voras <ivoras@freebsd.org>:
>> Hello,
>>
>> I have a SQL function (which I've pasted below) and while testing its
>> code directly (outside a function), this is the "normal", default plan:
>>
>> http://explain.depesz.com/s/vfP (67 ms)
>>
>> and this is the plain with enable_seqscan turned off:
>>
>> http://explain.depesz.com/s/EFP (27 ms)
>>
>> Disabling seqscan results in almost 2.5x faster execution.
>>
>> However, when this code is wrapped in a function, the execution time is
>> closer to the second case (which is great, I'm not complaining):
>>
>
> see http://archives.postgresql.org/pgsql-general/2009-12/msg01189.php

Hi,

Thank you for your answer, but if you read my post, you'll hopefully
realize my questions are different from that in the linked post, and
are not answered by the post.

Re: Seqscan slowness and stored procedures

From

Pavel Stehule

Date:

27 May 2012, 13:08:26

2012/5/27 Ivan Voras <ivoras@freebsd.org>:
> On 27 May 2012 05:28, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Hello
>>
>> 2012/5/26 Ivan Voras <ivoras@freebsd.org>:
>>> Hello,
>>>
>>> I have a SQL function (which I've pasted below) and while testing its
>>> code directly (outside a function), this is the "normal", default plan:
>>>
>>> http://explain.depesz.com/s/vfP (67 ms)
>>>
>>> and this is the plain with enable_seqscan turned off:
>>>
>>> http://explain.depesz.com/s/EFP (27 ms)
>>>
>>> Disabling seqscan results in almost 2.5x faster execution.
>>>
>>> However, when this code is wrapped in a function, the execution time is
>>> closer to the second case (which is great, I'm not complaining):
>>>
>>
>> see http://archives.postgresql.org/pgsql-general/2009-12/msg01189.php
>
> Hi,
>
> Thank you for your answer, but if you read my post, you'll hopefully
> realize my questions are different from that in the linked post, and
> are not answered by the post.

yes, sorry,

Pavel

Re: Seqscan slowness and stored procedures

From

"Albe Laurenz"

Date:

08 June 2012, 06:58:51

Ivan Voras wrote:
> I have a SQL function (which I've pasted below) and while testing its
> code directly (outside a function), this is the "normal", default
plan:
>
> http://explain.depesz.com/s/vfP (67 ms)
>
> and this is the plain with enable_seqscan turned off:
>
> http://explain.depesz.com/s/EFP (27 ms)
>
> Disabling seqscan results in almost 2.5x faster execution.
>
> However, when this code is wrapped in a function, the execution time
is
> closer to the second case (which is great, I'm not complaining):
>
> edem=> explain analyze select * from document_content_top_voted(36);
>                                                             QUERY PLAN
>
------------------------------------------------------------------------
------------------------------
> -----------------------------
>  Function Scan on document_content_top_voted  (cost=0.25..10.25
> rows=1000 width=188) (actual time=20.644..20.821 rows=167 loops=1)
>  Total runtime: 21.236 ms
> (2 rows)
>
> I assume that the difference between the function execution time and
the
> direct plan with seqscan disabled is due to SQL parsing and planning.

That cannot be, because SQL functions do not cache execution plans.

Did you take caching of table data in the buffer cache or the filesystem
cache into account?  Did you run your tests several times in a row and
were the actual execution times consistent?

> Since the plan is compiled-in for stored procedures, is the planner in
> that case already running under the assumption that seqscans must be
> disabled (or something to that effect)?
>
> Would tweaking enable_seqscan and other planner functions during the
> CREATE FUNCTION have an effect on the stored plan?

No, but you can use the SET clause of CREATE FUNCTION to change
enable_seqscan for this function if you know that this is the right
thing.
But be aware that things might be different for other function arguments
or when the table data change, so this is normally considered a bad
idea.

> Do the functions need to be re-created when the database is fully
> populated, to adjust their stored plans with regards to new
selectivity
> situation on the indexes?

No. Even in PL/pgSQL, where plans are cached, this is only for the
lifetime of the database session.  The plan is generated when the
function is called for the first time in a database session.

Yours,
Laurenz Albe

Re: Seqscan slowness and stored procedures

From

Ivan Voras

Date:

08 June 2012, 08:01:11

On 8 June 2012 11:58, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

> Did you take caching of table data in the buffer cache or the filesystem
> cache into account?  Did you run your tests several times in a row and
> were the actual execution times consistent?

Yes, and yes.

>> Would tweaking enable_seqscan and other planner functions during the
>> CREATE FUNCTION have an effect on the stored plan?
>
> No, but you can use the SET clause of CREATE FUNCTION to change
> enable_seqscan for this function if you know that this is the right
> thing.
> But be aware that things might be different for other function arguments
> or when the table data change, so this is normally considered a bad
> idea.

Ok.

>> Do the functions need to be re-created when the database is fully
>> populated, to adjust their stored plans with regards to new
> selectivity
>> situation on the indexes?
>
> No. Even in PL/pgSQL, where plans are cached, this is only for the
> lifetime of the database session.  The plan is generated when the
> function is called for the first time in a database session.

Thanks for clearing this up for me! I thought SQL functions are also
pre-planned and that the plans are static.