Thread: Planner Row Estimate with Function

Planner Row Estimate with Function

From
Michael Fork
Date:
I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the
restof the query planning.  The result of the function is roughly unique - there are a handful with multiple entries -
butthe planner is estimating 227,745 rows.  I re-ran ANALYZE on the table and the results did not change.  Any
suggestionson how to get more accurate planner result? 


Function definition:

CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$
BEGIN
RETURN SUBSTRING($1 FROM 3 FOR 13);
END;
$_$ LANGUAGE plpgsql IMMUTABLE;

Explain output:

# explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) =
lpad(CAST('57729970'AS text), 13, '0'); 
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Index Scan using idx_event_card_id on event  (cost=0.25..468642.89 rows=227745 width=104)
   Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text)

Statistics:

# SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM
pg_statsWHERE tablename = 'idx_event_card_id'; 
 null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation |
                                       histogram_bounds 

-----------+-----------+------------+------------------+-------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 0.0935673 |        17 |         -1 |                  |                   |    0.672617 |
{0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760}
Thanks.


Michael


Re: Planner Row Estimate with Function

From
Pavel Stehule
Date:
Hello

2009/12/29 Michael Fork <mfork00@yahoo.com>:
> I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the
restof the query planning.  The result of the function is roughly unique - there are a handful with multiple entries -
butthe planner is estimating 227,745 rows.  I re-ran ANALYZE on the table and the results did not change.  Any
suggestionson how to get more accurate planner result? 
>
>
> Function definition:
>
> CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$
> BEGIN
> RETURN SUBSTRING($1 FROM 3 FOR 13);
> END;
> $_$ LANGUAGE plpgsql IMMUTABLE;
>

try
 CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$
SELECT SUBSTRING($1 FROM 3 FOR 13);
 $_$ LANGUAGE sql;

regards
Pavel Stehule

> Explain output:
>
> # explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) =
lpad(CAST('57729970'AS text), 13, '0'); 
>                                        QUERY PLAN
> -------------------------------------------------------------------------------------------
>  Index Scan using idx_event_card_id on event  (cost=0.25..468642.89 rows=227745 width=104)
>   Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text)
>
> Statistics:
>
> # SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM
pg_statsWHERE tablename = 'idx_event_card_id'; 
>  null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation |                          
                                          histogram_bounds 
>
-----------+-----------+------------+------------------+-------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
>  0.0935673 |        17 |         -1 |                  |                   |    0.672617 |
{0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760}
> Thanks.
>
>
> Michael
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Planner Row Estimate with Function

From
Michael Fork
Date:
Pavel,

Thanks for the suggestion but unfortunately the planner estimate was not really affected:

                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Index Scan using idx_event_card_id on event  (cost=0.00..401311.59 rows=223890 width=103)
   Index Cond: ("substring"(reference_code, 3, 13) = '0000057729970'::text)
(2 rows)
Thanks.


Michael



----- Original Message ----
From: Pavel Stehule <pavel.stehule@gmail.com>
To: Michael Fork <mfork00@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Tue, December 29, 2009 12:18:52 PM
Subject: Re: [GENERAL] Planner Row Estimate with Function

Hello

2009/12/29 Michael Fork <mfork00@yahoo.com>:
> I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the
restof the query planning.  The result of the function is roughly unique - there are a handful with multiple entries -
butthe planner is estimating 227,745 rows.  I re-ran ANALYZE on the table and the results did not change.  Any
suggestionson how to get more accurate planner result? 
>
>
> Function definition:
>
> CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$
> BEGIN
> RETURN SUBSTRING($1 FROM 3 FOR 13);
> END;
> $_$ LANGUAGE plpgsql IMMUTABLE;
>

try
CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$
SELECT SUBSTRING($1 FROM 3 FOR 13);
$_$ LANGUAGE sql;

regards
Pavel Stehule

> Explain output:
>
> # explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) =
lpad(CAST('57729970'AS text), 13, '0'); 
>                                        QUERY PLAN
> -------------------------------------------------------------------------------------------
>  Index Scan using idx_event_card_id on event  (cost=0.25..468642.89 rows=227745 width=104)
>   Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text)
>
> Statistics:
>
> # SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM
pg_statsWHERE tablename = 'idx_event_card_id'; 
>  null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation |
                                         histogram_bounds 
>
-----------+-----------+------------+------------------+-------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
>  0.0935673 |        17 |         -1 |                  |                   |    0.672617 |
{0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760}
> Thanks.
>
>
> Michael
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: Planner Row Estimate with Function

From
Pavel Stehule
Date:
2009/12/29 Michael Fork <mfork00@yahoo.com>:
> Pavel,
>
> Thanks for the suggestion but unfortunately the planner estimate was not really affected:

any string estimation are not exact.

you can use following dirty trick:


http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Using_IMMUTABLE_functions_as_hints_for_the_optimizer

Regards
Pavel
>
>                                        QUERY PLAN
> -------------------------------------------------------------------------------------------
>  Index Scan using idx_event_card_id on event  (cost=0.00..401311.59 rows=223890 width=103)
>   Index Cond: ("substring"(reference_code, 3, 13) = '0000057729970'::text)
> (2 rows)
> Thanks.
>
>
> Michael
>
>
>
> ----- Original Message ----
> From: Pavel Stehule <pavel.stehule@gmail.com>
> To: Michael Fork <mfork00@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Sent: Tue, December 29, 2009 12:18:52 PM
> Subject: Re: [GENERAL] Planner Row Estimate with Function
>
> Hello
>
> 2009/12/29 Michael Fork <mfork00@yahoo.com>:
>> I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the
restof the query planning.  The result of the function is roughly unique - there are a handful with multiple entries -
butthe planner is estimating 227,745 rows.  I re-ran ANALYZE on the table and the results did not change.  Any
suggestionson how to get more accurate planner result? 
>>
>>
>> Function definition:
>>
>> CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$
>> BEGIN
>> RETURN SUBSTRING($1 FROM 3 FOR 13);
>> END;
>> $_$ LANGUAGE plpgsql IMMUTABLE;
>>
>
> try
> CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$
> SELECT SUBSTRING($1 FROM 3 FOR 13);
> $_$ LANGUAGE sql;
>
> regards
> Pavel Stehule
>
>> Explain output:
>>
>> # explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) =
lpad(CAST('57729970'AS text), 13, '0'); 
>>                                        QUERY PLAN
>> -------------------------------------------------------------------------------------------
>>  Index Scan using idx_event_card_id on event  (cost=0.25..468642.89 rows=227745 width=104)
>>   Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text)
>>
>> Statistics:
>>
>> # SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM
pg_statsWHERE tablename = 'idx_event_card_id'; 
>>  null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation |                          
                                          histogram_bounds 
>>
-----------+-----------+------------+------------------+-------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  0.0935673 |        17 |         -1 |                  |                   |    0.672617 |
{0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760}
>> Thanks.
>>
>>
>> Michael
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

Re: Planner Row Estimate with Function

From
Thomas Kellerer
Date:
Michael Fork wrote on 29.12.2009 18:08:
> I have an index scan on a custom function that is returning a wildly
> incorrect row estimate that is throwing off the rest of the query
> planning.  The result of the function is roughly unique - there are a
> handful with multiple entries - but the planner is estimating 227,745
> rows.  I re-ran ANALYZE on the table and the results did not change.

> Any suggestions on how to get more accurate planner result?

You can add the ROWS nnn option to your create statement to give the planner a hint about the number of rows:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

Check out the /ROWS result_rows/ part.

Thomas



Re: Planner Row Estimate with Function

From
Tom Lane
Date:
Michael Fork <mfork00@yahoo.com> writes:
> I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the
restof the query planning.  The result of the function is roughly unique - there are a handful with multiple entries -
butthe planner is estimating 227,745 rows.  I re-ran ANALYZE on the table and the results did not change.  Any
suggestionson how to get more accurate planner result? 

What PG version is this exactly?

Also, what happened to the type='CREDIT' condition in your query?  Is
that a partial index?

            regards, tom lane

Re: Planner Row Estimate with Function

From
Michael Fork
Date:
springboard_v2=# SELECT version();
                                             version
--------------------------------------------------------------------------------------------------
 PostgreSQL 8.3.7 on amd64-portbld-freebsd7.2, compiled by GCC cc (GCC) 4.2.1 20070719  [FreeBSD]
(1 row)

Yes, this is partial index.  I should have included the index definition earlier:

# CREATE INDEX CONCURRENTLY idx_event_card_id ON trail.event(parsecardidfromreferencecode(reference_code)) WHERE type =
'CREDIT';Thanks. 


Michael



----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Michael Fork <mfork00@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Tue, December 29, 2009 3:43:06 PM
Subject: Re: [GENERAL] Planner Row Estimate with Function

Michael Fork <mfork00@yahoo.com> writes:
> I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the
restof the query planning.  The result of the function is roughly unique - there are a handful with multiple entries -
butthe planner is estimating 227,745 rows.  I re-ran ANALYZE on the table and the results did not change.  Any
suggestionson how to get more accurate planner result? 

What PG version is this exactly?

Also, what happened to the type='CREDIT' condition in your query?  Is
that a partial index?

            regards, tom lane


Re: Planner Row Estimate with Function

From
Michael Fork
Date:
I tried the trick below and the planner estimate was roughly the same:

springboard_v2=# explain select * from trail.event where type='CREDIT' and lpad(CAST('57729970' AS text), 13, '0') =
ANY(parsecardidfromreferencecode(reference_code));                                          QUERY PLAN 
---------------------------------------------------------------------------------------------
 Bitmap Heap Scan on event  (cost=1669366.06..15120311.84 rows=2178778 width=103)
   Recheck Cond: (type = 'CREDIT'::text)
   Filter: ('0000057729970'::text = ANY (parsecardidfromreferencecode(reference_code)))
   ->  Bitmap Index Scan on idx_event_card_id  (cost=0.00..1668821.37 rows=44565021 width=0)
 Thanks.


Michael



----- Original Message ----
From: Pavel Stehule <pavel.stehule@gmail.com>
To: Michael Fork <mfork00@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Tue, December 29, 2009 1:16:10 PM
Subject: Re: [GENERAL] Planner Row Estimate with Function

2009/12/29 Michael Fork <mfork00@yahoo.com>:
> Pavel,
>
> Thanks for the suggestion but unfortunately the planner estimate was not really affected:

any string estimation are not exact.

you can use following dirty trick:


http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Using_IMMUTABLE_functions_as_hints_for_the_optimizer

Regards
Pavel
>
>                                        QUERY PLAN
> -------------------------------------------------------------------------------------------
>  Index Scan using idx_event_card_id on event  (cost=0.00..401311.59 rows=223890 width=103)
>   Index Cond: ("substring"(reference_code, 3, 13) = '0000057729970'::text)
> (2 rows)
> Thanks.
>
>
> Michael
>
>
>
> ----- Original Message ----
> From: Pavel Stehule <pavel.stehule@gmail.com>
> To: Michael Fork <mfork00@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Sent: Tue, December 29, 2009 12:18:52 PM
> Subject: Re: [GENERAL] Planner Row Estimate with Function
>
> Hello
>
> 2009/12/29 Michael Fork <mfork00@yahoo.com>:
>> I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the
restof the query planning.  The result of the function is roughly unique - there are a handful with multiple entries -
butthe planner is estimating 227,745 rows.  I re-ran ANALYZE on the table and the results did not change.  Any
suggestionson how to get more accurate planner result? 
>>
>>
>> Function definition:
>>
>> CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$
>> BEGIN
>> RETURN SUBSTRING($1 FROM 3 FOR 13);
>> END;
>> $_$ LANGUAGE plpgsql IMMUTABLE;
>>
>
> try
> CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$
> SELECT SUBSTRING($1 FROM 3 FOR 13);
> $_$ LANGUAGE sql;
>
> regards
> Pavel Stehule
>
>> Explain output:
>>
>> # explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) =
lpad(CAST('57729970'AS text), 13, '0'); 
>>                                        QUERY PLAN
>> -------------------------------------------------------------------------------------------
>>  Index Scan using idx_event_card_id on event  (cost=0.25..468642.89 rows=227745 width=104)
>>   Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text)
>>
>> Statistics:
>>
>> # SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM
pg_statsWHERE tablename = 'idx_event_card_id'; 
>>  null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation |
                                          histogram_bounds 
>>
-----------+-----------+------------+------------------+-------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  0.0935673 |        17 |         -1 |                  |                   |    0.672617 |
{0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760}
>> Thanks.
>>
>>
>> Michael
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


Re: Planner Row Estimate with Function

From
Tom Lane
Date:
Michael Fork <mfork00@yahoo.com> writes:
>> Also, what happened to the type='CREDIT' condition in your query?  Is
>> that a partial index?

> Yes, this is partial index.  I should have included the index definition earlier:

Ah.  The optimizer is ignoring the index stats on the grounds that
they are not representative of the whole table.  I'm not entirely sure
offhand whether it would be safe to use them anyway if the index
predicate is known to match the query --- it seems a bit risky but on
the other hand it'd probably be better than having no stats at all.

What you can do to get a better estimate is to create a non-partial
index on parsecardidfromreferencecode(reference_code).  It's fairly
likely that if you do that, maintaining the partial index as well
is not worth your time; but that's a decision you'd have to make.

            regards, tom lane

Re: Planner Row Estimate with Function

From
Michael Fork
Date:
That solved it.
 Thanks!


Michael



----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Michael Fork <mfork00@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Tue, December 29, 2009 11:19:42 PM
Subject: Re: [GENERAL] Planner Row Estimate with Function

Michael Fork <mfork00@yahoo.com> writes:
>> Also, what happened to the type='CREDIT' condition in your query?  Is
>> that a partial index?

> Yes, this is partial index.  I should have included the index definition earlier:

Ah.  The optimizer is ignoring the index stats on the grounds that
they are not representative of the whole table.  I'm not entirely sure
offhand whether it would be safe to use them anyway if the index
predicate is known to match the query --- it seems a bit risky but on
the other hand it'd probably be better than having no stats at all.

What you can do to get a better estimate is to create a non-partial
index on parsecardidfromreferencecode(reference_code).  It's fairly
likely that if you do that, maintaining the partial index as well
is not worth your time; but that's a decision you'd have to make.

            regards, tom lane