Thread: procost for to_tsvector

procost for to_tsvector

From

Andrew Gierth

Date:

11 March 2015, 14:40:23

An issue that comes up regularly on IRC is that text search queries,
especially on relatively modest size tables or for relatively
non-selective words, often misplan as a seqscan based on the fact that
to_tsvector has procost=1.

Clearly this cost number is ludicrous.

Getting the right cost estimate would obviously mean taking the cost of
detoasting into account, but even without doing that, there's a strong
argument that it should be increased to at least the order of 100.
(With the default cpu_operator_cost that would make each to_tsvector
call cost 0.25.)

(The guy I was just helping on IRC was seeing a slowdown of 100x from a
seqscan in a query that selected about 50 rows from about 500.)

-- 
Andrew (irc:RhodiumToad)

Re: procost for to_tsvector

From

Andres Freund

Date:

11 March 2015, 14:44:40

Hi,

On 2015-03-11 14:40:16 +0000, Andrew Gierth wrote:
> An issue that comes up regularly on IRC is that text search queries,
> especially on relatively modest size tables or for relatively
> non-selective words, often misplan as a seqscan based on the fact that
> to_tsvector has procost=1.

I've also seen this regularly outside IRC.

> Clearly this cost number is ludicrous.

Yea.

> Getting the right cost estimate would obviously mean taking the cost of
> detoasting into account

Well, that's not done in other cases where you could either, so there's
precedence for being inaccurate ;)

> ,but even without doing that, there's a strong
> argument that it should be increased to at least the order of 100.
> (With the default cpu_operator_cost that would make each to_tsvector
> call cost 0.25.)

100 sounds good to me. IIRC that's what has been proposed before.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: procost for to_tsvector

From

Tom Lane

Date:

11 March 2015, 16:07:35

Andres Freund <andres@2ndquadrant.com> writes:
> On 2015-03-11 14:40:16 +0000, Andrew Gierth wrote:
>> Getting the right cost estimate would obviously mean taking the cost of
>> detoasting into account

> Well, that's not done in other cases where you could either, so there's
> precedence for being inaccurate ;)

If we were to charge something for detoasting, that would be a separate
matter anyway IMO, not something to try to sneak into function costs.
(Essentially, what we ought to consider is that a Var isn't zero-cost
if it refers to a column with a large fraction of toasted entries.
But that's a matter for a different patch.)

>> ,but even without doing that, there's a strong
>> argument that it should be increased to at least the order of 100.

Nyet ... at least not without you actually making that argument, with
numbers, rather than just handwaving.  We use 100 for plpgsql and suchlike
functions.  I'd be OK with making it 10 just on general principles, but
claiming that it's as expensive as a plpgsql function requires evidence.
        regards, tom lane

Re: procost for to_tsvector

From

Andres Freund

Date:

11 March 2015, 16:26:11

On 2015-03-11 12:07:20 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2015-03-11 14:40:16 +0000, Andrew Gierth wrote:
> >> ,but even without doing that, there's a strong
> >> argument that it should be increased to at least the order of 100.
> 
> Nyet ... at least not without you actually making that argument, with
> numbers, rather than just handwaving.  We use 100 for plpgsql and suchlike
> functions.  I'd be OK with making it 10 just on general principles, but
> claiming that it's as expensive as a plpgsql function requires
> evidence.

I'll note that you proposed a higher cost than 10 years back ;):
http://www.postgresql.org/message-id/8971.1255891843@sss.pgh.pa.us

What you said back then makes sense to me:

On 2009-10-18 14:50:43 -0400, Tom Lane wrote:
> In another case I was looking at just now, it seems that to_tsquery()
> and to_tsvector() are noticeably slower than most other built-in
> functions, which is not surprising given the amount of mechanism that
> gets invoked inside them.  It would be useful to tell the planner
> about that to discourage it from picking seqscan plans that involve
> repeated execution of these functions.

A trivial comparison shows with a simple plpgsql function:
CREATE FUNCTION a_simple_plpgsql_function(a text) RETURNS text LANGUAGE plpgsql AS $$BEGIN RETURN repeat(a, 3);END;$$;

SELECT a_simple_plpgsql_function('This is a long sentence in english. Or maybe not so long after all. But it includes a
MetalÜmlaut. And parens: ()! Also a number: ' ||g.i)
 
FROM generate_series(1, 10000) g(i)
Time: 32.898 ms

and
SELECT to_tsvector('english',                  'This is a long sentence in english. Or maybe not so
longafter all. But it includes a Metal Ümlaut. And                  parens: ()! Also a number: ' ||g.i)
 
FROM generate_series(1, 10000) g(i);
Time: 450.996 ms

Given that this is a short sentence and a simple text search
configuration a factor of 10 between them doesn't sound wrong. This is
obviously completely unscientific, but ...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: procost for to_tsvector

From

Andrew Gierth

Date:

11 March 2015, 21:54:58

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Nyet ... at least not without you actually making that argument,Tom> with numbers, rather than just handwaving.
Weuse 100 for plpgsqlTom> and suchlike functions.  I'd be OK with making it 10 just onTom> general principles, but
claimingthat it's as expensive as aTom> plpgsql function requires evidence.
 

[TL/DR: 10 isn't enough, even 100 may be too low]

On a text corpus consisting of ~18 thousand blog comments + ~5% of dead
rows, median length 302 bytes, only about 3% long enough to be toasted,
and selecting a common word (~22% of the table):

explain analyze select * from comments where to_tsvector('english',message) @@ '''one'''::tsquery;
                                                  QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------Seq
Scanon comments  (cost=0.00..2406.18 rows=4140 width=792) (actual time=0.601..3946.589 rows=4056 loops=1)  Filter:
(to_tsvector('english'::regconfig,message) @@ '''one'''::tsquery)  Rows Removed by Filter: 14310Planning time: 0.270
msExecutiontime: 3954.745 ms
 
(5 rows)
                                                              QUERY PLAN
               
 

-----------------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on comments  (cost=204.09..2404.30 rows=4140 width=792) (actual time=2.401..11.564 rows=4056 loops=1)  Recheck
Cond:(to_tsvector('english'::regconfig, message) @@ '''one'''::tsquery)  Heap Blocks: exact=1911  ->  Bitmap Index Scan
oncomments_to_tsvector_idx  (cost=0.00..203.05 rows=4140 width=0) (actual time=1.974..1.974 rows=4313 loops=1)
IndexCond: (to_tsvector('english'::regconfig, message) @@ '''one'''::tsquery)Planning time: 0.278 msExecution time:
17.640ms
 
(7 rows)

(strangely, the seqscan plan is picked despite having a cost more than a
point higher? what's up with that?)

So for two plans with virtually identical cost, we have an execution
time difference on the order of 200x.

We can rule out the performance of the @@ by using a precalculated
tsvector:

explain analyze select * from comments where tsv @@ '''one'''::tsquery;
QUERYPLAN                                                  
 
--------------------------------------------------------------------------------------------------------------Seq Scan
oncomments  (cost=0.00..2359.31 rows=4140 width=792) (actual time=0.023..47.746 rows=4056 loops=1)  Filter: (tsv @@
'''one'''::tsquery) Rows Removed by Filter: 14310Planning time: 0.262 msExecution time: 54.220 ms
 
(5 rows)

So we're looking at an execution time for to_tsvector on the order of
200us, which is a seriously big deal when looking at a potential
seqscan.  That's not just _as_ expensive as a plpgsql function, but more
than 50 times as expensive as a simple one like this:

create function f1(text) returns integer language plpgsqlas $f$ begin return length($1); end; $f$;

select sum(length(message)) from comments;  --  89ms
select sum(f1(message)) from comments;      -- 155ms

66ms difference divided by 18366 rows = 3.6us per call

Now, obviously the default cost for plpgsql functions is assuming that
the function is a whole lot more complex than that, so one wouldn't
argue that to_tsvector should cost 5000. But there's a strong case for
arguing that it should cost a whole lot more than 100, because even at
that value the relative costs for the first two plans in this post only
differ by 2x, compared to a 200x runtime difference.  A value of 10
would be inadequate in many cases; in this example it leaves the slower
plan with a cost only ~15% higher, which is way too close to be
comfortable.

(As another example, a function with a simple query in it, such as
obj_description, can have runtimes on the order of 40us, still 5x faster
than to_tsvector.)

-- 
Andrew (irc:RhodiumToad)

Re: procost for to_tsvector

From

Jeff Janes

Date:

11 March 2015, 22:22:31

On Wed, Mar 11, 2015 at 2:54 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

Seq Scan on comments (cost=0.00..2406.18 rows=4140 width=792) (actual time=0.601..3946.589 rows=4056 loops=1)

Bitmap Heap Scan on comments (cost=204.09..2404.30 rows=4140 width=792) (actual time=2.401..11.564 rows=4056 loops=1)

...

(strangely, the seqscan plan is picked despite having a cost more than a
point higher? what's up with that?)

It is probably this, from src/backend/optimizer/util/pathnode.c :

costcmp = compare_path_costs_fuzzily(new_path, old_path, 1.01,

parent_rel->consider_startup);

Cheers,

Jeff

Re: procost for to_tsvector

From

Bruce Momjian

Date:

01 May 2015, 01:34:08

On Wed, Mar 11, 2015 at 02:40:16PM +0000, Andrew Gierth wrote:
> An issue that comes up regularly on IRC is that text search queries,
> especially on relatively modest size tables or for relatively
> non-selective words, often misplan as a seqscan based on the fact that
> to_tsvector has procost=1.
> 
> Clearly this cost number is ludicrous.
> 
> Getting the right cost estimate would obviously mean taking the cost of
> detoasting into account, but even without doing that, there's a strong
> argument that it should be increased to at least the order of 100.
> (With the default cpu_operator_cost that would make each to_tsvector
> call cost 0.25.)
> 
> (The guy I was just helping on IRC was seeing a slowdown of 100x from a
> seqscan in a query that selected about 50 rows from about 500.)

Where are we on setting increasing procost for to_tsvector?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: procost for to_tsvector

From

Robert Haas

Date:

01 May 2015, 11:57:32

On Thu, Apr 30, 2015 at 9:34 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Mar 11, 2015 at 02:40:16PM +0000, Andrew Gierth wrote:
>> An issue that comes up regularly on IRC is that text search queries,
>> especially on relatively modest size tables or for relatively
>> non-selective words, often misplan as a seqscan based on the fact that
>> to_tsvector has procost=1.
>>
>> Clearly this cost number is ludicrous.
>>
>> Getting the right cost estimate would obviously mean taking the cost of
>> detoasting into account, but even without doing that, there's a strong
>> argument that it should be increased to at least the order of 100.
>> (With the default cpu_operator_cost that would make each to_tsvector
>> call cost 0.25.)
>>
>> (The guy I was just helping on IRC was seeing a slowdown of 100x from a
>> seqscan in a query that selected about 50 rows from about 500.)
>
> Where are we on setting increasing procost for to_tsvector?

We're waiting for you to commit the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: procost for to_tsvector

From

Bruce Momjian

Date:

01 May 2015, 13:14:05

On Fri, May  1, 2015 at 07:57:27AM -0400, Robert Haas wrote:
> On Thu, Apr 30, 2015 at 9:34 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Wed, Mar 11, 2015 at 02:40:16PM +0000, Andrew Gierth wrote:
> >> An issue that comes up regularly on IRC is that text search queries,
> >> especially on relatively modest size tables or for relatively
> >> non-selective words, often misplan as a seqscan based on the fact that
> >> to_tsvector has procost=1.
> >>
> >> Clearly this cost number is ludicrous.
> >>
> >> Getting the right cost estimate would obviously mean taking the cost of
> >> detoasting into account, but even without doing that, there's a strong
> >> argument that it should be increased to at least the order of 100.
> >> (With the default cpu_operator_cost that would make each to_tsvector
> >> call cost 0.25.)
> >>
> >> (The guy I was just helping on IRC was seeing a slowdown of 100x from a
> >> seqscan in a query that selected about 50 rows from about 500.)
> >
> > Where are we on setting increasing procost for to_tsvector?
>
> We're waiting for you to commit the patch.

OK, I have to write the patch first, so patch attached, using the cost
of 10.  I assume to_tsvector() is the ony one needing changes.  The
patch will require a catalog bump too.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

tsvector.diff

Re: procost for to_tsvector

From

Robert Haas

Date:

01 May 2015, 13:39:48

On Fri, May 1, 2015 at 9:13 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, May  1, 2015 at 07:57:27AM -0400, Robert Haas wrote:
>> On Thu, Apr 30, 2015 at 9:34 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > On Wed, Mar 11, 2015 at 02:40:16PM +0000, Andrew Gierth wrote:
>> >> An issue that comes up regularly on IRC is that text search queries,
>> >> especially on relatively modest size tables or for relatively
>> >> non-selective words, often misplan as a seqscan based on the fact that
>> >> to_tsvector has procost=1.
>> >>
>> >> Clearly this cost number is ludicrous.
>> >>
>> >> Getting the right cost estimate would obviously mean taking the cost of
>> >> detoasting into account, but even without doing that, there's a strong
>> >> argument that it should be increased to at least the order of 100.
>> >> (With the default cpu_operator_cost that would make each to_tsvector
>> >> call cost 0.25.)
>> >>
>> >> (The guy I was just helping on IRC was seeing a slowdown of 100x from a
>> >> seqscan in a query that selected about 50 rows from about 500.)
>> >
>> > Where are we on setting increasing procost for to_tsvector?
>>
>> We're waiting for you to commit the patch.
>
> OK, I have to write the patch first, so patch attached, using the cost
> of 10.  I assume to_tsvector() is the ony one needing changes.  The
> patch will require a catalog bump too.

Andrew did the research to support a higher value, but even 10 should
be an improvement over what we have now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: procost for to_tsvector

From

Bruce Momjian

Date:

01 May 2015, 14:01:50

On Fri, May  1, 2015 at 09:39:43AM -0400, Robert Haas wrote:
> On Fri, May 1, 2015 at 9:13 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Fri, May  1, 2015 at 07:57:27AM -0400, Robert Haas wrote:
> >> On Thu, Apr 30, 2015 at 9:34 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> > On Wed, Mar 11, 2015 at 02:40:16PM +0000, Andrew Gierth wrote:
> >> >> An issue that comes up regularly on IRC is that text search queries,
> >> >> especially on relatively modest size tables or for relatively
> >> >> non-selective words, often misplan as a seqscan based on the fact that
> >> >> to_tsvector has procost=1.
> >> >>
> >> >> Clearly this cost number is ludicrous.
> >> >>
> >> >> Getting the right cost estimate would obviously mean taking the cost of
> >> >> detoasting into account, but even without doing that, there's a strong
> >> >> argument that it should be increased to at least the order of 100.
> >> >> (With the default cpu_operator_cost that would make each to_tsvector
> >> >> call cost 0.25.)
> >> >>
> >> >> (The guy I was just helping on IRC was seeing a slowdown of 100x from a
> >> >> seqscan in a query that selected about 50 rows from about 500.)
> >> >
> >> > Where are we on setting increasing procost for to_tsvector?
> >>
> >> We're waiting for you to commit the patch.
> >
> > OK, I have to write the patch first, so patch attached, using the cost
> > of 10.  I assume to_tsvector() is the ony one needing changes.  The
> > patch will require a catalog bump too.
> 
> Andrew did the research to support a higher value, but even 10 should
> be an improvement over what we have now.

Yes, I saw that, but I didn't see him recommend an actual number.  Can
someone recommend a number now?   Tom initially recommended 10, but
Andrew's tests suggest something > 100.  Tom didn't do any tests so I
tend to favor Andrew's suggestion, if he has one.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: procost for to_tsvector

From

Robert Haas

Date:

01 May 2015, 14:03:11

On Fri, May 1, 2015 at 10:01 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> Andrew did the research to support a higher value, but even 10 should
>> be an improvement over what we have now.
>
> Yes, I saw that, but I didn't see him recommend an actual number.  Can
> someone recommend a number now?   Tom initially recommended 10, but
> Andrew's tests suggest something > 100.  Tom didn't do any tests so I
> tend to favor Andrew's suggestion, if he has one.

In the OP, he suggested "on the order of 100".  Maybe we could just go with 100.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: procost for to_tsvector

From

Bruce Momjian

Date:

01 May 2015, 14:12:03

On Fri, May  1, 2015 at 10:03:01AM -0400, Robert Haas wrote:
> On Fri, May 1, 2015 at 10:01 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >> Andrew did the research to support a higher value, but even 10 should
> >> be an improvement over what we have now.
> >
> > Yes, I saw that, but I didn't see him recommend an actual number.  Can
> > someone recommend a number now?   Tom initially recommended 10, but
> > Andrew's tests suggest something > 100.  Tom didn't do any tests so I
> > tend to favor Andrew's suggestion, if he has one.
> 
> In the OP, he suggested "on the order of 100".  Maybe we could just go with 100.

OK, I will go with 100 unless I hear otherwise.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: procost for to_tsvector

From

Andres Freund

Date:

01 May 2015, 14:13:26

On 2015-05-01 10:03:01 -0400, Robert Haas wrote:
> Maybe we could just go with 100.

+1

Greetings,

Andres Freund

Re: procost for to_tsvector

From

Tom Lane

Date:

01 May 2015, 17:59:53

Robert Haas <robertmhaas@gmail.com> writes:
> In the OP, he suggested "on the order of 100".  Maybe we could just go with 100.

I'm OK with that in view of <87h9trs0zm.fsf@news-spur.riddles.org.uk> and
some experiments of my own, but I wonder why we are only thinking of
to_tsvector.  Isn't to_tsquery, for example, just about as expensive?
What of other text search functions?
        regards, tom lane

Re: procost for to_tsvector

From

Andrew Gierth

Date:

02 May 2015, 01:27:31

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> In the OP, he suggested "on the order of 100".  Maybe we could just>> go with 100.
Tom> I'm OK with that in view of <87h9trs0zm.fsf@news-spur.riddles.org.uk>

Note that the results from that post suggest 100 as a bare minimum,
higher values would be quite reasonable.
Tom> and some experiments of my own, but I wonder why we are onlyTom> thinking of to_tsvector.  Isn't to_tsquery, for
example,justTom> about as expensive?  What of other text search functions?

Making the same change for to_tsquery and plainto_tsquery would be
reasonable; that would help with the seqscan cost for cases like
to_tsvector('config',col) @@ to_tsquery('blah') where the non-immutable
form of to_tsquery is used. It doesn't seem to have shown up as an issue
in reports so far because the common usage patterns don't tend to have
it evaluated for each row (either the immutable form is used, or the
to_tsquery is evaluated in a different from-clause item).

I don't recall seeing cases of any of the other functions figuring into
planner decisions.

-- 
Andrew (irc:RhodiumToad)

Re: procost for to_tsvector

From

Tom Lane

Date:

02 May 2015, 04:42:15

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>>> In the OP, he suggested "on the order of 100".  Maybe we could just
>>> go with 100.

>  Tom> I'm OK with that in view of <87h9trs0zm.fsf@news-spur.riddles.org.uk>

> Note that the results from that post suggest 100 as a bare minimum,
> higher values would be quite reasonable.

I'm not entirely convinced that your experiments disentangled the CPU cost
of to_tsvector itself from the costs of detoasting its input, which is an
issue that we ought to address separately.  In particular, comparing to
textlen() is unreliable for this purpose since in single-byte encodings
textlen() does not have to dereference a TOAST pointer at all.

It is possible to prove that to_tsvector() is much more expensive per-byte
than, say, md5():

regression=# select sum(length((repeat('xyzzy ', i)))) from generate_series(1,10000) i;   sum    
-----------300030000
(1 row)

Time: 360.423 ms
regression=# select sum(length(md5(repeat('xyzzy ', i)))) from generate_series(1,10000) i; sum   
--------320000
(1 row)

Time: 1339.806 ms
regression=# select sum(length(to_tsvector(repeat('xyzzy ', i)))) from generate_series(1,10000) i; sum  
-------10000
(1 row)

Time: 78564.333 ms

These numbers put md5() at about 3.3 nsec/input byte on my machine, and
to_tsvector() with the 'english' configuration at about 260 nsec/byte.
It's certainly possible that lots of repetitions of 'xyzzy ' isn't a very
representative sample of typical to_tsvector input; but at least this
test does not involve any toasted-value access.  So, as I said, I'm okay
with costing to_tsvector() at 100x the cost of md5().  I'm not convinced
that any factor above that is to_tsvector's fault.

>  Tom> and some experiments of my own, but I wonder why we are only
>  Tom> thinking of to_tsvector.  Isn't to_tsquery, for example, just
>  Tom> about as expensive?  What of other text search functions?

> Making the same change for to_tsquery and plainto_tsquery would be
> reasonable; that would help with the seqscan cost for cases like
> to_tsvector('config',col) @@ to_tsquery('blah') where the non-immutable
> form of to_tsquery is used.

Works for me.

> I don't recall seeing cases of any of the other functions figuring into
> planner decisions.

It's not so much "are they popular" as "do they involve parsing raw
text".  Once you've got the tsvector or tsquery, later steps are
(I think) much more efficient.
        regards, tom lane

Re: procost for to_tsvector

From

Tom Lane

Date:

03 May 2015, 18:05:14

I wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Tom> and some experiments of my own, but I wonder why we are only
>> Tom> thinking of to_tsvector.  Isn't to_tsquery, for example, just
>> Tom> about as expensive?  What of other text search functions?

>> Making the same change for to_tsquery and plainto_tsquery would be
>> reasonable; that would help with the seqscan cost for cases like
>> to_tsvector('config',col) @@ to_tsquery('blah') where the non-immutable
>> form of to_tsquery is used.

> Works for me.

>> I don't recall seeing cases of any of the other functions figuring into
>> planner decisions.

> It's not so much "are they popular" as "do they involve parsing raw
> text".  Once you've got the tsvector or tsquery, later steps are
> (I think) much more efficient.

I poked at this a bit more, and noted that:

* ts_headline() also parses input text, and is demonstrably at least as
expensive per-input-byte as to_tsvector.

* ts_match_tt() and ts_match_tq() invoke to_tsvector internally,
and thus should certainly have as great a cost.

* tsquery_rewrite_query() actually executes a SQL query given as a string,
with cost that is uncertain, but treating it as a unit-cost function is
surely completely silly.  Since our default cost for PL-language functions
is 100, probably setting this one to 100 as well is a reasonable proposal.

So I think we should set procost for all of these functions to 100, as
per attached.  Any objections?

            regards, tom lane

diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 55c246e..0a0b2bb 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 3625 (  tsvector_conca
*** 4494,4501 ****

  DATA(insert OID = 3634 (  ts_match_vq            PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "3614 3615" _null_
_null__null_ _null_ _null_ ts_match_vq _null_ _null_ _null_ )); 
  DATA(insert OID = 3635 (  ts_match_qv            PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "3615 3614" _null_
_null__null_ _null_ _null_ ts_match_qv _null_ _null_ _null_ )); 
! DATA(insert OID = 3760 (  ts_match_tt            PGNSP PGUID 12 3 0 0 0 f f f f t f s 2 0 16 "25 25" _null_ _null_
_null__null_ _null_ ts_match_tt _null_ _null_ _null_ )); 
! DATA(insert OID = 3761 (  ts_match_tq            PGNSP PGUID 12 2 0 0 0 f f f f t f s 2 0 16 "25 3615" _null_ _null_
_null__null_ _null_ ts_match_tq _null_ _null_ _null_ )); 

  DATA(insert OID = 3648 (  gtsvector_compress    PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_
_null__null_ _null_ gtsvector_compress _null_ _null_ _null_ )); 
  DESCR("GiST tsvector support");
--- 4494,4501 ----

  DATA(insert OID = 3634 (  ts_match_vq            PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "3614 3615" _null_
_null__null_ _null_ _null_ ts_match_vq _null_ _null_ _null_ )); 
  DATA(insert OID = 3635 (  ts_match_qv            PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "3615 3614" _null_
_null__null_ _null_ _null_ ts_match_qv _null_ _null_ _null_ )); 
! DATA(insert OID = 3760 (  ts_match_tt            PGNSP PGUID 12 100 0 0 0 f f f f t f s 2 0 16 "25 25" _null_ _null_
_null__null_ _null_ ts_match_tt _null_ _null_ _null_ )); 
! DATA(insert OID = 3761 (  ts_match_tq            PGNSP PGUID 12 100 0 0 0 f f f f t f s 2 0 16 "25 3615" _null_
_null__null_ _null_ _null_ ts_match_tq _null_ _null_ _null_ )); 

  DATA(insert OID = 3648 (  gtsvector_compress    PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_
_null__null_ _null_ gtsvector_compress _null_ _null_ _null_ )); 
  DESCR("GiST tsvector support");
*************** DESCR("show real useful query for GiST i
*** 4554,4560 ****

  DATA(insert OID = 3684 (  ts_rewrite        PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 3615 "3615 3615 3615" _null_
_null__null_ _null_ _null_ tsquery_rewrite _null_ _null_ _null_ )); 
  DESCR("rewrite tsquery");
! DATA(insert OID = 3685 (  ts_rewrite        PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 3615 "3615 25" _null_ _null_
_null__null_ _null_ tsquery_rewrite_query _null_ _null_ _null_ )); 
  DESCR("rewrite tsquery");

  DATA(insert OID = 3695 (  gtsquery_compress                PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281"
_null__null_ _null_ _null_ _null_ gtsquery_compress _null_ _null_ _null_ )); 
--- 4554,4560 ----

  DATA(insert OID = 3684 (  ts_rewrite        PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 3615 "3615 3615 3615" _null_
_null__null_ _null_ _null_ tsquery_rewrite _null_ _null_ _null_ )); 
  DESCR("rewrite tsquery");
! DATA(insert OID = 3685 (  ts_rewrite        PGNSP PGUID 12 100 0 0 0 f f f f t f v 2 0 3615 "3615 25" _null_ _null_
_null__null_ _null_ tsquery_rewrite_query _null_ _null_ _null_ )); 
  DESCR("rewrite tsquery");

  DATA(insert OID = 3695 (  gtsquery_compress                PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281"
_null__null_ _null_ _null_ _null_ gtsquery_compress _null_ _null_ _null_ )); 
*************** DESCR("(internal)");
*** 4644,4669 ****
  DATA(insert OID = 3741 (  thesaurus_lexize    PGNSP PGUID 12 1 0 0 0 f f f f t f i 4 0 2281 "2281 2281 2281 2281"
_null__null_ _null_ _null_ _null_ thesaurus_lexize _null_ _null_ _null_ )); 
  DESCR("(internal)");

! DATA(insert OID = 3743 (  ts_headline    PGNSP PGUID 12 1 0 0 0 f f f f t f i 4 0 25 "3734 25 3615 25" _null_ _null_
_null__null_ _null_ ts_headline_byid_opt _null_ _null_ _null_ )); 
  DESCR("generate headline");
! DATA(insert OID = 3744 (  ts_headline    PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 25 "3734 25 3615" _null_ _null_
_null__null_ _null_ ts_headline_byid _null_ _null_ _null_ )); 
  DESCR("generate headline");
! DATA(insert OID = 3754 (  ts_headline    PGNSP PGUID 12 1 0 0 0 f f f f t f s 3 0 25 "25 3615 25" _null_ _null_
_null__null_ _null_ ts_headline_opt _null_ _null_ _null_ )); 
  DESCR("generate headline");
! DATA(insert OID = 3755 (  ts_headline    PGNSP PGUID 12 1 0 0 0 f f f f t f s 2 0 25 "25 3615" _null_ _null_ _null_
_null__null_ ts_headline _null_ _null_ _null_ )); 
  DESCR("generate headline");

! DATA(insert OID = 3745 (  to_tsvector        PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 3614 "3734 25" _null_ _null_
_null__null_ _null_ to_tsvector_byid _null_ _null_ _null_ )); 
  DESCR("transform to tsvector");
! DATA(insert OID = 3746 (  to_tsquery        PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 3615 "3734 25" _null_ _null_
_null__null_ _null_ to_tsquery_byid _null_ _null_ _null_ )); 
  DESCR("make tsquery");
! DATA(insert OID = 3747 (  plainto_tsquery    PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 3615 "3734 25" _null_ _null_
_null__null_ _null_ plainto_tsquery_byid _null_ _null_ _null_ )); 
  DESCR("transform to tsquery");
! DATA(insert OID = 3749 (  to_tsvector        PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 3614 "25" _null_ _null_ _null_
_null__null_ to_tsvector _null_ _null_ _null_ )); 
  DESCR("transform to tsvector");
! DATA(insert OID = 3750 (  to_tsquery        PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 3615 "25" _null_ _null_ _null_
_null__null_ to_tsquery _null_ _null_ _null_ )); 
  DESCR("make tsquery");
! DATA(insert OID = 3751 (  plainto_tsquery    PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 3615 "25" _null_ _null_ _null_
_null__null_ plainto_tsquery _null_ _null_ _null_ )); 
  DESCR("transform to tsquery");

  DATA(insert OID = 3752 (  tsvector_update_trigger            PGNSP PGUID 12 1 0 0 0 f f f f f f v 0 0 2279 "" _null_
_null__null_ _null_ _null_ tsvector_update_trigger_byid _null_ _null_ _null_ )); 
--- 4644,4669 ----
  DATA(insert OID = 3741 (  thesaurus_lexize    PGNSP PGUID 12 1 0 0 0 f f f f t f i 4 0 2281 "2281 2281 2281 2281"
_null__null_ _null_ _null_ _null_ thesaurus_lexize _null_ _null_ _null_ )); 
  DESCR("(internal)");

! DATA(insert OID = 3743 (  ts_headline    PGNSP PGUID 12 100 0 0 0 f f f f t f i 4 0 25 "3734 25 3615 25" _null_
_null__null_ _null_ _null_ ts_headline_byid_opt _null_ _null_ _null_ )); 
  DESCR("generate headline");
! DATA(insert OID = 3744 (  ts_headline    PGNSP PGUID 12 100 0 0 0 f f f f t f i 3 0 25 "3734 25 3615" _null_ _null_
_null__null_ _null_ ts_headline_byid _null_ _null_ _null_ )); 
  DESCR("generate headline");
! DATA(insert OID = 3754 (  ts_headline    PGNSP PGUID 12 100 0 0 0 f f f f t f s 3 0 25 "25 3615 25" _null_ _null_
_null__null_ _null_ ts_headline_opt _null_ _null_ _null_ )); 
  DESCR("generate headline");
! DATA(insert OID = 3755 (  ts_headline    PGNSP PGUID 12 100 0 0 0 f f f f t f s 2 0 25 "25 3615" _null_ _null_ _null_
_null__null_ ts_headline _null_ _null_ _null_ )); 
  DESCR("generate headline");

! DATA(insert OID = 3745 (  to_tsvector        PGNSP PGUID 12 100 0 0 0 f f f f t f i 2 0 3614 "3734 25" _null_ _null_
_null__null_ _null_ to_tsvector_byid _null_ _null_ _null_ )); 
  DESCR("transform to tsvector");
! DATA(insert OID = 3746 (  to_tsquery        PGNSP PGUID 12 100 0 0 0 f f f f t f i 2 0 3615 "3734 25" _null_ _null_
_null__null_ _null_ to_tsquery_byid _null_ _null_ _null_ )); 
  DESCR("make tsquery");
! DATA(insert OID = 3747 (  plainto_tsquery    PGNSP PGUID 12 100 0 0 0 f f f f t f i 2 0 3615 "3734 25" _null_ _null_
_null__null_ _null_ plainto_tsquery_byid _null_ _null_ _null_ )); 
  DESCR("transform to tsquery");
! DATA(insert OID = 3749 (  to_tsvector        PGNSP PGUID 12 100 0 0 0 f f f f t f s 1 0 3614 "25" _null_ _null_
_null__null_ _null_ to_tsvector _null_ _null_ _null_ )); 
  DESCR("transform to tsvector");
! DATA(insert OID = 3750 (  to_tsquery        PGNSP PGUID 12 100 0 0 0 f f f f t f s 1 0 3615 "25" _null_ _null_ _null_
_null__null_ to_tsquery _null_ _null_ _null_ )); 
  DESCR("make tsquery");
! DATA(insert OID = 3751 (  plainto_tsquery    PGNSP PGUID 12 100 0 0 0 f f f f t f s 1 0 3615 "25" _null_ _null_
_null__null_ _null_ plainto_tsquery _null_ _null_ _null_ )); 
  DESCR("transform to tsquery");

  DATA(insert OID = 3752 (  tsvector_update_trigger            PGNSP PGUID 12 1 0 0 0 f f f f f f v 0 0 2279 "" _null_
_null__null_ _null_ _null_ tsvector_update_trigger_byid _null_ _null_ _null_ ));