Re: BUG #16241: Degraded hash join performance - Mailing list pgsql-bugs

From Thomas Butz
Subject Re: BUG #16241: Degraded hash join performance
Date
Msg-id 1965637501.286993.1580900801394.JavaMail.zimbra@optitool.de
Whole thread Raw
In response to Re: BUG #16241: Degraded hash join performance  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
I've opened an issue: https://github.com/giggls/mapnik-german-l10n/issues/40

I suspect that the number of executed regexp_replace calls is the culprit here.
The cache of regexp.c seems to be limited to 32 entries which might be to low to keep all involved regexes cached.


> Hi,
> 
> On 2020-02-04 11:00:29 -0500, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> > Interesting! The no-children one clearly shows that a lot of the the
>> > time is spent evaluating regular expressions (there's other regex
>> > functions in the profile too):
>> >     23.36%  postgres  postgres            [.] subcolor
>> 
>> Huh ...
>> 
>> > I'm not aware of any relevant regular expression evaluation changes
>> > between 11 and 12. Tom, does this trigger anything?
>> 
>> (1) Nope, I'm not either; the last non-back-patched change in that
>> code was c54159d44 in v10.
>> 
>> (2) subcolor() is part of regex compilation, not execution, which makes
>> one wonder why it's showing up at all.  Maybe the regex cache in
>> adt/regexp.c is overflowing and preventing useful caching?  But
>> that didn't change in v12 either.  Are these test cases really
>> 100% equivalent?  I'm wondering if there are a few more "hot"
>> regex patterns in the v12 data ...
> 
> They are not 100% equivalent, but the part of the plan we see is very
> similar rowcount wise. It's possible that the functions differ more
> however, there are different postgis versions involved, and apparently
> also an "osml10n" extension.
> 
> 
>> (3) Where the heck is the regex use coming from at all?  I don't
>> see any regex operators in the plan.  Maybe it's inside the
>> plpgsql function?
> 
> It definitely is. The stack shows at least two levels of plpgsql
> functions. And Thomas has since confirmed that removing the functioncall
> fixes the issue.
> 
> Based on the name I think this is somewhere around this:
> https://github.com/giggls/mapnik-german-l10n/blob/master/plpgsql/get_localized_name_from_tags.sql#L120
> The callgraph indicates that most of the cost comes from within
> textregexreplace_noopt.
> 
> Not clear why the cache isn't fixing this - there are no variables in
> the regexp_replace calls as far as I can see.
> 
> Greetings,
> 
> Andres Freund

--

Thomas Butz



pgsql-bugs by date:

Previous
From: Arseny Sher
Date:
Subject: Re: ERROR: subtransaction logged without previous top-level txn record
Next
From: Stephen Frost
Date:
Subject: Re: BUG #16234: LDAP Query