Home > mailing lists

Re: wildcard text filter switched to boolean column, performance is way worse - Mailing list pgsql-performance

From	Mike Broers
Subject	Re: wildcard text filter switched to boolean column, performance is way worse
Date	July 7, 2015 16:10:25
Msg-id	CAB9893h90mR4-HFLFp+k=HCDWFxvrbxs08vDN37k82D+gSDHAA@mail.gmail.com Whole thread
In response to	Re: wildcard text filter switched to boolean column, performance is way worse (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: wildcard text filter switched to boolean column, performance is way worse
List	pgsql-performance

Tree view

Thanks, very informative! I'll experiment with work_mem settings and report back.

On Tue, Jul 7, 2015 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mike Broers <mbroers@gmail.com> writes:
> I had a query that was filtering with a wildcard search of a text field for
> %SUCCESS%. The query took about 5 seconds and was running often so I wanted
> to improve it. I suggested that the engineers include a new boolean column
> for successful status. They implemented the requested field, but the query
> that filters on that new column runs very long (i kill it after letting it
> run for about an hour). Can someone help me understand why that is the
> case and how to resolve it?

It's hashing the subplan output in the first case and not the second:

> Seq Scan on lead (cost=130951.81..158059.21 rows=139957 width=369) (actual
> time=4699.619..4699.869 rows=1 loops=1)
> Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <>
> ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3))))
^^^^^^^^^^^^^^^^
vs

> Seq Scan on lead (cost=85775.78..9005687281.12 rows=139957 width=369)
> Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <>
> ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3))))
^^^^^^^^^

Presumably, the new more-accurate rows count causes the planner to realize
that the hash table will exceed work_mem so it doesn't choose to hash ...
but for your situation, you'd rather it did, because what you're getting
instead is a Materialize node that spills to disk (again, because the data
involved exceeds work_mem) and that's a killer for this query. You should
be able to get back the old behavior if you raise work_mem enough.

Another idea you might think about is changing the OR'd IN conditions
to a single IN over a UNION ALL of the subselects. I'm not really sure if
that would produce a better plan, but it's worth trying if it wouldn't
require too much app-side contortion.

regards, tom lane

pgsql-performance by date:

From: Tom Lane
Date: 07 July 2015, 16:02:56
Subject: Re: wildcard text filter switched to boolean column, performance is way worse

From: Vitalii Tymchyshyn
Date: 07 July 2015, 16:28:05
Subject: Re: New server: SSD/RAID recommendations?

Re: wildcard text filter switched to boolean column, performance is way worse - Mailing list pgsql-performance

Previous

Next