Re: wildcard text filter switched to boolean column, performance is way worse - Mailing list pgsql-performance

From Marc Mamin
Subject Re: wildcard text filter switched to boolean column, performance is way worse
Date
Msg-id B6F6FD62F2624C4C9916AC0175D56D8828BF1096@jenmbs01.ad.intershop.net
Whole thread Raw
In response to Re: wildcard text filter switched to boolean column, performance is way worse  (Mike Broers <mbroers@gmail.com>)
List pgsql-performance
Hello,
> 
> 
> 
> 
> 
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mike
Broers
> Sent: Dienstag, 7. Juli 2015 18:28
> To: Tom Lane
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] wildcard text filter switched to boolean column, performance is way worse
> 
> After bumping up work_mem from 12MB to 25MB that last materialize is indeed hashing and this cut the query time by
about60%.  Thanks, this was very helpful and gives me something else to look for when troubleshooting explains.  
 
> 
> 
> 
> On Tue, Jul 7, 2015 at 11:10 AM, Mike Broers <mbroers@gmail.com> wrote:
> Thanks, very informative! I'll experiment with work_mem settings and report back.
> 
> On Tue, Jul 7, 2015 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Mike Broers <mbroers@gmail.com> writes:
> > I had a query that was filtering with a wildcard search of a text field for
> > %SUCCESS%. The query took about 5 seconds and was running often so I wanted
> > to improve it.  I suggested that the engineers include a new boolean column
> > for successful status.  They implemented the requested field, but the query
> > that filters on that new column runs very long (i kill it after letting it
> > run for about an hour).  Can someone help me understand why that is the
> > case and how to resolve it?
> 
> It's hashing the subplan output in the first case and not the second:
> 
> > Seq Scan on lead  (cost=130951.81..158059.21 rows=139957 width=369) (actual
> > time=4699.619..4699.869 rows=1 loops=1)
> >     Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <>
> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3))))
>                                                    ^^^^^^^^^^^^^^^^
> vs
> 
> >   Seq Scan on lead  (cost=85775.78..9005687281.12 rows=139957 width=369)
> >     Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <>
> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3))))
>                                                    ^^^^^^^^^
> 
> Presumably, the new more-accurate rows count causes the planner to realize
> that the hash table will exceed work_mem so it doesn't choose to hash ...
> but for your situation, you'd rather it did, because what you're getting
> instead is a Materialize node that spills to disk (again, because the data
> involved exceeds work_mem) and that's a killer for this query.  You should
> be able to get back the old behavior if you raise work_mem enough.
> 
> Another idea you might think about is changing the OR'd IN conditions
> to a single IN over a UNION ALL of the subselects.  I'm not really sure if
> that would produce a better plan, but it's worth trying if it wouldn't
> require too much app-side contortion.


Hello,
you might try to use a CTE to first collect the IDs to exclude, and join them to your main table.
This should result in an anti join pattern.

Something like:

WITH IDS as (
    SELECT U1."lead_id" AS "lead_id" 
    FROM "event" U1 
    WHERE U1."event_type" ='type_1'
    UNION (
       SELECT U1."lead_id" AS "lead_id" 
       FROM "event" U1 
       WHERE U1."event_type" = 'type_2'
       INTERSECT
       SELECT U1."lead_id" AS "lead_id" 
       FROM "event" U1 
       WHERE successful
    )
)
SELECT * FROM lead LEFT OUTER JOIN IDS ON (lead.id=IDS.lead_id)
WHERE IDS.lead_id IS NULL;

regards,

Marc Mamin






>                         regards, tom lane
> 
> 
>

pgsql-performance by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Next
From: "Graeme B. Bell"
Date:
Subject: Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?