Thread: A more efficient way?

A more efficient way?

From
James Cloos
Date:
I've a third-party app which is slower than it ought to be.

It does one operation in particular by doing a query to get a list of
rows, and then iterates though them to confirm whether it actually wants
that row.  As you can imagine that is very slow.

This query gets the correct data set in one go, but is also slow:
select p from m where s = 7 and p not in (select p from m where s != 7);

The p column is not unique; the goal is the set of p for which *every*
row with a matching p also has s=7.  (s and p are both integer columns,
if that matters.)

That takes about 38 seconds with this (explain analyze) plan:
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------Index
Scanusing m_5 on m  (cost=8141.99..11519.73 rows=1 width=4) (actual time=564.689..37964.163 rows=243 loops=1)  Index
Cond:(s = 7)  Filter: (NOT (SubPlan 1))  SubPlan 1    ->  Materialize  (cost=8141.99..13586.24 rows=523955 width=4)
(actualtime=0.003..80.734 rows=523948 loops=243)          ->  Seq Scan on m  (cost=0.00..7413.34 rows=523955 width=4)
(actualtime=0.023..259.901 rows=523948 loops=1)                Filter: (s <> 7)Total runtime: 38121.781 ms
 

The index m_5 is btree (s).  

Can that be done is a way which requires only 1 loop?

My understanding is that:
select p from m where s = 7  except select p from m where s != 7;

will return the same results as a simple 'select p from m where s = 7', yes?

Is there a way to do it with a self join which requires just a single loop?

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6


Re: A more efficient way?

From
Tom Lane
Date:
James Cloos <cloos@jhcloos.com> writes:
> I've a third-party app which is slower than it ought to be.
> It does one operation in particular by doing a query to get a list of
> rows, and then iterates though them to confirm whether it actually wants
> that row.  As you can imagine that is very slow.

> This query gets the correct data set in one go, but is also slow:

>  select p from m where s = 7 and p not in (select p from m where s != 7);

See if you can recast it as a NOT EXISTS.  NOT IN is hard to optimize
because of its weird behavior with nulls.

If you're working with a less-than-current version of PG, you may
instead have to resort to a left-join-with-is-null locution, ieselect m.p from m left join(select p from m where s !=
7)m2on (m.p=m2.p)where m2.p is null and m.s = 7;
 
but it's hard to wrap your head around that sort of thing, so I'd
advise against using it if you can get decent performance with EXISTS.
        regards, tom lane