Thread: MINUS and slow 'not in'

MINUS and slow 'not in'

From

pierre

Date:

23 November 1998, 23:54:02

All,
  I've got a small problem.

Say you have tables A and B. They both have a userid column. Table B was
selected and previously filled with entries from table A. Lets say about
2000 out of 40,000. Now
I want to select everything from A that isn't in B, so about 38,000
entries.

I can't seem to get the MINUS to work within a select statement all I
ever get are
parse errors. Is this even implemented yet?

I then tried using a 'not in' clause.

select * from A where user_id not in (select * from B);

This is VERY slow, and examining the explain output tells me that it will
use the user_id index for table B, but a sequential scan of A even though
A has an index for the user_id column.

Am I missing something? Does anyone have any ideas?

Thanks for any help.

-=pierre

Re: [SQL] MINUS and slow 'not in'

From

Herouth Maoz

Date:

24 November 1998, 04:14:25

At 6:53 +0200 on 24/11/98, pierre wrote:

> I then tried using a 'not in' clause.
>
> select * from A where user_id not in (select * from B);
>
> This is VERY slow, and examining the explain output tells me that it will
> use the user_id index for table B, but a sequential scan of A even though
> A has an index for the user_id column.

First, I assume you meant "select user_id from B", not "select *", or
something is very strange here.

Anyway, suppose you had two tables. How would you go about doing this while
using *both* indices? I don't think it's possible. You have a condition
that says: include each row which doesn't meet a certain criteria. The only
way to do it is to scan each row, get the value of its user_id, and then go
to be, and use its index to find if the user_id we already have is NOT
there.

You can use an index only when you have a specific value to search for. A
NOT IN clause doesn't supply a specific value, so you can't use the outer
index.

You may try to convert the NOT IN to a NOT EXISTS clause, and see if it
improves anything, but it will still require a sequential search.

If I needed this query often, I'd try to optimize it by adding a column to
table A, marking the records that match, and then selecting all the records
which don't match. I'm not sure whether one can index a boolean field in
the current version of PostgreSQL, but if not, you can probably use a char
field instead. I suppose you can make sure this column stays up-to-date
with rules, or do the update as a preparatory step: Update all to "N", then
update all the fields that match to "Y" with a join. VACUUM, ANALYZE, and
then you can start selecting.

This requires two sequential scans plus vacuums before you start selecting,
so it may not be worth it if you only select once by this criteria... I'd
go with the NOT IN or NOT EXISTS solution, which gives you a sequential
scan with minimal search over the index of table B.

SELECT * FROM A
WHERE NOT EXISTS (
   SELECT * FROM B
   WHERE B.user_id = A.user_id
);

By the way, if you have any specific criteria on A, besides the NOT EXISTS
or NOT IN, they may cause an index scan on A as well.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

Re: [SQL] MINUS and slow 'not in'

From

pierre@desertmoon.com

Date:

24 November 1998, 09:52:41

>
> At 6:53 +0200 on 24/11/98, pierre wrote:
>
> > I then tried using a 'not in' clause.
> >
> > select * from A where user_id not in (select * from B);
> >
> > This is VERY slow, and examining the explain output tells me that it will
> > use the user_id index for table B, but a sequential scan of A even though
> > A has an index for the user_id column.
>
> First, I assume you meant "select user_id from B", not "select *", or
> something is very strange here.
>
> You may try to convert the NOT IN to a NOT EXISTS clause, and see if it
> improves anything, but it will still require a sequential search.
>
> SELECT * FROM A
> WHERE NOT EXISTS (
>    SELECT * FROM B
>    WHERE B.user_id = A.user_id
> );
>
> By the way, if you have any specific criteria on A, besides the NOT EXISTS
> or NOT IN, they may cause an index scan on A as well.
>

Ok...remember that I have table A with 40k rows, and B with 2k. What I
want to really get out of the query are 2k rows from A that are not contained
in B. After reading your email, I thought about using a cursor and only
fetching the first 2k rows that match the query. This helped tremendously
in that it didn't try and return all 38k rows. However I now need to
take the results of the fetch and dump it into table B.

How can one use fetch to insert?

I've tried...

insert into B
fetch 2000 from fubar;

Which just gives a parser error. There is very little documentation on
cursors written up that I can find. I've even searched the email archives.
Ideas?

-=pierre

Re: [SQL] MINUS and slow 'not in'

From

Herouth Maoz

Date:

24 November 1998, 10:07:46

At 16:57 +0200 on 24/11/98, pierre@desertmoon.com wrote:

> I've tried...
>
> insert into B
> fetch 2000 from fubar;
>
> Which just gives a parser error. There is very little documentation on
> cursors written up that I can find. I've even searched the email archives.
> Ideas?

Well, this usage of a cursor is not supported, as far as I know. However,
if you have 6.4 (do you?), you can use SET QUERY_LIMIT to limit the number
of rows fetched from the SELECT. I suppose you can set it back after you do
the INSERT.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

Re: [SQL] MINUS and slow 'not in'

From

pierre@desertmoon.com

Date:

24 November 1998, 10:25:45

>
> At 16:57 +0200 on 24/11/98, pierre@desertmoon.com wrote:
>
>
> > I've tried...
> >
> > insert into B
> > fetch 2000 from fubar;
> >
> > Which just gives a parser error. There is very little documentation on
> > cursors written up that I can find. I've even searched the email archives.
> > Ideas?
>
> Well, this usage of a cursor is not supported, as far as I know. However,
> if you have 6.4 (do you?), you can use SET QUERY_LIMIT to limit the number
> of rows fetched from the SELECT. I suppose you can set it back after you do
> the INSERT.
>

Yeah I've got 6.4. I tried:

set query_limit to 2000;

and got:

ERROR:  parser: parse error at or near "2000"

Ideas?

-=pierre

Re: [SQL] MINUS and slow 'not in'

From

Herouth Maoz

Date:

24 November 1998, 10:32:38

At 17:31 +0200 on 24/11/98, pierre@desertmoon.com wrote:

>
> Yeah I've got 6.4. I tried:
>
> set query_limit to 2000;
>
> and got:
>
> ERROR:  parser: parse error at or near "2000"
>
> Ideas?

Well, I don't have 6.4 as yet. However, reading the manpage, I surmise that
the value passed is a string value (To support "SET QUERY_LIMIT TO
'unlimited'"). Thus, try quoting the 2000...

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

Re: [SQL] MINUS and slow 'not in'

From

jwieck@debis.com (Jan Wieck)

Date:

24 November 1998, 10:39:40

> Yeah I've got 6.4. I tried:
>
> set query_limit to 2000;
>
> and got:
>
> ERROR:  parser: parse error at or near "2000"
>
> Ideas?

    I think you must use '2000' instead.

    Anyway,  the  "set  query_limit" will disappear again in v6.5
    (at least) because it  potentially  can  break  rewrite  rule
    system semantics.

    Instead  you  might  want  to use the LIMIT/OFFSET patch I've
    created in the v6.4 feature patch. This is what  will  be  in
    v6.5.  I'll  take a look how I can put it onto the server and
    drop a note here after.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

LIMIT patch available (was: Re: [SQL] MINUS and slow 'not in')

From

jwieck@debis.com (Jan Wieck)

Date:

25 November 1998, 05:48:46

>
> > Yeah I've got 6.4. I tried:
> >
> > set query_limit to 2000;
> >
> > and got:
> >
> > ERROR:  parser: parse error at or near "2000"
> >
> > Ideas?
>
>     I think you must use '2000' instead.
>
>     Anyway,  the  "set  query_limit" will disappear again in v6.5
>     (at least) because it  potentially  can  break  rewrite  rule
>     system semantics.
>
>     Instead  you  might  want  to use the LIMIT/OFFSET patch I've
>     created in the v6.4 feature patch. This is what  will  be  in
>     v6.5.  I'll  take a look how I can put it onto the server and
>     drop a note here after.

    Done. URL is:

    ftp://ftp.postgresql.org/pub/patches/v6.4-feature-patch.tar.gz


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [SQL] MINUS and slow 'not in'

From

David Martinez Cuevas

Date:

14 December 1998, 21:14:43

Hi everybody.

I've been trying to connect a win-machine-toaster-oven to my
postgresql server, across the psqlodbc ( from Insight )
... but  when I get to the "File DSN" configuration, everything
fails: it tells me that I have a user authentication problem on
the settings. But my pg_hba.conf doesn't have restrictions
about the users or host that can connect.

I'm using ver 6.3.2 with the latest patched psqlodbc,
what do you think I should do ???

Do I have to get Win 98 ???    : )

Merry .Xmas

--
"Cuando Microdog ladra, es porque vamos caminando..."
                           M.C.S. et al

David Martinez Cuevas
     Office 622-60-72      @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
     Home 565-25-17          "Eat Linux, Drink Linux...  SMOKE LINUX "
                           @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Re: [SQL] MINUS and slow 'not in'

From

David Hartwig

Date:

15 December 1998, 10:22:35

Have you checked out:

    http://www.insightdist.com/psqlodbc/psqlodbc_faq.html#dsnsetup

Have you established any connections from a remote machine?

If not,  be sure to use  the "-i" option for postmaster startup.

If this does not work double check the entries in pg_hba.conf.

As far as OS's go;  Win(95)|(98)|(NT) are all verified as well as ports to
some Unix boxes.   Win3.1 is right out.   (Please, no requests to port to
16 bit)

David Martinez Cuevas wrote:

> Hi everybody.
>
> I've been trying to connect a win-machine-toaster-oven to my
> postgresql server, across the psqlodbc ( from Insight )
> ... but  when I get to the "File DSN" configuration, everything
> fails: it tells me that I have a user authentication problem on
> the settings. But my pg_hba.conf doesn't have restrictions
> about the users or host that can connect.
>
> I'm using ver 6.3.2 with the latest patched psqlodbc,
> what do you think I should do ???
>
> Do I have to get Win 98 ???    : )
>
> Merry .Xmas
>
> --
> "Cuando Microdog ladra, es porque vamos caminando..."
>                            M.C.S. et al
>
> David Martinez Cuevas
>      Office 622-60-72      @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>      Home 565-25-17          "Eat Linux, Drink Linux...  SMOKE LINUX "
>                            @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@