Re: Poor performance with queries using clause: sth IN (...) - Mailing list pgsql-performance

From Gregory Stark
Subject Re: Poor performance with queries using clause: sth IN (...)
Date
Msg-id 87wszigjdl.fsf@oxford.xeocode.com
Whole thread Raw
In response to Poor performance with queries using clause: sth IN (...)  (Andrzej Zawadzki <zawadaa@wp.pl>)
List pgsql-performance
> AND '' IN ('', upper(b.nazwisko))
> AND '' IN ('', upper(b.imie))
> AND '78111104485' IN ('', b.pesel)
> AND '' IN ('', upper(trim(b.dowseria)))
> AND '' IN ('', b.dowosnr)
> AND 0 IN (0, b.typkred)
> AND k.datazwrot IS NULL;

Hum, interesting. Most of the work Postgres does with IN clauses is on the
assumption that the column you're trying to restrict is on the left hand side
of the IN clause.

1) I think you'll be much better off expanding these into OR clauses.

2) I assume the left hand sides of the IN clauses are actually parameters? I
   would recommend using bound parameters mostly for security but also for
   performance reasons in that case.

3) having upper() and trim() around the columns makes it basically impossible
   for the planner to use indexes even if it was capable of expanding the IN
   clauses into OR expressions. Your options are either

  a) use an expression index, for example
     CREATE INDEX idx_nazwisko on kredytob (upper(nazwisko))

  b) use a case-insensitive locale (which you may already be doing) in which
     case the upper() is simply unnecessary.

  c) use the citext data type (or a case insensitive indexable operator but we
     don't seem to have a case insensitive equals, only LIKE and regexp
     matches? That seems strange.)

4) You should consider using text or varchar instead of char(). char() has no
   performance advantages in Postgres and is annoying to work with.

Something like this with expression indexes on upper(nazwisko), upper(imie),
upper(trim(downseria)) would actually be optimized using indexes:

  AND (? = '' OR upper(b.nazwisko) = ?)
  AND (? = '' OR upper(b.imie) = ?)
  AND (? = '' OR b.pesel = ?)
  AND (? = '' OR upper(trim(b.downseria)) = ?)
  AND (? = '' OR b.dowosnr = ?)
  AND (? = 0  OR b.typkred = ?)
  AND k.datazwrot IS NULL

If this is the only query or a particularly important query you could consider
making all those indexes partial with "WHERE datazwrot IS NULL" as well.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


pgsql-performance by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Apparently useless bitmap scans
Next
From: Gregory Stark
Date:
Subject: Re: Nested loops overpriced