Suboptimal query plan when using expensive BCRYPT functions - Mailing list pgsql-performance

From Erik van Zijst
Subject Suboptimal query plan when using expensive BCRYPT functions
Date
Msg-id CA+69USsLsVb8U5=r7KYs9GYT4Rx9DtyaohsfHvLzVKS=fZNUEw@mail.gmail.com
Whole thread Raw
Responses Re: Suboptimal query plan when using expensive BCRYPT functions  (bricklen <bricklen@gmail.com>)
Re: Suboptimal query plan when using expensive BCRYPT functions  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-performance
Hi there,

I've got a relatively simple query that contains expensive BCRYPT
functions that gets optimized in a way that causes postgres to compute
more bcrypt hashes than necessary, thereby dramatically slowing things
down.

In a certain part of our application we need to lookup users by their
username, email address and password. Now we don't store plaintext
passwords and so the query needs to compute bcrypt hashes on the fly:

    SELECT DISTINCT u.*
    FROM auth_user u
    JOIN bb_userprofile p ON p.user_id = u.id
    JOIN bb_identity i ON i.profile_id = p.id
    WHERE
    (
      (
        u.username ILIKE 'detkin'
        OR
        i.email ILIKE 'foo@example.com'
      )
      AND
      (
        SUBSTRING(password FROM 8) = CRYPT(
          'detkin', SUBSTRING(password FROM 8))
      )
    )

These queries are generated by a parser that translates from an
external query language to SQL run on the database. This test db
contains 12 user records.

With a single bcrypt hash taking ~300ms to compute, this is a recipe
for disaster and so the app only allows queries that require only a
very small number of bcrypt computation.

E.g. the user must always "AND" the password lookup with a clause like
" username = 'foo' AND password = 'bar'" (username is unique).

However, while the query above technically only needs to compute 1
hash (there is a user 'detkin' and email 'foo@example.com' does not
exist), it instead creates a query plan that computes hashes *before*
filtering on username and email, leading to 12 hash computations and a
very slow query.

The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE

The schemas for the 3 tables involved are here:
http://pgsql.privatepaste.com/f72020ad0a

As a quick experiment I tried moving the joins and email lookup into a
nested IN query, but that still generates a plan that computes hashes
for all 12 users, before picking out the 1 whose username matches.

Is there any way I can get postgres to perform the hash calculations
on the *result* of the other parts of the where clause, instead of the
other way around? Or else rewrite the query?

Cheers,
Erik


pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: Getting query plan alternatives from query planner?
Next
From: bricklen
Date:
Subject: Re: Suboptimal query plan when using expensive BCRYPT functions