Thread: SQL performance

SQL performance

From

Robert DiFalco

Date:

02 June 2013, 22:39:08

I have a table called contacts. It has a BIGINT owner_id which references a record in the user table. It also has a BIGINT user_id which may be null. Additionally it has a BOOLEAN blocked column to indicate if a contact is blocked. The final detail is that multiple contacts for an owner may reference the same user.

I have a query to get all the user_ids of a non-blocked contact that is a mutual contact of the user. The important part of the table looks like this:

CREATE TABLE contacts

(

id BIGINT PRIMARY KEY NOT NULL, // generated

blocked BOOL,

owner_id BIGINT NOT NULL,

user_id BIGINT,

FOREIGN KEY ( owner_id ) REFERENCES app_users ( id ) ON DELETE CASCADE,

FOREIGN KEY ( user_id ) REFERENCES app_users ( id ) ON DELETE SET NULL

);

CREATE INDEX idx_contact_owner ON contacts ( owner_id );

CREATE INDEX idx_contact_mutual ON contacts ( owner_id, user_id ) WHERE user_id IS NOT NULL AND NOT blocked;

The query looks like this:

explain analyze verbose

select c.user_id

from contact_entity c

where c.owner_id=24 and c.user_id<>24 and c.user_id IS NOT NULL and NOT c.blocked and (exists (

select 1

from contact_entity c1

where NOT c1.blocked and c1.owner_id=c.user_id and c1.user_id IS NOT NULL and c1.user_id=24))

group by c.user_id;

This will get all the users for user 24 that are mutual unblocked contacts but exclude the user 24.

I have run this through explain several times and I'm out of ideas on the index. I note that I can also right the query like this:

explain analyze verbose

select distinct c.user_id

from contact_entity c left outer join contact_entity c1 on c1.owner_id = c.user_id and c1.user_id = c.owner_id

where NOT c.blocked AND NOT c1.blocked AND c.owner_id = 24 AND c.user_id <> 24

AND c.user_id IS NOT NULL AND c1.user_id IS NOT NULL

group by c.user_id;

I don't notice a big difference in the query plans. I also notice no difference if I replace the GROUP BY with DISTINCT.

My question is, can this be tightened further in a way I haven't been creative enough to try? Does it matter if I use the EXISTS versus the OUTER JOIN or the GROUP BY versus the DISTINCT.

Is there a better index and I just have not been clever enough to come up with it yet? I've tried a bunch.

Thanks in advance!!

Robert

Re: SQL performance

From

Szymon Guz

Date:

02 June 2013, 22:42:16

On 2 June 2013 21:39, Robert DiFalco <robert.difalco@gmail.com> wrote:

I have a table called contacts. It has a BIGINT owner_id which references a record in the user table. It also has a BIGINT user_id which may be null. Additionally it has a BOOLEAN blocked column to indicate if a contact is blocked. The final detail is that multiple contacts for an owner may reference the same user.

I have a query to get all the user_ids of a non-blocked contact that is a mutual contact of the user. The important part of the table looks like this:

CREATE TABLE contacts
(
id BIGINT PRIMARY KEY NOT NULL, // generated
blocked BOOL,
owner_id BIGINT NOT NULL,
user_id BIGINT,
FOREIGN KEY ( owner_id ) REFERENCES app_users ( id ) ON DELETE CASCADE,
FOREIGN KEY ( user_id ) REFERENCES app_users ( id ) ON DELETE SET NULL
);
CREATE INDEX idx_contact_owner ON contacts ( owner_id );
CREATE INDEX idx_contact_mutual ON contacts ( owner_id, user_id ) WHERE user_id IS NOT NULL AND NOT blocked;

The query looks like this:

explain analyze verbose
select c.user_id
from contact_entity c
where c.owner_id=24 and c.user_id<>24 and c.user_id IS NOT NULL and NOT c.blocked and (exists (
select 1
from contact_entity c1
where NOT c1.blocked and c1.owner_id=c.user_id and c1.user_id IS NOT NULL and c1.user_id=24))
group by c.user_id;

This will get all the users for user 24 that are mutual unblocked contacts but exclude the user 24.

I have run this through explain several times and I'm out of ideas on the index. I note that I can also right the query like this:

explain analyze verbose
select distinct c.user_id
from contact_entity c left outer join contact_entity c1 on c1.owner_id = c.user_id and c1.user_id = c.owner_id
where NOT c.blocked AND NOT c1.blocked AND c.owner_id = 24 AND c.user_id <> 24
AND c.user_id IS NOT NULL AND c1.user_id IS NOT NULL
group by c.user_id;

I don't notice a big difference in the query plans. I also notice no difference if I replace the GROUP BY with DISTINCT.

My question is, can this be tightened further in a way I haven't been creative enough to try? Does it matter if I use the EXISTS versus the OUTER JOIN or the GROUP BY versus the DISTINCT.

Is there a better index and I just have not been clever enough to come up with it yet? I've tried a bunch.

Thanks in advance!!

Robert

Hi Robert,

could you show us the plans?

thanks,

Szymon

Re: SQL performance

From

Robert DiFalco

Date:

02 June 2013, 23:19:26

Absolutely:

explain analyze verbose

select c.user_id

from contact_entity c left outer join contact_entity c1 on c1.owner_id = c.user_id and c1.user_id = c.owner_id

where NOT c.blocked AND NOT c1.blocked AND c.owner_id = 24 AND c.user_id != 24

AND c.user_id IS NOT NULL AND c1.user_id IS NOT NULL

group by c.user_id;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------

Group (cost=0.00..9.00 rows=1 width=8) (actual time=0.170..0.301 rows=8 loops=1)

Output: c.user_id

-> Merge Join (cost=0.00..9.00 rows=1 width=8) (actual time=0.166..0.270 rows=17 loops=1)

Output: c.user_id

Merge Cond: (c.user_id = c1.owner_id)

-> Index Scan using idx_contact_mutual on public.contact_entity c (cost=0.00..5.10 rows=2 width=16) (actual time=0.146..0.164 rows=11 loops=1)

Output: c.id, c.blocked, c.first_name, c.last_name, c.owner_id, c.user_id

Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))

Filter: (c.user_id <> 24)

Rows Removed by Filter: 1

-> Index Scan using idx_contact_mutual on public.contact_entity c1 (cost=0.00..6.45 rows=1 width=16) (actual time=0.012..0.049 rows=18 loops=1)

Output: c1.id, c1.blocked, c1.first_name, c1.last_name, c1.owner_id, c1.user_id

Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id = 24))

Total runtime: 0.388 ms

(14 rows)

explain analyze verbose

select c.user_id

from contact_entity c

where c.owner_id=24 and c.user_id<>24 and c.user_id IS NOT NULL and NOT c.blocked and (exists(

select 1

from contact_entity c1

where NOT c1.blocked and c1.owner_id=c.user_id and c1.user_id IS NOT NULL and c1.user_id=c.owner_id))

group by c.user_id;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------

Group (cost=0.00..9.00 rows=1 width=8) (actual time=0.048..0.159 rows=8 loops=1)

Output: c.user_id

-> Merge Semi Join (cost=0.00..9.00 rows=1 width=8) (actual time=0.044..0.137 rows=9 loops=1)

Output: c.user_id

Merge Cond: (c.user_id = c1.owner_id)

-> Index Scan using idx_contact_mutual on public.contact_entity c (cost=0.00..5.10 rows=2 width=16) (actual time=0.024..0.042 rows=11 loops=1)

Output: c.id, c.blocked, c.first_name, c.last_name, c.owner_id, c.user_id

Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))

Filter: (c.user_id <> 24)

Rows Removed by Filter: 1

-> Index Scan using idx_contact_mutual on public.contact_entity c1 (cost=0.00..6.45 rows=1 width=16) (actual time=0.011..0.047 rows=16 loops=1)

Output: c1.id, c1.blocked, c1.first_name, c1.last_name, c1.owner_id, c1.user_id

Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id = 24))

Total runtime: 0.224 ms

(14 rows)

The only difference I see between the EXISTS and LEFT OUTER JOIN is the Merge Join versus the Merge Semi Join. Then again, there may be a third option for this query besides those two that will be much better. But those are the only two reasonable variations I can think of.

The GROUP BY versus the DISTINCT on c.user_id makes no impact at all on the plan. They are exactly the same.

On Sun, Jun 2, 2013 at 12:42 PM, Szymon Guz <mabewlun@gmail.com> wrote:

On 2 June 2013 21:39, Robert DiFalco <robert.difalco@gmail.com> wrote:
I have a table called contacts. It has a BIGINT owner_id which references a record in the user table. It also has a BIGINT user_id which may be null. Additionally it has a BOOLEAN blocked column to indicate if a contact is blocked. The final detail is that multiple contacts for an owner may reference the same user.

I have a query to get all the user_ids of a non-blocked contact that is a mutual contact of the user. The important part of the table looks like this:

CREATE TABLE contacts
(
id BIGINT PRIMARY KEY NOT NULL, // generated
blocked BOOL,
owner_id BIGINT NOT NULL,
user_id BIGINT,
FOREIGN KEY ( owner_id ) REFERENCES app_users ( id ) ON DELETE CASCADE,
FOREIGN KEY ( user_id ) REFERENCES app_users ( id ) ON DELETE SET NULL
);
CREATE INDEX idx_contact_owner ON contacts ( owner_id );
CREATE INDEX idx_contact_mutual ON contacts ( owner_id, user_id ) WHERE user_id IS NOT NULL AND NOT blocked;

The query looks like this:

explain analyze verbose
select c.user_id
from contact_entity c
where c.owner_id=24 and c.user_id<>24 and c.user_id IS NOT NULL and NOT c.blocked and (exists (
select 1
from contact_entity c1
where NOT c1.blocked and c1.owner_id=c.user_id and c1.user_id IS NOT NULL and c1.user_id=24))
group by c.user_id;

This will get all the users for user 24 that are mutual unblocked contacts but exclude the user 24.

I have run this through explain several times and I'm out of ideas on the index. I note that I can also right the query like this:

explain analyze verbose
select distinct c.user_id
from contact_entity c left outer join contact_entity c1 on c1.owner_id = c.user_id and c1.user_id = c.owner_id
where NOT c.blocked AND NOT c1.blocked AND c.owner_id = 24 AND c.user_id <> 24
AND c.user_id IS NOT NULL AND c1.user_id IS NOT NULL
group by c.user_id;

I don't notice a big difference in the query plans. I also notice no difference if I replace the GROUP BY with DISTINCT.

My question is, can this be tightened further in a way I haven't been creative enough to try? Does it matter if I use the EXISTS versus the OUTER JOIN or the GROUP BY versus the DISTINCT.

Is there a better index and I just have not been clever enough to come up with it yet? I've tried a bunch.

Thanks in advance!!

Robert

Hi Robert,
could you show us the plans?

thanks,
Szymon

Re: SQL performance

From

Kevin Grittner

Date:

03 June 2013, 17:26:28

Robert DiFalco <robert.difalco@gmail.com> wrote:

> CREATE TABLE contacts
> (
>     id BIGINT PRIMARY KEY NOT NULL, // generated
>
>     blocked BOOL,
>     owner_id BIGINT NOT NULL,
>     user_id BIGINT,
>     FOREIGN KEY ( owner_id ) REFERENCES app_users ( id ) ON DELETE CASCADE,
>
>     FOREIGN KEY ( user_id ) REFERENCES app_users ( id ) ON DELETE SET NULL
> );
> CREATE INDEX idx_contact_owner ON contacts ( owner_id );
> CREATE INDEX idx_contact_mutual ON contacts ( owner_id, user_id ) WHERE user_id IS NOT NULL AND NOT blocked;

Well, the first thing I note is that "blocked" can be NULL.  You
exclude rows from the result where it IS NULL in either row.  That
may be what you really want, but it seems worth mentioning.  If you
don't need to support missing values there, you might want to add a
NOT NULL constraint.  If it should be NULL when user_id is, but not
otherwise, you might want a row-level constraint.  You might shave
a tiny amount off the runtime by getting rid of the redundant tests
for NOT NULL on user_id; it cannot compare as either TRUE on either
= or <> if either (or both) values are NULL.

> explain analyze verbose
> select c.user_id
> from contact_entity c left outer join contact_entity c1 on c1.owner_id = c.user_id and c1.user_id = c.owner_id
> where NOT c.blocked AND NOT c1.blocked AND c.owner_id = 24 AND c.user_id != 24
> AND c.user_id IS NOT NULL AND c1.user_id IS NOT NULL
> group by c.user_id;

> Group  (cost=0.00..9.00 rows=1 width=8) (actual time=0.170..0.301 rows=8 loops=1)
>   Output: c.user_id
>   ->  Merge Join  (cost=0.00..9.00 rows=1 width=8) (actual time=0.166..0.270 rows=17 loops=1)
>         Output: c.user_id
>         Merge Cond: (c.user_id = c1.owner_id)
>         ->  Index Scan using idx_contact_mutual on public.contact_entity c  (cost=0.00..5.10 rows=2 width=16) (actual
time=0.146..0.164rows=11 loops=1) 
>               Output: c.id, c.blocked, c.first_name, c.last_name, c.owner_id, c.user_id
>               Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))
>               Filter: (c.user_id <> 24)
>               Rows Removed by Filter: 1
>         ->  Index Scan using idx_contact_mutual on public.contact_entity c1  (cost=0.00..6.45 rows=1 width=16)
(actualtime=0.012..0.049 rows=18 loops=1) 
>               Output: c1.id, c1.blocked, c1.first_name, c1.last_name, c1.owner_id, c1.user_id
>               Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id = 24))
> Total runtime: 0.388 ms

> explain analyze verbose
> select c.user_id
> from contact_entity c
> where c.owner_id=24 and c.user_id<>24 and c.user_id IS NOT NULL and NOT c.blocked and (exists(
>   select 1
>   from contact_entity c1
>   where NOT c1.blocked and c1.owner_id=c.user_id and c1.user_id IS NOT NULL and c1.user_id=c.owner_id))
> group by c.user_id;

> Group  (cost=0.00..9.00 rows=1 width=8) (actual time=0.048..0.159 rows=8 loops=1)
>   Output: c.user_id
>   ->  Merge Semi Join  (cost=0.00..9.00 rows=1 width=8) (actual time=0.044..0.137 rows=9 loops=1)
>         Output: c.user_id
>         Merge Cond: (c.user_id = c1.owner_id)
>         ->  Index Scan using idx_contact_mutual on public.contact_entity c  (cost=0.00..5.10 rows=2 width=16) (actual
time=0.024..0.042rows=11 loops=1) 
>               Output: c.id, c.blocked, c.first_name, c.last_name, c.owner_id, c.user_id
>               Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))
>               Filter: (c.user_id <> 24)
>               Rows Removed by Filter: 1
>         ->  Index Scan using idx_contact_mutual on public.contact_entity c1  (cost=0.00..6.45 rows=1 width=16)
(actualtime=0.011..0.047 rows=16 loops=1) 
>               Output: c1.id, c1.blocked, c1.first_name, c1.last_name, c1.owner_id, c1.user_id
>               Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id = 24))
> Total runtime: 0.224 ms

So, it looks like you can get about 3000 to 4000 of these per
second on a single connection -- at least in terms of server-side
processing.  Were you expecting more than that?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SQL performance

From

Robert DiFalco

Date:

03 June 2013, 17:46:39

Thanks Kevin, the blocked should not be NULLABLE. I will fix that. This is with a pretty tiny dataset. I'm a little paranoid that with a large one I will have issues.

Believe it or not the query became faster when I put the tests for user_id IS NOT NULL in there (and added an index for that) then without the tests and index.

It kinda makes me wonder if (from a performance perspective) I should change the schema to pull user_id out of contacts and created a related table with {contacts.id, user_id} where user_id is never null.

On Mon, Jun 3, 2013 at 7:26 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Robert DiFalco <robert.difalco@gmail.com> wrote:

> CREATE TABLE contacts
> (
>     id BIGINT PRIMARY KEY NOT NULL, // generated
>
>     blocked BOOL,
>     owner_id BIGINT NOT NULL,
>     user_id BIGINT,
>     FOREIGN KEY ( owner_id ) REFERENCES app_users ( id ) ON DELETE CASCADE,
>
>     FOREIGN KEY ( user_id ) REFERENCES app_users ( id ) ON DELETE SET NULL
> );
> CREATE INDEX idx_contact_owner ON contacts ( owner_id );
> CREATE INDEX idx_contact_mutual ON contacts ( owner_id, user_id ) WHERE user_id IS NOT NULL AND NOT blocked;

Well, the first thing I note is that "blocked" can be NULL. You
exclude rows from the result where it IS NULL in either row. That
may be what you really want, but it seems worth mentioning. If you
don't need to support missing values there, you might want to add a
NOT NULL constraint. If it should be NULL when user_id is, but not
otherwise, you might want a row-level constraint. You might shave
a tiny amount off the runtime by getting rid of the redundant tests
for NOT NULL on user_id; it cannot compare as either TRUE on either
= or <> if either (or both) values are NULL.

> explain analyze verbose
> select c.user_id
> from contact_entity c left outer join contact_entity c1 on c1.owner_id = c.user_id and c1.user_id = c.owner_id
> where NOT c.blocked AND NOT c1.blocked AND c.owner_id = 24 AND c.user_id != 24
> AND c.user_id IS NOT NULL AND c1.user_id IS NOT NULL
> group by c.user_id;

> Group (cost=0.00..9.00 rows=1 width=8) (actual time=0.170..0.301 rows=8 loops=1)
>   Output: c.user_id
>   -> Merge Join (cost=0.00..9.00 rows=1 width=8) (actual time=0.166..0.270 rows=17 loops=1)
>         Output: c.user_id
>         Merge Cond: (c.user_id = c1.owner_id)
>         -> Index Scan using idx_contact_mutual on public.contact_entity c (cost=0.00..5.10 rows=2 width=16) (actual time=0.146..0.164 rows=11 loops=1)
>               Output: c.id, c.blocked, c.first_name, c.last_name, c.owner_id, c.user_id
>               Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))
>               Filter: (c.user_id <> 24)
>               Rows Removed by Filter: 1
>         -> Index Scan using idx_contact_mutual on public.contact_entity c1 (cost=0.00..6.45 rows=1 width=16) (actual time=0.012..0.049 rows=18 loops=1)
>               Output: c1.id, c1.blocked, c1.first_name, c1.last_name, c1.owner_id, c1.user_id
>               Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id = 24))
> Total runtime: 0.388 ms

> explain analyze verbose
> select c.user_id
> from contact_entity c
> where c.owner_id=24 and c.user_id<>24 and c.user_id IS NOT NULL and NOT c.blocked and (exists(
>   select 1
>   from contact_entity c1
>   where NOT c1.blocked and c1.owner_id=c.user_id and c1.user_id IS NOT NULL and c1.user_id=c.owner_id))
> group by c.user_id;

> Group (cost=0.00..9.00 rows=1 width=8) (actual time=0.048..0.159 rows=8 loops=1)
>   Output: c.user_id
>   -> Merge Semi Join (cost=0.00..9.00 rows=1 width=8) (actual time=0.044..0.137 rows=9 loops=1)
>         Output: c.user_id
>         Merge Cond: (c.user_id = c1.owner_id)
>         -> Index Scan using idx_contact_mutual on public.contact_entity c (cost=0.00..5.10 rows=2 width=16) (actual time=0.024..0.042 rows=11 loops=1)
>               Output: c.id, c.blocked, c.first_name, c.last_name, c.owner_id, c.user_id
>               Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))
>               Filter: (c.user_id <> 24)
>               Rows Removed by Filter: 1
>         -> Index Scan using idx_contact_mutual on public.contact_entity c1 (cost=0.00..6.45 rows=1 width=16) (actual time=0.011..0.047 rows=16 loops=1)
>               Output: c1.id, c1.blocked, c1.first_name, c1.last_name, c1.owner_id, c1.user_id
>               Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id = 24))
> Total runtime: 0.224 ms

So, it looks like you can get about 3000 to 4000 of these per
second on a single connection -- at least in terms of server-side
processing. Were you expecting more than that?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company