Re: SQL performance - Mailing list pgsql-performance

From Robert DiFalco
Subject Re: SQL performance
Date
Msg-id CAAXGW-zDR0XA7K5=xO53J45TLGGapHqYdPnCOwqJm3Jd54begA@mail.gmail.com
Whole thread Raw
In response to Re: SQL performance  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-performance
Thanks Kevin, the blocked should not be NULLABLE. I will fix that.  This is with a pretty tiny dataset. I'm a little paranoid that with a large one I will have issues.

Believe it or not the query became faster when I put the tests for user_id IS NOT NULL in there (and added an index for that) then without the tests and index.

It kinda makes me wonder if (from a performance perspective) I should change the schema to pull user_id out of contacts and created a related table with {contacts.id, user_id} where user_id is never null.




On Mon, Jun 3, 2013 at 7:26 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
Robert DiFalco <robert.difalco@gmail.com> wrote:

> CREATE TABLE contacts
> (
>     id BIGINT PRIMARY KEY NOT NULL, // generated
>
>     blocked BOOL,
>     owner_id BIGINT NOT NULL,
>     user_id BIGINT,
>     FOREIGN KEY ( owner_id ) REFERENCES app_users ( id ) ON DELETE CASCADE,
>
>     FOREIGN KEY ( user_id ) REFERENCES app_users ( id ) ON DELETE SET NULL
> );
> CREATE INDEX idx_contact_owner ON contacts ( owner_id );
> CREATE INDEX idx_contact_mutual ON contacts ( owner_id, user_id ) WHERE user_id IS NOT NULL AND NOT blocked;

Well, the first thing I note is that "blocked" can be NULL.  You
exclude rows from the result where it IS NULL in either row.  That
may be what you really want, but it seems worth mentioning.  If you
don't need to support missing values there, you might want to add a
NOT NULL constraint.  If it should be NULL when user_id is, but not
otherwise, you might want a row-level constraint.  You might shave
a tiny amount off the runtime by getting rid of the redundant tests
for NOT NULL on user_id; it cannot compare as either TRUE on either
= or <> if either (or both) values are NULL.

> explain analyze verbose
> select c.user_id
> from contact_entity c left outer join contact_entity c1 on c1.owner_id = c.user_id and c1.user_id = c.owner_id
> where NOT c.blocked AND NOT c1.blocked AND c.owner_id = 24 AND c.user_id != 24
> AND c.user_id IS NOT NULL AND c1.user_id IS NOT NULL
> group by c.user_id;

> Group  (cost=0.00..9.00 rows=1 width=8) (actual time=0.170..0.301 rows=8 loops=1)
>   Output: c.user_id
>   ->  Merge Join  (cost=0.00..9.00 rows=1 width=8) (actual time=0.166..0.270 rows=17 loops=1)
>         Output: c.user_id
>         Merge Cond: (c.user_id = c1.owner_id)
>         ->  Index Scan using idx_contact_mutual on public.contact_entity c  (cost=0.00..5.10 rows=2 width=16) (actual time=0.146..0.164 rows=11 loops=1)
>               Output: c.id, c.blocked, c.first_name, c.last_name, c.owner_id, c.user_id
>               Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))
>               Filter: (c.user_id <> 24)
>               Rows Removed by Filter: 1
>         ->  Index Scan using idx_contact_mutual on public.contact_entity c1  (cost=0.00..6.45 rows=1 width=16) (actual time=0.012..0.049 rows=18 loops=1)
>               Output: c1.id, c1.blocked, c1.first_name, c1.last_name, c1.owner_id, c1.user_id
>               Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id = 24))
> Total runtime: 0.388 ms

> explain analyze verbose
> select c.user_id
> from contact_entity c
> where c.owner_id=24 and c.user_id<>24 and c.user_id IS NOT NULL and NOT c.blocked and (exists(
>   select 1
>   from contact_entity c1
>   where NOT c1.blocked and c1.owner_id=c.user_id and c1.user_id IS NOT NULL and c1.user_id=c.owner_id))
> group by c.user_id;

> Group  (cost=0.00..9.00 rows=1 width=8) (actual time=0.048..0.159 rows=8 loops=1)
>   Output: c.user_id
>   ->  Merge Semi Join  (cost=0.00..9.00 rows=1 width=8) (actual time=0.044..0.137 rows=9 loops=1)
>         Output: c.user_id
>         Merge Cond: (c.user_id = c1.owner_id)
>         ->  Index Scan using idx_contact_mutual on public.contact_entity c  (cost=0.00..5.10 rows=2 width=16) (actual time=0.024..0.042 rows=11 loops=1)
>               Output: c.id, c.blocked, c.first_name, c.last_name, c.owner_id, c.user_id
>               Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))
>               Filter: (c.user_id <> 24)
>               Rows Removed by Filter: 1
>         ->  Index Scan using idx_contact_mutual on public.contact_entity c1  (cost=0.00..6.45 rows=1 width=16) (actual time=0.011..0.047 rows=16 loops=1)
>               Output: c1.id, c1.blocked, c1.first_name, c1.last_name, c1.owner_id, c1.user_id
>               Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id = 24))
> Total runtime: 0.224 ms

So, it looks like you can get about 3000 to 4000 of these per
second on a single connection -- at least in terms of server-side
processing.  Were you expecting more than that?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-performance by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: SQL performance
Next
From: Jeff Janes
Date:
Subject: Re: Advice on optimizing select/index