Thread: Slow Query Help

Slow Query Help

From

Will Platnick

Date:

05 February 2013, 03:46:01

We upgraded from PG 9.1 to 9.2. Since the upgrade, the # of active queries has raised significantly, especially during our peak time where lots of users are logging in. According to New Relic, this query is now taking up the most amount of time during peak activity and my pg_stat_activity and slow log sampling agrees. We have 3 DB servers referenced here, production running 9.2.2, semi-idle (idle except for replication when I ran the test) running 9.2.2, and 9.1.3 completely idle with an old dump restored.

HW: All servers have the same amount of RAM and CPU and all our indexes fit in RAM. HD's differ, but they're all SSD and since the explain analyze's are all indexes, I'm going to assume it doesn't mean that much in the context.

Configuration across all three servers is the same.

Table was auto vacuumed today.

Idle DB running 9.1.3

http://explain.depesz.com/s/7DZ

Semi-Idle DB running 9.2.2 (transactions ran previously and replication happening, but no active traffic when query was ran)

http://explain.depesz.com/s/fEE

Production DB running 9.2.2 (Incredibly Active)

http://explain.depesz.com/s/qVW

Table "public.users"

Column | Type | Modifiers

-------------------+--------------------------+----------------------------------------------------

id | integer | not null default nextval('users_id_seq'::regclass)

created_dt | timestamp with time zone | default now()

last_login_dt | timestamp with time zone | default now()

email | character varying | not null

verified | boolean | default false

first_name | character varying |

last_name | character varying |

username | character varying(50) | not null

location | character varying |

im_type | character varying |

im_username | character varying |

website | character varying |

phone_mobile | character varying(30) |

postal_code | character varying(10) |

language_tag | character varying(7) | default 'en'::character varying

settings | text | default ''::text

country | character(2) | default ''::bpchar

timezone | character varying(50) | not null default 'UTC'::character varying

verify_hash | character varying |

bio | character varying(160) |

twitter | character varying |

facebook | character varying |

search_updated_dt | timestamp with time zone | not null default now()

source | character varying |

auto_verified | boolean |

password | character(60) |

google | character varying |

gender | smallint | not null default 0

birthdate | date |

weibo | character varying |

Indexes:

"users_pkey" PRIMARY KEY, btree (id)

"u_auth_email_index" UNIQUE, btree (lower(email::text))

"u_auth_uname_index" UNIQUE, btree (lower(username::text))

"u_auth_verify_hash_idx" UNIQUE, btree (verify_hash)

"users_search_updated_dt_idx" btree (search_updated_dt DESC)

Will Platnick

Sent with Sparrow

Re: Slow Query Help

From

Heikki Linnakangas

Date:

05 February 2013, 13:58:35

On 05.02.2013 05:45, Will Platnick wrote:
> We upgraded from PG 9.1 to 9.2. Since the upgrade, the # of active queries has raised significantly, especially
duringour peak time where lots of users are logging in.  According to New Relic, this query is now taking up the most
amountof time during peak activity and my pg_stat_activity and slow log sampling agrees. We have 3 DB servers
referencedhere, production running 9.2.2, semi-idle (idle except for replication when I ran the test) running 9.2.2,
and9.1.3 completely idle with an old dump restored. 

The only thing that stands out is that it always checks both indexes for
matches. Since you only want a single row as a result, it seems like it
would be better to first check one index, and only check the other one
if there's no match. Rewriting the query with UNION should do that:

SELECT id, username, password, email, verified, timezone FROM users
WHERE lower(username) = 'randomuser'
UNION ALL
SELECT id, username, password, email, verified, timezone FROM users
WHERE lower(email) = 'randomuser'
LIMIT 1;

Also, if you can assume that email addresses always contain the
@-character, you could take advantage of that and only do the
lower(email) = 'randomuser' search if there is one.

- Heikki

Re: Slow Query Help

From

Pavan Deolasee

Date:

06 February 2013, 08:51:40

On Tue, Feb 5, 2013 at 9:15 AM, Will Platnick <wplatnick@gmail.com> wrote:
> We upgraded from PG 9.1 to 9.2. Since the upgrade, the # of active queries
> has raised significantly, especially during our peak time where lots of
> users are logging in.  According to New Relic, this query is now taking up
> the most amount of time during peak activity and my pg_stat_activity and
> slow log sampling agrees. We have 3 DB servers referenced here, production
> running 9.2.2, semi-idle (idle except for replication when I ran the test)
> running 9.2.2, and 9.1.3 completely idle with an old dump restored.
>

The only thing that stands out is: on your production server I see
"Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE shows
actual time as 0.179 ms. Not sure where that additional time is being
spent though. It could be ExecutorStart/End, but have no idea why they
should take so long.

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: Slow Query Help

From

Will Platnick

Date:

06 February 2013, 13:17:56

Good eye, I totally missed that!

Any ideas on how to troubleshoot this delay?

On Wednesday, February 6, 2013 at 3:51 AM, Pavan Deolasee wrote:

On Tue, Feb 5, 2013 at 9:15 AM, Will Platnick <wplatnick@gmail.com> wrote:
We upgraded from PG 9.1 to 9.2. Since the upgrade, the # of active queries
has raised significantly, especially during our peak time where lots of
users are logging in. According to New Relic, this query is now taking up
the most amount of time during peak activity and my pg_stat_activity and
slow log sampling agrees. We have 3 DB servers referenced here, production
running 9.2.2, semi-idle (idle except for replication when I ran the test)
running 9.2.2, and 9.1.3 completely idle with an old dump restored.

The only thing that stands out is: on your production server I see
"Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE shows
actual time as 0.179 ms. Not sure where that additional time is being
spent though. It could be ExecutorStart/End, but have no idea why they
should take so long.

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: Slow Query Help

From

Kevin Grittner

Date:

06 February 2013, 16:22:44

Will Platnick <wplatnick@gmail.com> wrote:
> Will Platnick <wplatnick@gmail.com> wrote:

>> The only thing that stands out is: on your production server I see
>> "Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE shows
>> actual time as 0.179 ms. Not sure where that additional time is being
>> spent though. It could be ExecutorStart/End, but have no idea why they
>> should take so long.

> Any ideas on how to troubleshoot this delay?

Is the client which is running the query on the same machine as the
server?  If not, what's the ping time between them?

-Kevin

Re: Slow Query Help

From

Will Platnick

Date:

06 February 2013, 16:24:49

Clients are technically our pgbouncer which is on the same machine. The explain analyze was local through psql direct to postgresql.

On Wednesday, February 6, 2013 at 11:22 AM, Kevin Grittner wrote:

Will Platnick <wplatnick@gmail.com> wrote:
Will Platnick <wplatnick@gmail.com> wrote:

The only thing that stands out is: on your production server I see
"Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE shows
actual time as 0.179 ms. Not sure where that additional time is being
spent though. It could be ExecutorStart/End, but have no idea why they
should take so long.

Any ideas on how to troubleshoot this delay?

Is the client which is running the query on the same machine as the
server? If not, what's the ping time between them?

-Kevin

Re: Slow Query Help

From

Pavan Deolasee

Date:

06 February 2013, 16:28:02

On Wed, Feb 6, 2013 at 9:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Will Platnick <wplatnick@gmail.com> wrote:
>> Will Platnick <wplatnick@gmail.com> wrote:
>
>>> The only thing that stands out is: on your production server I see
>>> "Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE shows
>>> actual time as 0.179 ms. Not sure where that additional time is being
>>> spent though. It could be ExecutorStart/End, but have no idea why they
>>> should take so long.
>
>> Any ideas on how to troubleshoot this delay?
>
> Is the client which is running the query on the same machine as the
> server?  If not, what's the ping time between them?
>

I don't think the network latency can cause that. The "Total runtime"
is calculated on the server side itself - see ExplainOnePlan().

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee