Thread: Slow Query Help
We upgraded from PG 9.1 to 9.2. Since the upgrade, the # of active queries has raised significantly, especially during our peak time where lots of users are logging in. According to New Relic, this query is now taking up the most amount of time during peak activity and my pg_stat_activity and slow log sampling agrees. We have 3 DB servers referenced here, production running 9.2.2, semi-idle (idle except for replication when I ran the test) running 9.2.2, and 9.1.3 completely idle with an old dump restored.
HW: All servers have the same amount of RAM and CPU and all our indexes fit in RAM. HD's differ, but they're all SSD and since the explain analyze's are all indexes, I'm going to assume it doesn't mean that much in the context.
Configuration across all three servers is the same.
Table was auto vacuumed today.
Idle DB running 9.1.3
http://explain.depesz.com/s/7DZ
Semi-Idle DB running 9.2.2 (transactions ran previously and replication happening, but no active traffic when query was ran)
http://explain.depesz.com/s/fEE
Production DB running 9.2.2 (Incredibly Active)
http://explain.depesz.com/s/qVW
Table "public.users"
Column | Type | Modifiers
-------------------+--------------------------+----------------------------------------------------
id | integer | not null default nextval('users_id_seq'::regclass)
created_dt | timestamp with time zone | default now()
last_login_dt | timestamp with time zone | default now()
email | character varying | not null
verified | boolean | default false
first_name | character varying |
last_name | character varying |
username | character varying(50) | not null
location | character varying |
im_type | character varying |
im_username | character varying |
website | character varying |
phone_mobile | character varying(30) |
postal_code | character varying(10) |
language_tag | character varying(7) | default 'en'::character varying
settings | text | default ''::text
country | character(2) | default ''::bpchar
timezone | character varying(50) | not null default 'UTC'::character varying
verify_hash | character varying |
bio | character varying(160) |
twitter | character varying |
facebook | character varying |
search_updated_dt | timestamp with time zone | not null default now()
source | character varying |
auto_verified | boolean |
password | character(60) |
google | character varying |
gender | smallint | not null default 0
birthdate | date |
weibo | character varying |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"u_auth_email_index" UNIQUE, btree (lower(email::text))
"u_auth_uname_index" UNIQUE, btree (lower(username::text))
"u_auth_verify_hash_idx" UNIQUE, btree (verify_hash)
"users_search_updated_dt_idx" btree (search_updated_dt DESC)
On 05.02.2013 05:45, Will Platnick wrote: > We upgraded from PG 9.1 to 9.2. Since the upgrade, the # of active queries has raised significantly, especially duringour peak time where lots of users are logging in. According to New Relic, this query is now taking up the most amountof time during peak activity and my pg_stat_activity and slow log sampling agrees. We have 3 DB servers referencedhere, production running 9.2.2, semi-idle (idle except for replication when I ran the test) running 9.2.2, and9.1.3 completely idle with an old dump restored. The only thing that stands out is that it always checks both indexes for matches. Since you only want a single row as a result, it seems like it would be better to first check one index, and only check the other one if there's no match. Rewriting the query with UNION should do that: SELECT id, username, password, email, verified, timezone FROM users WHERE lower(username) = 'randomuser' UNION ALL SELECT id, username, password, email, verified, timezone FROM users WHERE lower(email) = 'randomuser' LIMIT 1; Also, if you can assume that email addresses always contain the @-character, you could take advantage of that and only do the lower(email) = 'randomuser' search if there is one. - Heikki
On Tue, Feb 5, 2013 at 9:15 AM, Will Platnick <wplatnick@gmail.com> wrote: > We upgraded from PG 9.1 to 9.2. Since the upgrade, the # of active queries > has raised significantly, especially during our peak time where lots of > users are logging in. According to New Relic, this query is now taking up > the most amount of time during peak activity and my pg_stat_activity and > slow log sampling agrees. We have 3 DB servers referenced here, production > running 9.2.2, semi-idle (idle except for replication when I ran the test) > running 9.2.2, and 9.1.3 completely idle with an old dump restored. > The only thing that stands out is: on your production server I see "Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE shows actual time as 0.179 ms. Not sure where that additional time is being spent though. It could be ExecutorStart/End, but have no idea why they should take so long. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On Wednesday, February 6, 2013 at 3:51 AM, Pavan Deolasee wrote:
On Tue, Feb 5, 2013 at 9:15 AM, Will Platnick <wplatnick@gmail.com> wrote:We upgraded from PG 9.1 to 9.2. Since the upgrade, the # of active querieshas raised significantly, especially during our peak time where lots ofusers are logging in. According to New Relic, this query is now taking upthe most amount of time during peak activity and my pg_stat_activity andslow log sampling agrees. We have 3 DB servers referenced here, productionrunning 9.2.2, semi-idle (idle except for replication when I ran the test)running 9.2.2, and 9.1.3 completely idle with an old dump restored.The only thing that stands out is: on your production server I see"Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE showsactual time as 0.179 ms. Not sure where that additional time is beingspent though. It could be ExecutorStart/End, but have no idea why theyshould take so long.Thanks,Pavan--Pavan Deolasee
Will Platnick <wplatnick@gmail.com> wrote: > Will Platnick <wplatnick@gmail.com> wrote: >> The only thing that stands out is: on your production server I see >> "Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE shows >> actual time as 0.179 ms. Not sure where that additional time is being >> spent though. It could be ExecutorStart/End, but have no idea why they >> should take so long. > Any ideas on how to troubleshoot this delay? Is the client which is running the query on the same machine as the server? If not, what's the ping time between them? -Kevin
On Wednesday, February 6, 2013 at 11:22 AM, Kevin Grittner wrote:
Will Platnick <wplatnick@gmail.com> wrote:Will Platnick <wplatnick@gmail.com> wrote:The only thing that stands out is: on your production server I see"Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE showsactual time as 0.179 ms. Not sure where that additional time is beingspent though. It could be ExecutorStart/End, but have no idea why theyshould take so long.Any ideas on how to troubleshoot this delay?Is the client which is running the query on the same machine as theserver? If not, what's the ping time between them?-Kevin
On Wed, Feb 6, 2013 at 9:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Will Platnick <wplatnick@gmail.com> wrote: >> Will Platnick <wplatnick@gmail.com> wrote: > >>> The only thing that stands out is: on your production server I see >>> "Total runtime: 7.515 ms", but the top node in EXPLAIN ANAYZE shows >>> actual time as 0.179 ms. Not sure where that additional time is being >>> spent though. It could be ExecutorStart/End, but have no idea why they >>> should take so long. > >> Any ideas on how to troubleshoot this delay? > > Is the client which is running the query on the same machine as the > server? If not, what's the ping time between them? > I don't think the network latency can cause that. The "Total runtime" is calculated on the server side itself - see ExplainOnePlan(). Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee