Home > mailing lists

Re: Checkpoint_segments optimal value - Mailing list pgsql-general

From	Prabhjot Sheena
Subject	Re: Checkpoint_segments optimal value
Date	July 18, 2014 02:26:26
Msg-id	CAOf_bQbFEM6oT24sLrpw+-LKHGXvR=VomFQH7GSGV0jVtrCLOg@mail.gmail.com Whole thread Raw
In response to	Re: Checkpoint_segments optimal value (Potentialtech <wmoran@potentialtech.com>)
Responses	Re: Checkpoint_segments optimal value (Prabhjot Sheena <prabhjot.sheena@rivalwatch.com>) Re: Checkpoint_segments optimal value (John R Pierce <pierce@hogranch.com>)
List	pgsql-general

Tree view

Here is the explain plan. There query time went backup up to 2 to 3 minutes from 3 second in just 2 hrs. Can anyone suggest something on how to fix this or why this is happening

explain SELECT account.id, account.organization_id, run.application_id, work_unit.script, work_unit.id, work_unit.start_time, run.id, work_unit.priority FROM work_unit, run, account WHERE work_unit.status = 3 AND work_unit.run_id = run.id AND work_unit.type != 1 AND run.status = 1 AND run.account_id = account.id;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..24.84 rows=1 width=64)
   Join Filter: (run.id = work_unit.run_id)
   -> Nested Loop (cost=0.00..16.55 rows=1 width=16)
         -> Index Scan using run_status_1 on run (cost=0.00..8.27 rows=1 width=12)
               Index Cond: (status = 1)
         -> Index Scan using account_pkey on account (cost=0.00..8.27 rows=1 width=8)
               Index Cond: (account.id = run.account_id)
   -> Index Scan using idx_work_unit_1_status_3_new on work_unit (cost=0.00..8.28 rows=1 width=52)
         Index Cond: (work_unit.status = 3)

Table Structure

\d work_unit
               Table "public.work_unit"
   Column   |            Type             | Modifiers
------------+-----------------------------+-----------
id         | integer                     | not null
run_id     | integer                     |
status     | integer                     |
script     | character varying           |
parameters | character varying           |
input      | character varying           |
start_time | timestamp without time zone |
stop_time | timestamp without time zone |
priority   | integer                     |
type       | integer                     |
lineitems | integer                     |
outputs    | integer                     |
sub_type   | integer                     |
site_code | text                        |
Indexes:
    "work_unit_pkey" PRIMARY KEY, btree (id)
    "idx_work_unit_1_partial" btree (run_id, start_time) WHERE status >= 3
    "idx_work_unit_1_run_id" btree (run_id)
    "idx_work_unit_1_script" btree (script)
    "idx_work_unit_1_site_code" btree (site_code)
    "idx_work_unit_1_starttime" btree (start_time)
    "idx_work_unit_1_status_3_new" btree (status, type) WHERE status = 3 AND type <> 1
    "idx_work_unit_1_status_5" btree (status) WHERE status >= 4
    "idx_work_unit_1_status_part_new" btree (status) WHERE status < 4
    "idx_work_unit_1_stop_time" btree (stop_time)

\d run
                       Table "public.run"
       Column        |           Type           |   Modifiers
---------------------+--------------------------+---------------
id                  | integer                  | not null
job_id              | integer                  |
start_time          | timestamp with time zone | not null
status              | integer                  | not null
job_name            | character varying        |
account_id          | integer                  |
application_id      | integer                  |
postproc_script     | character varying        |
upload_url          | character varying        |
complete_time       | timestamp with time zone |
email               | character varying        |
size                | integer                  |
errors              | integer                  |
raw_count           | integer                  |
munge               | integer                  |
job_details         | character varying        |
user_settings       | character varying        |
run_type            | integer                  |
stop_after          | timestamp with time zone |
total_work_time     | integer                  |
flags               | integer                  |
cluster_info_id     | integer                  |
skiplineitems_count | integer                  |
last_update_ts      | timestamp with time zone |
result_exists       | boolean                  | default false
abort_type          | integer                  |
Indexes:
    "run_pkey" PRIMARY KEY, btree (id)
    "idx_run_acc_stat_comp" btree (account_id, status, complete_time)
    "idx_run_app" btree (application_id)
    "idx_run_complete_time_2" btree (complete_time)
    "idx_run_job_name" btree (job_name)
    "run_application_account_idx" btree (application_id, account_id, status)
    "run_job" btree (job_id)
    "run_result_exists" btree (result_exists)
    "run_start_time" btree (start_time)
    "run_status" btree (status)
    "run_status_1" btree (status) WHERE status = 1
    "run_status_part_idx" btree (status) WHERE status < 3
Triggers:
    run_upd_ts BEFORE INSERT OR UPDATE ON run FOR EACH ROW EXECUTE PROCEDURE set_last_update_ts()
    tr_proc_update_job_summary AFTER UPDATE ON run FOR EACH ROW EXECUTE PROCEDURE tr_proc_update_job_summary()

On Thu, Jul 17, 2014 at 12:00 PM, Potentialtech <wmoran@potentialtech.com> wrote:

On Thu, 17 Jul 2014 11:28:04 -0700
Prabhjot Sheena <prabhjot.sheena@rivalwatch.com> wrote:

> Thanks i have changed that to 64 and reloaded it.
>
> When i had load issue today there was this exact same query that hits the
> db like 50 to 60 times from different machines in 3 to 4 minutes and was
> taking long time to execute and was holding up the database. i did recreate
> an index and it started performing better. My question is why it is not
> fetching the result from the memory since its the same query that runs
> again and again.
>
> This is the actual query i m taking about:
>
> SELECT account.id, account.organization_id, run.application_id,
> work_unit.script, work_unit.id, work_unit.start_time, run.id,
> work_unit.priority FROM work_unit, run, account WHERE work_unit.status =
> 3 AND work_unit.run_id = run.id AND work_unit.type != 1 AND run.status = 1
> AND run.account_id = account.id
>
> Pls suggest if i can do something to fix this

Provide a lot more information if you want anyone on the list to be able
to help: such as explain output while the problem is happening, and some
information about the makeup of the tables (column types/indexes/# rows).

Guessing, based on the little information you've provided, it's likely
that you have something else going on at the same time that you're not
aware of, and this particular query is only a symptom. I'm saying that
because SELECTs don't generally create any WAL traffic, so there were
probably some INSERT/UPDATE/DELETE running at the same time that both
pushed those 3 tables out of memory and/or saturated disk activity to
the point that accessing everything becomes slow for a short while, and
it's just those queries that you noticed.

Are you making the mistake where you set log_min_duration to 1s and only
worry about queries that take longer than 1s? Because I've seen (on
multiple occasions) where many 1000s of queries, each running less than
1s, are the actual cause of the problem. pgBadger is particularly helpful
in tracking down situations like that.

> On Thu, Jul 17, 2014 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Potentialtech <wmoran@potentialtech.com> writes:
> > > If the warning isn't happening too often, I would try increasing it only
> > a
> > > little and see if it helps. If it's not enough you can then increase it
> > some
> > > more. Various sources around the Internet suggest that you don't want
> > to go
> > > much larger than 256 for this (if only because it's uncommon to do so
> > and is
> > > probably indicative of other tuning that you need to do).
> > Unfortunatley, you
> > > need to restart PG for the change to take effect, so you have to balance
> > > experimenting with your tuning against how often you can get away with a
> > server
> > > restart.
> >
> > Huh? You don't need a restart, just a reload (SIGHUP) to change that.
> >
> > regards, tom lane
> >

--
Potentialtech <wmoran@potentialtech.com>

pgsql-general by date:

From: AI Rumman
Date: 18 July 2014, 02:08:54
Subject: Re: Whats is lock type transactionid?

From: Prabhjot Sheena
Date: 18 July 2014, 02:39:50
Subject: Re: Checkpoint_segments optimal value

Re: Checkpoint_segments optimal value - Mailing list pgsql-general

Previous

Next