Re: Checkpoint_segments optimal value - Mailing list pgsql-general

From Prabhjot Sheena
Subject Re: Checkpoint_segments optimal value
Date
Msg-id CAOf_bQbFEM6oT24sLrpw+-LKHGXvR=VomFQH7GSGV0jVtrCLOg@mail.gmail.com
Whole thread Raw
In response to Re: Checkpoint_segments optimal value  (Potentialtech <wmoran@potentialtech.com>)
Responses Re: Checkpoint_segments optimal value  (Prabhjot Sheena <prabhjot.sheena@rivalwatch.com>)
Re: Checkpoint_segments optimal value  (John R Pierce <pierce@hogranch.com>)
List pgsql-general
Here is the explain plan. There query time went backup up to 2 to 3 minutes from 3 second  in just 2 hrs.  Can anyone suggest something on how to fix this or why this is happening

 explain SELECT account.id, account.organization_id, run.application_id, work_unit.script, work_unit.id, work_unit.start_time, run.id, work_unit.priority  FROM work_unit, run, account  WHERE  work_unit.status = 3 AND work_unit.run_id = run.id AND work_unit.type != 1 AND run.status = 1 AND run.account_id = account.id;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..24.84 rows=1 width=64)
   Join Filter: (run.id = work_unit.run_id)
   ->  Nested Loop  (cost=0.00..16.55 rows=1 width=16)
         ->  Index Scan using run_status_1 on run  (cost=0.00..8.27 rows=1 width=12)
               Index Cond: (status = 1)
         ->  Index Scan using account_pkey on account  (cost=0.00..8.27 rows=1 width=8)
               Index Cond: (account.id = run.account_id)
   ->  Index Scan using idx_work_unit_1_status_3_new on work_unit  (cost=0.00..8.28 rows=1 width=52)
         Index Cond: (work_unit.status = 3)

Table Structure

 \d work_unit
               Table "public.work_unit"
   Column   |            Type             | Modifiers
------------+-----------------------------+-----------
 id         | integer                     | not null
 run_id     | integer                     |
 status     | integer                     |
 script     | character varying           |
 parameters | character varying           |
 input      | character varying           |
 start_time | timestamp without time zone |
 stop_time  | timestamp without time zone |
 priority   | integer                     |
 type       | integer                     |
 lineitems  | integer                     |
 outputs    | integer                     |
 sub_type   | integer                     |
 site_code  | text                        |
Indexes:
    "work_unit_pkey" PRIMARY KEY, btree (id)
    "idx_work_unit_1_partial" btree (run_id, start_time) WHERE status >= 3
    "idx_work_unit_1_run_id" btree (run_id)
    "idx_work_unit_1_script" btree (script)
    "idx_work_unit_1_site_code" btree (site_code)
    "idx_work_unit_1_starttime" btree (start_time)
    "idx_work_unit_1_status_3_new" btree (status, type) WHERE status = 3 AND type <> 1
    "idx_work_unit_1_status_5" btree (status) WHERE status >= 4
    "idx_work_unit_1_status_part_new" btree (status) WHERE status < 4
    "idx_work_unit_1_stop_time" btree (stop_time)


 \d run
                       Table "public.run"
       Column        |           Type           |   Modifiers
---------------------+--------------------------+---------------
 id                  | integer                  | not null
 job_id              | integer                  |
 start_time          | timestamp with time zone | not null
 status              | integer                  | not null
 job_name            | character varying        |
 account_id          | integer                  |
 application_id      | integer                  |
 postproc_script     | character varying        |
 upload_url          | character varying        |
 complete_time       | timestamp with time zone |
 email               | character varying        |
 size                | integer                  |
 errors              | integer                  |
 raw_count           | integer                  |
 munge               | integer                  |
 job_details         | character varying        |
 user_settings       | character varying        |
 run_type            | integer                  |
 stop_after          | timestamp with time zone |
 total_work_time     | integer                  |
 flags               | integer                  |
 cluster_info_id     | integer                  |
 skiplineitems_count | integer                  |
 last_update_ts      | timestamp with time zone |
 result_exists       | boolean                  | default false
 abort_type          | integer                  |
Indexes:
    "run_pkey" PRIMARY KEY, btree (id)
    "idx_run_acc_stat_comp" btree (account_id, status, complete_time)
    "idx_run_app" btree (application_id)
    "idx_run_complete_time_2" btree (complete_time)
    "idx_run_job_name" btree (job_name)
    "run_application_account_idx" btree (application_id, account_id, status)
    "run_job" btree (job_id)
    "run_result_exists" btree (result_exists)
    "run_start_time" btree (start_time)
    "run_status" btree (status)
    "run_status_1" btree (status) WHERE status = 1
    "run_status_part_idx" btree (status) WHERE status < 3
Triggers:
    run_upd_ts BEFORE INSERT OR UPDATE ON run FOR EACH ROW EXECUTE PROCEDURE set_last_update_ts()
    tr_proc_update_job_summary AFTER UPDATE ON run FOR EACH ROW EXECUTE PROCEDURE tr_proc_update_job_summary()




On Thu, Jul 17, 2014 at 12:00 PM, Potentialtech <wmoran@potentialtech.com> wrote:
On Thu, 17 Jul 2014 11:28:04 -0700
Prabhjot Sheena <prabhjot.sheena@rivalwatch.com> wrote:

> Thanks i have changed that to 64 and reloaded it.
>
> When i had load issue today there was this exact same query that hits the
> db like 50 to 60 times from different machines in 3 to 4 minutes and was
> taking long time to execute and was holding up the database. i did recreate
> an index and it started performing better. My question is why it is not
> fetching the result from the memory since  its the same query that runs
> again and again.
>
> This is the actual query i m taking about:
>
> SELECT account.id, account.organization_id, run.application_id,
> work_unit.script, work_unit.id, work_unit.start_time, run.id,
> work_unit.priority  FROM work_unit, run, account  WHERE  work_unit.status =
> 3 AND work_unit.run_id = run.id AND work_unit.type != 1 AND run.status = 1
> AND run.account_id = account.id
>
> Pls suggest if i can do something to fix this

Provide a lot more information if you want anyone on the list to be able
to help: such as explain output while the problem is happening, and some
information about the makeup of the tables (column types/indexes/# rows).

Guessing, based on the little information you've provided, it's likely
that you have something else going on at the same time that you're not
aware of, and this particular query is only a symptom.  I'm saying that
because SELECTs don't generally create any WAL traffic, so there were
probably some INSERT/UPDATE/DELETE running at the same time that both
pushed those 3 tables out of memory and/or saturated disk activity to
the point that accessing everything becomes slow for a short while, and
it's just those queries that you noticed.

Are you making the mistake where you set log_min_duration to 1s and only
worry about queries that take longer than 1s?  Because I've seen (on
multiple occasions) where many 1000s of queries, each running less than
1s, are the actual cause of the problem.  pgBadger is particularly helpful
in tracking down situations like that.

> On Thu, Jul 17, 2014 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Potentialtech <wmoran@potentialtech.com> writes:
> > > If the warning isn't happening too often, I would try increasing it only
> > a
> > > little and see if it helps.  If it's not enough you can then increase it
> > some
> > > more.  Various sources around the Internet suggest that you don't want
> > to go
> > > much larger than 256 for this (if only because it's uncommon to do so
> > and is
> > > probably indicative of other tuning that you need to do).
> >  Unfortunatley, you
> > > need to restart PG for the change to take effect, so you have to balance
> > > experimenting with your tuning against how often you can get away with a
> > server
> > > restart.
> >
> > Huh?  You don't need a restart, just a reload (SIGHUP) to change that.
> >
> >                         regards, tom lane
> >


--
Potentialtech <wmoran@potentialtech.com>

pgsql-general by date:

Previous
From: AI Rumman
Date:
Subject: Re: Whats is lock type transactionid?
Next
From: Prabhjot Sheena
Date:
Subject: Re: Checkpoint_segments optimal value