Thread: Set of related slow queries

Set of related slow queries

From

John Williams

Date:

08 June 2011, 09:05:39

Hi All,
I'm having issues with a set of fairly related queries in my
application.  EXPLAIN ANALYZE is showing them all to be getting stuck
performing roughly the same operation:
               ->  Bitmap Heap Scan on logparser_entry
(cost=4119.06..21520.55 rows=68787 width=8) (actual
time=107.032..444.864 rows=16168 loops=1)
                     Recheck Cond: ((event_type)::text = ANY
('{Attack,"DoT Tick","Critical Attack"}'::text[]))
                     Filter: ((((target_relation)::text <> ALL
('{Other,N/A}'::text[])) OR (NOT (target_relation IS NOT NULL))) AND
(log_id = 2))
                     ->  Bitmap Index Scan on
logparser_entry_event_type_like  (cost=0.00..4101.86 rows=217733
width=0) (actual time=46.392..46.392 rows=237151 loops=1)
                           Index Cond: ((event_type)::text = ANY
('{Attack,"DoT Tick","Critical Attack"}'::text[]))
               ->  Hash  (cost=196.49..196.49 rows=9749 width=23)
(actual time=19.606..19.606 rows=9749 loops=1)

All the queries are being generated by the Django ORM, so they are not
particularly well optimized pretty.  I'd prefer to stay with the ORM
as a lot of the queries are highly variable depending on the request
parameters and so unless their are huge gains to be had by falling
back to raw SQL it will save me a lot of development time to stay with
the ORM.

The table in question (logparser_entry) currently has 815000 records
(but that only represents a very very small amount compared to what
the production server would have to handle,  as this represents only 2
log objects when I would expect easily 100 or more logs to be uploaded
per day).

Nulls should be rare in the fields.

This was being run on an AWS High CPU medium instance.  Obviously not
enoughfor a produciton system, but I would hope it would be more than
adequate for testing when I'm the only one using the app.  I opted for
High CPU because the system doesn't seem to be IO bound even on a
micro instance (nearly 0 wait time according to top) and barely
touches the RAM even when tuned to be aggressive with memory usage.
At the same time it's running 100% cpu usage.

My server config:
Server Config
             name             |
          current_setting

------------------------------+-------------------------------------------------------------------------------------------------------------------
 version                      | PostgreSQL 8.4.8 on i686-pc-linux-gnu,
compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5,
32-bit
 checkpoint_completion_target | 0.9
 effective_cache_size         | 1044MB
 external_pid_file            | /var/run/postgresql/8.4-main.pid
 fsync                        | on
 lc_collate                   | en_US.UTF-8
 lc_ctype                     | en_US.UTF-8
 listen_addresses             | *
 log_line_prefix              | %t
 log_min_duration_statement   | 250ms
 max_connections              | 25
 max_stack_depth              | 2MB
 port                         | 5432
 random_page_cost             | 4
 server_encoding              | UTF8
 shared_buffers               | 16MB
 synchronous_commit           | off
 TimeZone                     | UTC
 unix_socket_directory        | /var/run/postgresql
 work_mem                     | 250MB
(20 rows)

To try to make reading the queries easier I've attached a text file
with the queries and links to EXPLAIN ANALYZE outputs as well as
copied them below.  I've tried a lot to tune these queries, but
nothing seems to work.  The queries always spend a large amount of
time in the same place.  Is there something I missing that could
improve these or even a way to rework my schema to speed things up.

Thanks,
John


SELECT   "logparser_entry"."id"                       ,
         "logparser_entry"."log_id"                   ,
         "logparser_entry"."encounter_id"             ,
         "logparser_entry"."entry_order"              ,
         "logparser_entry"."timestamp"                ,
         "logparser_entry"."seconds_since_start"      ,
         "logparser_entry"."event_type"               ,
         "logparser_entry"."actor_id"                 ,
         "logparser_entry"."actor_relation"           ,
         "logparser_entry"."target_id"                ,
         "logparser_entry"."target_relation"          ,
         "logparser_entry"."pet_owner_id"             ,
         "logparser_entry"."pet_owner_relation"       ,
         "logparser_entry"."pet_target_owner_id"      ,
         "logparser_entry"."pet_target_owner_relation",
         "logparser_entry"."ability_id"               ,
         "logparser_entry"."effective_value"          ,
         "logparser_entry"."blocked"                  ,
         "logparser_entry"."absorbed"                 ,
         "logparser_entry"."overkill"                 ,
         "logparser_entry"."overheal"                 ,
         "logparser_entry"."total_value"
FROM     "logparser_entry"
WHERE    (
                  "logparser_entry"."log_id" = 2
         AND      NOT
                  (
                           (
                                    "logparser_entry"."actor_relation"
IN (E'Other',

    E'N/A')
                           AND      "logparser_entry"."actor_relation"
IS NOT NULL
                           )
                  )
         AND      "logparser_entry"."event_type" IN (E'Attack'  ,
                                                     E'DoT Tick',
                                                     E'Critical Attack')
         )
ORDER BY "logparser_entry"."entry_order" ASC
LIMIT    1
http://explain.depesz.com/s/vEx


SELECT   (ROUND(logparser_entry.seconds_since_start / 42)) AS "interval",
         SUM("logparser_entry"."effective_value")          AS
"effective_value__sum"
FROM     "logparser_entry"
WHERE    (
                  "logparser_entry"."log_id" = 2
         AND      NOT
                  (
                           (
                                    "logparser_entry"."actor_relation"
IN (E'Other',

    E'N/A')
                           AND      "logparser_entry"."actor_relation"
IS NOT NULL
                           )
                  )
         AND      "logparser_entry"."event_type" IN (E'Attack'  ,
                                                     E'DoT Tick',
                                                     E'Critical Attack')
         )
GROUP BY (ROUND(logparser_entry.seconds_since_start / 42)),
         ROUND(logparser_entry.seconds_since_start  / 42)
ORDER BY "interval" ASC
http://explain.depesz.com/s/Rhb


SELECT   (ROUND(logparser_entry.seconds_since_start / 45)) AS "interval",
         SUM("logparser_entry"."effective_value")          AS
"effective_value__sum"
FROM     "logparser_entry"
WHERE    (
                  "logparser_entry"."log_id" = 2
         AND      NOT
                  (
                           (

"logparser_entry"."target_relation" IN (E'Other',

     E'N/A')
                           AND
"logparser_entry"."target_relation" IS NOT NULL
                           )
                  AND
                           (
                                    "logparser_entry"."actor_relation"
IN (E'Other',

    E'N/A')
                           AND      "logparser_entry"."actor_relation"
IS NOT NULL
                           )
                  )
         AND      "logparser_entry"."event_type" IN (E'Heal',
                                                     E'Heal Critical')
         )
GROUP BY (ROUND(logparser_entry.seconds_since_start / 45)),
         ROUND(logparser_entry.seconds_since_start  / 45)
ORDER BY "interval" ASC
http://explain.depesz.com/s/JUo


SELECT   "units_ability"."ability_name",
         "units_ability"."damage_type" ,
         SUM("logparser_entry"."total_value") AS "total"
FROM     "logparser_entry"
         LEFT OUTER JOIN "units_ability"
         ON       (
                           "logparser_entry"."ability_id" = "units_ability"."id"
                  )
WHERE    (
                  "logparser_entry"."log_id" = 2
         AND      NOT
                  (
                           (

"logparser_entry"."target_relation" IN (E'Other',

     E'N/A')
                           AND
"logparser_entry"."target_relation" IS NOT NULL
                           )
                  )
         AND      "logparser_entry"."event_type" IN (E'Attack'  ,
                                                     E'DoT Tick',
                                                     E'Critical Attack')
         )
GROUP BY "units_ability"."ability_name",
         "units_ability"."damage_type" ,
         "units_ability"."ability_name",
         "units_ability"."damage_type"
HAVING   NOT
         (
                  SUM("logparser_entry"."total_value") = 0
         )
ORDER BY "total" DESC
http://explain.depesz.com/s/VZA


                                          Table "public.logparser_entry"
          Column           |          Type          |
        Modifiers
---------------------------+------------------------+--------------------------------------------------------------
 id                        | integer                | not null default
nextval('logparser_entry_id_seq'::regclass)
 log_id                    | integer                | not null
 encounter_id              | integer                |
 entry_order               | integer                | not null
 timestamp                 | time without time zone | not null
 seconds_since_start       | integer                | not null
 event_type                | character varying(64)  | not null
 actor_id                  | integer                | not null
 actor_relation            | character varying(24)  |
 target_id                 | integer                |
 target_relation           | character varying(24)  |
 pet_owner_id              | integer                |
 pet_owner_relation        | character varying(24)  |
 pet_target_owner_id       | integer                |
 pet_target_owner_relation | character varying(32)  |
 ability_id                | integer                |
 effective_value           | integer                | not null
 blocked                   | integer                | not null
 absorbed                  | integer                | not null
 overkill                  | integer                | not null
 overheal                  | integer                | not null
 total_value               | integer                | not null
Indexes:
    "logparser_entry_pkey" PRIMARY KEY, btree (id)
    "logparser_entry_ability_id" btree (ability_id)
    "logparser_entry_actor_id" btree (actor_id)
    "logparser_entry_actor_relation" btree (actor_relation)
    "logparser_entry_actor_relation_like" btree (actor_relation
varchar_pattern_ops)
    "logparser_entry_encounter_id" btree (encounter_id)
    "logparser_entry_event_type" btree (event_type)
    "logparser_entry_event_type_like" btree (event_type varchar_pattern_ops)
    "logparser_entry_log_id" btree (log_id)
    "logparser_entry_pet_owner_id" btree (pet_owner_id)
    "logparser_entry_pet_target_owner_id" btree (pet_target_owner_id)
    "logparser_entry_target_id" btree (target_id)
    "logparser_entry_target_relation" btree (target_relation)
    "logparser_entry_target_relation_like" btree (target_relation
varchar_pattern_ops)
Foreign-key constraints:
    "logparser_entry_ability_id_fkey" FOREIGN KEY (ability_id)
REFERENCES units_ability(id) DEFERRABLE INITIALLY DEFERRED
    "logparser_entry_actor_id_fkey" FOREIGN KEY (actor_id) REFERENCES
units_unit(id) DEFERRABLE INITIALLY DEFERRED
    "logparser_entry_encounter_id_fkey" FOREIGN KEY (encounter_id)
REFERENCES logparser_encounter(id) DEFERRABLE INITIALLY DEFERRED
    "logparser_entry_log_id_fkey" FOREIGN KEY (log_id) REFERENCES
logparser_log(id) DEFERRABLE INITIALLY DEFERRED
    "logparser_entry_pet_owner_id_fkey" FOREIGN KEY (pet_owner_id)
REFERENCES units_unit(id) DEFERRABLE INITIALLY DEFERRED
    "logparser_entry_pet_target_owner_id_fkey" FOREIGN KEY
(pet_target_owner_id) REFERENCES units_unit(id) DEFERRABLE INITIALLY
DEFERRED
    "logparser_entry_target_id_fkey" FOREIGN KEY (target_id)
REFERENCES units_unit(id) DEFERRABLE INITIALLY DEFERRED


Server Config
             name             |
          current_setting

------------------------------+-------------------------------------------------------------------------------------------------------------------
 version                      | PostgreSQL 8.4.8 on i686-pc-linux-gnu,
compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5,
32-bit
 checkpoint_completion_target | 0.9
 effective_cache_size         | 1044MB
 external_pid_file            | /var/run/postgresql/8.4-main.pid
 fsync                        | on
 lc_collate                   | en_US.UTF-8
 lc_ctype                     | en_US.UTF-8
 listen_addresses             | *
 log_line_prefix              | %t
 log_min_duration_statement   | 250ms
 max_connections              | 25
 max_stack_depth              | 2MB
 port                         | 5432
 random_page_cost             | 4
 server_encoding              | UTF8
 shared_buffers               | 16MB
 synchronous_commit           | off
 TimeZone                     | UTC
 unix_socket_directory        | /var/run/postgresql
 work_mem                     | 250MB
(20 rows)

Attachment

Queries.txt

Re: Set of related slow queries

From

Craig Ringer

Date:

08 June 2011, 09:21:05

On 8/06/2011 10:58 AM, John Williams wrote:

>                 ->   Bitmap Heap Scan on logparser_entry
> (cost=4119.06..21520.55 rows=68787 width=8) (actual
> time=107.032..444.864 rows=16168 loops=1)
>                       Recheck Cond: ((event_type)::text = ANY
> ('{Attack,"DoT Tick","Critical Attack"}'::text[]))
>                       Filter: ((((target_relation)::text<>  ALL
> ('{Other,N/A}'::text[])) OR (NOT (target_relation IS NOT NULL))) AND
> (log_id = 2))
>                       ->   Bitmap Index Scan on
> logparser_entry_event_type_like  (cost=0.00..4101.86 rows=217733
> width=0) (actual time=46.392..46.392 rows=237151 loops=1)
>                             Index Cond: ((event_type)::text = ANY
> ('{Attack,"DoT Tick","Critical Attack"}'::text[]))
>                 ->   Hash  (cost=196.49..196.49 rows=9749 width=23)
> (actual time=19.606..19.606 rows=9749 loops=1)

Thanks for including explain analyze output.

Is there any chance you can pop the full explains (not just excerpts) in
here:

http://explain.depesz.com/

?

Big query plans tend to get mangled into unreadable garbage by mail
clients, unfortunately.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Set of related slow queries

From

tv@fuzzy.cz

Date:

08 June 2011, 11:08:18

> Thanks for including explain analyze output.
>
> Is there any chance you can pop the full explains (not just excerpts) in
> here:
>
> http://explain.depesz.com/
>
> ?

I believe he already did that - there's a link below each query.

Tomas

Re: Set of related slow queries

From

tv@fuzzy.cz

Date:

08 June 2011, 11:30:51

------------------------------+---------------------------------------------
>  shared_buffers               | 16MB
>  work_mem                     | 250MB

This seems a bit suspicious. Are you sure you want to keep the
shared_buffers so small and work_mem so large at the same time? There
probably are workloads where this is the right thing to do, but I doubt
this is the case. Why have you set it like this?

I don't have much experience with running Pg on AWS, but I'd try to
increase the shared buffers to say 512MB and decrease the work_mem to 16MB
(or something like that).

Undersized shared_buffers might actually be part of the problem - to
access a row, the page needs to be loaded into shared_buffers. Even though
the I/O is very fast (or the page is already in the filesystem page
cache), there's some locking etc. that needs to be done. When the cache is
small (e.g. 16MB) then the pages need to be removed and read again
frequently. This might be one of the reasons why the CPU is 100% utilized.

> SELECT   "logparser_entry"."id"                       ,
>          "logparser_entry"."log_id"                   ,
>          "logparser_entry"."encounter_id"             ,
>          "logparser_entry"."entry_order"              ,
>          "logparser_entry"."timestamp"                ,
>          "logparser_entry"."seconds_since_start"      ,
>          "logparser_entry"."event_type"               ,
>          "logparser_entry"."actor_id"                 ,
>          "logparser_entry"."actor_relation"           ,
>          "logparser_entry"."target_id"                ,
>          "logparser_entry"."target_relation"          ,
>          "logparser_entry"."pet_owner_id"             ,
>          "logparser_entry"."pet_owner_relation"       ,
>          "logparser_entry"."pet_target_owner_id"      ,
>          "logparser_entry"."pet_target_owner_relation",
>          "logparser_entry"."ability_id"               ,
>          "logparser_entry"."effective_value"          ,
>          "logparser_entry"."blocked"                  ,
>          "logparser_entry"."absorbed"                 ,
>          "logparser_entry"."overkill"                 ,
>          "logparser_entry"."overheal"                 ,
>          "logparser_entry"."total_value"
> FROM     "logparser_entry"
> WHERE    (
>                   "logparser_entry"."log_id" = 2
>          AND      NOT
>                   (
>                            (
>                                     "logparser_entry"."actor_relation"
> IN (E'Other',
>
>     E'N/A')
>                            AND      "logparser_entry"."actor_relation"
> IS NOT NULL
>                            )
>                   )
>          AND      "logparser_entry"."event_type" IN (E'Attack'  ,
>                                                      E'DoT Tick',
>                                                      E'Critical Attack')
>          )
> ORDER BY "logparser_entry"."entry_order" ASC
> LIMIT    1
> http://explain.depesz.com/s/vEx

Well, the problem with this is that it needs to evaluate the whole result
set, sort it by "entry_order" and then get the 1st row. And there's no
index on entry_order, so it has to evaluate the whole result set and then
perform a traditional sort.

Try to create an index on the "entry_order" column - that might push it
towards index scan (to be honest I don't know if PostgreSQL knows it can
do it this way, so maybe it won't work).

> SELECT   (ROUND(logparser_entry.seconds_since_start / 42)) AS "interval",
>          SUM("logparser_entry"."effective_value")          AS
> "effective_value__sum"
> FROM     "logparser_entry"
> WHERE    (
>                   "logparser_entry"."log_id" = 2
>          AND      NOT
>                   (
>                            (
>                                     "logparser_entry"."actor_relation"
> IN (E'Other',
>
>     E'N/A')
>                            AND      "logparser_entry"."actor_relation"
> IS NOT NULL
>                            )
>                   )
>          AND      "logparser_entry"."event_type" IN (E'Attack'  ,
>                                                      E'DoT Tick',
>                                                      E'Critical Attack')
>          )
> GROUP BY (ROUND(logparser_entry.seconds_since_start / 42)),
>          ROUND(logparser_entry.seconds_since_start  / 42)
> ORDER BY "interval" ASC
> http://explain.depesz.com/s/Rhb

Hm, this is probably the best plan possible - not sure how to make it
faster. I'd expect a better performance with larger shared_buffers.

> http://explain.depesz.com/s/JUo

Same as above. Good plan, maybe increase shared_buffers?

> http://explain.depesz.com/s/VZA

Same as above. Good plan, maybe increase shared_buffers.

regards
Tomas

Re: Set of related slow queries

From

Craig Ringer

Date:

08 June 2011, 13:10:05

On 06/08/2011 07:08 PM, tv@fuzzy.cz wrote:
>> Thanks for including explain analyze output.
>>
>> Is there any chance you can pop the full explains (not just excerpts) in
>> here:
>>
>> http://explain.depesz.com/
>>
>> ?
>
> I believe he already did that - there's a link below each query.

Gah, I'm blind. Thanks.

--
Craig Ringer

Re: Set of related slow queries

From

Shaun Thomas

Date:

08 June 2011, 13:36:35

On 06/08/2011 06:30 AM, tv@fuzzy.cz wrote:

>>   shared_buffers               | 16MB
>>   work_mem                     | 250MB
>
> This seems a bit suspicious. Are you sure you want to keep the
> shared_buffers so small and work_mem so large at the same time? There
> probably are workloads where this is the right thing to do, but I doubt
> this is the case. Why have you set it like this?

I must concur in this case.  I can't imagine any scenario where this
makes sense. Work-mem is allocated on a per-sort basis, not just per
session or transaction. So a large query could allocate several of these
and run your system out of memory and cause the OOM killer to start
causing trouble.

> I don't have much experience with running Pg on AWS, but I'd try to
> increase the shared buffers to say 512MB and decrease the work_mem to
> 16MB (or something like that).

Easily good minimums. But it looks like your AWS only has 1GB of RAM
(based on your effective_cache_size), so you may only want to increase
it to 256MB. That said, reduce your work_mem to 8MB to start, and
increase it in 4MB increments if it's still too low.

With a setting of 16MB, it has to load data in and out of memory
constantly. Even if the host OS has cached every single block you'll
ever use, that's only the raw table contents. Processing hundreds of
thousands of rows still takes time, you just saved yourself the effort
of fetching them from disk, shared_buffers is still necessary to do
actual work.

Now... I have some issues with your queries, which are likely the fault
of the Django ORM, but still consider your analyze:

 > http://explain.depesz.com/s/vEx

Your bitmap index scan on logparser is hilarious. The estimates are
fine. 237k rows in 47ms when it expected 217k. If your table really does
have 815k rows in it, that's not very selective at all. Then it adds a
heap scan for the remaining where conditions, and you end up with 100k
rows it then has to sort. That's never going to be fast. 600ms actually
isn't terrible for this many rows, and it also explains your high CPU.

Then your next one:

 > http://explain.depesz.com/s/Rhb

700ms, mostly because of the HashAggregate caused by grouping by
round(((seconds_since_start / 42)). You're aggregating by a calculation
on 100k rows. Again, this will never be "fast" and 700ms is not terrible
considering all the extra work the engine's doing. Again, your index
scan returning everything and the kitchen sink is the root cause. Which
also is evidenced here:

 > http://explain.depesz.com/s/JUo

And here:

http://explain.depesz.com/s/VZA

Everything is being caused because it's always using the
ogparser_entry_event_type_like index to fetch the initial 200k rows. The
only way to make this faster is to restrict the rows coming back. For
instance, since you know these values are coming in every day, why
search through all of history every time?

Why not get your timestamp column involved? Maybe you only need to look
at Attack, DoT Tick, and Critical Attack event types for the last day,
or week, or even month. That alone should drastically reduce your row
counts and give the engine a much smaller data set to aggregate and sort.

The thing is, the way your queries are currently written, as you get
more data, this is just going to get worse and worse. Grabbing a quarter
of a table that just gets bigger every day and then getting aggregates
(group by, etc) is going to get slower every day unless you can restrict
the result set with more where clauses. If you need reports on a lot of
this data on a regular basis, consider running a nightly or hourly batch
to insert them into a reporting table you can check later.

There's a lot you can do here.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Set of related slow queries

From

John Williams

Date:

08 June 2011, 16:37:38

Hi All,

Let me clarify this a bit.

The memory values are ridiculous you're completely correct.  I've
since fixed that (it had no effect sadly).

I've adjust the shared buffers to about 400MB.  As per the tuning
guide to set that to around 1/4 of your system memory (the AWS in
question has 1.7GB).  I didn't have the shared buffers set correctly
to start because truthfully I had no idea how to incurease shmmax and
I had to look that up.

The work_mem is very very high for the system It's running on
admittedly.  I'm ok with leaving that though because currently I'm the
only one on the machine at all (this isn't a production set up it's a
testing setup).  Realistically it's only that high because someone
suggusted trying a much higher value (I had already personally set it
to 50MB as that was about 30% larger than the largest sort I found)
and see if that improved the situation (it didn't).

Several of the implications of my current set of data make things look
a little wrong so let me clarify the issue a bit.  The table is
composed of data coming from a games combat log.  Each log represents
about 400k entries.  Since I only really care to look at data from the
perspective of each log, the log_id is infact going to be more most
selective portion of the query.  Right now the table only has two logs
in it making this hard to see.  But it should reflect that the
situation shouldn't get worse over time.  I will basically never be
looking at more than a 400-500k record portion of my entries table at
a time.

This stuff gets really painful because I can't very well predict the
queries so I can't pre calculate and the data isn't like a system log,
I could be accepting uploads of 100's of such logs per day.  The
actual queries that are run are a function of what the user wants to
see.  Their are roughly 5 or so different data views, each of which
takes 15-25 separate queries to calculate all the various graphs and
aggregates.  Frequently I won't be looking at the "overall" entire log
(composed of 400k entries), instead I'll be looking at smaller slices
of the data adding: WHERE seconds_since_start <= 1500 AND seconds
since start <= 4000 or some such with very arbitrary min and max.

Now I should say I've seen almost this exact same work done before for
a different game.  So I can't help but feel I must be missing
something really important either in how I'm setting up my data or how
I'm processing.

Thanks,
John

---

John Williams
42nd Design
Email: jwilliams@42nddesign.com
Skype: druidjaidan
Phone: (520) 440-7239

On Wed, Jun 8, 2011 at 6:36 AM, Shaun Thomas <sthomas@peak6.com> wrote:
> On 06/08/2011 06:30 AM, tv@fuzzy.cz wrote:
>
>>>  shared_buffers               | 16MB
>>>  work_mem                     | 250MB
>>
>> This seems a bit suspicious. Are you sure you want to keep the
>> shared_buffers so small and work_mem so large at the same time? There
>> probably are workloads where this is the right thing to do, but I doubt
>> this is the case. Why have you set it like this?
>
> I must concur in this case.  I can't imagine any scenario where this makes
> sense. Work-mem is allocated on a per-sort basis, not just per session or
> transaction. So a large query could allocate several of these and run your
> system out of memory and cause the OOM killer to start causing trouble.
>
>> I don't have much experience with running Pg on AWS, but I'd try to
>> increase the shared buffers to say 512MB and decrease the work_mem to
>> 16MB (or something like that).
>
> Easily good minimums. But it looks like your AWS only has 1GB of RAM (based
> on your effective_cache_size), so you may only want to increase it to 256MB.
> That said, reduce your work_mem to 8MB to start, and increase it in 4MB
> increments if it's still too low.
>
> With a setting of 16MB, it has to load data in and out of memory constantly.
> Even if the host OS has cached every single block you'll ever use, that's
> only the raw table contents. Processing hundreds of thousands of rows still
> takes time, you just saved yourself the effort of fetching them from disk,
> shared_buffers is still necessary to do actual work.
>
> Now... I have some issues with your queries, which are likely the fault of
> the Django ORM, but still consider your analyze:
>
>> http://explain.depesz.com/s/vEx
>
> Your bitmap index scan on logparser is hilarious. The estimates are fine.
> 237k rows in 47ms when it expected 217k. If your table really does have 815k
> rows in it, that's not very selective at all. Then it adds a heap scan for
> the remaining where conditions, and you end up with 100k rows it then has to
> sort. That's never going to be fast. 600ms actually isn't terrible for this
> many rows, and it also explains your high CPU.
>
> Then your next one:
>
>> http://explain.depesz.com/s/Rhb
>
> 700ms, mostly because of the HashAggregate caused by grouping by
> round(((seconds_since_start / 42)). You're aggregating by a calculation on
> 100k rows. Again, this will never be "fast" and 700ms is not terrible
> considering all the extra work the engine's doing. Again, your index scan
> returning everything and the kitchen sink is the root cause. Which also is
> evidenced here:
>
>> http://explain.depesz.com/s/JUo
>
> And here:
>
> http://explain.depesz.com/s/VZA
>
> Everything is being caused because it's always using the
> ogparser_entry_event_type_like index to fetch the initial 200k rows. The
> only way to make this faster is to restrict the rows coming back. For
> instance, since you know these values are coming in every day, why search
> through all of history every time?
>
> Why not get your timestamp column involved? Maybe you only need to look at
> Attack, DoT Tick, and Critical Attack event types for the last day, or week,
> or even month. That alone should drastically reduce your row counts and give
> the engine a much smaller data set to aggregate and sort.
>
> The thing is, the way your queries are currently written, as you get more
> data, this is just going to get worse and worse. Grabbing a quarter of a
> table that just gets bigger every day and then getting aggregates (group by,
> etc) is going to get slower every day unless you can restrict the result set
> with more where clauses. If you need reports on a lot of this data on a
> regular basis, consider running a nightly or hourly batch to insert them
> into a reporting table you can check later.
>
> There's a lot you can do here.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> sthomas@peak6.com
>
> ______________________________________________
>
> See  http://www.peak6.com/email_disclaimer.php
> for terms and conditions related to this email
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>