Thread: Surprising no use of indexes - low performance

Surprising no use of indexes - low performance

From
Nicolas Charles
Date:
Hello,

I've been struggling to understand what's happening on my
databases/query for several days, and I'm turning to higher mind for a
logical answer.

I'm dealing with a fairly large database, containing logs informations,
that I crunch to get data out of it, with several indexes on them that I
hoped were logical

\d ruddersysevents
                                  Table « public.ruddersysevents »
       Colonne       |           Type           |
Modificateurs
--------------------+--------------------------+--------------------------------------------------
  id                 | integer                  | non NULL Par défaut,
nextval('serial'::regclass)
  executiondate      | timestamp with time zone | non NULL
  nodeid             | text                     | non NULL
  directiveid        | text                     | non NULL
  ruleid             | text                     | non NULL
  serial             | integer                  | non NULL
  component          | text                     | non NULL
  keyvalue           | text                     |
  executiontimestamp | timestamp with time zone | non NULL
  eventtype          | character varying(64)    |
  policy             | text                     |
  msg                | text                     |
Index :
     "ruddersysevents_pkey" PRIMARY KEY, btree (id)
     "component_idx" btree (component)
     "configurationruleid_idx" btree (ruleid)
     "executiontimestamp_idx" btree (executiontimestamp)
     "keyvalue_idx" btree (keyvalue)
     "nodeid_idx" btree (nodeid)
Contraintes de vérification :
     "ruddersysevents_component_check" CHECK (component <> ''::text)
     "ruddersysevents_configurationruleid_check" CHECK (ruleid <> ''::text)
     "ruddersysevents_nodeid_check" CHECK (nodeid <> ''::text)
     "ruddersysevents_policyinstanceid_check" CHECK (directiveid <>
''::text)


It contains 11018592 entries, with the followinf patterns :
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial

Related table/index size are
                 relation                |  size
----------------------------------------+---------
  public.ruddersysevents                 | 3190 MB
  public.nodeid_idx                      | 614 MB
  public.configurationruleid_idx         | 592 MB
  public.ruddersysevents_pkey            | 236 MB
  public.executiontimestamp_idx          | 236 MB


I'm crunching the data by looking for each
nodeid/ruleid/directiveid/serial with an executiontimestamp in an interval:

explain analyze select executiondate, nodeid, ruleid, directiveid,
serial, component, keyValue, executionTimeStamp, eventtype, policy, msg
from RudderSysEvents where 1=1  and nodeId =
'31264061-5ecb-4891-9aa4-83824178f43d'  and ruleId =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10 and
executiontimestamp between to_timestamp('2012-11-22 16:00:16.005',
'YYYY-MM-DD HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467',
'YYYY-MM-DD HH24:MI:SS.MS') ORDER BY executionTimeStamp asc;
  Sort  (cost=293125.41..293135.03 rows=3848 width=252) (actual
time=28628.922..28647.952 rows=62403 loops=1)
    Sort Key: executiontimestamp
    Sort Method:  external merge  Disk: 17480kB
    ->  Bitmap Heap Scan on ruddersysevents  (cost=74359.66..292896.27
rows=3848 width=252) (actual time=1243.150..28338.927 rows=62403 loops=1)
          Recheck Cond: ((nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
          Filter: ((serial = 10) AND (executiontimestamp >=
to_timestamp('2012-11-22 16:00:16.005'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)))
          ->  BitmapAnd  (cost=74359.66..74359.66 rows=90079 width=0)
(actual time=1228.610..1228.610 rows=0 loops=1)
                ->  Bitmap Index Scan on nodeid_idx
(cost=0.00..25795.17 rows=716237 width=0) (actual time=421.365..421.365
rows=690503 loops=1)
                      Index Cond: (nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text)
                ->  Bitmap Index Scan on configurationruleid_idx
(cost=0.00..48562.32 rows=1386538 width=0) (actual time=794.490..794.490
rows=1381391 loops=1)
                      Index Cond: (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)
  Total runtime: 28657.352 ms



I'm surprised that the executiontimestamp index is not used, since it
seems to be where most of the query time is spent.

For all my tests, I removed all the incoming logs, so that this table
has only selects and no writes

I'm using Postgres 8.4, on a quite smallish VM, with some process
runnings, with the following non default configuration
shared_buffers = 112MB
work_mem = 8MB
maintenance_work_mem = 48MB
max_stack_depth = 3MB
wal_buffers = 1MB
effective_cache_size = 128MB
checkpoint_segments = 6

Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
performances (less than 10%). I would have expected it to improve, since
the indexes would all fit in RAM
Every explain analyze made after modification of configuration and
indexes where done after a complete batch of crunch of logs by our apps,
to be sure the stats were corrects

I tested further with the indexes, and reached these results :
1/ dropping the unique index  "configurationruleid_idx" btree (ruleid),
"executiontimestamp_idx" btree (executiontimestamp),  "keyvalue_idx"
btree (keyvalue),  "nodeid_idx" btree (nodeid) to replace them by a
unique index did lower the perfs :

create index composite_idx on ruddersysevents (executiontimestamp,
ruleid, serial, nodeid);

Index Scan using composite_idx on ruddersysevents  (cost=0.01..497350.22
rows=3729 width=252) (actual time=7.989..83717.901 rows=62403 loops=1)
    Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
(executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text))


2/ Removing nodeid from the index did lower again the perf

create index composite2_idx on ruddersysevents (executiontimestamp,
ruleid, serial);

  Index Scan using composite2_idx on ruddersysevents
(cost=0.01..449948.98 rows=3729 width=252) (actual
time=23.507..84888.349 rows=62403 loops=1)
    Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
(executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10))
    Filter: (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text)

3/ Removing executiontimestamp from the composite index makes the query
performs better at the begining of its uses (around 17 secondes), but
over time it degrades (I'm logging query longer than 20 secondes, and
there are very rare in the first half of the batch, and getting more and
more common at the end) to what is below

create index composite3_idx on ruddersysevents (ruleid, serial, nodeid);
   Sort  (cost=24683.44..24693.07 rows=3849 width=252) (actual
time=60454.558..60474.013 rows=62403 loops=1)
    Sort Key: executiontimestamp
    Sort Method:  external merge  Disk: 17480kB
    ->  Bitmap Heap Scan on ruddersysevents  (cost=450.12..24454.23
rows=3849 width=252) (actual time=146.065..60249.143 rows=62403 loops=1)
          Recheck Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text))
          Filter: ((executiontimestamp >= to_timestamp('2012-11-22
16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
(executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)))
          ->  Bitmap Index Scan on composite3_idx  (cost=0.00..449.15
rows=6635 width=0) (actual time=129.102..129.102 rows=62403 loops=1)
                Index Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text))
  Total runtime: 60484.022 ms


And with all these tests, the difference of the whole database
processing with each index time is within 10% error margin (2h10 to
2h25), even if most of the time is spent on the SQL query.

So my question is :
"Why *not* indexing the column which is not used makes the query slower
over time, while not slowing the application?"

If you have some clues, I'm really interested.

Best regards
Nicolas



Re: Surprising no use of indexes - low performance

From
Ireneusz Pluta
Date:
W dniu 2013-02-14 16:35, Nicolas Charles pisze:
> I'm crunching the data by looking for each nodeid/ruleid/directiveid/serial with an
> executiontimestamp in an interval:
>
> explain analyze select executiondate, nodeid, ruleid, directiveid, serial, component, keyValue,
> executionTimeStamp, eventtype, policy, msg from RudderSysEvents where 1=1  and nodeId =
> '31264061-5ecb-4891-9aa4-83824178f43d'  and ruleId = '61713ff1-aa6f-4c86-b3cb-7012bee707dd' and
> serial = 10 and executiontimestamp between to_timestamp('2012-11-22 16:00:16.005', 'YYYY-MM-DD
> HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467', 'YYYY-MM-DD HH24:MI:SS.MS') ORDER BY
> executionTimeStamp asc;
>  Sort  (cost=293125.41..293135.03 rows=3848 width=252) (actual time=28628.922..28647.952
> rows=62403 loops=1)
>    Sort Key: executiontimestamp
>    Sort Method:  external merge  Disk: 17480kB
>    ->  Bitmap Heap Scan on ruddersysevents (cost=74359.66..292896.27 rows=3848 width=252) (actual
> time=1243.150..28338.927 rows=62403 loops=1)
>          Recheck Cond: ((nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid =
> '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
>          Filter: ((serial = 10) AND (executiontimestamp >= to_timestamp('2012-11-22
> 16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
> to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)))
>          ->  BitmapAnd  (cost=74359.66..74359.66 rows=90079 width=0) (actual
> time=1228.610..1228.610 rows=0 loops=1)
>                ->  Bitmap Index Scan on nodeid_idx (cost=0.00..25795.17 rows=716237 width=0)
> (actual time=421.365..421.365 rows=690503 loops=1)
>                      Index Cond: (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text)
>                ->  Bitmap Index Scan on configurationruleid_idx  (cost=0.00..48562.32 rows=1386538
> width=0) (actual time=794.490..794.490 rows=1381391 loops=1)
>                      Index Cond: (ruleid = '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)
>  Total runtime: 28657.352 ms
>
>
>
> I'm surprised that the executiontimestamp index is not used, since it seems to be where most of
> the query time is spent.

this use pattern is quite similar to the one I used to have problem with. The key problem here is
that planner wants to bitmapand on indexes that are spread on all the table, on all timestamp
values, regardless you are interested in only a narrow timestamp window, and is quite aggressive on
using bitmapscan feature. So the planner needs to be directed more precisely.

You could try the above again with:

SET enable_bitmapscan TO off ?

It helped in my case.

You may also try close the timestamp condition in a "preselecting" CTE, and doing the rest of finer
filtering outside of it, like:

with
p as (select * from RudderSysEvents where executiontimestamp between '2012-11-22 16:00:16.005' and
'2013-01-25 18:53:52.467')
select executiondate, nodeid, ruleid, directiveid, serial, component, keyValue, executionTimeStamp,
eventtype, policy, msg
from p
where nodeId = '31264061-5ecb-4891-9aa4-83824178f43d'  and ruleId =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10

As a side note, I think that all your indexes, except the timestamp one, are unnecessary, because of
low distribution or their values, and, as you see, the confuse they make to the planner.

Eventually, you may use one of the columns as a second one to a two column index together with
timestamp, the one which may always be used for filtering and add its filtering inside the CTE part.

HTH,
Irek.


Re: Surprising no use of indexes - low performance

From
Jeff Janes
Date:
On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
<nicolas.charles@normation.com> wrote:
>
> It contains 11018592 entries, with the followinf patterns :
> 108492 distinct executiontimestamp
> 14 distinct nodeid
> 59 distinct directiveid
> 26 distinct ruleid
> 35 distinct serial

How many entries fall within a typical query interval of executiontimestamp?

...
>
> I'm surprised that the executiontimestamp index is not used, since it seems
> to be where most of the query time is spent.

I do not draw that conclusion from your posted information.  Can you
highlight the parts of it that lead you to this conclusion?

> For all my tests, I removed all the incoming logs, so that this table has
> only selects and no writes
>
> I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,

A lot of improvements have been made since 8.4 which would make this
kind of thing easier to figure out.  What is smallish?

> with the following non default configuration
> shared_buffers = 112MB
> work_mem = 8MB
> maintenance_work_mem = 48MB
> max_stack_depth = 3MB
> wal_buffers = 1MB
> effective_cache_size = 128MB

effective_cache_size seems small unless you expect to have a lot of
this type of query running simultaneously, assuming you have at least
4GB of RAM, which I'm guessing you do based on your next comments.

> checkpoint_segments = 6
>
> Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
> performances (less than 10%). I would have expected it to improve, since the
> indexes would all fit in RAM

If the indexes fit in RAM, they fit in RAM.  If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM.  If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.

>
> create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
> serial, nodeid);

I wouldn't expect this to work well for this particular query.  Since
the leading column is used in a range test, the following columns
cannot be used efficiently in the index structure.  You should put the
equality-tested columns at the front of the index and the range-tested
one at the end of it.


>
> 2/ Removing nodeid from the index did lower again the perf
> create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
> serial);


I doubt that 84888.349 vs 83717.901 is really a meaningful difference.

> 3/ Removing executiontimestamp from the composite index makes the query
> performs better at the begining of its uses (around 17 secondes), but over
> time it degrades (I'm logging query longer than 20 secondes, and there are
> very rare in the first half of the batch, and getting more and more common
> at the end) to what is below

If the batch processing adds data, it is not surprising the query
slows down.  It looks like it is still faster at the end then the
previous two cases, right?


> So my question is :
> "Why *not* indexing the column which is not used makes the query slower over
> time, while not slowing the application?"

I don't know what column you are referring to here.  But it sounds
like you think that dropping the leading column from an index is a
minor change.  It is not.  It makes a fundamentally different index.

Cheers,

Jeff


Re: Surprising no use of indexes - low performance

From
Nicolas Charles
Date:
On 14/02/2013 20:27, Jeff Janes wrote:
> On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
> <nicolas.charles@normation.com>  wrote:
>> It contains 11018592 entries, with the followinf patterns :
>> 108492 distinct executiontimestamp
>> 14 distinct nodeid
>> 59 distinct directiveid
>> 26 distinct ruleid
>> 35 distinct serial
> How many entries fall within a typical query interval of executiontimestamp?

Around 65 000 entries
.
>> I'm surprised that the executiontimestamp index is not used, since it seems
>> to be where most of the query time is spent.
> I do not draw that conclusion from your posted information.  Can you
> highlight the parts of it that lead you to this conclusion?
The index scan are on nodeid_idx and configurationruleid_idx, not on
executiontimestamp
Or am I misreading the output ?

>> For all my tests, I removed all the incoming logs, so that this table has
>> only selects and no writes
>>
>> I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,
> A lot of improvements have been made since 8.4 which would make this
> kind of thing easier to figure out.  What is smallish?

A VM with 1 core, 2 GB RAM, a single hard drive

>> with the following non default configuration
>> shared_buffers = 112MB
>> work_mem = 8MB
>> maintenance_work_mem = 48MB
>> max_stack_depth = 3MB
>> wal_buffers = 1MB
>> effective_cache_size = 128MB
> effective_cache_size seems small unless you expect to have a lot of
> this type of query running simultaneously, assuming you have at least
> 4GB of RAM, which I'm guessing you do based on your next comments.

For the sake of the test, the VM got its memory increased, with no
significant changes

>> checkpoint_segments = 6
>>
>> Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
>> performances (less than 10%). I would have expected it to improve, since the
>> indexes would all fit in RAM
> If the indexes fit in RAM, they fit in RAM.  If anything, increasing
> shared_buffers could make it harder to fit them entirely in RAM.  If
> your shared buffers undergo a lot of churn, then the OS cache and the
> shared buffers tend to uselessly mirror each other, meaning there is
> less space for non-redundant pages.
Oh !
So I completely misunderstood the meaning of shared_buffer; I figured
that it was somehow the place where the data would be stored by postgres
(like indexes)


>
>> create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
>> serial, nodeid);
> I wouldn't expect this to work well for this particular query.  Since
> the leading column is used in a range test, the following columns
> cannot be used efficiently in the index structure.  You should put the
> equality-tested columns at the front of the index and the range-tested
> one at the end of it.
>
>
>> 2/ Removing nodeid from the index did lower again the perf
>> create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
>> serial);
>
> I doubt that 84888.349 vs 83717.901 is really a meaningful difference.
>
>> 3/ Removing executiontimestamp from the composite index makes the query
>> performs better at the begining of its uses (around 17 secondes), but over
>> time it degrades (I'm logging query longer than 20 secondes, and there are
>> very rare in the first half of the batch, and getting more and more common
>> at the end) to what is below
> If the batch processing adds data, it is not surprising the query
> slows down.  It looks like it is still faster at the end then the
> previous two cases, right?

Actually, the batch reads data from this table, and writes into another.
So the content of the table doesn't change at all
And yes, it is faster than the two previous case

>> So my question is :
>> "Why *not* indexing the column which is not used makes the query slower over
>> time, while not slowing the application?"
> I don't know what column you are referring to here.  But it sounds
> like you think that dropping the leading column from an index is a
> minor change.  It is not.  It makes a fundamentally different index.

I was refering to the executionTimeStamp column. I know it is a HUGE
change, but it's clearly not behavin the way I thought
With your remark I understand a little better what is going on, and I
can test better what I'm doing.

Thank you !
Nicolas


Re: Surprising no use of indexes - low performance

From
Jeff Janes
Date:
On Fri, Feb 15, 2013 at 1:00 AM, Nicolas Charles
<nicolas.charles@normation.com> wrote:
> On 14/02/2013 20:27, Jeff Janes wrote:
>>
>> On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
>> <nicolas.charles@normation.com>  wrote:
>>>
>>> It contains 11018592 entries, with the followinf patterns :
>>> 108492 distinct executiontimestamp
>>> 14 distinct nodeid
>>> 59 distinct directiveid
>>> 26 distinct ruleid
>>> 35 distinct serial
>>
>> How many entries fall within a typical query interval of
>> executiontimestamp?
>
>
> Around 65 000 entries
>
> .
>>>
>>> I'm surprised that the executiontimestamp index is not used, since it
>>> seems
>>> to be where most of the query time is spent.
>>
>> I do not draw that conclusion from your posted information.  Can you
>> highlight the parts of it that lead you to this conclusion?
>
> The index scan are on nodeid_idx and configurationruleid_idx, not on
> executiontimestamp
> Or am I misreading the output ?

You are correct about the use of nodeid_idx and
configurationruleid_idx.  But the part about most of the time going
into filtering out rows that fail to match executiontimestamp is the
part that I don't think is supported by the explained plan.  The
information given by the explain plan is not sufficient to decide that
one way or the other.  Some more recent improvements in "explain
(analyze, buffers)" would make it easier (especially with
track_io_timing on) but it would still be hard to draw a definitive
conclusion.  The most definitive thing would be to do the experiment
of adding executiontimestamp as a *trailing* column to the end of one
of the existing indexes and see what happens.


>> If the indexes fit in RAM, they fit in RAM.  If anything, increasing
>> shared_buffers could make it harder to fit them entirely in RAM.  If
>> your shared buffers undergo a lot of churn, then the OS cache and the
>> shared buffers tend to uselessly mirror each other, meaning there is
>> less space for non-redundant pages.
>
> Oh !
> So I completely misunderstood the meaning of shared_buffer; I figured that
> it was somehow the place where the data would be stored by postgres (like
> indexes)

That is correct, it is the space used by postgres to cache data.  But,
the rest of RAM (beyond shared_buffers) will also be used to cache
data; but by the OS rather than by postgres.  On a dedicated server,
the OS will generally choose to (or at least attempt to) use this
space for the benefit of postgres anyway.  If shared_buffers > RAM/2,
it won't be very successful at this, but it will still try.  The
kernel and postgres do not have intimate knowledge of each other, so
it is hard to arrange that all pages show up in just one place or the
other but not both.

Cheers,

Jeff


Re: Surprising no use of indexes - low performance

From
Merlin Moncure
Date:
On Thu, Feb 14, 2013 at 9:35 AM, Nicolas Charles
<nicolas.charles@normation.com> wrote:
> Hello,
>
> I've been struggling to understand what's happening on my databases/query
> for several days, and I'm turning to higher mind for a logical answer.
>
> I'm dealing with a fairly large database, containing logs informations, that
> I crunch to get data out of it, with several indexes on them that I hoped
> were logical
>
> \d ruddersysevents
>                                  Table « public.ruddersysevents »
>       Colonne       |           Type           |
> Modificateurs
> --------------------+--------------------------+--------------------------------------------------
>  id                 | integer                  | non NULL Par défaut,
> nextval('serial'::regclass)
>  executiondate      | timestamp with time zone | non NULL
>  nodeid             | text                     | non NULL
>  directiveid        | text                     | non NULL
>  ruleid             | text                     | non NULL
>  serial             | integer                  | non NULL
>  component          | text                     | non NULL
>  keyvalue           | text                     |
>  executiontimestamp | timestamp with time zone | non NULL
>  eventtype          | character varying(64)    |
>  policy             | text                     |
>  msg                | text                     |
> Index :
>     "ruddersysevents_pkey" PRIMARY KEY, btree (id)
>     "component_idx" btree (component)
>     "configurationruleid_idx" btree (ruleid)
>     "executiontimestamp_idx" btree (executiontimestamp)
>     "keyvalue_idx" btree (keyvalue)
>     "nodeid_idx" btree (nodeid)
> Contraintes de vérification :
>     "ruddersysevents_component_check" CHECK (component <> ''::text)
>     "ruddersysevents_configurationruleid_check" CHECK (ruleid <> ''::text)
>     "ruddersysevents_nodeid_check" CHECK (nodeid <> ''::text)
>     "ruddersysevents_policyinstanceid_check" CHECK (directiveid <> ''::text)
>
>
> It contains 11018592 entries, with the followinf patterns :
> 108492 distinct executiontimestamp
> 14 distinct nodeid
> 59 distinct directiveid
> 26 distinct ruleid
> 35 distinct serial
>
> Related table/index size are
>                 relation                |  size
> ----------------------------------------+---------
>  public.ruddersysevents                 | 3190 MB
>  public.nodeid_idx                      | 614 MB
>  public.configurationruleid_idx         | 592 MB
>  public.ruddersysevents_pkey            | 236 MB
>  public.executiontimestamp_idx          | 236 MB
>
>
> I'm crunching the data by looking for each nodeid/ruleid/directiveid/serial
> with an executiontimestamp in an interval:
>
> explain analyze select executiondate, nodeid, ruleid, directiveid, serial,
> component, keyValue, executionTimeStamp, eventtype, policy, msg from
> RudderSysEvents where 1=1  and nodeId =
> '31264061-5ecb-4891-9aa4-83824178f43d'  and ruleId =
> '61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10 and
> executiontimestamp between to_timestamp('2012-11-22 16:00:16.005',
> 'YYYY-MM-DD HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467',
> 'YYYY-MM-DD HH24:MI:SS.MS') ORDER BY executionTimeStamp asc;
>  Sort  (cost=293125.41..293135.03 rows=3848 width=252) (actual
> time=28628.922..28647.952 rows=62403 loops=1)
>    Sort Key: executiontimestamp
>    Sort Method:  external merge  Disk: 17480kB
>    ->  Bitmap Heap Scan on ruddersysevents  (cost=74359.66..292896.27
> rows=3848 width=252) (actual time=1243.150..28338.927 rows=62403 loops=1)
>          Recheck Cond: ((nodeid =
> '31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid =
> '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
>          Filter: ((serial = 10) AND (executiontimestamp >=
> to_timestamp('2012-11-22 16:00:16.005'::text, 'YYYY-MM-DD
> HH24:MI:SS.MS'::text)) AND (executiontimestamp <= to_timestamp('2013-01-25
> 18:53:52.467'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)))
>          ->  BitmapAnd  (cost=74359.66..74359.66 rows=90079 width=0) (actual
> time=1228.610..1228.610 rows=0 loops=1)
>                ->  Bitmap Index Scan on nodeid_idx  (cost=0.00..25795.17
> rows=716237 width=0) (actual time=421.365..421.365 rows=690503 loops=1)
>                      Index Cond: (nodeid =
> '31264061-5ecb-4891-9aa4-83824178f43d'::text)
>                ->  Bitmap Index Scan on configurationruleid_idx
> (cost=0.00..48562.32 rows=1386538 width=0) (actual time=794.490..794.490
> rows=1381391 loops=1)
>                      Index Cond: (ruleid =
> '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)
>  Total runtime: 28657.352 ms
>
>
>
> I'm surprised that the executiontimestamp index is not used, since it seems
> to be where most of the query time is spent.
>
> For all my tests, I removed all the incoming logs, so that this table has
> only selects and no writes
>
> I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,
> with the following non default configuration
> shared_buffers = 112MB
> work_mem = 8MB
> maintenance_work_mem = 48MB
> max_stack_depth = 3MB
> wal_buffers = 1MB
> effective_cache_size = 128MB
> checkpoint_segments = 6
>
> Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
> performances (less than 10%). I would have expected it to improve, since the
> indexes would all fit in RAM
> Every explain analyze made after modification of configuration and indexes
> where done after a complete batch of crunch of logs by our apps, to be sure
> the stats were corrects
>
> I tested further with the indexes, and reached these results :
> 1/ dropping the unique index  "configurationruleid_idx" btree (ruleid),
> "executiontimestamp_idx" btree (executiontimestamp),  "keyvalue_idx" btree
> (keyvalue),  "nodeid_idx" btree (nodeid) to replace them by a unique index
> did lower the perfs :
>
> create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
> serial, nodeid);
>
> Index Scan using composite_idx on ruddersysevents  (cost=0.01..497350.22
> rows=3729 width=252) (actual time=7.989..83717.901 rows=62403 loops=1)
>    Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22
> 16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
> (executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
> 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid =
> '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (nodeid
> = '31264061-5ecb-4891-9aa4-83824178f43d'::text))
>
>
> 2/ Removing nodeid from the index did lower again the perf
>
> create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
> serial);
>
>  Index Scan using composite2_idx on ruddersysevents  (cost=0.01..449948.98
> rows=3729 width=252) (actual time=23.507..84888.349 rows=62403 loops=1)
>    Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22
> 16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
> (executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
> 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid =
> '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10))
>    Filter: (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text)
>
> 3/ Removing executiontimestamp from the composite index makes the query
> performs better at the begining of its uses (around 17 secondes), but over
> time it degrades (I'm logging query longer than 20 secondes, and there are
> very rare in the first half of the batch, and getting more and more common
> at the end) to what is below
>
> create index composite3_idx on ruddersysevents (ruleid, serial, nodeid);
>   Sort  (cost=24683.44..24693.07 rows=3849 width=252) (actual
> time=60454.558..60474.013 rows=62403 loops=1)
>    Sort Key: executiontimestamp
>    Sort Method:  external merge  Disk: 17480kB
>    ->  Bitmap Heap Scan on ruddersysevents  (cost=450.12..24454.23 rows=3849
> width=252) (actual time=146.065..60249.143 rows=62403 loops=1)
>          Recheck Cond: ((ruleid =
> '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (nodeid
> = '31264061-5ecb-4891-9aa4-83824178f43d'::text))
>          Filter: ((executiontimestamp >= to_timestamp('2012-11-22
> 16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND
> (executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text,
> 'YYYY-MM-DD HH24:MI:SS.MS'::text)))
>          ->  Bitmap Index Scan on composite3_idx  (cost=0.00..449.15
> rows=6635 width=0) (actual time=129.102..129.102 rows=62403 loops=1)
>                Index Cond: ((ruleid =
> '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (nodeid
> = '31264061-5ecb-4891-9aa4-83824178f43d'::text))
>  Total runtime: 60484.022 ms
>
>
> And with all these tests, the difference of the whole database processing
> with each index time is within 10% error margin (2h10 to 2h25), even if most
> of the time is spent on the SQL query.
>
> So my question is :
> "Why *not* indexing the column which is not used makes the query slower over
> time, while not slowing the application?"
>
> If you have some clues, I'm really interested.

This query can be optimized to zero (assuming data in cache and not
too many dead rows).

create one index on nodeId, ruleId, serial, executiontimestamp columns and do:

SELECT
  executiondate, nodeid, ruleid, directiveid, serial, component,
keyValue, executionTimeStamp, eventtype, policy, msg
FROM RudderSysEvents
  WHERE (nodeId, ruleId, serial, executiontimestamp) >= (
    '31264061-5ecb-4891-9aa4-83824178f43d',
    '61713ff1-aa6f-4c86-b3cb-7012bee707dd',
    10, to_timestamp('2012-11-22 16:00:16.005', 'YYYY-MM-DD HH24:MI:SS.MS'))
   AND (nodeId, ruleId, serial, executiontimestamp) <= (
    '31264061-5ecb-4891-9aa4-83824178f43d',
    '61713ff1-aa6f-4c86-b3cb-7012bee707dd',
    10, to_timestamp('2013-01-25 18:53:52.467', 'YYYY-MM-DD HH24:MI:SS.MS'))
ORDER BY nodeId, ruleId, serial, executiontimestamp

merlin


Re: Surprising no use of indexes - low performance

From
Nicolas Charles
Date:
Le 15/02/2013 19:04, Jeff Janes a écrit :
> On Fri, Feb 15, 2013 at 1:00 AM, Nicolas Charles
> <nicolas.charles@normation.com> wrote:
>> On 14/02/2013 20:27, Jeff Janes wrote:
>>>> I'm surprised that the executiontimestamp index is not used, since it
>>>> seems
>>>> to be where most of the query time is spent.
>>> I do not draw that conclusion from your posted information.  Can you
>>> highlight the parts of it that lead you to this conclusion?
>> The index scan are on nodeid_idx and configurationruleid_idx, not on
>> executiontimestamp
>> Or am I misreading the output ?
> You are correct about the use of nodeid_idx and
> configurationruleid_idx.  But the part about most of the time going
> into filtering out rows that fail to match executiontimestamp is the
> part that I don't think is supported by the explained plan.  The
> information given by the explain plan is not sufficient to decide that
> one way or the other.  Some more recent improvements in "explain
> (analyze, buffers)" would make it easier (especially with
> track_io_timing on) but it would still be hard to draw a definitive
> conclusion.  The most definitive thing would be to do the experiment
> of adding executiontimestamp as a *trailing* column to the end of one
> of the existing indexes and see what happens.
I added this index
"composite_idx" btree (ruleid, serial, executiontimestamp)
and lowered the shared_buffer to 54 MB.

For this specific query, it is indeed a bit better. (23s against 28s)

  Sort  (cost=43449.44..43459.07 rows=3854 width=252) (actual
time=23375.247..23394.704 rows=62403 loops=1)
    Sort Key: executiontimestamp
    Sort Method:  external merge  Disk: 17480kB
    ->  Bitmap Heap Scan on ruddersysevents (cost=28884.44..43219.89
rows=3854 width=252) (actual time=1528.704..23155.991 rows=62403 loops=1)
          Recheck Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(executiontimestamp >= to_timestamp('2012-11-22 16:00:16.005'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)) AND (nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text))
          ->  BitmapAnd  (cost=28884.44..28884.44 rows=3854 width=0)
(actual time=1517.310..1517.310 rows=0 loops=1)
                ->  Bitmap Index Scan on composite_idx
(cost=0.00..3062.16 rows=59325 width=0) (actual time=775.418..775.418
rows=811614 loops=1)
                      Index Cond: ((ruleid =
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND
(executiontimestamp >= to_timestamp('2012-11-22 16:00:16.005'::text,
'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <=
to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD
HH24:MI:SS.MS'::text)))
                ->  Bitmap Index Scan on nodeid_idx (cost=0.00..25820.11
rows=717428 width=0) (actual time=714.254..714.254 rows=690503 loops=1)
                      Index Cond: (nodeid =
'31264061-5ecb-4891-9aa4-83824178f43d'::text)
  Total runtime: 23419.411 ms
(11 lignes)

But since there were a lot of Sort Method:  external merge, and a lot of
object instanciations in our batch, we tried to split the job on 5 days
interval, so the average number of line returned is closed to 5000

Load/IO of the system was *significantly* lower during the batch
treatment with this index than with the previous attempt (load around .7
instead of 1.8)

And batch execution time is now 1h08 rather than 2h20

>>> If the indexes fit in RAM, they fit in RAM.  If anything, increasing
>>> shared_buffers could make it harder to fit them entirely in RAM.  If
>>> your shared buffers undergo a lot of churn, then the OS cache and the
>>> shared buffers tend to uselessly mirror each other, meaning there is
>>> less space for non-redundant pages.
>> Oh !
>> So I completely misunderstood the meaning of shared_buffer; I figured that
>> it was somehow the place where the data would be stored by postgres (like
>> indexes)
> That is correct, it is the space used by postgres to cache data.  But,
> the rest of RAM (beyond shared_buffers) will also be used to cache
> data; but by the OS rather than by postgres.  On a dedicated server,
> the OS will generally choose to (or at least attempt to) use this
> space for the benefit of postgres anyway.  If shared_buffers > RAM/2,
> it won't be very successful at this, but it will still try.  The
> kernel and postgres do not have intimate knowledge of each other, so
> it is hard to arrange that all pages show up in just one place or the
> other but not both.
>


Thank you very much for your explainations !

Nicolas