Re: Surprising no use of indexes - low performance - Mailing list pgsql-performance

From Nicolas Charles
Subject Re: Surprising no use of indexes - low performance
Date
Msg-id 511DF928.8070101@normation.com
Whole thread Raw
In response to Re: Surprising no use of indexes - low performance  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Surprising no use of indexes - low performance
List pgsql-performance
On 14/02/2013 20:27, Jeff Janes wrote:
> On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
> <nicolas.charles@normation.com>  wrote:
>> It contains 11018592 entries, with the followinf patterns :
>> 108492 distinct executiontimestamp
>> 14 distinct nodeid
>> 59 distinct directiveid
>> 26 distinct ruleid
>> 35 distinct serial
> How many entries fall within a typical query interval of executiontimestamp?

Around 65 000 entries
.
>> I'm surprised that the executiontimestamp index is not used, since it seems
>> to be where most of the query time is spent.
> I do not draw that conclusion from your posted information.  Can you
> highlight the parts of it that lead you to this conclusion?
The index scan are on nodeid_idx and configurationruleid_idx, not on
executiontimestamp
Or am I misreading the output ?

>> For all my tests, I removed all the incoming logs, so that this table has
>> only selects and no writes
>>
>> I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,
> A lot of improvements have been made since 8.4 which would make this
> kind of thing easier to figure out.  What is smallish?

A VM with 1 core, 2 GB RAM, a single hard drive

>> with the following non default configuration
>> shared_buffers = 112MB
>> work_mem = 8MB
>> maintenance_work_mem = 48MB
>> max_stack_depth = 3MB
>> wal_buffers = 1MB
>> effective_cache_size = 128MB
> effective_cache_size seems small unless you expect to have a lot of
> this type of query running simultaneously, assuming you have at least
> 4GB of RAM, which I'm guessing you do based on your next comments.

For the sake of the test, the VM got its memory increased, with no
significant changes

>> checkpoint_segments = 6
>>
>> Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
>> performances (less than 10%). I would have expected it to improve, since the
>> indexes would all fit in RAM
> If the indexes fit in RAM, they fit in RAM.  If anything, increasing
> shared_buffers could make it harder to fit them entirely in RAM.  If
> your shared buffers undergo a lot of churn, then the OS cache and the
> shared buffers tend to uselessly mirror each other, meaning there is
> less space for non-redundant pages.
Oh !
So I completely misunderstood the meaning of shared_buffer; I figured
that it was somehow the place where the data would be stored by postgres
(like indexes)


>
>> create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
>> serial, nodeid);
> I wouldn't expect this to work well for this particular query.  Since
> the leading column is used in a range test, the following columns
> cannot be used efficiently in the index structure.  You should put the
> equality-tested columns at the front of the index and the range-tested
> one at the end of it.
>
>
>> 2/ Removing nodeid from the index did lower again the perf
>> create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
>> serial);
>
> I doubt that 84888.349 vs 83717.901 is really a meaningful difference.
>
>> 3/ Removing executiontimestamp from the composite index makes the query
>> performs better at the begining of its uses (around 17 secondes), but over
>> time it degrades (I'm logging query longer than 20 secondes, and there are
>> very rare in the first half of the batch, and getting more and more common
>> at the end) to what is below
> If the batch processing adds data, it is not surprising the query
> slows down.  It looks like it is still faster at the end then the
> previous two cases, right?

Actually, the batch reads data from this table, and writes into another.
So the content of the table doesn't change at all
And yes, it is faster than the two previous case

>> So my question is :
>> "Why *not* indexing the column which is not used makes the query slower over
>> time, while not slowing the application?"
> I don't know what column you are referring to here.  But it sounds
> like you think that dropping the leading column from an index is a
> minor change.  It is not.  It makes a fundamentally different index.

I was refering to the executionTimeStamp column. I know it is a HUGE
change, but it's clearly not behavin the way I thought
With your remark I understand a little better what is going on, and I
can test better what I'm doing.

Thank you !
Nicolas


pgsql-performance by date:

Previous
From: Ali Pouya
Date:
Subject: Re: Partition insert trigger using C language
Next
From: Florian Schröck
Date:
Subject: Very slow update statement on 40mio rows