Re: Surprising no use of indexes - low performance - Mailing list pgsql-performance
From | Nicolas Charles |
---|---|
Subject | Re: Surprising no use of indexes - low performance |
Date | |
Msg-id | 511F508A.9080004@normation.com Whole thread Raw |
In response to | Re: Surprising no use of indexes - low performance (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-performance |
Le 15/02/2013 19:04, Jeff Janes a écrit : > On Fri, Feb 15, 2013 at 1:00 AM, Nicolas Charles > <nicolas.charles@normation.com> wrote: >> On 14/02/2013 20:27, Jeff Janes wrote: >>>> I'm surprised that the executiontimestamp index is not used, since it >>>> seems >>>> to be where most of the query time is spent. >>> I do not draw that conclusion from your posted information. Can you >>> highlight the parts of it that lead you to this conclusion? >> The index scan are on nodeid_idx and configurationruleid_idx, not on >> executiontimestamp >> Or am I misreading the output ? > You are correct about the use of nodeid_idx and > configurationruleid_idx. But the part about most of the time going > into filtering out rows that fail to match executiontimestamp is the > part that I don't think is supported by the explained plan. The > information given by the explain plan is not sufficient to decide that > one way or the other. Some more recent improvements in "explain > (analyze, buffers)" would make it easier (especially with > track_io_timing on) but it would still be hard to draw a definitive > conclusion. The most definitive thing would be to do the experiment > of adding executiontimestamp as a *trailing* column to the end of one > of the existing indexes and see what happens. I added this index "composite_idx" btree (ruleid, serial, executiontimestamp) and lowered the shared_buffer to 54 MB. For this specific query, it is indeed a bit better. (23s against 28s) Sort (cost=43449.44..43459.07 rows=3854 width=252) (actual time=23375.247..23394.704 rows=62403 loops=1) Sort Key: executiontimestamp Sort Method: external merge Disk: 17480kB -> Bitmap Heap Scan on ruddersysevents (cost=28884.44..43219.89 rows=3854 width=252) (actual time=1528.704..23155.991 rows=62403 loops=1) Recheck Cond: ((ruleid = '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (executiontimestamp >= to_timestamp('2012-11-22 16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text)) -> BitmapAnd (cost=28884.44..28884.44 rows=3854 width=0) (actual time=1517.310..1517.310 rows=0 loops=1) -> Bitmap Index Scan on composite_idx (cost=0.00..3062.16 rows=59325 width=0) (actual time=775.418..775.418 rows=811614 loops=1) Index Cond: ((ruleid = '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (executiontimestamp >= to_timestamp('2012-11-22 16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text))) -> Bitmap Index Scan on nodeid_idx (cost=0.00..25820.11 rows=717428 width=0) (actual time=714.254..714.254 rows=690503 loops=1) Index Cond: (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text) Total runtime: 23419.411 ms (11 lignes) But since there were a lot of Sort Method: external merge, and a lot of object instanciations in our batch, we tried to split the job on 5 days interval, so the average number of line returned is closed to 5000 Load/IO of the system was *significantly* lower during the batch treatment with this index than with the previous attempt (load around .7 instead of 1.8) And batch execution time is now 1h08 rather than 2h20 >>> If the indexes fit in RAM, they fit in RAM. If anything, increasing >>> shared_buffers could make it harder to fit them entirely in RAM. If >>> your shared buffers undergo a lot of churn, then the OS cache and the >>> shared buffers tend to uselessly mirror each other, meaning there is >>> less space for non-redundant pages. >> Oh ! >> So I completely misunderstood the meaning of shared_buffer; I figured that >> it was somehow the place where the data would be stored by postgres (like >> indexes) > That is correct, it is the space used by postgres to cache data. But, > the rest of RAM (beyond shared_buffers) will also be used to cache > data; but by the OS rather than by postgres. On a dedicated server, > the OS will generally choose to (or at least attempt to) use this > space for the benefit of postgres anyway. If shared_buffers > RAM/2, > it won't be very successful at this, but it will still try. The > kernel and postgres do not have intimate knowledge of each other, so > it is hard to arrange that all pages show up in just one place or the > other but not both. > Thank you very much for your explainations ! Nicolas
pgsql-performance by date: