Thread: [GENERAL] explain analyze showed improved results without changes, why?

[GENERAL] explain analyze showed improved results without changes, why?

From
Chris Richards
Date:
Howdy. I was noticing a significant problem with a query on one of my tables. I tried recreating the problem and wasn't able to do so on a different install, and so a few days later I ran the same query on the problem table. Lo' and behold, there wasn't a problem anymore. I'm at a loss to why. The indices and two explains follow below. Thanks in advance for the help.

Cheers,
Chris

Indexes:
    "blocks_pkey" PRIMARY KEY, btree (cloudidx, blkid)
    "blocks_blkid_idx" btree (blkid)
    "blocks_cloudidx_idx" btree (cloudidx)
    "blocks_off_sz_idx" btree (off, sz)

mdb=> explain analyze SELECT * FROM blocks
 WHERE cloudidx=98038 AND off+sz >= 0 AND state='seeded'
 ORDER BY off LIMIT 1 FOR UPDATE;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..587.83 rows=1 width=100) (actual time=4814.579..4814.579 rows=1 loops=1)
   ->  LockRows  (cost=0.43..1358633.99 rows=2313 width=100) (actual time=4814.577..4814.577 rows=1 loops=1)
         ->  Index Scan using blocks_off_sz_idx on blocks  (cost=0.43..1358610.86 rows=2313 width=100) (actual time=4813.498..4814.384 rows=2 loops=1)
               Filter: ((cloudidx = 98038) AND (state = 'seeded'::block_state) AND ((off + sz) >= 0))
               Rows Removed by Filter: 6935023
 Total runtime: 4814.619 ms
(6 rows)

mdb=> select count(*) from blocks;
  count
----------
 11052135
(1 row)

mdb=> select count (distinct (cloudidx)) from blocks;
 count
-------
  4549
(1 row)

And here's the second. Notice that even though there are more rows, it was much faster and the "rows removed by filter" were significantly reduced by several orders of magnitude.

mdb=> explain analyze SELECT * FROM blocks
 WHERE cloudidx=98038 AND off+sz >= 0 AND state='seeded'
 ORDER BY off LIMIT 1 FOR UPDATE;
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..584.57 rows=1 width=100) (actual time=0.071..0.071 rows=1 loops=1)
   ->  LockRows  (cost=0.43..1390825.21 rows=2381 width=100) (actual time=0.070..0.070 rows=1 loops=1)
         ->  Index Scan using blocks_off_sz_idx on blocks  (cost=0.43..1390801.40 rows=2381 width=100) (actual time=0.055..0.055 rows=1 loops=1)
               Filter: ((cloudidx = 98038) AND (state = 'seeded'::block_state) AND ((off + sz) >= 0))
               Rows Removed by Filter: 26
 Total runtime: 0.114 ms
(6 rows)

mdb=> select count(*) from blocks;
  count
----------
 11328801
(1 row)

mdb=> select count (distinct (cloudidx)) from blocks;
 count
-------
  4613
(1 row)

Re: [GENERAL] explain analyze showed improved results withoutchanges, why?

From
"Peter J. Holzer"
Date:
On 2016-12-23 10:35:26 -0600, Chris Richards wrote:
> Howdy. I was noticing a significant problem with a query on one of my tables. I
> tried recreating the problem and wasn't able to do so on a different install,
> and so a few days later I ran the same query on the problem table. Lo' and
> behold, there wasn't a problem anymore. I'm at a loss to why.
[...]
>     "blocks_off_sz_idx" btree (off, sz)
>
> mdb=> explain analyze SELECT * FROM blocks
>  WHERE cloudidx=98038 AND off+sz >= 0 AND state='seeded'
>  ORDER BY off LIMIT 1 FOR UPDATE;
>                                                                       QUERY
> PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.43..587.83 rows=1 width=100) (actual time=4814.579..4814.579 rows=1 loops=1)
>    ->  LockRows  (cost=0.43..1358633.99 rows=2313 width=100) (actual time= 4814.577..4814.577 rows=1 loops=1)
>          ->  Index Scan using blocks_off_sz_idx on blocks  (cost= 0.43..1358610.86 rows=2313 width=100) (actual
time=4813.498..4814.384rows=2 loops=1) 
>                Filter: ((cloudidx = 98038) AND (state = 'seeded'::block_state) AND ((off + sz) >= 0))
>                Rows Removed by Filter: 6935023
>  Total runtime: 4814.619 ms
> (6 rows)

This scans the table in ascending (off, sz) order until it finds one row
matching the filter. Apparently at the time of the query there were
6935023 rows in the table before the matching row.

[...]
> And here's the second. Notice that even though there are more rows, it was much
> faster and the "rows removed by filter" were significantly reduced by several
> orders of magnitude.
>
>
> mdb=> explain analyze SELECT * FROM blocks
>  WHERE cloudidx=98038 AND off+sz >= 0 AND state='seeded'
>  ORDER BY off LIMIT 1 FOR UPDATE;
>                                                                    QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.43..584.57 rows=1 width=100) (actual time=0.071..0.071 rows=1 loops=1)
>    ->  LockRows  (cost=0.43..1390825.21 rows=2381 width=100) (actual time=0.070..0.070 rows=1 loops=1)
>          ->  Index Scan using blocks_off_sz_idx on blocks  (cost=0.43..1390801.40 rows=2381 width=100) (actual
time=0.055..0.055rows=1 loops=1) 
>                Filter: ((cloudidx = 98038) AND (state = 'seeded'::block_state) AND ((off + sz) >= 0))
>                Rows Removed by Filter: 26
>  Total runtime: 0.114 ms
> (6 rows)

The plan here is exactly the same, but only 26 rows are discarded. My
guess is that between those two queries a row was inserted with a really
low (off, sz) value which matches the query. So now the query can return
after checking only a handful of rows.

LIMIT, EXISTS, etc. are awful when you want predictable performance. You
may be lucky and the rows you are looking for are just at the start or
you may be unlucky and you have to scan through the whole table to find
them. The optimizer (usually) doesn't have enough information and
assumes they are spread randomly through the table.

        hp

--
   _  | Peter J. Holzer    | A coding theorist is someone who doesn't
|_|_) |                    | think Alice is crazy.
| |   | hjp@hjp.at         | -- John Gordon
__/   | http://www.hjp.at/ |    http://downlode.org/Etext/alicebob.html

Attachment