Thread: Confusion and Questions about blocks read

Confusion and Questions about blocks read

From

"Alex Turner"

Date:

22 September 2006, 15:35:12

The query expain analyze looks like this:

click-counter=# explain analyze select count(*) as count, to_char(date_trunc('day',c.datestamp),'DD-Mon') as day from impression c, url u, handle h where c.url_id=u.url_id and c.handle_id=h.handle_id and h.handle like '10000.19%' group by date_trunc('day',c.datestamp) order by date_trunc('day',c.datestamp);
                                                                                  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=530282.76..530283.04 rows=113 width=8) (actual time= 191887.059..191887.131 rows=114 loops=1)
   Sort Key: date_trunc('day'::text, c.datestamp)
   -> HashAggregate (cost=530276.65..530278.91 rows=113 width=8) (actual time=191886.081..191886.509 rows=114 loops=1)
         -> Hash Join (cost=128.41..518482.04 rows=2358921 width=8) (actual time=17353.281..190568.890 rows=625212 loops=1)
               Hash Cond: ("outer".handle_id = "inner".handle_id)
               -> Merge Join (cost=0.00..444641.52 rows=5896746 width=12) (actual time=34.582..183154.561 rows=5896746 loops=1)
                     Merge Cond: ("outer".url_id = "inner".url_id)
                     -> Index Scan using url_pkey on url u (cost=0.00..106821.10 rows=692556 width=8) (actual time=0.078..83432.380 rows=692646 loops=1)
                     -> Index Scan using impression_url_i on impression c (cost= 0.00..262546.95 rows=5896746 width=16) (actual time=34.473..86701.410 rows=5896746 loops=1)
               -> Hash (cost=123.13..123.13 rows=2115 width=4) (actual time=40.159..40.159 rows=2706 loops=1)
                     -> Bitmap Heap Scan on handle h (cost= 24.69..123.13 rows=2115 width=4) (actual time=20.362..36.819 rows=2706 loops=1)
                           Filter: (handle ~~ '10000.19%'::text)
                           -> Bitmap Index Scan on handles_i (cost= 0.00..24.69 rows=2115 width=0) (actual time=20.264..20.264 rows=2706 loops=1)
                                 Index Cond: ((handle >= '10000.19'::text) AND (handle < '10000.1:'::text))
Total runtime: 191901.868 ms

(looks like it sped up a bit the second time I did it)

When I query relpages for the tables involved:

click-counter=# select relpages from pg_class where relname='impression';
relpages
----------
    56869
(1 row)

click-counter=# select relpages from pg_class where relname='url';
relpages
----------
    66027
(1 row)

click-counter=# select relpages from pg_class where relname='handle';
relpages
----------
       72
(1 row)

click-counter=#

they only total 122968.

Home come the query statistics showed that 229066 blocks where read given that all the blocks in all the tables put together only total 122968?

LOG: QUERY STATISTICS
DETAIL: ! system usage stats:
        !       218.630786 elapsed 24.160000 user 13.930000 system sec
        !       [261.000000 user 85.610000 sys total]
        !       0/0 [0/0] filesystem blocks in/out
        !       65/47 [20176/99752] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       0/0 [0/0] voluntary/involuntary context switches
        ! buffer usage stats:
        !       Shared blocks:     229066 read,          2 written, buffer hit rate = 55.61%
        !       Local blocks:          0 read,          0 written, buffer hit rate = 0.00%
        !       Direct blocks:          0 read,          0 written

Alex.

Re: Confusion and Questions about blocks read

From

Tom Lane

Date:

22 September 2006, 16:31:30

"Alex Turner" <armtuk@gmail.com> writes:
> Home come the query statistics showed that 229066 blocks where read given
> that all the blocks in all the tables put together only total 122968?

You forgot to count the indexes.  Also, the use of indexscans in the
mergejoins probably causes multiple re-reads of some table blocks,
depending on just what the physical ordering of the rows is.

            regards, tom lane

Re: Confusion and Questions about blocks read

From

"Alex Turner"

Date:

22 September 2006, 17:46:24

ahh.... good point

Thanks

On 9/22/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Alex Turner" <armtuk@gmail.com> writes:
> Home come the query statistics showed that 229066 blocks where read given
> that all the blocks in all the tables put together only total 122968?

You forgot to count the indexes. Also, the use of indexscans in the
mergejoins probably causes multiple re-reads of some table blocks,
depending on just what the physical ordering of the rows is.

regards, tom lane

Re: Confusion and Questions about blocks read

From

"Alex Turner"

Date:

22 September 2006, 21:38:29

Ok - so I have another mystery:

I insert virtually the same rows into two different tables:

trend=# insert into fish select 2, nextval('result_entry_order_seq'), property_id from property;
INSERT 0 59913
trend=# insert into result_entry select 0, nextval('result_entry_order_seq'), property_id from property;
INSERT 0 59913
trend=#

but the stats show one as having written 20x as many blocks:

LOG: statement: insert into fish select 2, nextval('result_entry_order_seq'), property_id from property;
LOG: QUERY STATISTICS
DETAIL: ! system usage stats:
         !       2.098067 elapsed 0.807877 user 1.098833 system sec
         !       [23.875370 user 27.789775 sys total]
         !       0/0 [0/0] filesystem blocks in/out
         !       0/1 [5/62269] page faults/reclaims, 0 [0] swaps
         !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
         !       72/6 [18464/1126] voluntary/involuntary context switches
         ! buffer usage stats:
         !       Shared blocks:      79106 read,        420 written, buffer hit rate = 79.39%
         !       Local blocks:          0 read,          0 written, buffer hit rate = 0.00%
         !       Direct blocks:          0 read,          0 written

LOG: statement: insert into result_entry select 0, nextval('result_entry_order_seq'), property_id from property;
LOG: QUERY STATISTICS
DETAIL: ! system usage stats:
!       16.963729 elapsed 3.533463 user 1.706740 system sec
!       [27.408833 user 29.497515 sys total]
!       0/0 [0/0] filesystem blocks in/out
!       0/1186 [5/63455] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       59/139 [18525/1265] voluntary/involuntary context switches
! buffer usage stats:
!       Shared blocks:     100744 read,       7352 written, buffer hit rate = 89.71%
!       Local blocks:          0 read,          0 written, buffer hit rate = 0.00%
!       Direct blocks:          0 read,          0 written

I understand the read blocks difference, the second had to check indexes matching the foreign keys.

The table definitions are given below:

trend=# \d fish
           Table "public.fish"
       Column       | Type   | Modifiers
--------------------+---------+-----------
result_id          | bigint |
result_entry_order | bigint |
property_id        | integer |
Indexes:
    "fish_pkey" UNIQUE, btree (result_id, result_entry_order)

trend=# \d result_Entry
                                 Table "public.result_entry "
       Column       | Type   |                          Modifiers
--------------------+---------+-----------
result_id          | bigint |
result_entry_order | bigint |
property_id        | integer |
Indexes:
    "fish_pkey" UNIQUE, btree (result_id, result_entry_order)

The explain analyzes are kind of interesting:

trend=# explain analyze insert into fish select 2, nextval('result_entry_order_seq'), property_id from property;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on property (cost= 0.00..79295.70 rows=59913 width=8) (actual time=0.275..1478.681 rows=59913 loops=1)
Total runtime: 2178.600 ms
(2 rows)

trend=# explain analyze insert into result_entry select 0, nextval('result_entry_order_seq'), property_id from property;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on property (cost= 0.00..79295.70 rows=59913 width=8) (actual time=0.118..1473.352 rows=59913 loops=1)
Trigger for constraint result_entry_result_fk: time=2037.351 calls=59913
Trigger for constraint result_entry_property_fk: time=8622.260 calls=59913
Total runtime: 12959.716 ms
(4 rows)

I don't understand the time for the FK check given the size of the tables they are checking against (and I understand it's the indexes, not the tables that the actualy check is made):

trend=# select count(*) from result_cache;
count
-------
     8
(1 row)

trend=#

trend=# select count(*) from property;
count
-------
59913
(1 row)

trend=#

The database was just re-indexed, and no changes beyond this insert were made in that time and result_entry has recently been vacuumed.

Any insight would be greatly appreciated

Alex

On 9/22/06, Alex Turner <armtuk@gmail.com> wrote:

ahh.... good point

Thanks

On 9/22/06, Tom Lane < tgl@sss.pgh.pa.us> wrote:
"Alex Turner" <armtuk@gmail.com> writes:
> Home come the query statistics showed that 229066 blocks where read given
> that all the blocks in all the tables put together only total 122968?

You forgot to count the indexes. Also, the use of indexscans in the
mergejoins probably causes multiple re-reads of some table blocks,
depending on just what the physical ordering of the rows is.

regards, tom lane

Re: Confusion and Questions about blocks read

From

Markus Schaber

Date:

23 September 2006, 12:19:50

Hi, Tom,

Tom Lane wrote:
> "Alex Turner" <armtuk@gmail.com> writes:
>> Home come the query statistics showed that 229066 blocks where read given
>> that all the blocks in all the tables put together only total 122968?
>
> You forgot to count the indexes.  Also, the use of indexscans in the
> mergejoins probably causes multiple re-reads of some table blocks,
> depending on just what the physical ordering of the rows is.

As far as I understand, Index Bitmap Scans improve this behaviour, by
ensuring that every table block is read only once.

Btw, would it be feasible to enhance normal index scans by looking at
all rows in the current table block whether they meet the query
criteria, fetch them all, and blacklist the block for further revisiting
during the same index scan?

I think that, for non-sorted cases, this could improve index scans a
little, but I don't know whether it's worth the effort, given that
bitmap indidex scans exist.

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: Confusion and Questions about blocks read

From

Jim Nasby

Date:

27 September 2006, 03:46:25

On Sep 23, 2006, at 8:19 AM, Markus Schaber wrote:
> Btw, would it be feasible to enhance normal index scans by looking at
> all rows in the current table block whether they meet the query
> criteria, fetch them all, and blacklist the block for further
> revisiting
> during the same index scan?
>
> I think that, for non-sorted cases, this could improve index scans a
> little, but I don't know whether it's worth the effort, given that
> bitmap indidex scans exist.

The trade-off is you'd burn a lot more CPU on those pages. What might
be interesting would be collapsing bitmap scan data down to a page
level when certain conditions were met, such as if you're getting a
significant number of hits for a given page. There's probably other
criteria that could be used as well. One issue would be considering
the effects of other bitmap index operations; if you're ANDing a
bunch of scans together, you're likely to have far fewer tuples per
page coming out the backside, which means you probably wouldn't want
to burn the extra CPU to do full page scans.

BTW, I remember discussion at some point about ordering the results
of a bitmap scan by page/tuple ID, which would essentially do what
you're talking about. I don't know if it actually happened or not,
though.

If this is something that interests you, I recommend taking a look at
the code; it's generally not too hard to read through thanks to all
the comments.
--
Jim Nasby                                    jimn@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

--
Jim Nasby                                    jimn@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)