Thread: Wrong actual number of rows in the Query Plan

Wrong actual number of rows in the Query Plan

From
Vahe Evoyan
Date:
Hello,

The Query Plan for the query below shows a large number in its actual rows count by an unknown reason. As a result Merge Join works on a large enough data to slow down the query.

The table which I query has the following description:


                                                      Table "public.qor_value"
   Column    |          Type          |                              Modifiers                              | Storage  | Description
-------------+------------------------+---------------------------------------------------------------------+----------+-------------
 value_id    | integer                | not null default nextval('qor_value_denorm_value_id_seq'::regclass) | plain    |
 run_id      | integer                | not null                                                            | plain    |
 dft_id      | integer                | not null                                                            | plain    |
 stat_id     | integer                | not null                                                            | plain    |
 key         | character varying(128) |                                                                     | extended |
 value       | numeric(22,10)         |                                                                     | main     |
 line_number | integer                | not null default nextval('qor_value_line_numbering'::regclass)      | plain    |
 file_number | integer                | not null                                                            | plain    |
Indexes:
    "qor_value_cluster" btree (run_id, stat_id) CLUSTER INVALID
    "qor_value_filtered_self_join" btree (run_id, stat_id, key, dft_id, line_number) INVALID
    "qor_value_self_join" btree (run_id, stat_id, dft_id, key, line_number)


Here is the query:

EXPLAIN ANALYZE
SELECT *
FROM "qor_value" V1
INNER JOIN "qor_value" V2
USING ("dft_id", "stat_id", "key")
WHERE
V1."stat_id" = 342 AND
V1."run_id" = 60807 AND
V2."run_id" = 60875;


                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..2513.96 rows=1 width=72) (actual time=127.361..473.687 rows=66460 loops=1)
   Merge Cond: ((v1.dft_id = v2.dft_id) AND ((v1.key)::text = (v2.key)::text))
   ->  Index Scan using qor_value_self_join on qor_value v1  (cost=0.00..1255.60 rows=275 width=51) (actual time=89.549..97.045 rows=1388 loops=1)
         Index Cond: ((run_id = 60807) AND (stat_id = 342))
   ->  Index Scan using qor_value_self_join on qor_value v2  (cost=0.00..1255.60 rows=275 width=51) (actual time=37.796..134.286 rows=66343 loops=1)
         Index Cond: ((run_id = 60875) AND (stat_id = 342))
 Total runtime: 544.646 ms
(7 rows)


Note that the second Index Scan has 66343 rows in place of 1388. Here is the query which proves that:

SELECT COUNT(*) FROM "qor_value" WHERE run_id = 60875 AND stat_id = 342;
 count
-------
  1388


Please help me to figure out where the problem is.
Thanks in advance,
Vahe

Re: Wrong actual number of rows in the Query Plan

From
Tom Lane
Date:
Vahe Evoyan <vahe.evoyan@gmail.com> writes:
>  Merge Join  (cost=0.00..2513.96 rows=1 width=72) (actual
> time=127.361..473.687 rows=66460 loops=1)
>    Merge Cond: ((v1.dft_id = v2.dft_id) AND ((v1.key)::text =
> (v2.key)::text))
>    ->  Index Scan using qor_value_self_join on qor_value v1
> (cost=0.00..1255.60 rows=275 width=51) (actual time=89.549..97.045
> rows=1388 loops=1)
>          Index Cond: ((run_id = 60807) AND (stat_id = 342))
>    ->  Index Scan using qor_value_self_join on qor_value v2
> (cost=0.00..1255.60 rows=275 width=51) (actual time=37.796..134.286
> rows=66343 loops=1)
>          Index Cond: ((run_id = 60875) AND (stat_id = 342))
>  Total runtime: 544.646 ms
> (7 rows)

> Note that the second Index Scan has 66343 rows in place of 1388.

That's not a bug.  That's a result of rescanning portions of the inner
relation's output due to duplicate mergejoin keys in the outer relation.
The EXPLAIN ANALYZE machinery counts the re-fetches as if they were new
rows, though in some sense they're not.

            regards, tom lane