Re: BUG #13908: Query returns too few rows - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #13908: Query returns too few rows
Date
Msg-id 20160202223135.GA157331@alvherre.pgsql
Whole thread Raw
In response to BUG #13908: Query returns too few rows  (seth-p@outlook.com)
List pgsql-bugs
seth-p@outlook.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13908
> Logged by:          Seth
> Email address:      seth-p@outlook.com
> PostgreSQL version: 9.5.0
> Operating system:   Windows 10
> Description:
>
> Below is a copy of a long-ish email I wrote detailing the problem. If it
> doesn't come through in a reasonable format, please let me know where I can
> email it. Thanks --Seth

Here's the copy Seth sent with better formatting.

--------------------------

I've only recently started using Postgresql, and have encountered what I believe to be a bug in Postgresql 9.5 on
Windows10. 
Below are 6 queries that are identical except for (a) the date range in the WHERE clause, and (b) whether or not they
includeDISTINCT. This is as simple as I could get the queries while still reproducing the bug. The following table
summarizesthe number of rows returned by each query: 


-----------------------------------------------------------------------------------------------------------------------------------
--       Date Range         | # rows without DISTINCT | # rows with DISTINCT  | Notes
         | 

------------------------------------------------------------------------------|----------------------------------------------------
-- 2005-03-01 - 2005-07-30  |             (A) 415,983 |         (A-D) 416,075 | DISTINCT *increases* the number of rows
returned!| 
-- 2005-03-01 - 2005-04-30  |             (B) 168,886 |         (B-D) 168,886 | DISTINCT has no effect
         | 
-- 2005-05-01 - 2005-07-30  |             (C) 247,189 |         (C-D) 247,189 | DISTINCT has no effect
         | 
-- sum of sub-range queries |       (B) + (C) 416,075 | (B-D) + (C-D) 416,075 | In both cases, sum of 2 queries equals
(A-D)     | 

-----------------------------------------------------------------------------------------------------------------------------------

Query (A-D) (with DISTINCT) should not return more rows than query (A) (the identical query without DISTINCT), so
clearlysomething is wrong there. 
Looking at the results of the queries over the two sub-ranges strongly suggests that it is query (A) that is returning
toofew rows. 

Beneath the queries below I also show the EXPLAIN results for queries (A), (A-D), (B), and (B-D). Notice that the plan
forquery (A) differs from the other three -- it appears to have an extra "Hash" (highlighted with asterisks) that the
othersdo not. 

I'm afraid I am unable to share the data, but can provide table statistics, index descriptions, or anything else that
mightbe useful. Any suggestions for how to proceed? 

Seth

-- (A) 2005-03-01 - 2005-07-30 returns 415,983 rows
select count(*) from
(
SELECT
    rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
    ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
    ON uuu.universe_hash = 5188205190738336870 AND
           uuu.barrid = rrr.barrid
WHERE
    fff.file_name_model_ver = '100' AND
    fff.file_name_date BETWEEN '2005-03-01' AND '2005-07-30'
) a

-- (A-D) 2005-03-01 - 2005-07-30 DISTNICT returns 416,075 rows
select count(*) from
(
SELECT DISTINCT
    rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
    ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
    ON uuu.universe_hash = 5188205190738336870 AND
           uuu.barrid = rrr.barrid
WHERE
    fff.file_name_model_ver = '100' AND
    fff.file_name_date BETWEEN '2005-03-01' AND '2005-07-30'
) ad

-- (B) 2005-03-01 - 2005-04-30 returns 168,886 rows
select count(*) from
(
SELECT
    rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
    ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
    ON uuu.universe_hash = 5188205190738336870 AND
           uuu.barrid = rrr.barrid
WHERE
    fff.file_name_model_ver = '100' AND
    fff.file_name_date BETWEEN '2005-03-01' AND '2005-04-30'
) b

-- (B-D) 2005-03-01 - 2005-04-30 DISTINCT returns 168,886 rows
select count(*) from
(
SELECT DISTINCT
    rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
    ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
    ON uuu.universe_hash = 5188205190738336870 AND
           uuu.barrid = rrr.barrid
WHERE
    fff.file_name_model_ver = '100' AND
    fff.file_name_date BETWEEN '2005-03-01' AND '2005-04-30'
) bd

-- (C) 2005-05-01 - 2005-07-30 returns 247,189 rows
select count(*) from
(
SELECT
    rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
    ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
    ON uuu.universe_hash = 5188205190738336870 AND
           uuu.barrid = rrr.barrid
WHERE
    fff.file_name_model_ver = '100' AND
    fff.file_name_date BETWEEN '2005-05-01' AND '2005-07-30'
) c

-- (C-D) 2005-05-01 - 2005-07-30 DISTINCT returns 247,189 rows
select count(*) from
(
SELECT DISTINCT
    rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
    ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
    ON uuu.universe_hash = 5188205190738336870 AND
           uuu.barrid = rrr.barrid
WHERE
    fff.file_name_model_ver = '100' AND
    fff.file_name_date BETWEEN '2005-05-01' AND '2005-07-30'
) cd



EPXLAIN of queries

(A)
"Aggregate  (cost=1821781.18..1821781.19 rows=1 width=0)"
"  ->  Hash Join  (cost=1811365.12..1821096.53 rows=273861 width=0)"
"        Hash Cond: ((uuu.barrid)::text = (rrr.barrid)::text)"
"        ->  Bitmap Heap Scan on temp_universe_instruments uuu  (cost=231.25..932.60 rows=8108 width=8)"
"              Recheck Cond: (universe_hash = '5188205190738336870'::bigint)"
"              ->  Bitmap Index Scan on pk_temp_universe_instruments  (cost=0.00..229.23 rows=8108 width=0)"
"                    Index Cond: (universe_hash = '5188205190738336870'::bigint)"
"        ->  Hash  (cost=1796474.21..1796474.21 rows=893492 width=8)"
**********************************************************
"              ->  Hash Join  (cost=2341.15..1796474.21 rows=893492 width=8)"
"                    Hash Cond: (rrr.barra_file_idx = fff.idx)"
"                    ->  Seq Scan on models_direct_row_asset_data rrr  (cost=0.00..1518763.74 rows=71049174 width=12)"
"                    ->  Hash  (cost=2329.00..2329.00 rows=972 width=4)"
"                          ->  Bitmap Heap Scan on models_direct_file fff  (cost=35.33..2329.00 rows=972 width=4)"
"                                Recheck Cond: ((file_name_date >= '2005-03-01'::date) AND (file_name_date <=
'2005-07-30'::date))"
"                                Filter: ((file_name_model_ver)::text = '100'::text)"
"                                ->  Bitmap Index Scan on ix_models_direct_file_file_name_date  (cost=0.00..35.08
rows=1479width=0)" 
"                                      Index Cond: ((file_name_date >= '2005-03-01'::date) AND (file_name_date <=
'2005-07-30'::date))"

(A-D)
"Aggregate  (cost=1886439.23..1886439.24 rows=1 width=0)"
"  ->  Unique  (cost=1875484.79..1883015.97 rows=273861 width=64)"
"        ->  Sort  (cost=1875484.79..1876169.44 rows=273861 width=64)"
"              Sort Key: rrr.idx, rrr.row_number, rrr.barrid, rrr.yield_pct, rrr.total_risk_pct, rrr.spec_risk_pct,
rrr.hist_beta,rrr.pred_beta, rrr.data_date, rrr.barra_file_idx" 
"              ->  Hash Join  (cost=3375.10..1840453.92 rows=273861 width=64)"
"                    Hash Cond: ((rrr.barrid)::text = (uuu.barrid)::text)"
"                    ->  Hash Join  (cost=2341.15..1796474.21 rows=893492 width=64)"
"                          Hash Cond: (rrr.barra_file_idx = fff.idx)"
"                          ->  Seq Scan on models_direct_row_asset_data rrr  (cost=0.00..1518763.74 rows=71049174
width=64)"
"                          ->  Hash  (cost=2329.00..2329.00 rows=972 width=4)"
"                                ->  Bitmap Heap Scan on models_direct_file fff  (cost=35.33..2329.00 rows=972
width=4)"
"                                      Recheck Cond: ((file_name_date >= '2005-03-01'::date) AND (file_name_date <=
'2005-07-30'::date))"
"                                      Filter: ((file_name_model_ver)::text = '100'::text)"
"                                      ->  Bitmap Index Scan on ix_models_direct_file_file_name_date  (cost=0.00..35.08
rows=1479width=0)" 
"                                            Index Cond: ((file_name_date >= '2005-03-01'::date) AND (file_name_date <=
'2005-07-30'::date))"
"                    ->  Hash  (cost=932.60..932.60 rows=8108 width=8)"
"                          ->  Bitmap Heap Scan on temp_universe_instruments uuu  (cost=231.25..932.60 rows=8108
width=8)"
"                                Recheck Cond: (universe_hash = '5188205190738336870'::bigint)"
"                                ->  Bitmap Index Scan on pk_temp_universe_instruments  (cost=0.00..229.23 rows=8108
width=0)"
"                                      Index Cond: (universe_hash = '5188205190738336870'::bigint)"

(B)
"Aggregate  (cost=1809368.84..1809368.85 rows=1 width=0)"
"  ->  Hash Join  (cost=2484.83..1809086.39 rows=112981 width=0)"
"        Hash Cond: ((rrr.barrid)::text = (uuu.barrid)::text)"
"        ->  Hash Join  (cost=1450.88..1790335.13 rows=368611 width=8)"
"              Hash Cond: (rrr.barra_file_idx = fff.idx)"
"              ->  Seq Scan on models_direct_row_asset_data rrr  (cost=0.00..1518763.74 rows=71049174 width=12)"
"              ->  Hash  (cost=1445.87..1445.87 rows=401 width=4)"
"                    ->  Bitmap Heap Scan on models_direct_file fff  (cost=14.49..1445.87 rows=401 width=4)"
"                          Recheck Cond: ((file_name_date >= '2005-03-01'::date) AND (file_name_date <=
'2005-04-30'::date))"
"                          Filter: ((file_name_model_ver)::text = '100'::text)"
"                          ->  Bitmap Index Scan on ix_models_direct_file_file_name_date  (cost=0.00..14.39 rows=610
width=0)"
"                                Index Cond: ((file_name_date >= '2005-03-01'::date) AND (file_name_date <=
'2005-04-30'::date))"
"        ->  Hash  (cost=932.60..932.60 rows=8108 width=8)"
"              ->  Bitmap Heap Scan on temp_universe_instruments uuu  (cost=231.25..932.60 rows=8108 width=8)"
"                    Recheck Cond: (universe_hash = '5188205190738336870'::bigint)"
"                    ->  Bitmap Index Scan on pk_temp_universe_instruments  (cost=0.00..229.23 rows=8108 width=0)"
"                          Index Cond: (universe_hash = '5188205190738336870'::bigint)"

(B-D)
"Aggregate  (cost=1827336.97..1827336.98 rows=1 width=0)"
"  ->  Unique  (cost=1822817.73..1825924.71 rows=112981 width=64)"
"        ->  Sort  (cost=1822817.73..1823100.18 rows=112981 width=64)"
"              Sort Key: rrr.idx, rrr.row_number, rrr.barrid, rrr.yield_pct, rrr.total_risk_pct, rrr.spec_risk_pct,
rrr.hist_beta,rrr.pred_beta, rrr.data_date, rrr.barra_file_idx" 
"              ->  Hash Join  (cost=2484.83..1809086.39 rows=112981 width=64)"
"                    Hash Cond: ((rrr.barrid)::text = (uuu.barrid)::text)"
"                    ->  Hash Join  (cost=1450.88..1790335.13 rows=368611 width=64)"
"                          Hash Cond: (rrr.barra_file_idx = fff.idx)"
"                          ->  Seq Scan on models_direct_row_asset_data rrr  (cost=0.00..1518763.74 rows=71049174
width=64)"
"                          ->  Hash  (cost=1445.87..1445.87 rows=401 width=4)"
"                                ->  Bitmap Heap Scan on models_direct_file fff  (cost=14.49..1445.87 rows=401
width=4)"
"                                      Recheck Cond: ((file_name_date >= '2005-03-01'::date) AND (file_name_date <=
'2005-04-30'::date))"
"                                      Filter: ((file_name_model_ver)::text = '100'::text)"
"                                      ->  Bitmap Index Scan on ix_models_direct_file_file_name_date  (cost=0.00..14.39
rows=610width=0)" 
"                                            Index Cond: ((file_name_date >= '2005-03-01'::date) AND (file_name_date <=
'2005-04-30'::date))"
"                    ->  Hash  (cost=932.60..932.60 rows=8108 width=8)"
"                          ->  Bitmap Heap Scan on temp_universe_instruments uuu  (cost=231.25..932.60 rows=8108
width=8)"
"                                Recheck Cond: (universe_hash = '5188205190738336870'::bigint)"
"                                ->  Bitmap Index Scan on pk_temp_universe_instruments  (cost=0.00..229.23 rows=8108
width=0)"
"                                      Index Cond: (universe_hash = '5188205190738336870'::bigint)"






--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-bugs by date:

Previous
From: seth-p@outlook.com
Date:
Subject: BUG #13908: Query returns too few rows
Next
From: Tom Lane
Date:
Subject: Re: BUG #13888: pg_dump write error