Thread: Obvious data mismatch in View2 which basically SELECT * from View1
Dear list, Recently I am getting feedback, data in my analytic report is not repeatable. From time to time they get different data for the same time span. (but IIRC previously it was OK). Therefore I started debuging the View chain for that report, during which I bumped into this issue/phenomenon. In a over -simplified version: CREATE VIEW2 AS SELECT * FROM VIEW1; SELECT col1 FROM VIEW2 WHERE cond1=True; SELECT col1 FROM VIEW1 WHERE cond1=True; Now col1 from both views looks different. I don't know where to start to solve this problem. The actual situation is a bit more than that, the following is the actual query: -- trying to audit utlog weighed stat with t as ( select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02 0:0:0'::timestamp t1 ) --select * from t; select * -- from utlog.cache_stats_per_shift_per_reason_weighed_stats -- from utlog.stats_per_shift_filtered_per_reason from utlog.stats_per_shift_filtered (let's call it #View2 for short) -- from utlog.stats_per_shift_filtered_b0206 (let's call it #View1 for short) -- from utlog.stats_per_shift cross join t where wline = 'F02' and wts >= t.t0 and wts < t.t1 and wsft ='D' limit 100 ; The Result for #View2 wts | wsft | wspan | wstate | wline | rcodes --------------------+------+--------+--------+-------+------- 2020-07-01 08:00:00 | D | 0 | S00 | F02 | {PDCB} 2020-07-01 09:50:01 | D | 12.533 | S00 | F02 | {PDCB} 2020-07-01 11:35:46 | D | 12.217 | S00 | F02 | {CDSO} 2020-07-01 13:22:58 | D | 5.15 | S00 | F02 | {PDCB} 2020-07-01 14:57:38 | D | 6.8 | S00 | F02 | {PDCB} INDEX | COLUMN_NAME | DATA_TYPE ------+-------------+------------ 1 | wts | timestamptz 3 | wsft | varchar 4 | wspan | float8 5 | wstate | varchar 6 | wline | varchar 7 | rcodes | text[] Same query, the Result for #View1 wts | wsft | wspan | wstate | wline | rcodes --------------------+------+-------+--------+-------+------- 2020-07-01 08:00:00 | D | 5 | S00 | F02 | {PDCB} 2020-07-01 09:50:01 | D | 13 | S00 | F02 | {PDCB} 2020-07-01 11:35:46 | D | 12 | S00 | F02 | {CDSO} 2020-07-01 13:22:58 | D | 5 | S00 | F02 | {PDCB} 2020-07-01 14:57:38 | D | 7 | S00 | F02 | {PDCB} INDEX | COLUMN_NAME | DATA_TYPE ------+-------------+------------ 1 | wts | timestamptz 3 | wsft | varchar 4 | wspan | float8 5 | wstate | varchar 6 | wline | varchar 7 | rcodes | varchar[] Reuslts in `wspan` column is inaccurate while both type are float8. Most weird thing is the 5 to 0 change. for Row 1. The `_b0206`(#View1) is just a version of `stats_per_shift_filtered`(#View2) from past revisions. I am sure the original CREATE statement for (#View2) is `CREATE VIEW ... AS SELECT * FROM ...._b0206` Definition of View2 in SQLWorkbench/J generated schema: CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2) ( wts, wdate, wsft, wspan, wstate, wline, rcodes ) AS SELECT stats_per_shift_filtered_u0206.wts, stats_per_shift_filtered_u0206.wsft::character varying AS wsft, stats_per_shift_filtered_u0206.wspan, stats_per_shift_filtered_u0206.wstate, stats_per_shift_filtered_u0206.wline, stats_per_shift_filtered_u0206.rcodes FROM utlog.stats_per_shift_filtered_u0206; (as #View1 in this post) It feels like the utlog.stats_per_shift_filtered_u0206 in utlog.stats_per_shift_filtered definition is a different object from utlog.stats_per_shift_filtered_u0206? I am totally out of clues. Any help would be appreciated. Thanks. Regards, Ben
On Wed, Sep 16, 2020 at 9:26 AM Ben <bentenzha@outlook.com> wrote:
Dear list,
Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.
In a over -simplified version:
CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT col1 FROM VIEW2 WHERE cond1=True;
SELECT col1 FROM VIEW1 WHERE cond1=True;
Now col1 from both views looks different. I don't know where to start to
solve this problem.
The actual situation is a bit more than that, the following is the
actual query:
-- trying to audit utlog weighed stat
with t as (
select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
)
--select * from t;
select *
-- from utlog.cache_stats_per_shift_per_reason_weighed_stats
-- from utlog.stats_per_shift_filtered_per_reason
from utlog.stats_per_shift_filtered (let's call
it #View2 for short)
-- from utlog.stats_per_shift_filtered_b0206 (let's call it
#View1 for short)
-- from utlog.stats_per_shift
cross join t
where wline = 'F02' and wts >= t.t0 and wts < t.t1 and wsft ='D'
limit 100
;
Not sure if it might be something lost in your simplification here, but you have a LIMIT with no ORDER BY there. That basically means "give me 100 random rows" (but not with a very good random level). It does not return rows in a consistent/predictable order. So as long as that query is part of what you're doing, you should not be surprised if you get the rows in an inconsistent/unpredictable order, with whatever follow-on effects that might have. (And it can lead to weird follow-on effects like the ones you're talking about when used in larger query structures)
On 9/15/20 10:40 PM, Ben wrote: > Dear list, > > Recently I am getting feedback, data in my analytic report is not > repeatable. From time to time they get different data for the same time span. > (but IIRC previously it was OK). Therefore I started debuging the View > chain for that report, during which I bumped into this issue/phenomenon. > > In a over -simplified version: > > CREATE VIEW2 AS SELECT * FROM VIEW1; > SELECT col1 FROM VIEW2 WHERE cond1=True; > SELECT col1 FROM VIEW1 WHERE cond1=True; > > Now col1 from both views looks different. I don't know where to start to > solve this problem. > > The actual situation is a bit more than that, the following is the actual > query: > > > -- trying to audit utlog weighed stat > with t as ( > select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02 > 0:0:0'::timestamp t1 > ) > --select * from t; > select * > -- from utlog.cache_stats_per_shift_per_reason_weighed_stats > -- from utlog.stats_per_shift_filtered_per_reason > from utlog.stats_per_shift_filtered (let's call it > #View2 for short) > -- from utlog.stats_per_shift_filtered_b0206 (let's call it > #View1 for short) > -- from utlog.stats_per_shift > cross join t > where wline = 'F02' and wts >= t.t0 and wts < t.t1 and wsft ='D' > limit 100 > ; > > The Result for #View2 > > wts | wsft | wspan | wstate | wline | rcodes > --------------------+------+--------+--------+-------+------- > 2020-07-01 08:00:00 | D | 0 | S00 | F02 | {PDCB} > 2020-07-01 09:50:01 | D | 12.533 | S00 | F02 | {PDCB} > 2020-07-01 11:35:46 | D | 12.217 | S00 | F02 | {CDSO} > 2020-07-01 13:22:58 | D | 5.15 | S00 | F02 | {PDCB} > 2020-07-01 14:57:38 | D | 6.8 | S00 | F02 | {PDCB} > > INDEX | COLUMN_NAME | DATA_TYPE > ------+-------------+------------ > 1 | wts | timestamptz > 3 | wsft | varchar > 4 | wspan | float8 > 5 | wstate | varchar > 6 | wline | varchar > 7 | rcodes | text[] > > > Same query, the Result for #View1 > > wts | wsft | wspan | wstate | wline | rcodes > --------------------+------+-------+--------+-------+------- > 2020-07-01 08:00:00 | D | 5 | S00 | F02 | {PDCB} > 2020-07-01 09:50:01 | D | 13 | S00 | F02 | {PDCB} > 2020-07-01 11:35:46 | D | 12 | S00 | F02 | {CDSO} > 2020-07-01 13:22:58 | D | 5 | S00 | F02 | {PDCB} > 2020-07-01 14:57:38 | D | 7 | S00 | F02 | {PDCB} > > INDEX | COLUMN_NAME | DATA_TYPE > ------+-------------+------------ > 1 | wts | timestamptz > 3 | wsft | varchar > 4 | wspan | float8 > 5 | wstate | varchar > 6 | wline | varchar > 7 | rcodes | varchar[] > > Reuslts in `wspan` column is inaccurate while both type are float8. Most > weird thing is the 5 to 0 change. for Row 1. > > The `_b0206`(#View1) is just a version of > `stats_per_shift_filtered`(#View2) from past revisions. > I am sure the original CREATE statement for (#View2) is `CREATE VIEW ... > AS SELECT * FROM ...._b0206` > > Definition of View2 in SQLWorkbench/J generated schema: > > > CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2) > ( > wts, > wdate, > wsft, > wspan, > wstate, > wline, > rcodes > ) > AS > SELECT stats_per_shift_filtered_u0206.wts, > stats_per_shift_filtered_u0206.wsft::character varying AS wsft, > stats_per_shift_filtered_u0206.wspan, > stats_per_shift_filtered_u0206.wstate, > stats_per_shift_filtered_u0206.wline, > stats_per_shift_filtered_u0206.rcodes > FROM utlog.stats_per_shift_filtered_u0206; (as #View1 in this post) > > > It feels like the utlog.stats_per_shift_filtered_u0206 in > utlog.stats_per_shift_filtered definition is a different object from > utlog.stats_per_shift_filtered_u0206? > > I am totally out of clues. Any help would be appreciated. Thanks. Try the queries in a serializable read only transaction. That should any possible changes in the underlying data. START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY; SELECT col1 FROM VIEW2 WHERE cond1=True; SELECT col1 FROM VIEW1 WHERE cond1=True; COMMIT; -- Angular momentum makes the world go 'round.
Hi Magnus,
Thanks for the heads up.
For the data posted in this email, the listed result is the full result set for that query.
I have checked the data in view2 and view1 with various conditions.
Both view has many difference in column wspan::float8.
The condition in the shown query is intended to show just a small set of them.
But you are right, I should be more cautious. Thanks for the headsup.
Regards,
Ben
On 9/16/20 3:35 PM, Magnus Hagander wrote:
--On Wed, Sep 16, 2020 at 9:26 AM Ben <bentenzha@outlook.com> wrote:Dear list,
Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.
In a over -simplified version:
CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT col1 FROM VIEW2 WHERE cond1=True;
SELECT col1 FROM VIEW1 WHERE cond1=True;
Now col1 from both views looks different. I don't know where to start to
solve this problem.
The actual situation is a bit more than that, the following is the
actual query:
-- trying to audit utlog weighed stat
with t as (
select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
)
--select * from t;
select *
-- from utlog.cache_stats_per_shift_per_reason_weighed_stats
-- from utlog.stats_per_shift_filtered_per_reason
from utlog.stats_per_shift_filtered (let's call
it #View2 for short)
-- from utlog.stats_per_shift_filtered_b0206 (let's call it
#View1 for short)
-- from utlog.stats_per_shift
cross join t
where wline = 'F02' and wts >= t.t0 and wts < t.t1 and wsft ='D'
limit 100
;Not sure if it might be something lost in your simplification here, but you have a LIMIT with no ORDER BY there. That basically means "give me 100 random rows" (but not with a very good random level). It does not return rows in a consistent/predictable order. So as long as that query is part of what you're doing, you should not be surprised if you get the rows in an inconsistent/unpredictable order, with whatever follow-on effects that might have. (And it can lead to weird follow-on effects like the ones you're talking about when used in larger query structures)
Dear List,
Some further investigation.
Creating a fresh View3 on View1 gives exactly the same result as View1.
The View1 View2 are both years old in a production database, in use for quite some time. (The database is production duty but not hosted in server room with UPS. It's like a edge PC in industry monitoring. Now am more concerned with its data integrity)
The problem with the final report is reported recently. I am not sure what's broken in the database.
I haven't replaced the broken View2 yet. Hope someone can point me to some further investigation.
My concern is that if there are other views inside that database having similar integrity issue, how can I find them all (if any).
It's beyond my regular SQL ability. I guess I really need help from people with maintenance experience.
Any help will be appreciated, thanks in advance.
Ben
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Some further investigation.
Creating a fresh View3 on View1 gives exactly the same result as View1.
The View1 View2 are both years old in a production database, in use for quite some time. (The database is production duty but not hosted in server room with UPS. It's like a edge PC in industry monitoring. Now am more concerned with its data integrity)
The problem with the final report is reported recently. I am not sure what's broken in the database.
I haven't replaced the broken View2 yet. Hope someone can point me to some further investigation.
My concern is that if there are other views inside that database having similar integrity issue, how can I find them all (if any).
It's beyond my regular SQL ability. I guess I really need help from people with maintenance experience.
Any help will be appreciated, thanks in advance.
Ben
On September 16, 2020 3:40:34 AM UTC, Ben <bentenzha@outlook.com> wrote:
Dear list,
Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.
In a over -simplified version:
CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT col1 FROM VIEW2 WHERE cond1=True;
SELECT col1 FROM VIEW1 WHERE cond1=True;
Now col1 from both views looks different. I don't know where to start to
solve this problem.
The actual situation is a bit more than that, the following is the
actual query:
-- trying to audit utlog weighed stat
with t as (
select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
)
--select * from t;
select *
-- from utlog.cache_stats_per_shift_per_reason_weighed_stats
-- from utlog.stats_per_shift_filtered_per_reason
from utlog.stats_per_shift_filtered (let's call
it #View2 for short)
-- from utlog.stats_per_shift_filtered_b0206 (let's call it
#View1 for short)
-- from utlog.stats_per_shift
cross join t
where wline = 'F02' and wts >= t.t0 and wts < t.t1 and wsft ='D'
limit 100
;
The Result for #View2
wts | wsft | wspan | wstate | wline | rcodes
--------------------+------+--------+--------+-------+-------
2020-07-01 08:00:00 | D | 0 | S00 | F02 | {PDCB}
2020-07-01 09:50:01 | D | 12.533 | S00 | F02 | {PDCB}
2020-07-01 11:35:46 | D | 12.217 | S00 | F02 | {CDSO}
2020-07-01 13:22:58 | D | 5.15 | S00 | F02 | {PDCB}
2020-07-01 14:57:38 | D | 6.8 | S00 | F02 | {PDCB}
INDEX | COLUMN_NAME | DATA_TYPE
------+-------------+------------
1 | wts | timestamptz
3 | wsft | varchar
4 | wspan | float8
5 | wstate | varchar
6 | wline | varchar
7 | rcodes | text[]
Same query, the Result for #View1
wts | wsft | wspan | wstate | wline | rcodes
--------------------+------+-------+--------+-------+-------
2020-07-01 08:00:00 | D | 5 | S00 | F02 | {PDCB}
2020-07-01 09:50:01 | D | 13 | S00 | F02 | {PDCB}
2020-07-01 11:35:46 | D | 12 | S00 | F02 | {CDSO}
2020-07-01 13:22:58 | D | 5 | S00 | F02 | {PDCB}
2020-07-01 14:57:38 | D | 7 | S00 | F02 | {PDCB}
INDEX | COLUMN_NAME | DATA_TYPE
------+-------------+------------
1 | wts | timestamptz
3 | wsft | varchar
4 | wspan | float8
5 | wstate | varchar
6 | wline | varchar
7 | rcodes | varchar[]
Reuslts in `wspan` column is inaccurate while both type are float8. Most
weird thing is the 5 to 0 change. for Row 1.
The `_b0206`(#View1) is just a version of
`stats_per_shift_filtered`(#View2) from past revisions.
I am sure the original CREATE statement for (#View2) is `CREATE VIEW ...
AS SELECT * FROM ...._b0206`
Definition of View2 in SQLWorkbench/J generated schema:
CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
(
wts,
wdate,
wsft,
wspan,
wstate,
wline,
rcodes
)
AS
SELECT stats_per_shift_filtered_u0206.wts,
stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
stats_per_shift_filtered_u0206.wspan,
stats_per_shift_filtered_u0206.wstate,
stats_per_shift_filtered_u0206.wline,
stats_per_shift_filtered_u0206.rcodes
FROM utlog.stats_per_shift_filtered_u0206; (as #View1 in this post)
It feels like the utlog.stats_per_shift_filtered_u0206 in
utlog.stats_per_shift_filtered definition is a different object from
utlog.stats_per_shift_filtered_u0206?
I am totally out of clues. Any help would be appreciated. Thanks.
Regards,
Ben
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
>I haven't replaced the broken View2 yet. >Hope someone can point me to some >further investigation.
Did you look at the actual definition of view2. Like all RDBMS PG materializes the Ddl as it existed at the time of creation and converts it into an in line sql.
There is a possibility it may be different than what you think.
Ben <bentenzha@outlook.com> writes: > Dear List, > > Some further investigation. > > Creating a fresh View3 on View1 gives exactly the same result as > View1. > > The View1 View2 are both years old in a production database, in use > for quite some time. (The database is production duty but not hosted > in server room with UPS. It's like a edge PC in industry monitoring. > Now am more concerned with its data integrity) > > The problem with the final report is reported recently. I am not sure > what's broken in the database. > > I haven't replaced the broken View2 yet. Hope someone can point me to > some further investigation. Already mentioned downthread, but have a look at the view definitions by... select pg_get_viewdef('$your-view'); Or... pg_dump --table $your-view Expect to see something different if you repeat the above for the old view giving undesired results and the new correct version. HTH > My concern is that if there are other views inside that database > having similar integrity issue, how can I find them all (if any). > > It's beyond my regular SQL ability. I guess I really need help from > people with maintenance experience. > > Any help will be appreciated, thanks in advance. > > Ben > > > > On September 16, 2020 3:40:34 AM UTC, Ben <bentenzha@outlook.com> > wrote: > > Dear list, > > Recently I am getting feedback, data in my analytic report is not > repeatable. From time to time they get different data for the same time > span. > (but IIRC previously it was OK). Therefore I started debuging the View > chain for that report, during which I bumped into this issue/phenomenon. > > In a over -simplified version: > > CREATE VIEW2 AS SELECT * FROM VIEW1; > SELECT col1 FROM VIEW2 WHERE cond1=True; > SELECT col1 FROM VIEW1 WHERE cond1=True; > > Now col1 from both views looks different. I don't know where to start to > solve this problem. > > The actual situation is a bit more than that, the following is the > actual query: > > > -- trying to audit utlog weighed stat > with t as ( > select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02 > 0:0:0'::timestamp t1 > ) > --select * from t; > select * > -- from utlog.cache_stats_per_shift_per_reason_weighed_stats > -- from utlog.stats_per_shift_filtered_per_reason > from utlog.stats_per_shift_filtered (let's call > it #View2 for short) > -- from utlog.stats_per_shift_filtered_b0206 (let's call it > #View1 for short) > -- from utlog.stats_per_shift > cross join t > where wline = 'F02' and wts >= t.t0 and wts < t.t1 and wsft ='D' > limit 100 > ; > > The Result for #View2 > > wts | wsft | wspan | wstate | wline | rcodes > --------------------+------+--------+--------+-------+------- > 2020-07-01 08:00:00 | D | 0 | S00 | F02 | {PDCB} > 2020-07-01 09:50:01 | D | 12.533 | S00 | F02 | {PDCB} > 2020-07-01 11:35:46 | D | 12.217 | S00 | F02 | {CDSO} > 2020-07-01 13:22:58 | D | 5.15 | S00 | F02 | {PDCB} > 2020-07-01 14:57:38 | D | 6.8 | S00 | F02 | {PDCB} > > INDEX | COLUMN_NAME | DATA_TYPE > ------+-------------+------------ > 1 | wts | timestamptz > 3 | wsft | varchar > 4 | wspan | float8 > 5 | wstate | varchar > 6 | wline | varchar > 7 | rcodes | text[] > > > Same query, the Result for #View1 > > wts | wsft | wspan | wstate | wline | rcodes > --------------------+------+-------+--------+-------+------- > 2020-07-01 08:00:00 | D | 5 | S00 | F02 | {PDCB} > 2020-07-01 09:50:01 | D | 13 | S00 | F02 | {PDCB} > 2020-07-01 11:35:46 | D | 12 | S00 | F02 | {CDSO} > 2020-07-01 13:22:58 | D | 5 | S00 | F02 | {PDCB} > 2020-07-01 14:57:38 | D | 7 | S00 | F02 | {PDCB} > > INDEX | COLUMN_NAME | DATA_TYPE > ------+-------------+------------ > 1 | wts | timestamptz > 3 | wsft | varchar > 4 | wspan | float8 > 5 | wstate | varchar > 6 | wline | varchar > 7 | rcodes | varchar[] > > Reuslts in `wspan` column is inaccurate while both type are float8. Most > weird thing is the 5 to 0 change. for Row 1. > > The `_b0206`(#View1) is just a version of > `stats_per_shift_filtered`(#View2) from past revisions. > I am sure the original CREATE statement for (#View2) is `CREATE VIEW ... > AS SELECT * FROM ...._b0206` > > Definition of View2 in SQLWorkbench/J generated schema: > > > CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2) > ( > wts, > wdate, > wsft, > wspan, > wstate, > wline, > rcodes > ) > AS > SELECT stats_per_shift_filtered_u0206.wts, > stats_per_shift_filtered_u0206.wsft::character varying AS wsft, > stats_per_shift_filtered_u0206.wspan, > stats_per_shift_filtered_u0206.wstate, > stats_per_shift_filtered_u0206.wline, > stats_per_shift_filtered_u0206.rcodes > FROM utlog.stats_per_shift_filtered_u0206; (as #View1 in this post) > > > It feels like the utlog.stats_per_shift_filtered_u0206 in > utlog.stats_per_shift_filtered definition is a different object from > utlog.stats_per_shift_filtered_u0206? > > I am totally out of clues. Any help would be appreciated. Thanks. > > > Regards, > > Ben > > > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
Hi Sievers, Krishna You are right none of them is what I originally used to create them which should be `select * from utlog.stats_per_shift_filtered_b0206`, but they do look different, maybe the prev version is created before a pg update ? b0206 could be 20190206 or 20180206... the output: lets_db=# select pg_get_viewdef('utlog.stats_per_shift_filtered'); pg_get_viewdef ----------------------------------------------------------------------- SELECT stats_per_shift_filtered_u0206.wts, + stats_per_shift_filtered_u0206.wdate, + (stats_per_shift_filtered_u0206.wsft)::character varying AS wsft,+ stats_per_shift_filtered_u0206.wspan, + stats_per_shift_filtered_u0206.wstate, + stats_per_shift_filtered_u0206.wline, + stats_per_shift_filtered_u0206.rcodes + FROM utlog.stats_per_shift_filtered_u0206; (1 row) (result for the same query, wspan is the column in question: wts | wdate | wsft | wspan | wstate | wlin ------------------------+------------+------+------------------+--------+----- 2020-07-01 14:57:38+08 | 2020-07-01 | D | 6.8 | S00 | F02 2020-07-01 13:22:58+08 | 2020-07-01 | D | 5.15 | S00 | F02 2020-07-01 11:35:46+08 | 2020-07-01 | D | 12.2166666666667 | S00 | F02 2020-07-01 09:50:01+08 | 2020-07-01 | D | 12.5333333333333 | S00 | F02 2020-07-01 08:00:00+08 | 2020-07-01 | D | 0 | S00 | F02 (5 rows) ) lets_db=# select pg_get_viewdef('utlog.view_test1'); pg_get_viewdef ----------------------------------------------- SELECT stats_per_shift_filtered_b0206.wts, + stats_per_shift_filtered_b0206.wdate, + stats_per_shift_filtered_b0206.wsft, + stats_per_shift_filtered_b0206.wspan, + stats_per_shift_filtered_b0206.wstate, + stats_per_shift_filtered_b0206.wline, + stats_per_shift_filtered_b0206.rcodes + FROM utlog.stats_per_shift_filtered_b0206; (1 row) (the result for the same query: wts | wdate | wsft | wspan | wstate | wline | rcodes ------------------------+------------+------+-------+--------+-------+----------------+--------------------- 2020-07-01 09:50:01+08 | 2020-07-01 | D | 13 | S00 | F02 | {PDCB} 2020-07-01 11:35:46+08 | 2020-07-01 | D | 12 | S00 | F02 | {CDSO} 2020-07-01 14:57:38+08 | 2020-07-01 | D | 7 | S00 | F02 | {PDCB} 2020-07-01 08:00:00+08 | 2020-07-01 | D | 5 | S00 | F02 | {PDCB} 2020-07-01 13:22:58+08 | 2020-07-01 | D | 5 | S00 | F02 | {PDCB} (5 rows) ) The result in returned column looks different but definition of the column in question (wspan::float8) looks identical in both case. Regards, Ben On 9/17/20 10:41 PM, Jerry Sievers wrote: > Ben <bentenzha@outlook.com> writes: > >> Dear List, >> >> Some further investigation. >> >> Creating a fresh View3 on View1 gives exactly the same result as >> View1. >> >> The View1 View2 are both years old in a production database, in use >> for quite some time. (The database is production duty but not hosted >> in server room with UPS. It's like a edge PC in industry monitoring. >> Now am more concerned with its data integrity) >> >> The problem with the final report is reported recently. I am not sure >> what's broken in the database. >> >> I haven't replaced the broken View2 yet. Hope someone can point me to >> some further investigation. > Already mentioned downthread, but have a look at the view definitions > by... > > select pg_get_viewdef('$your-view'); > > Or... > > pg_dump --table $your-view > > Expect to see something different if you repeat the above for the old > view giving undesired results and the new correct version. > > HTH > > > >> My concern is that if there are other views inside that database >> having similar integrity issue, how can I find them all (if any). >> >> It's beyond my regular SQL ability. I guess I really need help from >> people with maintenance experience. >> >> Any help will be appreciated, thanks in advance. >> >> Ben >> >> >> >> On September 16, 2020 3:40:34 AM UTC, Ben <bentenzha@outlook.com> >> wrote: >> >> Dear list, >> >> Recently I am getting feedback, data in my analytic report is not >> repeatable. From time to time they get different data for the same time >> span. >> (but IIRC previously it was OK). Therefore I started debuging the View >> chain for that report, during which I bumped into this issue/phenomenon. >> >> In a over -simplified version: >> >> CREATE VIEW2 AS SELECT * FROM VIEW1; >> SELECT col1 FROM VIEW2 WHERE cond1=True; >> SELECT col1 FROM VIEW1 WHERE cond1=True; >> >> Now col1 from both views looks different. I don't know where to start to >> solve this problem. >> >> The actual situation is a bit more than that, the following is the >> actual query: >> >> >> -- trying to audit utlog weighed stat >> with t as ( >> select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02 >> 0:0:0'::timestamp t1 >> ) >> --select * from t; >> select * >> -- from utlog.cache_stats_per_shift_per_reason_weighed_stats >> -- from utlog.stats_per_shift_filtered_per_reason >> from utlog.stats_per_shift_filtered (let's call >> it #View2 for short) >> -- from utlog.stats_per_shift_filtered_b0206 (let's call it >> #View1 for short) >> -- from utlog.stats_per_shift >> cross join t >> where wline = 'F02' and wts >= t.t0 and wts < t.t1 and wsft ='D' >> limit 100 >> ; >> >> The Result for #View2 >> >> wts | wsft | wspan | wstate | wline | rcodes >> --------------------+------+--------+--------+-------+------- >> 2020-07-01 08:00:00 | D | 0 | S00 | F02 | {PDCB} >> 2020-07-01 09:50:01 | D | 12.533 | S00 | F02 | {PDCB} >> 2020-07-01 11:35:46 | D | 12.217 | S00 | F02 | {CDSO} >> 2020-07-01 13:22:58 | D | 5.15 | S00 | F02 | {PDCB} >> 2020-07-01 14:57:38 | D | 6.8 | S00 | F02 | {PDCB} >> >> INDEX | COLUMN_NAME | DATA_TYPE >> ------+-------------+------------ >> 1 | wts | timestamptz >> 3 | wsft | varchar >> 4 | wspan | float8 >> 5 | wstate | varchar >> 6 | wline | varchar >> 7 | rcodes | text[] >> >> >> Same query, the Result for #View1 >> >> wts | wsft | wspan | wstate | wline | rcodes >> --------------------+------+-------+--------+-------+------- >> 2020-07-01 08:00:00 | D | 5 | S00 | F02 | {PDCB} >> 2020-07-01 09:50:01 | D | 13 | S00 | F02 | {PDCB} >> 2020-07-01 11:35:46 | D | 12 | S00 | F02 | {CDSO} >> 2020-07-01 13:22:58 | D | 5 | S00 | F02 | {PDCB} >> 2020-07-01 14:57:38 | D | 7 | S00 | F02 | {PDCB} >> >> INDEX | COLUMN_NAME | DATA_TYPE >> ------+-------------+------------ >> 1 | wts | timestamptz >> 3 | wsft | varchar >> 4 | wspan | float8 >> 5 | wstate | varchar >> 6 | wline | varchar >> 7 | rcodes | varchar[] >> >> Reuslts in `wspan` column is inaccurate while both type are float8. Most >> weird thing is the 5 to 0 change. for Row 1. >> >> The `_b0206`(#View1) is just a version of >> `stats_per_shift_filtered`(#View2) from past revisions. >> I am sure the original CREATE statement for (#View2) is `CREATE VIEW ... >> AS SELECT * FROM ...._b0206` >> >> Definition of View2 in SQLWorkbench/J generated schema: >> >> >> CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2) >> ( >> wts, >> wdate, >> wsft, >> wspan, >> wstate, >> wline, >> rcodes >> ) >> AS >> SELECT stats_per_shift_filtered_u0206.wts, >> stats_per_shift_filtered_u0206.wsft::character varying AS wsft, >> stats_per_shift_filtered_u0206.wspan, >> stats_per_shift_filtered_u0206.wstate, >> stats_per_shift_filtered_u0206.wline, >> stats_per_shift_filtered_u0206.rcodes >> FROM utlog.stats_per_shift_filtered_u0206; (as #View1 in this post) >> >> >> It feels like the utlog.stats_per_shift_filtered_u0206 in >> utlog.stats_per_shift_filtered definition is a different object from >> utlog.stats_per_shift_filtered_u0206? >> >> I am totally out of clues. Any help would be appreciated. Thanks. >> >> >> Regards, >> >> Ben >> >> >> >> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. >> >>
Ben <bentenzha@outlook.com> <MWHPR06MB2400E1FA4AAD12440EA96B79B93E0@MWHPR06MB2400.namprd06.prod.outlook.com> writes: > You are right none of them is what I originally used to create them > which should be > `select * from utlog.stats_per_shift_filtered_b0206`, but they do look > different, This discrepancy could explain a lot: > (stats_per_shift_filtered_u0206.wsft)::character varying AS wsft, vs > stats_per_shift_filtered_b0206.wsft, Your problem query has a WHERE clause testing wsft, so this difference is subtly affecting what that WHERE clause means. That could result in a plan change, thereby changing the row output order, which as already noted would be sufficient to explain the inconsistency. If I had to bet I'd suspect that the extra cast is blocking use of an index on wsft. regards, tom lane