Re: [GENERAL] Using 'WITH SELECT' Results Do Not Match 'SELECT FROM' Results - Mailing list pgsql-general
From Adrian Klaver
Subject Re: [GENERAL] Using 'WITH SELECT' Results Do Not Match 'SELECT FROM' Results
Date
Msg-id 71a59bcf-000a-c56c-2748-fc6c66b29038@aklaver.com
Whole thread Raw
In response to [GENERAL] Using 'WITH SELECT' Results Do Not Match 'SELECT FROM 'Results  (Jerry Regan <jerry.regan@concertoglobalresources.com>)
Responses Re: [GENERAL] Using 'WITH SELECT' Results Do Not Match 'SELECT FROM ' Results
List pgsql-general
On 07/02/2017 10:33 AM, Jerry Regan wrote:
> For reasons beyond my control, I am using Postgresql 9.4 on a MacBookPro
> (development system). I use pgadminIII and psql for clients (I tried and
> didn’t like the pgadmin4 UI; pg_dump, pg_restore also seem to be dumbed
> down).
>
> My question:
> I have some performance test results in table cor_duration_report. One
> column, c_entered_ion is of type timestamptz, another, c_scenario, is of
> type text. I want to calculate the difference between succeeding
> c_entered_ion rows to learn the rate at which entry events occur. In
> cor_duration_report, c_entered_ion columns are NOT in ascending sort order.
>
> For a first attempt, I created another table cor_temp_gap as:
>
>     CREATE TABLE cor_temp_gap
>     (
>        c_id serial NOT NULL,
>        c_entered_ion timestamp with time zone NOT NULL,
>        c_scenario text NOT NULL
>     )
>     WITH (
>        OIDS=FALSE
>     );
>
> and loaded it with:
>
>     INSERT into cor_temp_gap (c_entered_ion, c_scenario) SELECT
>     c_entered_ion, c_scenario from cor_duration_report order by
>     c_entered_ion;
>
> The c_id column is loaded with the default value - the next sequence value.
>
> I then generated my report with:
>
>     select count( gap ) as gaps, sum(gap) as sum,
>             mode() within group (order by gap) as mode,
>             percentile_disc(0.5) within group (order by gap) as  median,
>             avg( gap::integer ) as mean,
>             min( gap ) as min,
>             max( gap ) as max
>     from ( select extract( epoch from ( f.c_entered_ion -
>     s.c_entered_ion)::interval) * 1000 as gap
>             from cor_temp_gap s, cor_temp_gap f
>             where s.c_scenario = '20170628tc04'
>             and s.c_id+1 = f.c_id ) vals;
>
>
> This seems to give me the results I want:
>
>       gaps     |   sum      | mode    | median |         mean
>                  | min |  max
>     ------+---------+------+--------+-----------------------+-----+--------
>       307412 | 6872207 |    1       |      8      | 22.3550381897908995
>         |   0 | 10846
>
>
> The min value of zero is accurate. The mode value of 1 is reasonable, as
> is the median value of 8. Using a totally different method, the mean
> value is accurate, as is gaps (there are 307,413 rows in the table).
>
> I do know enough sql to believe my cor_temp_gap table could probably be
> replace by a ‘WITH SELECT….’
>
> I attempted this:
>
>     with cor_entry_time as ( select nextval('cor_temp_select_c_id_seq')
>     as c_id, c_entered_ion, c_scenario
>     from cor_duration_report where c_scenario = '20170628tc04' order by
>     c_entered_ion )
>     select count( gap ) as gaps,
>             sum(gap::integer) as sum,
>             mode() within group (order by gap) as mode,
>             percentile_disc(0.5) within group (order by gap) as  median,
>             avg( gap::integer ) as mean,
>             min( gap::integer ) as min,
>             max( gap::integer ) as max
>     from ( select extract( epoch from ( f.c_entered_ion -
>     s.c_entered_ion)::interval) * 1000 as gap
>             from cor_entry_time s, cor_entry_time f
>             where s.c_id+1 = f.c_id ) vals;

I used this site to reformat the above:

http://sqlformat.darold.net/

WITH cor_entry_time AS (
     SELECT
         nextval('cor_temp_select_c_id_seq') AS c_id,
         c_entered_ion,
         c_scenario
     FROM
         cor_duration_report
     WHERE
         c_scenario = '20170628tc04'
     ORDER BY
         c_entered_ion
)
SELECT
     count(gap) AS gaps,
     sum(gap::INTEGER) AS SUM,
     MODE ()
     WITHIN
GROUP (
ORDER BY
     gap) AS MODE,
percentile_disc (0.5)
WITHIN
GROUP (
ORDER BY
     gap) AS median,
avg(gap::INTEGER) AS mean,
min(gap::INTEGER) AS MIN,
max(gap::INTEGER) AS MAX
FROM (
     SELECT
         extract(EPOCH
         FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
     cor_entry_time s,
     cor_entry_time f
WHERE
     s.c_id + 1 = f.c_id) vals;


Still have not figured out everything that is going on above, but it
gave me a fighting chance:)



>
>
> which returned:
>
>        gaps    |   sum      | mode | median |        mean
>              |   min       |  max
>     --------+---------+------+--------+---------------------+----------+-------
>       307412 | 6867802 |    0     |    461   | 22.3407088857949592   |
>     -6871881 | 59791
>
> The results should match but obviously they don’t. the ‘mode’, ‘median’,
> ‘min’ and ‘max’ columns are clearly different. The ‘sum’ and ‘mean’
> columns are similar but are also different. Only ‘gaps’ is the same.
> There should be no negative numbers at all, assuming my c_entered_ion
> column is in ascending order. Wouldn’t the 'order by c_entered_ion’ in
> the ‘WITH’ select do that?

I believe you are going to have to move the ORDER BY to outside the
WITH. If I am following correctly:

FROM (
     SELECT
         extract(EPOCH
         FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
     cor_entry_time s,
     cor_entry_time f
WHERE
     s.c_id + 1 = f.c_id
ORDER BY
     s.c__entered_ion
) vals;

You can remove the:

ORDER BY
         c_entered_ion

in the WITH.

>
> To me, the differences between loading ‘cor_temp_gaps’ and my ‘WITH’
> select should not cause the difference. The differences are in how
> ‘c_id’ is generated and the fact that selecting only ‘c_scenario =
> ‘20170628tc04’ has been moved from calculating the interval to the
> ‘WITH’ select. I have also tried the ‘WITH SELECT’ approach without
> moving that test and received the same results.
>
> My suspicion is that in the failing approach, my sequence is being
> assigned before the sort whereas when I load ‘cor_temp_gap’, and c_id
> defaults to a sequence then c_id is generated AFTER c_entered_ion is put
> in sort order.
>
> If my suspicion is right, how do accomplish the same thing in the ‘WITH
> SELECT’ case?
>
> If it is wrong, what am I doing wrong?
>
> Thanks for any insights you may be able to provide!
>
> /s/jr
> Consultant
> Concerto GR
> Mobile: 612.208.6601
>
> Concerto - a composition for orchestra and a soloist
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: rajan
Date:
Subject: [GENERAL] Re: have trouble understanding xmin and xmax with update operationsfrom two different sessions
Next
From: Jerry Regan
Date:
Subject: Re: [GENERAL] Using 'WITH SELECT' Results Do Not Match 'SELECT FROM ' Results