Re: BUG #6131: Query Returning Incorrect Results - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #6131: Query Returning Incorrect Results |
Date | |
Msg-id | 19689.1311723714@sss.pgh.pa.us Whole thread Raw |
In response to | BUG #6131: Query Returning Incorrect Results ("David Johnston" <polobo@yahoo.com>) |
Responses |
Re: BUG #6131: Query Returning Incorrect Results
|
List | pgsql-bugs |
"David Johnston" <polobo@yahoo.com> writes: > The embedded script exhibits some strange behavior. When the query is run > the [adjustment_paid] column for [technician] "6338B" should show +/- 25.00. > Instead, if I run the last query immediately after creating the schema and > inserting the data the results I get for that technician is 0.00 for both > records. However, if I continue to run the query, or explicitly run ANALYZE > , I then get the correct results. Interesting example. I don't believe it's a Postgres bug though, but rather indeterminism in your query. If you pull out the subquery that is being fed to the window aggregate (row_number()): SELECT s_id, date_reference, accountnumber, technician, rate_paid, COUNT(*) AS rate_count FROM laborwip_payroll_ticket GROUP BY s_id, date_reference, accountnumber, technician, rate_paid ORDER BY s_id, date_reference, accountnumber, technician, rate_count DESC; you will find that it produces slightly different output row ordering before and after the ANALYZE. I get s_id | date_reference | accountnumber | technician | rate_paid | rate_count -----------------+----------------+---------------+------------+-----------+------------ [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 500A | 15.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 500B | 15.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6016 | 10.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6120 | 10.00 | 3 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6124 | 15.00 | 5 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6126 | 20.00 | 9 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6126 | 25.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6287 | 10.00 | 4 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6338 | 25.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6338B | 0.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6338B | 25.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6338B | 10.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 7001 | 10.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 7002 | 10.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 7003 | 10.00 | 2 [TESTSG]STORE02 | 1976-06-15 | LABORWIP | 600A | 10.00 | 4 [TESTSG]STORE02 | 1976-06-15 | LABORWIP | 600A | 0.00 | 2 [TESTSG]STORE02 | 1976-06-15 | LABORWIP | 600B | 0.00 | 1 (18 rows) versus s_id | date_reference | accountnumber | technician | rate_paid | rate_count -----------------+----------------+---------------+------------+-----------+------------ [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 500A | 15.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 500B | 15.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6016 | 10.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6120 | 10.00 | 3 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6124 | 15.00 | 5 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6126 | 20.00 | 9 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6126 | 25.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6287 | 10.00 | 4 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6338 | 25.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6338B | 25.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6338B | 0.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 6338B | 10.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 7001 | 10.00 | 1 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 7002 | 10.00 | 2 [TESTSG]STORE01 | 1976-06-15 | LABORWIP | 7003 | 10.00 | 2 [TESTSG]STORE02 | 1976-06-15 | LABORWIP | 600A | 10.00 | 4 [TESTSG]STORE02 | 1976-06-15 | LABORWIP | 600A | 0.00 | 2 [TESTSG]STORE02 | 1976-06-15 | LABORWIP | 600B | 0.00 | 1 (18 rows) Note in particular that the first two rows for technician 6338B are presented in different orders. This is not a bug because they are not distinct so far as the ORDER BY clause is concerned. However, when you feed them to the window aggregate stuff: SELECT ..., (row_number() OVER (PARTITION BY s_id, date_reference, accountnumber, technician ORDER BY rate_count DESC)) AS rate_index FROM the-query-above WHERE rate_index = 1 ... you get only the first one of those two rows. And then your upper query's results vary depending on which rate_paid you got. So basically you need to add more columns to the window aggregate PARTITION/ORDER BY clauses to make the result more deterministic. BTW, the reason ANALYZE causes the row order to change is that it affects the size of the hash table that the HashAggregate step uses. That's not tremendously relevant to fixing the problem, but just in case you were wondering. regards, tom lane
pgsql-bugs by date: