Re: analyzing query results - Mailing list pgsql-novice
From | Lonni J Friedman |
---|---|
Subject | Re: analyzing query results |
Date | |
Msg-id | 7c1574a90708131249p7605b07fv2d01bf0e3fec3516@mail.gmail.com Whole thread Raw |
In response to | Re: analyzing query results (Jon Sime <jsime@mediamatters.org>) |
List | pgsql-novice |
On 8/10/07, Jon Sime <jsime@mediamatters.org> wrote: > Lonni J Friedman wrote: > > The problem here is that occasionally the same subtest runs more than > > once in a 24 hour period (although not often), and not all of the test > > runs will have the same result (current_status). To make matters more > > complicated when this happens, the additional runs will have a > > current_status of PASSED, yet my automated report still grabs the > > older FAILED test result, and this confuses the hell out of the people > > reviewing the report, since its technically passed already. > > DDL and some sample data would be useful to help out, but if I > understood what you were describing, here's a simple mockup I tossed > into my local playground: > > create table testresults ( > id serial not null primary key, > subtest text not null, > os text not null, > arch text not null, > info text not null, > current_status text not null, > date_created date not null); > > insert into testresults > (subtest, os, arch, info, current_status, date_created) > values > ('a','linux','x86','info','PASSED',now()), > ('b','linux','x86','info','PASSED',now()), > ('c','linux','x86','info','FAILED',now()), > ('c','linux','x86','info','FAILED','2007-07-01'), > ('c','linux','x86','info','PASSED','2007-07-01'), > ('d','linux','x86','info','FAILED',now()), > ('d','linux','x86','info','PASSED',now()), > ('e','linux','x86','info','FAILED','2007-07-01'); > > select a.* > from testresults a > left join testresults b on (b.id <> a.id and > b.date_created >= a.date_created and > b.current_status = 'PASSED' and > row(b.subtest, b.os, b.arch) > = row(a.subtest, a.os, a.arch)) > where a.current_status = 'FAILED' and b.id is null and > a.date_created >= now() - interval '24 hours'; > > id | subtest | os | arch | info | current_status | date_created > ----+---------+-------+------+------+----------------+-------------- > 8 | c | linux | x86 | info | FAILED | 2007-08-10 > (1 row) > > The row constructors within the join should be adjusted to match the set > of columns within the table that would adequately match two separate > rows as having been the same subtest on a different run (I just picked > the three that stood out from your original query as possibly being > identifying). > > This may well not be the best way to do it. I've obviously only tested > it against a very tiny set of data. And without any DDL or sample data, > I may have misunderstood the problem. First, thanks much for your input. I'm sorry that I didn't provide sample data up front. I'll go into more detail now. In my database, I have one column whose content is always the same, for each iteration of a subtest. That column is named 'cudacode'. Thus, if a given subtest runs multiple times in any 24 hour window, each inserted row will always have the same cudacode for that subtest. So cudacode is the unique identifier. That said, here's some example data that illustrates the problem i'm having: id | date_created | cudacode | current_status ----+----------------------------+-----------------------------------------------------------------------+---------------------------------------- 13 | 2007-08-13 11:00:18.782425 | 2007-05-13.cuda-linux32-test0.CUBLAS.debug.hostEmu.CheckedIn_Compiler | FAILED 14 | 2007-08-13 11:00:39.917609 | 2007-05-13.cuda-linux32-test0.CUBLAS.debug.hostEmu.CheckedIn_Compiler | PASSED 15 | 2007-08-13 11:46:09.770971 | 2007-05-13.cuda-linux64-test0.CUBLAS.debug.hostEmu.CheckedIn_Compiler | FAILED In the above data, id 13 & 14 are two iterations of the same subtest (they have the same cudacode), yet the first has a current_status of FAILED, and the second is PASSED. What I want is a query that will detect this and return nothing because the later row (per the date_created field) has PASSED, since all I care about is when the last iteration has FAILED. id 15 has a different cudacode, and therefore should be returned from the query as well. In other words, I'd like to see the content of row 15 returned, since its the only one that meets my criteria. Or as a starting point, just get rows 14 & 15 returned, since 14 & 15 are the newest unique instances of cudacode, and then I can check the current_status in those as a secondary query. I tried to give that a whirl below: select a.* from cudaapps a left join cudaapps b on (b.cudacode <> a.cudacode and b.date_created >= a.date_created and b.current_status = 'PASSED' and row(b.subtest, b.os, b.arch) = row(a.subtest, a.os, a.arch)) where a.current_status = 'FAILED' and b.id is null and a.date_created >= now() - interval '24 hours'; but for reasons that aren't clear to me, I'm getting id 13 & 15 returned instead of 14 & 15, and I can't figure out why. Separately, I tried to use Tom's advice to call DISTINCT ON, with the following query: SELECT DISTINCT ON(date_created, cudacode) id,current_status FROM cudaapps WHERE (SELECT now() - interval '24 hours' < date_created::timestamp)='t' ORDER BY date_created; but that too is returning id 13, 14 & 15. If I strip out the date_created in the DISTINCT ON and ORDER BY sections: SELECT DISTINCT ON(cudacode) id,current_status FROM cudaapps WHERE (SELECT now() - interval '24 hours' < date_created::timestamp)='t' ; then I get id 13 & 15 all over again, rather than 14 & 15. I'm guessing that I'm missing something really obvious here, but I can't quite figure out what. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
pgsql-novice by date: