Thread: need to join successive log entries into one
I have a table like: run | seq | start | done 1415| 261| 2001-01-29 12:36:55| 2001-02-07 13:02:38 1415| 263| 2001-02-1407:40:04| 2001-02-15 16:05:04 1415| 264| 2001-02-16 16:05:05| 2001-03-08 16:34:03 1415| 265| 2001-03-08 16:34:04| 1747| 257| 2001-02-15 09:14:39| 2001-03-01 08:58:03 1747| 258| 2001-03-01 08:58:04|2001-03-01 09:01:27 1747| 260| 2001-03-01 09:01:39| 2001-03-01 09:02:39 1747| 262| 2001-03-05 13:13:58| 1954| 218| 2001-02-02 20:55:39| 2001-02-08 12:44:48 1954| 219| 2001-02-08 12:44:49| 2001-02-0812:47:36 1954| 220| 2001-02-08 12:47:36| 2001-02-08 16:50:33 1954| 221| 2001-02-08 16:50:33| 2001-02-08 16:50:451954| 222| 2001-02-08 16:50:46| 2001-02-12 14:36:41 1954| 223| 2001-02-12 14:36:41| 2001-03-02 10:17:15 This is a log of some operations done on some runs. 'seq' is the step within the run. I need to produce a new table that coalesces immediately successive operations on a run into one, e.g.: run | start | done 1415| 2001-01-29 12:36:55| 2001-02-07 13:02:38 1415| 2001-02-14 07:40:04| 1747| 2001-02-1509:14:39| 2001-03-01 09:02:39 1747| 2001-03-05 13:13:58| 1954| 2001-02-02 20:55:39| 2001-03-02 10:17:15 i.e. where a run has one or more steps with succesive seq values, or equivalently, with abutting start/end values, then I want the new table to have only one entry representing the full span of time. Null 'done' just means it's not done yet. Unfortunately, the start time of a 'succesive' op is sometimes 1 second later that the 'done' time of the previous one, so maybe using the seq field is simpler. Can anyone think of a way I can do this in postgres? -- George Young, Rm. L-204 gry@ll.mit.edu MIT Lincoln Laboratory 244 Wood St. Lexington, Massachusetts 02420-9108 (781) 981-2756
George Young wrote: ... >I need to produce a new table that coalesces immediately successive >operations on a run into one, e.g.: > >run | start | done > 1415| 2001-01-29 12:36:55| 2001-02-07 13:02:38 > 1415| 2001-02-14 07:40:04| > 1747| 2001-02-15 09:14:39| 2001-03-01 09:02:39 > 1747| 2001-03-05 13:13:58| > 1954| 2001-02-02 20:55:39| 2001-03-02 10:17:15> >i.e. where a run has one or more steps with succesive seq values, >or equivalently, with abutting start/end values,then I want >the new table to have only one entry representing the full span of >time. Null 'done' just means it'snot done yet. ... >Can anyone think of a way I can do this in postgres? I think you would be better off feeding the log out of the database into a perl script. SQL is a set-oriented language that can't do this sort of thing without some appalling contortions, if at all. (I can't think of a way to do it.) So, store the data in PostgreSQL, but do procedural processing in a more appropriate language. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "Let your light so shine before men, that they may see your good works, andglorify your Father which is in heaven." Matthew 5:16
>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<< On 3/14/01, 5:24:12 PM, George Young <gry@ll.mit.edu> wrote regarding [SQL] need to join successive log entries into one: > I have a table like: > run | seq | start | done > 1415| 261| 2001-01-29 12:36:55| 2001-02-07 13:02:38 > 1415| 263| 2001-02-14 07:40:04| 2001-02-15 16:05:04 > 1415| 264| 2001-02-16 16:05:05| 2001-03-08 16:34:03 > 1415| 265| 2001-03-08 16:34:04| > This is a log of some operations done on some runs. 'seq' is the step > within the run. > I need to produce a new table that coalesces immediately successive > operations on a run into one, e.g.: > run | start | done > 1415| 2001-01-29 12:36:55| 2001-02-07 13:02:38 > 1415| 2001-02-14 07:40:04| > 1747| 2001-02-15 09:14:39| 2001-03-01 09:02:39 > 1747| 2001-03-05 13:13:58| > 1954| 2001-02-02 20:55:39| 2001-03-02 10:17:15 Try: select run,min(start),max(done) from mytable group by run; - Richard Huxton
On Wed, 14 Mar 2001, you wrote: > On 3/14/01, 5:24:12 PM, George Young <gry@ll.mit.edu> wrote regarding [SQL] > I need to join successive log entries into one: > > I have a table like: > > > run | seq | start | done > > 1415| 261| 2001-01-29 12:36:55| 2001-02-07 13:02:38 > > 1415| 263| 2001-02-14 07:40:04| 2001-02-15 16:05:04 > > 1415| 264| 2001-02-16 16:05:05| 2001-03-08 16:34:03 > > 1415| 265| 2001-03-08 16:34:04| > > > This is a log of some operations done on some runs. 'seq' is the step > > within the run. > > > I need to produce a new table that coalesces immediately successive > > operations on a run into one, e.g.: > > > run | start | done > > 1415| 2001-01-29 12:36:55| 2001-02-07 13:02:38 > > 1415| 2001-02-14 07:40:04| > > 1747| 2001-02-15 09:14:39| 2001-03-01 09:02:39 > > 1747| 2001-03-05 13:13:58| > > 1954| 2001-02-02 20:55:39| 2001-03-02 10:17:15 > > Try: > > select run,min(start),max(done) from mytable group by run; Alas, this combines *all* entries for a given run, not just those that are imediately adjacent (in time, or by 'seq' number)... -- George Young, Rm. L-204 gry@ll.mit.edu MIT Lincoln Laboratory 244 Wood St. Lexington, Massachusetts 02420-9108 (781) 981-2756
On Wed, Mar 14, 2001 at 09:17:33PM +0000, Richard H wrote: > > > I need to produce a new table that coalesces immediately successive > > operations on a run into one, e.g.: > > > run | start | done > > 1415| 2001-01-29 12:36:55| 2001-02-07 13:02:38 > > 1415| 2001-02-14 07:40:04| > > 1747| 2001-02-15 09:14:39| 2001-03-01 09:02:39 > > 1747| 2001-03-05 13:13:58| > > 1954| 2001-02-02 20:55:39| 2001-03-02 10:17:15 > > Try: > > select run,min(start),max(done) from mytable group by run; select run,min(start),max(done) from mytable group by runUNION ALLselect run,start,NULL from mytable where done is null; Assuming that any given run has only one open step. Ross
George Young wrote: > > On Wed, 14 Mar 2001, you wrote: > > On 3/14/01, 5:24:12 PM, George Young <gry@ll.mit.edu> wrote regarding [SQL] > > I need to join successive log entries into one: > > > I have a table like: > > > > > run | seq | start | done > > > 1415| 261| 2001-01-29 12:36:55| 2001-02-07 13:02:38 > > > 1415| 263| 2001-02-14 07:40:04| 2001-02-15 16:05:04 > > > 1415| 264| 2001-02-16 16:05:05| 2001-03-08 16:34:03 > > > 1415| 265| 2001-03-08 16:34:04| > > > > Try: > > > > select run,min(start),max(done) from mytable group by run; > > Alas, this combines *all* entries for a given run, not just those that > are imediately adjacent (in time, or by 'seq' number)... I thought it was complicated, then I thought it was easy. Looks like I was right first time. I was thinking that some huge self-join might do it, but I can't see how to go beyond a run of two adjacent entries. The only thing I can think of is to add a "batch" column and build a trigger to set it as data is inserted. I'm assuming the entries are put in one at a time and in order. That way you just need to look at the last entry to determine if the new one is in the same batch. Any use? - Richard Huxton > -- > George Young, Rm. L-204 gry@ll.mit.edu > MIT Lincoln Laboratory > 244 Wood St. > Lexington, Massachusetts 02420-9108 (781) 981-2756