Thread: Need some help with a query (uniq -c)
Hello! I have a table (think of it as a table of log messages) time | message ----------------------- 1 | a 2 | b 3 | b 4 | b 5 | a the three 'b' are the same message, so I would like to write a query that would give me a result that is similar to what the unix command "uniq -c" would give: first | message | last | count -------------------------------------- 1 | a | 1 | 1 2 | b | 4 | 3 <--- here it squeezes similar consecutive messages into a single row 5 | a | 5 | 1 How do I write such a command? I could of course write a plpgsql function that loops but that is not as interesting as finding out if this can be done in a single simple command. Perhaps it would be best to regularly delete neighbouring similar rows and keeping a "count" value would reduce the number of rows and make it more efficient if the query would be run many times and the number of duplicate messages would be large.
On Mon, Apr 12, 2010 at 12:22 PM, A B <gentosaker@gmail.com> wrote: > Hello! > > I have a table (think of it as a table of log messages) > > time | message > ----------------------- > 1 | a > 2 | b > 3 | b > 4 | b > 5 | a > > the three 'b' are the same message, so I would like to write a query > that would give me a result that is similar to what the unix command > "uniq -c" would give: > > first | message | last | count > -------------------------------------- > 1 | a | 1 | 1 > 2 | b | 4 | 3 <--- here it squeezes > similar consecutive messages into a single row > 5 | a | 5 | 1 > > How do I write such a command? Pretty straight ahead: select min(t), message, max(t), count(*) from table group by message.
On 12 April 2010 22:22, A B <gentosaker@gmail.com> wrote: > > first | message | last | count > -------------------------------------- > 1 | a | 1 | 1 > 2 | b | 4 | 3 <--- here it squeezes > similar consecutive messages into a single row > 5 | a | 5 | 1 > > How do I write such a command? select min(time) as first, max(time) as last, message, count(*) from log_table group by message order by 1; -- Sergey Konoplev Blog: http://gray-hemp.blogspot.com / Linkedin: http://ru.linkedin.com/in/grayhemp / JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp / ICQ: 29353802
On Apr 12, 2010, at 11:31 AM, Scott Marlowe wrote: > On Mon, Apr 12, 2010 at 12:22 PM, A B <gentosaker@gmail.com> wrote: >> Hello! >> >> I have a table (think of it as a table of log messages) >> >> time | message >> ----------------------- >> 1 | a >> 2 | b >> 3 | b >> 4 | b >> 5 | a >> >> the three 'b' are the same message, so I would like to write a query >> that would give me a result that is similar to what the unix command >> "uniq -c" would give: >> >> first | message | last | count >> -------------------------------------- >> 1 | a | 1 | 1 >> 2 | b | 4 | 3 <--- here it squeezes >> similar consecutive messages into a single row >> 5 | a | 5 | 1 >> >> How do I write such a command? > > Pretty straight ahead: > > select min(t), message, max(t), count(*) from table group by message. That was my first though too, but it combines everything not just adjacent messages. Something like this, maybe select t1.message, t1.time as first, t2.time as last, t2.time-t1.time+1 as count from foo as t1, foo as t2 where t1.time <= t2.time and t1.message = t2.message and not exists (select * from foo as t3 where (t3.time between t1.time and t2.time and t3.message <> t1.message) or (t3.time = t2.time + 1 and t3.message = t1.message) or (t3.time = t1.time - 1 and t3.message = t1.message)); message | first | last | count ---------+-------+------+------- a | 1 | 1 | 1 b | 2 | 4 | 3 a | 5 | 5 | 1 That'll only work if the time values are contiguous, but there's probably a similar trick for non-contiguous. Cheers, Steve
Hello. I try with "With Query". http://www.postgresql.org/docs/8.4/static/queries-with.html #We can use "With Queries" > v8.4 #That'll only work if the time values are contiguous, but there's probably a #similar trick for non-contiguous ,too. --create data drop table foo; create table foo( time int,message text); insert into foo values(1,'a'); insert into foo values(2,'b'); insert into foo values(3,'b'); insert into foo values(4,'c'); insert into foo values(5,'a'); insert into foo values(6,'c'); insert into foo values(7,'c'); insert into foo values(8,'a'); insert into foo values(9,'a'); insert into foo values(10,'a'); --begin Answer with recursive r as ( select foo.time,foo.message,1 as dummy from foo union all select foo.time,foo.message,r.dummy+1 from foo , r where foo.time=r.time-1 and foo.message=r.message ) ,rr as ( select foo.time,foo.message,'OLID' as flag from foo union all select foo.time,foo.message,'DUP' as flag from foo , rr where foo.time-1=rr.time-2 and foo.message=rr.message ) select time min,time+max(dummy)-1 max,message,max(dummy) counts from r where time not in (select distinct (time+1) times from rr where flag='DUP') group by time,message order by time; --result postgres(# where flag='DUP') group by time,message order by time; min | max | message | counts -----+-----+---------+-------- 1 | 1 | a | 1 2 | 3 | b | 2 4 | 4 | c | 1 5 | 5 | a | 1 6 | 7 | c | 2 8 | 10 | a | 3 (6 rows) --end But I think some one can provide more simple SQL. Thank you. > On Apr 12, 2010, at 11:31 AM, Scott Marlowe wrote: > > >> On Mon, Apr 12, 2010 at 12:22 PM, A B<gentosaker@gmail.com> wrote: >> >>> Hello! >>> >>> I have a table (think of it as a table of log messages) >>> >>> time | message >>> ----------------------- >>> 1 | a >>> 2 | b >>> 3 | b >>> 4 | b >>> 5 | a >>> >>> the three 'b' are the same message, so I would like to write a query >>> that would give me a result that is similar to what the unix command >>> "uniq -c" would give: >>> >>> first | message | last | count >>> -------------------------------------- >>> 1 | a | 1 | 1 >>> 2 | b | 4 | 3<--- here it squeezes >>> similar consecutive messages into a single row >>> 5 | a | 5 | 1 >>> >>> How do I write such a command? >>> >> Pretty straight ahead: >> >> select min(t), message, max(t), count(*) from table group by message. >> > That was my first though too, but it combines everything not just adjacent messages. > > Something like this, maybe > > select t1.message, t1.time as first, t2.time as last, t2.time-t1.time+1 as count > from foo as t1, foo as t2 > where t1.time<= t2.time and t1.message = t2.message > and not exists > (select * from foo as t3 > where (t3.time between t1.time and t2.time and t3.message<> t1.message) > or (t3.time = t2.time + 1 and t3.message = t1.message) > or (t3.time = t1.time - 1 and t3.message = t1.message)); > > message | first | last | count > ---------+-------+------+------- > a | 1 | 1 | 1 > b | 2 | 4 | 3 > a | 5 | 5 | 1 > > That'll only work if the time values are contiguous, but there's probably a > similar trick for non-contiguous. > > Cheers, > Steve > > > -- ================================================ Kenichiro Tanaka K.K.Ashisuto http://www.ashisuto.co.jp/english/index.html ================================================
Thank you all who has replied. I will study your suggestions and see what will work best in my case. 2010/4/13 Kenichiro Tanaka <ketanaka@ashisuto.co.jp>: > Hello. > > I try with "With Query". > http://www.postgresql.org/docs/8.4/static/queries-with.html > > #We can use "With Queries" > v8.4 > #That'll only work if the time values are contiguous, but there's probably a > #similar trick for non-contiguous ,too. > > --create data > drop table foo; > create table foo( time int,message text); > insert into foo values(1,'a'); > insert into foo values(2,'b'); > insert into foo values(3,'b'); > insert into foo values(4,'c'); > insert into foo values(5,'a'); > insert into foo values(6,'c'); > insert into foo values(7,'c'); > insert into foo values(8,'a'); > insert into foo values(9,'a'); > insert into foo values(10,'a'); > > --begin Answer > with recursive r as ( > select foo.time,foo.message,1 as dummy from foo > union all > select foo.time,foo.message,r.dummy+1 from foo , r > where foo.time=r.time-1 and foo.message=r.message > ) > ,rr as ( > select foo.time,foo.message,'OLID' as flag from foo > union all > select foo.time,foo.message,'DUP' as flag from foo , rr > where foo.time-1=rr.time-2 and foo.message=rr.message > ) > select time min,time+max(dummy)-1 max,message,max(dummy) counts > from r where time not in (select distinct (time+1) times from rr > where flag='DUP') group by time,message order by time; > > --result > postgres(# where flag='DUP') group by time,message order by time; > min | max | message | counts > -----+-----+---------+-------- > 1 | 1 | a | 1 > 2 | 3 | b | 2 > 4 | 4 | c | 1 > 5 | 5 | a | 1 > 6 | 7 | c | 2 > 8 | 10 | a | 3 > (6 rows) > > --end > > But I think some one can provide more simple SQL. > > Thank you. >> >> On Apr 12, 2010, at 11:31 AM, Scott Marlowe wrote: >> >> >>> >>> On Mon, Apr 12, 2010 at 12:22 PM, A B<gentosaker@gmail.com> wrote: >>> >>>> >>>> Hello! >>>> >>>> I have a table (think of it as a table of log messages) >>>> >>>> time | message >>>> ----------------------- >>>> 1 | a >>>> 2 | b >>>> 3 | b >>>> 4 | b >>>> 5 | a >>>> >>>> the three 'b' are the same message, so I would like to write a query >>>> that would give me a result that is similar to what the unix command >>>> "uniq -c" would give: >>>> >>>> first | message | last | count >>>> -------------------------------------- >>>> 1 | a | 1 | 1 >>>> 2 | b | 4 | 3<--- here it squeezes >>>> similar consecutive messages into a single row >>>> 5 | a | 5 | 1 >>>> >>>> How do I write such a command? >>>> >>> >>> Pretty straight ahead: >>> >>> select min(t), message, max(t), count(*) from table group by message. >>> >> >> That was my first though too, but it combines everything not just adjacent >> messages. >> >> Something like this, maybe >> >> select t1.message, t1.time as first, t2.time as last, t2.time-t1.time+1 as >> count >> from foo as t1, foo as t2 >> where t1.time<= t2.time and t1.message = t2.message >> and not exists >> (select * from foo as t3 >> where (t3.time between t1.time and t2.time and t3.message<> >> t1.message) >> or (t3.time = t2.time + 1 and t3.message = t1.message) >> or (t3.time = t1.time - 1 and t3.message = t1.message)); >> >> message | first | last | count >> ---------+-------+------+------- >> a | 1 | 1 | 1 >> b | 2 | 4 | 3 >> a | 5 | 5 | 1 >> >> That'll only work if the time values are contiguous, but there's probably >> a >> similar trick for non-contiguous. >> >> Cheers, >> Steve >> >> >> > > > -- > ================================================ > Kenichiro Tanaka > K.K.Ashisuto > http://www.ashisuto.co.jp/english/index.html > ================================================ > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >