Re: Question regarding querying some JSON/JSONB - Mailing list pgsql-sql

From Weston Weems
Subject Re: Question regarding querying some JSON/JSONB
Date
Msg-id CAHcN2MxWFjJHrRTEFvT2_iGC5dGRAxsBJqRzvYXddCUe3JF6pA@mail.gmail.com
Whole thread Raw
In response to Re: Question regarding querying some JSON/JSONB  (Steve Midgley <science@misuse.org>)
List pgsql-sql
sorry if I was unclear... all I meant to say about the dates were that they were the first key I use in the lookup in the json columns... they could be anything... 

So basically I'd like to say go pull these keys and perform the aggregation (lets say status is 1-10, I want a count of each, so I have the count queries that do that) this works, and provides OverallStatus1Count, OverallStatus2Count etc

Now I'm just trying to figure out how to expand on this so I don't have to run the query 12 times to effectively build aggregates on the fly for each month.

Grouping may be the key, but being I'm looking for a count of values equal to 1, then 2, then 3... a simple sum/group by may not be able to achieve what I'm looking for... but may be the right clue as to how to get where I need to go.

In any case, I'll definitely respond with anything I manage to get working, to see if theres anything hugely alarming with it.

Weston

On Mon, Apr 27, 2015 at 11:42 AM, Steve Midgley <science@misuse.org> wrote:
I'm not sure I understand this part of your question: "what I'd like to do is pass in a number of dates and have those dates substituted in for the date keys"

Are you trying to aggregate your query to add up all the "1" outputs in a sum from a large number of data structures featuring different dates? If so, I would think that you should use your current query as a dynamic table and then write a group by aggregation on it. 

If that assessment is right, then it seems like your real problem is that your date field is a key not a value? Can you restructure the json so you have the date as a value as well:

["struct":{"date": “2014-10-10”, “overall_status”:10, “avg_response:20}, "struct":{"date": “2014-11-10”, “overall_status”:10, “avg_response:20}]

Wouldn't that let you aggregate dates via a dynamic table? I'm writing free-hand (untested) but something like:

select date, sum(overall_status)  as StatusZeroCount, sum(avg_response) as Response10Count
from
  select
    data->'struct'->'date' as date
    coalesce(data->'overall_status' as overall_status, 1)
    coalesce(data->'avg_response' as avg_response, 1)
  from some_table
group by date

I'm riffing there, but maybe that idea is useful?

Steve


On Mon, Apr 27, 2015 at 7:19 AM, Weston Weems <wweems@gmail.com> wrote:

I'm having troubles with this query (or rather getting it to work exactly as I expected to. I've also summarized here:

http://pastecode.org/index.php/view/47785912


data
[“2014-10-10”:{“overall_status”:10, “avg_response:20}, “2014-11-10”:{“overall_status”:10, “avg_response:20}]
[“2014-10-10”:{“overall_status”:10, “avg_response:20}, “2014-11-10”:{“overall_status”:10, “avg_response:20}]


select
count(case when data->’2014-10-10’->’overall_status’ = 0 then 1) as StatusZeroCount,
count(case when data->’2014-10-10’->’avg_response’ = 10 then 1) as Response10Count,
from some_table where ...


This works, even for cases where records dont actually even have a key for that date (which is what I want)

Ideally what I'd like to do is pass in a number of dates and have those dates substituted in for the date keys (and return counts even if the keys dont exist in the json) in the count queries and return data like:


date              | StatusZeroCount | Response10Count
2014-10-10  | 10                         | 2


The problems I see is first of all, how to say substittute in these dates... seems like a subselect, where the date keys are pulled from the parent query would work, but then I'd get one record with a ton of columns.

Seems like there would probably be a way to group by key and group by overall_status, avg_response and get counts of each or something too, but I dont know.


I guess worst case scenario, I could just query the data 12 times (since I'm basically saying get some counts for the last 12 mo) so conceivably 12 records, and the aggregates I can build against that data, but it seems like that would be hugely wasteful.


Thanks for any advice in advance!






pgsql-sql by date:

Previous
From: Steve Midgley
Date:
Subject: Re: Question regarding querying some JSON/JSONB
Next
From: Suresh Raja
Date:
Subject: documenting tables version control