Thread: Ideas about presenting data coming from sensors
Dear PostgreSQL people We have a project having to do with capturing sensor data and alarms from various machinery using various protocols/standards (e.g. NMEA , MODBUS). We have about 150 sites (vessels of various types) and each site will generate about 500 rows per minute. We have solved most of the design + issues regarding DB, data transfer to the central DB, etc and now we have started to think about presenting them. Ideally and eventually each one of those will be connected to some machinery item defined in our inventory / live system, so the final goal is to have smth like "mimics" and interactive UI that will let the user have an overall view of the fleet (all vessels), spot the red/orange ones, then click on a vessel, spot the red/orange areas, then keep on zooming in until he/she finds the exact item that is in trouble. This is the rough idea. But for now we are more interested in just viewing the incoming data that will be basically tag/value pairs. The types of those data will be divided in normal tag with a normal float value, or alarm data which have about 2 discrete boolean variables (valid and acknowledged). So we want to cover those two cases as far as viewing is concerned. The data will have minutely resolution. So if a vessel e.g. has 700 discrete tags then the system will generate and transfer to the central DB 700 rows for every minute. Any ideas? Graphs time series? We have graphs (Java) that show time series with daily resolution. What would you guys recommend for showing such data, coming from data acquisition systems ? Thank you!
On 1/30/25 18:45, Adrian Klaver wrote: > On 1/30/25 06:18, Achilleas Mantzios - cloud wrote: >> Dear PostgreSQL people >> >> We have a project having to do with capturing sensor data and alarms >> from various machinery using various protocols/standards (e.g. NMEA , >> MODBUS). We have about 150 sites (vessels of various types) and each >> site will generate about 500 rows per minute. We have solved most of >> the design + issues regarding DB, data transfer to the central DB, >> etc and now we have started to think about presenting them. >> >> Ideally and eventually each one of those will be connected to some >> machinery item defined in our inventory / live system, so the final >> goal is to have smth like "mimics" and interactive UI that will let >> the user have an overall view of the fleet (all vessels), spot the >> red/orange ones, then click on a vessel, spot the red/orange areas, >> then keep on zooming in until he/she finds the exact item that is in >> trouble. >> >> This is the rough idea. >> >> But for now we are more interested in just viewing the incoming data >> that will be basically tag/value pairs. >> >> The types of those data will be divided in normal tag with a normal >> float value, or alarm data which have about 2 discrete boolean >> variables (valid and acknowledged). So we want to cover those two >> cases as far as viewing is concerned. >> >> The data will have minutely resolution. So if a vessel e.g. has 700 >> discrete tags then the system will generate and transfer to the >> central DB 700 rows for every minute. >> >> Any ideas? Graphs time series? We have graphs (Java) that show time >> series with daily resolution. What would you guys recommend for >> showing such data, coming from data acquisition systems ? > > It seems there are two uses in play here: > > 1) Minute by minute data acquisition to track sensor and alarm output > in general. > > 2) Troubleshooting emerging problems. > > For 2) you seem to already have the idea of drilling down on values > that have moved into the orange --> red area. That does not require > displaying all the data, just the data that enters the zone of interest. > > > For 1) the question is do you think folks are going to look at 700 > values recorded on a minute by minute basis on 150 ships? Where that > information could help is supplying context for the problems arising > in 2), especially intermittent issues. I could see picking a problem > defined in 2) and allowing for the user to select both a time period > and related values over that period to help determine how the problem > developed. > Thank you Adrian! Yes the ultimate goal would be to have a graphical overview where the user will finally zoom in until he/she spots the exact single problem at hand. Then view trends or other historical data that might have had contributed or caused this specific event. We took a look in timescaledb, and this seems to fit completely what we need : - postgresql - time series data - efficient storage, efficient querying - out of the box partitioning - etc Now my problem is on the design . We have : a) tags that have primitive values, float4 lets say - this is the majority, e.g. 60% of all tags b) tags that contain alarms data also with defined structure, which have additional data such as time of the initial alarm set, acknowledgement of this alarm , validity of this alarm. Those represent smth like 35% fo all tags c) tags that are basically polymorphic (about 11 of them all in all), each one has different structure, and their fields/cols range a few (1) up to many (25) We have a table for a) and a table for b). If we followed a strict normalized approach then we would create additionally 11 tables each tag of type c) . And we are not guaranteed that the same tags would have the same structure over the whole fleet/manufacturers. So we are thinking of putting all semi-structured data of tags of type c) into one table with a single col of type jsonb . From what I read timescaledb plays nice with jsonb (or at least not bad). Do you ppl see any gotcha with this approach ? For starters we will not convert yet to timescaledb, but store them and handle them like normal tables. At least until we grasp the ins and outs of this. > >> >> Thank you! >> >> >> >> >
On 2/13/25 01:53, Achilleas Mantzios - cloud wrote: > Now my problem is on the design . We have : > > a) tags that have primitive values, float4 lets say - this is the > majority, e.g. 60% of all tags > > b) tags that contain alarms data also with defined structure, which have > additional data such as time of the initial alarm set, acknowledgement > of this alarm , validity of this alarm. Those represent smth like 35% fo > all tags > > c) tags that are basically polymorphic (about 11 of them all in all), > each one has different structure, and their fields/cols range a few (1) > up to many (25) > > We have a table for a) and a table for b). > > If we followed a strict normalized approach then we would create > additionally 11 tables each tag of type c) . And we are not guaranteed > that the same tags would have the same structure over the whole > fleet/manufacturers. So we are thinking of putting all semi-structured > data of tags of type c) into one table with a single col of type jsonb . >> From what I read timescaledb plays nice with jsonb (or at least not bad). > > Do you ppl see any gotcha with this approach ? The only thing I can see at this time is: 'And we are not guaranteed that the same tags would have the same structure over the whole fleet/manufacturers.' That would seem to me to point to a need for a table that maps a structure template to a fleet or manufacturer and a corresponding field in table c) that holds the fleet/manufacturer information. > > For starters we will not convert yet to timescaledb, but store them and > handle them like normal tables. At least until we grasp the ins and outs > of this. > >> >>> >>> Thank you! >>> >>> >>> >>> >> -- Adrian Klaver adrian.klaver@aklaver.com
13.02.2025 10:54:05 Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com>: > If we followed a strict normalized approach then we would create additionally 11 tables each tag of type c) . And we arenot guaranteed that the same tags would have the same structure over the whole fleet/manufacturers. So we are thinkingof putting all semi-structured data of tags of type c) into one table with a single col of type jsonb . From whatI read timescaledb plays nice with jsonb (or at least not bad). > > Do you ppl see any gotcha with this approach ? > > For starters we will not convert yet to timescaledb, but store them and handle them like normal tables. At least untilwe grasp the ins and outs of this. I have not come to see the real advantage of putting data into something like Jason or XML columns. Sue, you make life easierinitially saving the time to put them into columns one by one, on the other end you have the hassle of dissecting theJSON, XML you name it when you retrieve/select the data, every query. While the query might stay stable the computational effort dissection will have to be done with every query execution. For c) you could go to 6th normal form, or what number that is, by a table like TAG(id, tag_name, tag_value). You would haveto convert the values to string to enter them. It is a pain in the butt, imho, retrieving them again. You can alleviateby having a specific column/table per data type, e.g. TAG(id, tag_name,tag_value_varray,tag_value_date,...). What is the downside of having each a table for the special tags? More effort in setting up. The upside is less effort in retrieval, and a much more understandable model. If your tag structure is volatile, you might have generic column names on a table mapping them in a view to speaking names.Taking this further does anchor modelling https://www.anchormodeling.com/ .
13.02.2025 10:54:05 Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com>:
> If we followed a strict normalized approach then we would create additionally 11 tables each tag of type c) . And we are not guaranteed that the same tags would have the same structure over the whole fleet/manufacturers. So we are thinking of putting all semi-structured data of tags of type c) into one table with a single col of type jsonb . From what I read timescaledb plays nice with jsonb (or at least not bad).
>
> Do you ppl see any gotcha with this approach ?
>
> For starters we will not convert yet to timescaledb, but store them and handle them like normal tables. At least until we grasp the ins and outs of this.
I have not come to see the real advantage of putting data into something like Jason or XML columns. Sue, you make life easier initially saving the time to put them into columns one by one, on the other end you have the hassle of dissecting the JSON, XML you name it when you retrieve/select the data, every query. While the query might stay stable the computational
effort dissection will have to be done with every query execution.
For c) you could go to 6th normal form, or what number that is, by a table like TAG(id, tag_name, tag_value). You would have to convert the values to string to enter them. It is a pain in the butt, imho, retrieving them again. You can alleviate by having a specific column/table per data type, e.g. TAG(id, tag_name,tag_value_varray,tag_value_date,...).
What is the downside of having each a table for the special tags? More effort in setting up.
The upside is less effort in retrieval, and a much more understandable model.
If your tag structure is volatile, you might have generic column names on a table mapping them in a view to speaking names. Taking this further does anchor modelling https://www.anchormodeling.com/
.
On 2/13/25 19:12, Adrian Klaver wrote: > On 2/13/25 01:53, Achilleas Mantzios - cloud wrote: >> Now my problem is on the design . We have : >> >> a) tags that have primitive values, float4 lets say - this is the >> majority, e.g. 60% of all tags >> >> b) tags that contain alarms data also with defined structure, which >> have additional data such as time of the initial alarm set, >> acknowledgement of this alarm , validity of this alarm. Those >> represent smth like 35% fo all tags >> >> c) tags that are basically polymorphic (about 11 of them all in all), >> each one has different structure, and their fields/cols range a few >> (1) up to many (25) >> >> We have a table for a) and a table for b). >> >> If we followed a strict normalized approach then we would create >> additionally 11 tables each tag of type c) . And we are not >> guaranteed that the same tags would have the same structure over the >> whole fleet/manufacturers. So we are thinking of putting all >> semi-structured data of tags of type c) into one table with a single >> col of type jsonb . >>> From what I read timescaledb plays nice with jsonb (or at least not >>> bad). >> >> Do you ppl see any gotcha with this approach ? > > The only thing I can see at this time is: 'And we are not guaranteed > that the same tags would have the same structure over the whole > fleet/manufacturers.' > > That would seem to me to point to a need for a table that maps a > structure template to a fleet or manufacturer and a corresponding > field in table c) that holds the fleet/manufacturer information. Nice idea or just stuff everything (self-contained) inside json and let the software do the relevant logic, a somewhat more liberal approach. > >> >> For starters we will not convert yet to timescaledb, but store them >> and handle them like normal tables. At least until we grasp the ins >> and outs of this. >> >>> >>>> >>>> Thank you! >>>> >>>> >>>> >>>> >>> >
On 2/13/25 20:44, Thiemo Kellner wrote: > 13.02.2025 10:54:05 Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com>: > >> If we followed a strict normalized approach then we would create additionally 11 tables each tag of type c) . And we arenot guaranteed that the same tags would have the same structure over the whole fleet/manufacturers. So we are thinkingof putting all semi-structured data of tags of type c) into one table with a single col of type jsonb . From whatI read timescaledb plays nice with jsonb (or at least not bad). >> >> Do you ppl see any gotcha with this approach ? >> >> For starters we will not convert yet to timescaledb, but store them and handle them like normal tables. At least untilwe grasp the ins and outs of this. > I have not come to see the real advantage of putting data into something like Jason or XML columns. Sue, you make lifeeasier initially saving the time to put them into columns one by one, on the other end you have the hassle of dissectingthe JSON, XML you name it when you retrieve/select the data, every query. While the query might stay stable the computational > effort dissection will have to be done with every query execution. Yes, jsonb is for cases that we simply do not know or care to know or have the time to break down something that will change from ship to ship, and already the burden of fighting with the electricians to define correctly the tags of a) and b) is too much. So since, on top of that, we hear variable "opinions" on those of type c) (one of them being ... nah .. nobody needs those), so we are saving our mental health going with jason, and of course this is under review, it could change with no big trouble at this stage. > > For c) you could go to 6th normal form, or what number that is, by a table like TAG(id, tag_name, tag_value). You wouldhave to convert the values to string to enter them. It is a pain in the butt, imho, retrieving them again. You can alleviateby having a specific column/table per data type, e.g. TAG(id, tag_name,tag_value_varray,tag_value_date,...). > > What is the downside of having each a table for the special tags? More effort in setting up. > The upside is less effort in retrieval, and a much more understandable model. The downside is for every vessel I might end up with a new table for every new setup which sucks, the idea by Adrian is fine, but we don't have users to support the variable dynamic definitions case by case, we would have to do that, so I better offload it all to the devs for lack of a better option. Plus this is the beginning we are starting lean, not over-engineering without a reason, and then augment as we go. > > If your tag structure is volatile, you might have generic column names on a table mapping them in a view to speaking names.Taking this further does anchor modelling https://www.anchormodeling.com/ > . > >
On Thu, Feb 13, 2025 at 9:44 PM Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote:13.02.2025 10:54:05 Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com>:
> If we followed a strict normalized approach then we would create additionally 11 tables each tag of type c) . And we are not guaranteed that the same tags would have the same structure over the whole fleet/manufacturers. So we are thinking of putting all semi-structured data of tags of type c) into one table with a single col of type jsonb . From what I read timescaledb plays nice with jsonb (or at least not bad).
>
> Do you ppl see any gotcha with this approach ?
>
> For starters we will not convert yet to timescaledb, but store them and handle them like normal tables. At least until we grasp the ins and outs of this.
I have not come to see the real advantage of putting data into something like Jason or XML columns. Sue, you make life easier initially saving the time to put them into columns one by one, on the other end you have the hassle of dissecting the JSON, XML you name it when you retrieve/select the data, every query. While the query might stay stable the computational
effort dissection will have to be done with every query execution.
For c) you could go to 6th normal form, or what number that is, by a table like TAG(id, tag_name, tag_value). You would have to convert the values to string to enter them. It is a pain in the butt, imho, retrieving them again. You can alleviate by having a specific column/table per data type, e.g. TAG(id, tag_name,tag_value_varray,tag_value_date,...).
What is the downside of having each a table for the special tags? More effort in setting up.
The upside is less effort in retrieval, and a much more understandable model.
If your tag structure is volatile, you might have generic column names on a table mapping them in a view to speaking names. Taking this further does anchor modelling https://www.anchormodeling.com/
.Though you haven't asked for comments on database design, I have a suggestion, have you considered using table partitioning based on a well defined date interval, maybe monthly?Here your DB data population application will determine the name of the table based on the current date then check for the existence of the table and then construct the inherited table if one does not exist and proceed to populate the child table accordingly.
Absolutely, this will be a necessity sooner or later, as I can see the future coming it will be a new timescaledb functioning at the start as a logical replica of our maindb, and timescaledb will handle all the partitioning (i hope). At some point switch the app (both the data loading (from the vessels) and the queries ) to the timescaledb, and free the space in the maindb or smth along those line.
This may prove useful in future.-Allan.
Hi Again
Up to this day we have set the data acquisition system running for just one ship and writing the code to display the data. For less than 20 days we have 6M rows.
I gave a shot to timescale, installed locally as an extension, it seems much prettier than having to do all the partition mgmt by hand or other tools. However this seems more than a complete engine with its own workers, so this seems like something new and big which seems to me like something to commit to for a long time, something to invest, on top of the already 25+ commitment we have with PostgreSQL itself.
So this is serious decision, so ppl please share your stories with timescale .
On 2/14/25 11:34, Allan Kamau wrote:On Thu, Feb 13, 2025 at 9:44 PM Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote:13.02.2025 10:54:05 Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com>:
> If we followed a strict normalized approach then we would create additionally 11 tables each tag of type c) . And we are not guaranteed that the same tags would have the same structure over the whole fleet/manufacturers. So we are thinking of putting all semi-structured data of tags of type c) into one table with a single col of type jsonb . From what I read timescaledb plays nice with jsonb (or at least not bad).
>
> Do you ppl see any gotcha with this approach ?
>
> For starters we will not convert yet to timescaledb, but store them and handle them like normal tables. At least until we grasp the ins and outs of this.
I have not come to see the real advantage of putting data into something like Jason or XML columns. Sue, you make life easier initially saving the time to put them into columns one by one, on the other end you have the hassle of dissecting the JSON, XML you name it when you retrieve/select the data, every query. While the query might stay stable the computational
effort dissection will have to be done with every query execution.
For c) you could go to 6th normal form, or what number that is, by a table like TAG(id, tag_name, tag_value). You would have to convert the values to string to enter them. It is a pain in the butt, imho, retrieving them again. You can alleviate by having a specific column/table per data type, e.g. TAG(id, tag_name,tag_value_varray,tag_value_date,...).
What is the downside of having each a table for the special tags? More effort in setting up.
The upside is less effort in retrieval, and a much more understandable model.
If your tag structure is volatile, you might have generic column names on a table mapping them in a view to speaking names. Taking this further does anchor modelling https://www.anchormodeling.com/
.Though you haven't asked for comments on database design, I have a suggestion, have you considered using table partitioning based on a well defined date interval, maybe monthly?Here your DB data population application will determine the name of the table based on the current date then check for the existence of the table and then construct the inherited table if one does not exist and proceed to populate the child table accordingly.Absolutely, this will be a necessity sooner or later, as I can see the future coming it will be a new timescaledb functioning at the start as a logical replica of our maindb, and timescaledb will handle all the partitioning (i hope). At some point switch the app (both the data loading (from the vessels) and the queries ) to the timescaledb, and free the space in the maindb or smth along those line.
This may prove useful in future.-Allan.
On 2/26/25 01:27, Achilleas Mantzios - cloud wrote: > Hi Again > > Up to this day we have set the data acquisition system running for just > one ship and writing the code to display the data. For less than 20 days > we have 6M rows. > > I gave a shot to timescale, installed locally as an extension, it seems > much prettier than having to do all the partition mgmt by hand or other > tools. However this seems more than a complete engine with its own > workers, so this seems like something new and big which seems to me like > something to commit to for a long time, something to invest, on top of > the already 25+ commitment we have with PostgreSQL itself. > > So this is serious decision, so ppl please share your stories with > timescale . > I don't use timescale, so this will not be about specifics. It seems to me you are well on the way to answering your own question with the choices you presented: a) '... it seems much prettier than having to do all the partition mgmt by hand or other tools.' b) 'However this seems more than a complete engine with its own workers, ...' Either you do the work to build your own solution or you leverage off other folks work. The final answer to that comes down to what fits your situation. Which solution is easier to implement with the resources you have available. Either one is going to end up being a long term commitment. -- Adrian Klaver adrian.klaver@aklaver.com
On 2/26/25 18:29, Adrian Klaver wrote: > On 2/26/25 01:27, Achilleas Mantzios - cloud wrote: >> Hi Again >> >> Up to this day we have set the data acquisition system running for >> just one ship and writing the code to display the data. For less than >> 20 days we have 6M rows. >> >> I gave a shot to timescale, installed locally as an extension, it >> seems much prettier than having to do all the partition mgmt by hand >> or other tools. However this seems more than a complete engine with >> its own workers, so this seems like something new and big which seems >> to me like something to commit to for a long time, something to >> invest, on top of the already 25+ commitment we have with PostgreSQL >> itself. >> >> So this is serious decision, so ppl please share your stories with >> timescale . >> > > I don't use timescale, so this will not be about specifics. It seems > to me you are well on the way to answering your own question with the > choices you presented: > > a) '... it seems much prettier than having to do all the partition > mgmt by hand or other tools.' > > b) 'However this seems more than a complete engine with its own > workers, ...' > > Either you do the work to build your own solution or you leverage off > other folks work. The final answer to that comes down to what fits > your situation. Which solution is easier to implement with the > resources you have available. Either one is going to end up being a > long term commitment. Thank you Adrian for all your companion and contribution in this thread! In haste I made some typos and maybe I was not well understood by potential readers. I mean we are a traditional PostgreSQL house for 25 years. I started this DB from scratch, and now the whole topology of postgresql servers (soon 200 in all 7 seas) has about than 60TB worth of data. Since day one, I have been compiling from source, the base postgres, the contrib, my own written functions, extra extensions. We have never relied on a commercial offering, official package, or docker image you name it. So now, it is the first time that I come across a situation where the package / extension in question is big, has somehow different doc style than the core postgres, I still cannot navigate myself into it, plus the concern: I know PostgreSQL will be here well after I retire, how about timescale? If they go out of business or no longer support newer postgresql versions, what do we do? Freeze the system for weeks, and move 100TB of data ? Employ some logical replication from timescale to native postgres somehow utilizing this new table "routing" rules that are available or will be available by the time? Hire some known PostgreSQL support company to do the job? Write my own data migration solution? That's why I am asking for user experiences on timescale. > >
On 2/27/25 09:05, Achilleas Mantzios - cloud wrote: > > On 2/26/25 18:29, Adrian Klaver wrote: >> On 2/26/25 01:27, Achilleas Mantzios - cloud wrote: >>> Hi Again >>> >>> Up to this day we have set the data acquisition system running for >>> just one ship and writing the code to display the data. For less >>> than 20 days we have 6M rows. >>> >>> I gave a shot to timescale, installed locally as an extension, it >>> seems much prettier than having to do all the partition mgmt by hand >>> or other tools. However this seems more than a complete engine with >>> its own workers, so this seems like something new and big which >>> seems to me like something to commit to for a long time, something >>> to invest, on top of the already 25+ commitment we have with >>> PostgreSQL itself. >>> >>> So this is serious decision, so ppl please share your stories with >>> timescale . >>> >> >> I don't use timescale, so this will not be about specifics. It seems >> to me you are well on the way to answering your own question with the >> choices you presented: >> >> a) '... it seems much prettier than having to do all the partition >> mgmt by hand or other tools.' >> >> b) 'However this seems more than a complete engine with its own >> workers, ...' >> >> Either you do the work to build your own solution or you leverage off >> other folks work. The final answer to that comes down to what fits >> your situation. Which solution is easier to implement with the >> resources you have available. Either one is going to end up being a >> long term commitment. > > Thank you Adrian for all your companion and contribution in this thread! > > In haste I made some typos and maybe I was not well understood by > potential readers. I mean we are a traditional PostgreSQL house for 25 > years. I started this DB from scratch, and now the whole topology of > postgresql servers (soon 200 in all 7 seas) has about than 60TB worth > of data. Since day one, I have been compiling from source, the base > postgres, the contrib, my own written functions, extra extensions. We > have never relied on a commercial offering, official package, or > docker image you name it. So now, it is the first time that I come > across a situation where the package / extension in question is big, > has somehow different doc style than the core postgres, I still cannot > navigate myself into it, plus the concern: I know PostgreSQL will be > here well after I retire, how about timescale? If they go out of > business or no longer support newer postgresql versions, what do we > do? Freeze the system for weeks, and move 100TB of data ? Employ some > logical replication from timescale to native postgres somehow > utilizing this new table "routing" rules that are available or will be > available by the time? Hire some known PostgreSQL support company to > do the job? Write my own data migration solution? > > That's why I am asking for user experiences on timescale. Or Tempo pg-timeseries . Or any other alternatives. All those companies were at Pgconf2024.eu , unfortunately at the time, this project was still inactive , I wish I had contacted them all. > >> >> > >