Thread: Re: Pulling data from Postgres DB table for every 5 seconds.
Hi Postgres Team,I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around 1 million rows and table size of 1 GB.We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existingproduct information.).This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients.UseCaseWe dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.Questions1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.Any ideas would be helpful.Thanks !!GithubKran
On Jan 9, 2019, at 10:02 AM, github kran <githubkran@gmail.com> wrote:
Hi Postgres Team,I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around 1 million rows and table size of 1 GB.We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existingproduct information.).This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients.UseCaseWe dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.Questions1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.Any ideas would be helpful.Thanks !!GithubKran
There is DML event trapping. You don’t poll every 5seconds you react immediately to each event (with trigger or event). From the trigger perspective you probably have everything you need to update IOT with addition searching.
On 1/9/19 11:02 AM, github kran wrote:
"the data". All 1GB every 5 seconds?
Or just a tiny subset every 5 seconds?
Hi Postgres Team,I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around 1 million rows and table size of 1 GB.We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB
"the data". All 1GB every 5 seconds?
( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existingproduct information.).This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients.UseCaseWe dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.
Or just a tiny subset every 5 seconds?
Questions1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.Any ideas would be helpful.Thanks !!GithubKran
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.
1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_start, ddl_command_end, table_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.
Ron- Its a tiny subset of 1 GB Data for every 5 seconds but not on the entire data.
On Wed, Jan 9, 2019 at 11:10 AM Rob Sargent <robjsargent@gmail.com> wrote:
On Jan 9, 2019, at 10:02 AM, github kran <githubkran@gmail.com> wrote:
Hi Postgres Team,I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around 1 million rows and table size of 1 GB.We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existingproduct information.).This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients.UseCaseWe dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.Questions1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.Any ideas would be helpful.Thanks !!GithubKranThere is DML event trapping. You don’t poll every 5seconds you react immediately to each event (with trigger or event). From the trigger perspective you probably have everything you need to update IOT with addition searching.
On Wed, Jan 9, 2019 at 9:02 AM github kran <githubkran@gmail.com> wrote:
Hi Postgres Team,I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around 1 million rows and table size of 1 GB.We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existingproduct information.).
It's unclear whether you want to do table scans or if you're just looking for changes to the database. If you're looking just for changes, consider implementing something using logical replication. We have a logical replication system set up to stream changes from the database into an elastic search cluster, and it works great.
Mark
On 1/9/19 10:21 AM, github kran wrote:
Ah, right you are. Are triggers off the table? You would want to write the trigger function in some (trusted?) language with access to the outsideThanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_start, ddl_command_end, table_rewrite and sql_drop. Support for additional events may be added in future releases.).2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.
Mark - We are currently on 9.6 version of postgres and cant use this feature of logical replication.Answering to your question we are looking for any changes in the data related to a specific table ( changes like any update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product entity).
Any other alternatives ?.
On Wed, Jan 9, 2019 at 11:24 AM Mark Fletcher <markf@corp.groups.io> wrote:
On Wed, Jan 9, 2019 at 9:02 AM github kran <githubkran@gmail.com> wrote:
Hi Postgres Team,I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around 1 million rows and table size of 1 GB.We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existingproduct information.).It's unclear whether you want to do table scans or if you're just looking for changes to the database. If you're looking just for changes, consider implementing something using logical replication. We have a logical replication system set up to stream changes from the database into an elastic search cluster, and it works great.Mark
Rob - It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.
On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 1/9/19 10:21 AM, github kran wrote:Ah, right you are. Are triggers off the table? You would want to write the trigger function in some (trusted?) language with access to the outsideThanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_start, ddl_command_end, table_rewrite and sql_drop. Support for additional events may be added in future releases.).2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.
On Jan 9, 2019, at 11:11 AM, github kran <githubkran@gmail.com> wrote:Rob - It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 1/9/19 10:21 AM, github kran wrote:Ah, right you are. Are triggers off the table? You would want to write the trigger function in some (trusted?) language with access to the outsideThanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_start, ddl_command_end, table_rewrite and sql_drop. Support for additional events may be added in future releases.).2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.
(Custom here is to “bottom post”)
Have you tried triggers and found them to have too much impact on total system? I can’t see them being more expensive than looking for changes every 5 seconds. If your hardware can scan 1T that quickly then I suspect your trigger will not be noticed. I would have the trigger write to queue and have something else using the queue to talk to IOT piece.
Failing that, perhaps your java (server-side?) app making the changes can be taught to emit the necessary details to IOT-thingy?
On Wed, Jan 9, 2019 at 10:10 AM github kran <githubkran@gmail.com> wrote:
Mark - We are currently on 9.6 version of postgres and cant use this feature of logical replication.Answering to your question we are looking for any changes in the data related to a specific table ( changes like any update on a timestamp fieldOR any new inserts happened in the last 5 seconds for a specific product entity).Any other alternatives ?.
The feature was added in 9.4 (I think). We are on 9.6 and it works great. Not sure about RDS Aurora, however.
Mark
On Wed, Jan 9, 2019 at 12:26 PM Rob Sargent <robjsargent@gmail.com> wrote:
On Jan 9, 2019, at 11:11 AM, github kran <githubkran@gmail.com> wrote:Rob - It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 1/9/19 10:21 AM, github kran wrote:Ah, right you are. Are triggers off the table? You would want to write the trigger function in some (trusted?) language with access to the outsideThanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_start, ddl_command_end, table_rewrite and sql_drop. Support for additional events may be added in future releases.).2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.(Custom here is to “bottom post”)Have you tried triggers and found them to have too much impact on total system? I can’t see them being more expensive than looking for changes every 5 seconds. If your hardware can scan 1T that quickly then I suspect your trigger will not be noticed. I would have the trigger write to queue and have something else using the queue to talk to IOT piece.Failing that, perhaps your java (server-side?) app making the changes can be taught to emit the necessary details to IOT-thingy?
Sure will try with a trigger and send data to a Queue and gather data for every 5 seconds reading off from the queue and send to IOT topic. Already we have a queue where every message inserted or updated on the table is sent to a queue but that is lot of data we are gathering. We want to rather minimize collecting data from DB.
On Wed, Jan 9, 2019 at 12:36 PM Mark Fletcher <markf@corp.groups.io> wrote:
On Wed, Jan 9, 2019 at 10:10 AM github kran <githubkran@gmail.com> wrote:Mark - We are currently on 9.6 version of postgres and cant use this feature of logical replication.Answering to your question we are looking for any changes in the data related to a specific table ( changes like any update on a timestamp fieldOR any new inserts happened in the last 5 seconds for a specific product entity).Any other alternatives ?.The feature was added in 9.4 (I think). We are on 9.6 and it works great. Not sure about RDS Aurora, however.Mark
Mark - just curious to know on the logical replication. Do you think I can use it for my use case where i need to publish data to a subscriber when there is a change in the data updated for a row or any new inserts happening on the table. Intention
is to send this data in Json format by collecting this modified data in real time to a subscriber.
Tahanks
Kran
On Wed, Jan 9, 2019 at 12:58 PM github kran <githubkran@gmail.com> wrote:
Mark - just curious to know on the logical replication. Do you think I can use it for my use case where i need to publish data to a subscriber when there is a change in the data updated for a row or any new inserts happening on the table. Intentionis to send this data in Json format by collecting this modified data in real time to a subscriber.
From what you've said, it's a great use case for that feature. The one thing to note is that you will have to code up a logical replication client. If I can do it, pretty much anyone can, but it might take some time to get things right. I wrote about some of what I found when developing our client a year ago here: https://wingedpig.com/2017/09/20/streaming-postgres-changes/
We ended up just using the included test output plugin that comes with the postgresql distribution. And we didn't end up streaming to Kafka or anything else first. We just take the data and insert it into our elasticsearch cluster directly as we get it.
Mark
El 9/1/19 a las 20:22, Mark Fletcher escribió: > On Wed, Jan 9, 2019 at 12:58 PM github kran <githubkran@gmail.com > <mailto:githubkran@gmail.com>> wrote: > > > Mark - just curious to know on the logical replication. Do you think > I can use it for my use case where i need to publish data to a > subscriber when there is a change in the data updated for a row or > any new inserts happening on the table. Intention > is to send this data in Json format by collecting this modified data > in real time to a subscriber. > > From what you've said, it's a great use case for that feature. The one > thing to note is that you will have to code up a logical replication > client. If I can do it, pretty much anyone can, but it might take some > time to get things right. I wrote about some of what I found when > developing our client a year ago > here: https://wingedpig.com/2017/09/20/streaming-postgres-changes/ > > We ended up just using the included test output plugin that comes with > the postgresql distribution. And we didn't end up streaming to Kafka or > anything else first. We just take the data and insert it into our > elasticsearch cluster directly as we get it. I realy doubt that would work. Aurora doesn't have WALs, so how would you be able to decode the transactions? AFAIU, you can't use logical decoding on Aurora. Maybe you should be asking at the Aurora support channel. Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I am not familiar with Aurora, but..
What something like https://github.com/subzerocloud/pg-amqp-bridge?
Set up a message queue in Postgres, which calls into AMPQ (RabbitMQ) to send a message for consumption by one or more clients.
This provides a function that can be called from a trigger to send the message.
After that, you have all the goodness of a standards-based open source MQ platform to distribute your data / notifications.
On Wed, 9 Jan 2019 at 19:02, github kran <githubkran@gmail.com> wrote:
Hi Postgres Team,I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around 1 million rows and table size of 1 GB.We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existingproduct information.).This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients.UseCaseWe dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.Questions1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.Any ideas would be helpful.Thanks !!GithubKran