Thread: Re: Pulling data from Postgres DB table for every 5 seconds.

Re: Pulling data from Postgres DB table for every 5 seconds.

From
github kran
Date:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran

Re: Pulling data from Postgres DB table for every 5 seconds.

From
Rob Sargent
Date:


On Jan 9, 2019, at 10:02 AM, github kran <githubkran@gmail.com> wrote:


Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran

There is DML event trapping.  You don’t poll every 5seconds you react immediately to each event (with trigger or event).  From the trigger perspective you probably have everything you need to update IOT with addition searching.

Re: Pulling data from Postgres DB table for every 5 seconds.

From
Ron
Date:
On 1/9/19 11:02 AM, github kran wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB

"the data".  All 1GB every 5 seconds?

( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Or just a tiny subset every 5 seconds?


Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran

--
Angular momentum makes the world go 'round.

Re: Pulling data from Postgres DB table for every 5 seconds.

From
github kran
Date:
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.

Ron- Its a tiny subset of 1 GB Data for every 5 seconds but not on the entire data.


Thanks !!.

On Wed, Jan 9, 2019 at 11:10 AM Rob Sargent <robjsargent@gmail.com> wrote:


On Jan 9, 2019, at 10:02 AM, github kran <githubkran@gmail.com> wrote:


Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran

There is DML event trapping.  You don’t poll every 5seconds you react immediately to each event (with trigger or event).  From the trigger perspective you probably have everything you need to update IOT with addition searching.

Re: Pulling data from Postgres DB table for every 5 seconds.

From
Mark Fletcher
Date:
On Wed, Jan 9, 2019 at 9:02 AM github kran <githubkran@gmail.com> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 


It's unclear whether you want to do table scans or if you're just looking for changes to the database. If you're looking just for changes, consider implementing something using logical replication. We have a logical replication system set up to stream changes from the database into an elastic search cluster, and it works great.

Mark

Re: Pulling data from Postgres DB table for every 5 seconds.

From
Rob Sargent
Date:


On 1/9/19 10:21 AM, github kran wrote:
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.


Ah, right you are.  Are triggers off the table?  You would want to write the trigger function in some (trusted?) language with access to the outside

Re: Pulling data from Postgres DB table for every 5 seconds.

From
github kran
Date:
Mark - We are currently on 9.6 version of postgres and cant use this feature of logical replication.Answering to your question we are looking for any changes in the data related to a specific table ( changes like any update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product entity). 
Any other alternatives ?.

On Wed, Jan 9, 2019 at 11:24 AM Mark Fletcher <markf@corp.groups.io> wrote:
On Wed, Jan 9, 2019 at 9:02 AM github kran <githubkran@gmail.com> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 


It's unclear whether you want to do table scans or if you're just looking for changes to the database. If you're looking just for changes, consider implementing something using logical replication. We have a logical replication system set up to stream changes from the database into an elastic search cluster, and it works great.

Mark

Re: Pulling data from Postgres DB table for every 5 seconds.

From
github kran
Date:
Rob -   It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.  

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <robjsargent@gmail.com> wrote:


On 1/9/19 10:21 AM, github kran wrote:
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.


Ah, right you are.  Are triggers off the table?  You would want to write the trigger function in some (trusted?) language with access to the outside

Re: Pulling data from Postgres DB table for every 5 seconds.

From
Rob Sargent
Date:


On Jan 9, 2019, at 11:11 AM, github kran <githubkran@gmail.com> wrote:

Rob -   It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.  

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <robjsargent@gmail.com> wrote:


On 1/9/19 10:21 AM, github kran wrote:
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.


Ah, right you are.  Are triggers off the table?  You would want to write the trigger function in some (trusted?) language with access to the outside

(Custom here is to “bottom post”)

Have you tried triggers and found them to have too much impact on total system?  I can’t see them being more expensive than looking for changes every 5 seconds.  If your hardware can scan 1T that quickly then I suspect your trigger will not be noticed.  I would have the trigger write to queue and have something else using the queue to talk to IOT piece.

Failing that, perhaps your java (server-side?) app making the changes can be taught to emit the necessary details to IOT-thingy?

Re: Pulling data from Postgres DB table for every 5 seconds.

From
Mark Fletcher
Date:
On Wed, Jan 9, 2019 at 10:10 AM github kran <githubkran@gmail.com> wrote:
Mark - We are currently on 9.6 version of postgres and cant use this feature of logical replication.Answering to your question we are looking for any changes in the data related to a specific table ( changes like any update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product entity). 
Any other alternatives ?.

The feature was added in 9.4 (I think). We are on 9.6 and it works great. Not sure about RDS Aurora, however.

Mark 

Re: Pulling data from Postgres DB table for every 5 seconds.

From
github kran
Date:



On Wed, Jan 9, 2019 at 12:26 PM Rob Sargent <robjsargent@gmail.com> wrote:


On Jan 9, 2019, at 11:11 AM, github kran <githubkran@gmail.com> wrote:

Rob -   It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.  

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <robjsargent@gmail.com> wrote:


On 1/9/19 10:21 AM, github kran wrote:
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.


Ah, right you are.  Are triggers off the table?  You would want to write the trigger function in some (trusted?) language with access to the outside

(Custom here is to “bottom post”)

Have you tried triggers and found them to have too much impact on total system?  I can’t see them being more expensive than looking for changes every 5 seconds.  If your hardware can scan 1T that quickly then I suspect your trigger will not be noticed.  I would have the trigger write to queue and have something else using the queue to talk to IOT piece.

Failing that, perhaps your java (server-side?) app making the changes can be taught to emit the necessary details to IOT-thingy?

Sure will try with a trigger and send data to a Queue and gather data for every 5 seconds reading off from the queue and send to IOT topic. Already we have a queue where every message inserted or updated on the table is sent to a queue but that is lot of data we are gathering. We want to rather minimize collecting data from DB.
 

Re: Pulling data from Postgres DB table for every 5 seconds.

From
github kran
Date:


On Wed, Jan 9, 2019 at 12:36 PM Mark Fletcher <markf@corp.groups.io> wrote:
On Wed, Jan 9, 2019 at 10:10 AM github kran <githubkran@gmail.com> wrote:
Mark - We are currently on 9.6 version of postgres and cant use this feature of logical replication.Answering to your question we are looking for any changes in the data related to a specific table ( changes like any update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product entity). 
Any other alternatives ?.

The feature was added in 9.4 (I think). We are on 9.6 and it works great. Not sure about RDS Aurora, however.

Mark 

Mark - just curious to know on the logical replication. Do you think I can use it for my use case where i need to publish data to a subscriber when there is a change in the data updated for a row or any new inserts happening on the table. Intention
is to send this data in Json format by collecting this modified data in real time to a subscriber.

Tahanks
Kran
 

Re: Pulling data from Postgres DB table for every 5 seconds.

From
Mark Fletcher
Date:
On Wed, Jan 9, 2019 at 12:58 PM github kran <githubkran@gmail.com> wrote:

Mark - just curious to know on the logical replication. Do you think I can use it for my use case where i need to publish data to a subscriber when there is a change in the data updated for a row or any new inserts happening on the table. Intention
is to send this data in Json format by collecting this modified data in real time to a subscriber.

From what you've said, it's a great use case for that feature. The one thing to note is that you will have to code up a logical replication client. If I can do it, pretty much anyone can, but it might take some time to get things right. I wrote about some of what I found when developing our client a year ago here: https://wingedpig.com/2017/09/20/streaming-postgres-changes/

We ended up just using the included test output plugin that comes with the postgresql distribution. And we didn't end up streaming to Kafka or anything else first. We just take the data and insert it into our elasticsearch cluster directly as we get it.

Mark

Re: Pulling data from Postgres DB table for every 5 seconds.

From
Martín Marqués
Date:
El 9/1/19 a las 20:22, Mark Fletcher escribió:
> On Wed, Jan 9, 2019 at 12:58 PM github kran <githubkran@gmail.com
> <mailto:githubkran@gmail.com>> wrote:
> 
> 
>     Mark - just curious to know on the logical replication. Do you think
>     I can use it for my use case where i need to publish data to a
>     subscriber when there is a change in the data updated for a row or
>     any new inserts happening on the table. Intention
>     is to send this data in Json format by collecting this modified data
>     in real time to a subscriber.
> 
> From what you've said, it's a great use case for that feature. The one
> thing to note is that you will have to code up a logical replication
> client. If I can do it, pretty much anyone can, but it might take some
> time to get things right. I wrote about some of what I found when
> developing our client a year ago
> here: https://wingedpig.com/2017/09/20/streaming-postgres-changes/
> 
> We ended up just using the included test output plugin that comes with
> the postgresql distribution. And we didn't end up streaming to Kafka or
> anything else first. We just take the data and insert it into our
> elasticsearch cluster directly as we get it.

I realy doubt that would work. Aurora doesn't have WALs, so how would
you be able to decode the transactions?

AFAIU, you can't use logical decoding on Aurora. Maybe you should be
asking at the Aurora support channel.

Regards,

-- 
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: Pulling data from Postgres DB table for every 5 seconds.

From
Tony Shelver
Date:
I am not familiar with Aurora, but..


Set up a message queue in Postgres, which calls into AMPQ (RabbitMQ) to send a message for consumption by one or more clients.

This provides a function that can be called from a trigger to send the message.

After that, you have all the goodness of a standards-based open source MQ platform to distribute your data / notifications.



On Wed, 9 Jan 2019 at 19:02, github kran <githubkran@gmail.com> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran