Thread: Unexpected query result

Unexpected query result

From
Begin Daniel
Date:
In order to process a large amount of data I need to run a procedure using parallel batch processes.  
The query I run in each process is expected to ...

1- select a bunch of id (500) in a table (wait4processing) containing the list of all records to process
2- remove selected records from wait4processing table in order to eliminate duplicate processing
3- run the procedure (build_contributions_clusters) over the range of selected ids 

--The query I use:
With ids as( delete from wait4processing where id in( select id from wait4processing limit 500)  returning id) 
select build_contributions_clusters(min(id),max(id)) from ids;

The query runs properly if I send it sequentially (wait for the completion of the query before sening it again) but it does'nt work when sending multiple occurrences in parallel.  Seems from the results I got that the first query received by the server runs properly but the following ones try to process the same first 500 records even if deleted - the build_contributions_clusters procedure receive NULL values instead of the following 500 records.

Since I am almost certain it is the expected behavior, I would like to like to understand why, and I would also appreciate to see alternative queries to do the job.

Thanks :-)

Re: Unexpected query result

From
Adrian Klaver
Date:
On 10/05/2015 05:02 AM, Begin Daniel wrote:
> In order to process a large amount of data I need to run a procedure
> using parallel batch processes.
> The query I run in each process is expected to ...
>
> 1- select a bunch of id (500) in a table (wait4processing) containing
> the list of all records to process
> 2- remove selected records from wait4processing table in order to
> eliminate duplicate processing
> 3- run the procedure (build_contributions_clusters) over the range of
> selected ids
>
> --The query I use:
> With ids as( delete from wait4processing where id in( select id from
> wait4processing limit 500)  returning id)
> select build_contributions_clusters(min(id),max(id)) from ids;
>
> The query runs properly if I send it sequentially (wait for the
> completion of the query before sening it again) but it does'nt work
> when sending multiple occurrences in parallel. Seems from the results I
> got that the first query received by the server runs properly but the
> following ones try to process the same first 500 records even if deleted
> - the build_contributions_clusters procedure receive NULL values instead
> of the following 500 records.
>
> Since I am almost certain it is the expected behavior, I would like to
> like to understand why, and I would also appreciate to
> see alternative queries to do the job.

See here:

http://www.postgresql.org/docs/9.4/interactive/transaction-iso.html

"13.2.1. Read Committed Isolation Level

Read Committed is the default isolation level in PostgreSQL. When a
transaction uses this isolation level, a SELECT query (without a FOR
UPDATE/SHARE clause) sees only data committed before the query began; it
never sees either uncommitted data or changes committed during query
execution by concurrent transactions. In effect, a SELECT query sees a
snapshot of the database as of the instant the query begins to run.
However, SELECT does see the effects of previous updates executed within
its own transaction, even though they are not yet committed. Also note
that two successive SELECT commands can see different data, even though
they are within a single transaction, if other transactions commit
changes after the first SELECT starts and before the second SELECT starts."

>
> Thanks :-)


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Unexpected query result

From
Begin Daniel
Date:
Thank Adrian, it makes sense. I'll adapt the calling procedures 
Daniel

> Subject: Re: [GENERAL] Unexpected query result
> To: jfd553@hotmail.com; pgsql-general@postgresql.org
> From: adrian.klaver@aklaver.com
> Date: Mon, 5 Oct 2015 06:17:33 -0700
>
> On 10/05/2015 05:02 AM, Begin Daniel wrote:
> > In order to process a large amount of data I need to run a procedure
> > using parallel batch processes.
> > The query I run in each process is expected to ...
> >
> > 1- select a bunch of id (500) in a table (wait4processing) containing
> > the list of all records to process
> > 2- remove selected records from wait4processing table in order to
> > eliminate duplicate processing
> > 3- run the procedure (build_contributions_clusters) over the range of
> > selected ids
> >
> > --The query I use:
> > With ids as( delete from wait4processing where id in( select id from
> > wait4processing limit 500) returning id)
> > select build_contributions_clusters(min(id),max(id)) from ids;
> >
> > The query runs properly if I send it sequentially (wait for the
> > completion of the query before sening it again) but it does'nt work
> > when sending multiple occurrences in parallel. Seems from the results I
> > got that the first query received by the server runs properly but the
> > following ones try to process the same first 500 records even if deleted
> > - the build_contributions_clusters procedure receive NULL values instead
> > of the following 500 records.
> >
> > Since I am almost certain it is the expected behavior, I would like to
> > like to understand why, and I would also appreciate to
> > see alternative queries to do the job.
>
> See here:
>
> http://www.postgresql.org/docs/9.4/interactive/transaction-iso.html
>
> "13.2.1. Read Committed Isolation Level
>
> Read Committed is the default isolation level in PostgreSQL. When a
> transaction uses this isolation level, a SELECT query (without a FOR
> UPDATE/SHARE clause) sees only data committed before the query began; it
> never sees either uncommitted data or changes committed during query
> execution by concurrent transactions. In effect, a SELECT query sees a
> snapshot of the database as of the instant the query begins to run.
> However, SELECT does see the effects of previous updates executed within
> its own transaction, even though they are not yet committed. Also note
> that two successive SELECT commands can see different data, even though
> they are within a single transaction, if other transactions commit
> changes after the first SELECT starts and before the second SELECT starts."
>
> >
> > Thanks :-)
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Unexpected query result

From
dinesh kumar
Date:
Hi,

On Mon, Oct 5, 2015 at 5:02 AM, Begin Daniel <jfd553@hotmail.com> wrote:
In order to process a large amount of data I need to run a procedure using parallel batch processes.  
The query I run in each process is expected to ...


It seems, you are trying to achieve the same, what we did. Find my blog entry, which may help you in this scenarios.

1- select a bunch of id (500) in a table (wait4processing 
) containing the list of all records to process
2- remove selected records from wait4processing table in order to eliminate duplicate processing
3- run the procedure (build_contributions_clusters) over the range of selected ids 

--The query I use:
With ids as( delete from wait4processing where id in( select id from wait4processing limit 500)  returning id) 
select build_contributions_clusters(min(id),max(id)) from ids;

The query runs properly if I send it sequentially (wait for the completion of the query before sening it again) but it does'nt work when sending multiple occurrences in parallel.  Seems from the results I got that the first query received by the server runs properly but the following ones try to process the same first 500 records even if deleted - the build_contributions_clusters procedure receive NULL values instead of the following 500 records.

Since I am almost certain it is the expected behavior, I would like to like to understand why, and I would also appreciate to see alternative queries to do the job.

Thanks :-)



--