Thread: Subselect left join / not exists()

Subselect left join / not exists()

From
Desmond Coertzen
Date:
Hi there,


I'm not convinced the problem is fully exposed.

I also followed Tom's advice by adding a is-not-null check to my clause in the not exists() section of my query. It did not fully solve my problem of returning no rows or bogus rows. I had to remove an aggregate max() and replace it with an order by desc limit 1, and I also had to relocate a date cast.

On Postgres 8.4.22.

The query references the same table 3 times:
In the main "from",
In a subselect from the column perspective of the above "from",
And in a subselect again of the above subselect in a not-exists test.

The first form of the query looked like:

select lots, of, stuff,
(select max(ls2.fiscal_ts)::date
 from long_story ls2
 where ls2.contract_id = ls.contract_id and ls2.tr_value > 0 and sp_tr_is_cash(ls2.primary_key_id)
 and not exists(select * from long_story ls2r where ls2r.reverse_of_pk_id = ls2.primary_key_id)
) as last_cash_tr_ts
from long_story ls
where ls.create_ts >= current_date and ls.tr_type_id = 4;

The subselect columm "last_cash_tr_ts" produces null or bogus result.

I tried wrapping the subselect in a stored function but came up with the same broken result. I tried a left join on the reversal self-referencing PK with a null test as a substitute for not exists() but that also produced weird results. The only way I could produce the correct results was by rewriting the subselect like this:

select lots, of, stuff,
(select ls2.fiscal_ts
 from long_story ls2
 where ls2.contract_id = ls.contract_id and ls2.tr_value > 0 and sp_tr_is_cash(ls2.primary_key_id)
 and not exists(select * from long_story ls2r where ls2r.reverse_of_pk_id is not null and ls2r.reverse_of_pk_id = ls2.primary_key_id)
 order by 1 desc
 limit 1
)::date as last_cash_tr_ts
from long_story ls
where ls.create_ts >= current_date and ls.tr_type_id = 4;

Including the not-null test in the subselect of the first query that produced weird results did not solve the problem.
Notice the ::date cast outside of the subselect. If I include within the subselect, I also get the same weird results.

Note that long_story.reverse_of_pk_id is a foreign key to itself, long_story.primary_key_id and the FK column is nullable.

It's not the best DB design but the query without the null test and the max aggregate should have worked. I am convinced there must be a bug exposed when doing nested sub queries on the same table and the bug may show itself the deeper you stack - stack meaning nested subselect on the same table. I am also convinced that I am completely insane and may be missing something very obvious like a noob.

Any help/comment highly appreciated in advance.

Re: Subselect left join / not exists()

From
"David G. Johnston"
Date:
On Fri, Feb 26, 2016 at 4:17 AM, Desmond Coertzen <patrolliekaptein@gmail.com> wrote:

It's not the best DB design but the query without the null test and the max aggregate should have worked. I am convinced there must be a bug exposed when doing nested sub queries on the same table and the bug may show itself the deeper you stack - stack meaning nested subselect on the same table. I am also convinced that I am completely insane and may be missing something very obvious like a noob.

Any help/comment highly appreciated in advance.

If you deign to provide a self-contained test case showing where the non-aggregated query provides bogus results while the aggregated and limited one does not we would be most greatful since we could then test whether what you are seeing exists in a release of PostgreSQL that is currently supported.  And if the behavior is correct we would have concrete values that could be used the in the explanation of said behavior.  

Don't expect us to be able to upgrade the quality of the discussion: If the best you can give us is phrases like "weird" and "bogus" to describe what you are seeing, and no explicit schema definitions, then they best I can say is that while this looks odd it is likely explainable and a direct function of the fact that "it's not the best DB design" and that because of such there are data anomalies that are potentially coming into play here.

Or its a bug - potentially one that has been fixed.

​David J.


Re: Subselect left join / not exists()

From
Tom Lane
Date:
Desmond Coertzen <patrolliekaptein@gmail.com> writes:
> On Postgres 8.4.22.

You realize of course that 8.4.x has been out of support for more than
a year ...

> The first form of the query looked like:

> select lots, of, stuff,
> (select max(ls2.fiscal_ts)::date
>  from long_story ls2
>  where ls2.contract_id = ls.contract_id and ls2.tr_value > 0 and
> sp_tr_is_cash(ls2.primary_key_id)
>  and not exists(select * from long_story ls2r where ls2r.reverse_of_pk_id =
> ls2.primary_key_id)
> ) as last_cash_tr_ts
> from long_story ls
> where ls.create_ts >= current_date and ls.tr_type_id = 4;

> The subselect columm "last_cash_tr_ts" produces null or bogus result.

You haven't provided nearly enough detail for anyone to judge whether
this is actually a bug or just your wrong expectation of what should
happen.  If you'd like people to look into it, please provide a
self-contained test case: not only the query but table definitions
and sample data.  (Ideally, a SQL script that reproduces the problem
starting from an empty database would make it easy for people to test.
We're not likely to take the time to try to reverse-engineer context
from an incomplete bug report.)

If it is a bug, it will not get fixed in 8.4.x anyway, because there
will never be any more 8.4.x releases.  However, if the bug still exists
in newer release branches, we'd definitely endeavor to fix it there.
        regards, tom lane



Re: Subselect left join / not exists()

From
Desmond Coertzen
Date:
I don't use the terms "bogus" and "weird" lightly.

A self contained test case is difficult to produce. I already built a script that creates a DB, three tables and test data, and then exercised the two forms of the the sub select. As expected, the test case does not provoke the behaviour whitnessed. Other than providing the entire DB dump to recreate the exact conditions that provoke this behaviour, I don't see how I can provide a self contained test case. The real table representing "long_story" in my report contains over 10.5 million rows and the behaviour in the sub select was not there before today. Possibly as my data collection grew, I may have stumbled over a problem.

I will try anyway by inserting more rows to try and provoke the behaviour. I will continue my answer on Tom's reply.


On Fri, Feb 26, 2016 at 4:53 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Feb 26, 2016 at 4:17 AM, Desmond Coertzen <patrolliekaptein@gmail.com> wrote:

It's not the best DB design but the query without the null test and the max aggregate should have worked. I am convinced there must be a bug exposed when doing nested sub queries on the same table and the bug may show itself the deeper you stack - stack meaning nested subselect on the same table. I am also convinced that I am completely insane and may be missing something very obvious like a noob.

Any help/comment highly appreciated in advance.

If you deign to provide a self-contained test case showing where the non-aggregated query provides bogus results while the aggregated and limited one does not we would be most greatful since we could then test whether what you are seeing exists in a release of PostgreSQL that is currently supported.  And if the behavior is correct we would have concrete values that could be used the in the explanation of said behavior.  

Don't expect us to be able to upgrade the quality of the discussion: If the best you can give us is phrases like "weird" and "bogus" to describe what you are seeing, and no explicit schema definitions, then they best I can say is that while this looks odd it is likely explainable and a direct function of the fact that "it's not the best DB design" and that because of such there are data anomalies that are potentially coming into play here.

Or its a bug - potentially one that has been fixed.

​David J.



Re: Subselect left join / not exists()

From
Rob Sargent
Date:


On 02/26/2016 12:54 PM, Desmond Coertzen wrote:
I don't use the terms "bogus" and "weird" lightly.

A self contained test case is difficult to produce. I already built a script that creates a DB, three tables and test data, and then exercised the two forms of the the sub select. As expected, the test case does not provoke the behaviour whitnessed. Other than providing the entire DB dump to recreate the exact conditions that provoke this behaviour, I don't see how I can provide a self contained test case. The real table representing "long_story" in my report contains over 10.5 million rows and the behaviour in the sub select was not there before today. Possibly as my data collection grew, I may have stumbled over a problem.

I will try anyway by inserting more rows to try and provoke the behaviour. I will continue my answer on Tom's reply.


On Fri, Feb 26, 2016 at 4:53 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Feb 26, 2016 at 4:17 AM, Desmond Coertzen <patrolliekaptein@gmail.com> wrote:

It's not the best DB design but the query without the null test and the max aggregate should have worked. I am convinced there must be a bug exposed when doing nested sub queries on the same table and the bug may show itself the deeper you stack - stack meaning nested subselect on the same table. I am also convinced that I am completely insane and may be missing something very obvious like a noob.

Any help/comment highly appreciated in advance.

If you deign to provide a self-contained test case showing where the non-aggregated query provides bogus results while the aggregated and limited one does not we would be most greatful since we could then test whether what you are seeing exists in a release of PostgreSQL that is currently supported.  And if the behavior is correct we would have concrete values that could be used the in the explanation of said behavior.  

Don't expect us to be able to upgrade the quality of the discussion: If the best you can give us is phrases like "weird" and "bogus" to describe what you are seeing, and no explicit schema definitions, then they best I can say is that while this looks odd it is likely explainable and a direct function of the fact that "it's not the best DB design" and that because of such there are data anomalies that are potentially coming into play here.

Or its a bug - potentially one that has been fixed.

​ David J.



The real schema and sql used might get people started

Re: Subselect left join / not exists()

From
Desmond Coertzen
Date:
Hi Tom,

Yes I did not provide nearly enough information. I was cheating in hope for a quick answer of something anyone may have encountered before me of the same nature.

I have been unable to reproduce the effect in an isolated test case, only on my live production setup. I came a bit closer to what the problem may be.

I did not see this behaviour on 8.4.22 until I started with partial indexing on a large table. A typical index was:

create index indx_lp_contract_iscash_true on loan_Payments (ContractKey, sp_payment_iscash(DKey)) where sp_payment_iscash(DKey) = true;

I know you would need more info, but please bare with me. When I dropped this type of index from this table, the broken effect went away and I got healthy results from my sub selects.

I started building a test system based on Postgres 9.3.11. I took the SQL dumps from my 8.4.22 setup and started restoring it on the 9.3.11 setup. In the log, I started seeing this during restore:

ERROR:  could not open relation with OID 36212
CONTEXT:  SQL statement "SELECT exists(select * from loan_Payments lp
                left join loan_payment_detail_nupay lpdn on lpdn.loan_payment_id = lp.DKey
                left join loan_payment_detail_bank_deposit lpdbd on lpdbd.loan_payment_id = lp.DKey
                left join loan_payment_detail_mctdebit lpdmct on lpdmct.loan_payment_id = lp.DKey
                left join loan_payment_detail_cashbook lpdcb on lpdcb.loan_payment_id = lp.DKey
                where lp.DKey = apaymentid and (lp.Type = 0 or lpdn.loan_payment_id = apaymentid or lpdbd.loan_payment_id = apaymentid or lpdmct.loan_payment_id = apaymentid or lpdcb.loan_payment_id = apaymentid) and lp.Payment <> 0
               )"
PL/pgSQL function sp_payment_iscash(bigint) line 3 at RETURN
STATEMENT:  CREATE INDEX indx_lp_iscash_true ON loan_payments USING btree (sp_payment_iscash((dkey)::bigint)) WHERE (sp_payment_iscash((dkey)::bigint) = true);

This log message in 9.3.11 put me on the path to drop all the partial index referencing my boolean function sp_payment_iscash on the 8.4.22 live setup. This returned sanity to my reports.

I am working on my test case to try to invoke the behaviour on both 8.4.22 and 9.3.11.

I have an idea the problem may be that the function accepts int8 as a parameter while the table is of int4 primary key type, where the primary key of the table is passed to the function during partial index. More testing and info will follow.

Regards.



On Fri, Feb 26, 2016 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Desmond Coertzen <patrolliekaptein@gmail.com> writes:
> On Postgres 8.4.22.

You realize of course that 8.4.x has been out of support for more than
a year ...

> The first form of the query looked like:

> select lots, of, stuff,
> (select max(ls2.fiscal_ts)::date
>  from long_story ls2
>  where ls2.contract_id = ls.contract_id and ls2.tr_value > 0 and
> sp_tr_is_cash(ls2.primary_key_id)
>  and not exists(select * from long_story ls2r where ls2r.reverse_of_pk_id =
> ls2.primary_key_id)
> ) as last_cash_tr_ts
> from long_story ls
> where ls.create_ts >= current_date and ls.tr_type_id = 4;

> The subselect columm "last_cash_tr_ts" produces null or bogus result.

You haven't provided nearly enough detail for anyone to judge whether
this is actually a bug or just your wrong expectation of what should
happen.  If you'd like people to look into it, please provide a
self-contained test case: not only the query but table definitions
and sample data.  (Ideally, a SQL script that reproduces the problem
starting from an empty database would make it easy for people to test.
We're not likely to take the time to try to reverse-engineer context
from an incomplete bug report.)

If it is a bug, it will not get fixed in 8.4.x anyway, because there
will never be any more 8.4.x releases.  However, if the bug still exists
in newer release branches, we'd definitely endeavor to fix it there.

                        regards, tom lane

Re: Subselect left join / not exists()

From
Desmond Coertzen
Date:
Recreating the stored function with an int4 parameter instead of int8 and trying to create the partial index does not change anything on 9.3.11

I cannot create this index on 9.3.11. I tried to recreate the index on 9.3.11 after my restore of my live setup from 8.4.22.

New detail in the output this time:
ERROR:  could not read block 0 in file "base/28654/39611": read only 0 of 8192 bytes

I checked my file system and disk space. No problems. The file in pg_data dir referenced in the error message does not exist. I ran vacuum analyze verboze on the 5 tables referenced by the stored function. I did not see any errors. I tried the index creation again. It failed with the same message.

Worse, even though the attempt to create the index failed and the command returned with a failed result immediately, the back end process running index creation is still running. I can see it on pg_stat_activity. It is not really running though. ps -ef shows "postgres: desmondc micro 10.0.3.169(37339) idle" and no disk io etc on the test system.

Lets forget what I see on 8.4.22. I'm failing on 9.3.11 on a test system to create a partial index and it seems the behaviour I'm getting is close to sigsegv that propagates corruption as far as pg_stat_activity. A bad exit on a routine or something.

I'm attaching DDL for 5 tables involved and the actual DDL of the stored function I'm trying to index. The index I'm trying to create is:
create index indx_lp_iscash on loan_Payments (sp_payment_iscash(DKey));
Full output of the attempt:
ERROR:  could not read block 0 in file "base/28654/39618": read only 0 of 8192 bytes
CONTEXT:  SQL statement "SELECT exists(select * from loan_Payments lp
                left join loan_payment_detail_nupay lpdn on lpdn.loan_payment_id = lp.DKey
                left join loan_payment_detail_bank_deposit lpdbd on lpdbd.loan_payment_id = lp.DKey
                left join loan_payment_detail_mctdebit lpdmct on lpdmct.loan_payment_id = lp.DKey
                left join loan_payment_detail_cashbook lpdcb on lpdcb.loan_payment_id = lp.DKey
                where lp.DKey = apaymentid and (lp.Type = 0 or lpdn.loan_payment_id = apaymentid or lpdbd.loan_payment_id = apaymentid or lpdmct.loan_payment_id = apaymentid or lpdcb.loan_payment_id = apaymentid) and lp.Payment <> 0
               )"
PL/pgSQL function sp_payment_iscash(integer) line 3 at RETURN

What else can I provide to narrow this down?

Regards.


On Mon, Feb 29, 2016 at 7:17 PM, Desmond Coertzen <patrolliekaptein@gmail.com> wrote:
Hi Tom,

Yes I did not provide nearly enough information. I was cheating in hope for a quick answer of something anyone may have encountered before me of the same nature.

I have been unable to reproduce the effect in an isolated test case, only on my live production setup. I came a bit closer to what the problem may be.

I did not see this behaviour on 8.4.22 until I started with partial indexing on a large table. A typical index was:

create index indx_lp_contract_iscash_true on loan_Payments (ContractKey, sp_payment_iscash(DKey)) where sp_payment_iscash(DKey) = true;

I know you would need more info, but please bare with me. When I dropped this type of index from this table, the broken effect went away and I got healthy results from my sub selects.

I started building a test system based on Postgres 9.3.11. I took the SQL dumps from my 8.4.22 setup and started restoring it on the 9.3.11 setup. In the log, I started seeing this during restore:

ERROR:  could not open relation with OID 36212
CONTEXT:  SQL statement "SELECT exists(select * from loan_Payments lp
                left join loan_payment_detail_nupay lpdn on lpdn.loan_payment_id = lp.DKey
                left join loan_payment_detail_bank_deposit lpdbd on lpdbd.loan_payment_id = lp.DKey
                left join loan_payment_detail_mctdebit lpdmct on lpdmct.loan_payment_id = lp.DKey
                left join loan_payment_detail_cashbook lpdcb on lpdcb.loan_payment_id = lp.DKey
                where lp.DKey = apaymentid and (lp.Type = 0 or lpdn.loan_payment_id = apaymentid or lpdbd.loan_payment_id = apaymentid or lpdmct.loan_payment_id = apaymentid or lpdcb.loan_payment_id = apaymentid) and lp.Payment <> 0
               )"
PL/pgSQL function sp_payment_iscash(bigint) line 3 at RETURN
STATEMENT:  CREATE INDEX indx_lp_iscash_true ON loan_payments USING btree (sp_payment_iscash((dkey)::bigint)) WHERE (sp_payment_iscash((dkey)::bigint) = true);

This log message in 9.3.11 put me on the path to drop all the partial index referencing my boolean function sp_payment_iscash on the 8.4.22 live setup. This returned sanity to my reports.

I am working on my test case to try to invoke the behaviour on both 8.4.22 and 9.3.11.

I have an idea the problem may be that the function accepts int8 as a parameter while the table is of int4 primary key type, where the primary key of the table is passed to the function during partial index. More testing and info will follow.

Regards.



On Fri, Feb 26, 2016 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Desmond Coertzen <patrolliekaptein@gmail.com> writes:
> On Postgres 8.4.22.

You realize of course that 8.4.x has been out of support for more than
a year ...

> The first form of the query looked like:

> select lots, of, stuff,
> (select max(ls2.fiscal_ts)::date
>  from long_story ls2
>  where ls2.contract_id = ls.contract_id and ls2.tr_value > 0 and
> sp_tr_is_cash(ls2.primary_key_id)
>  and not exists(select * from long_story ls2r where ls2r.reverse_of_pk_id =
> ls2.primary_key_id)
> ) as last_cash_tr_ts
> from long_story ls
> where ls.create_ts >= current_date and ls.tr_type_id = 4;

> The subselect columm "last_cash_tr_ts" produces null or bogus result.

You haven't provided nearly enough detail for anyone to judge whether
this is actually a bug or just your wrong expectation of what should
happen.  If you'd like people to look into it, please provide a
self-contained test case: not only the query but table definitions
and sample data.  (Ideally, a SQL script that reproduces the problem
starting from an empty database would make it easy for people to test.
We're not likely to take the time to try to reverse-engineer context
from an incomplete bug report.)

If it is a bug, it will not get fixed in 8.4.x anyway, because there
will never be any more 8.4.x releases.  However, if the bug still exists
in newer release branches, we'd definitely endeavor to fix it there.

                        regards, tom lane


Attachment

Re: Subselect left join / not exists()

From
Tom Lane
Date:
[ sorry for slow response ]

Desmond Coertzen <patrolliekaptein@gmail.com> writes:
> I cannot create this index on 9.3.11. I tried to recreate the index on
> 9.3.11 after my restore of my live setup from 8.4.22.

> New detail in the output this time:
> ERROR:  could not read block 0 in file "base/28654/39611": read only 0 of
> 8192 bytes

I think you are running into the same issue discussed in this thread:

http://www.postgresql.org/message-id/flat/87tx0dc80x.fsf@news-spur.riddles.org.uk

namely that you are trying to create an index on an allegedly immutable
function which, far from being immutable, actually attempts to consult the
table that the index is on.  That's never been considered supported, which
is why not a lot of enthusiasm has been mustered for suppressing this
weird error message.  The error message is indeed annoying and confusing,
but it's not like such an index could be expected to work usefully if we
prevented the error during index build.  In the example you've got here,
not only is the function consulting the underlying table, but four other
tables as well.  Updates on any one of those could invalidate the result,
but there's no mechanism to cause the index entries to be recomputed
when some other table changes.

So in short, you really need to reconsider trying to use an index this
way.
        regards, tom lane



Re: Subselect left join / not exists()

From
Desmond Coertzen
Date:
I get it.

I was under the impression that postgres will not allow me to declare a function immutable if it falls outside of the bounds of the definition. In example function sp_payment_iscash, the result would have been the same for every row with the same arguments, although that is because I willed it like that by design and not because it could not change through user action by manipulating the other tables references by the function.

I have other indexed immutable functions that references the table in the index and one more table. I will be taking a hard look at those as well.

The ability of postgres to index a function of this nature will be very much kick-ass if it could be supported in the future.

Thanks for your valuable input.

Regards.


On Mon, Mar 14, 2016 at 1:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[ sorry for slow response ]

Desmond Coertzen <patrolliekaptein@gmail.com> writes:
> I cannot create this index on 9.3.11. I tried to recreate the index on
> 9.3.11 after my restore of my live setup from 8.4.22.

> New detail in the output this time:
> ERROR:  could not read block 0 in file "base/28654/39611": read only 0 of
> 8192 bytes

I think you are running into the same issue discussed in this thread:

http://www.postgresql.org/message-id/flat/87tx0dc80x.fsf@news-spur.riddles.org.uk

namely that you are trying to create an index on an allegedly immutable
function which, far from being immutable, actually attempts to consult the
table that the index is on.  That's never been considered supported, which
is why not a lot of enthusiasm has been mustered for suppressing this
weird error message.  The error message is indeed annoying and confusing,
but it's not like such an index could be expected to work usefully if we
prevented the error during index build.  In the example you've got here,
not only is the function consulting the underlying table, but four other
tables as well.  Updates on any one of those could invalidate the result,
but there's no mechanism to cause the index entries to be recomputed
when some other table changes.

So in short, you really need to reconsider trying to use an index this
way.

                        regards, tom lane

Re: Subselect left join / not exists()

From
Tom Lane
Date:
Desmond Coertzen <patrolliekaptein@gmail.com> writes:
> I was under the impression that postgres will not allow me to declare a
> function immutable if it falls outside of the bounds of the definition.

IIRC, there was an explicit decision not to throw errors for "immutable
function reads a table", because we have no way to be sure that such a
thing might not be safe in practice in a particular usage.  (That is,
you actually don't ever change the table's contents after first use of
the function.)  But yeah, it's a gray area.
        regards, tom lane