Re: Processing long AND/OR lists - Mailing list pgsql-hackers

From Gurjeet Singh
Subject Re: Processing long AND/OR lists
Date
Msg-id CABwTF4X0oemqP7VN_is_ReYsC6h_RsVxf4Pexb=-yRgk6Hj+HQ@mail.gmail.com
Whole thread Raw
In response to Re: Processing long AND/OR lists  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, May 26, 2013 at 11:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Josh Berkus <josh@agliodbs.com> writes:
> ***15,000***?  I'd say that someone has an application design issue.
> Fixing the stack overflow is a good thing, but that query is never going
> to return ...

Just for the record, it does finish in 5 sec on my laptop. But yes, point taken.
 

Yeah, the parser's stack consumption seems like only the tip of the
iceberg here.

True. After getting past the parser, the bigger lists consume exponentially longer times around process_equivalence(). And BTW, a backend stuck in that stack does not respond to pg_cancel/terminate_backend()! Should there be a CHECK_FOR_INTERRUPT() in process_equivalence(), perhaps invoked only every N calls of that function.
 

I find it hard to visualize a use-case for so many AND'ed conditions
anyway.  I could believe a lot of OR'd conditions, but very likely it
would be a list that could and should be converted to an IN condition.

As noted earlier, this was seen in real-world, at a customer setup, generated by Slony, a highly regarded community project. Also, the patch addresses the stack limit for long OR clauses as well.

Anytime such conditions are generated programmatically, there's always a possibility that the programmer underestimated the amount of data they are generating the query for; just like it happened in Slony's case I quoted.

Slony has compress_actionseq() function to scan a list of numbers, and generate a smallest possible WHERE clause. If it sees consecutive numbers that form a range, it turns that range into a BETWEEN clause, and the numbers that don't seem to be in a range are added to the where clause as 'AND col <> n1 AND col <> n2 ...'. It's quite likely that it generates such long lists because it wrongly assumes that the input list is ordered, or at least mostly ordered.

Agreed that that function can/should be fixed to do a better job of sorting these numbers before scanning, and also using the NOT IN construct. But the point remains that Postgres should be capable of handling this simple construct efficiently.

Best regards,
--
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

pgsql-hackers by date:

Previous
From: Christopher Browne
Date:
Subject: Re: Planning incompatibilities for Postgres 10.0
Next
From: Craig Ringer
Date:
Subject: Re: Planning incompatibilities for Postgres 10.0