Hi hackers,
Recently one of our customers encountered a situation when the planning
of a particular query takes too long (several minutes) and can't be
interrupted by pg_terminate_backend(). The query and schema are attached
(this is generated by Zabbix). The reason for the slowness is that the
run time of choose_bitmap_and() is quadratic in the number of WHERE
clauses. It assigns unique ids to the clauses by putting them in a list
and then doing a linear search with equal() to determine the position of
each new clause.
Our first attempt to fix this was putting these clauses into an rbtree
or dynahash. This improves the performance, but is not entirely correct.
We don't have a comparison or hash function for nodes, so we have to
hash or compare their string representation. But the equality of
nodeToString() is not equivalent to equal(), because the string has some
fields that are ignored by equal(), such as token location. So we can't
really compare the string value instead of using equal().
I settled on a simpler solution: limiting the number of clauses we try
to uniquely identify. If there are too many, skip the smarter logic that
requires comparing paths by clauses, and just return the cheapest input
path from choose_bitmap_and(). The patch is attached.
I'd like to hear your thoughts on this. This is a valid query that
freezes a backend with 100% CPU usage and no way to interrupt it, and I
think we should fail more gracefully.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company