Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2 - Mailing list pgsql-performance

From Rodrigo Rosenfeld Rosas
Subject Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Date
Msg-id 5099525B.5070909@gmail.com
Whole thread Raw
In response to Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
List pgsql-performance
Hi Merlin,

Em 06-11-2012 15:22, Merlin Moncure escreveu:
On Tue, Nov 6, 2012 at 11:11 AM, Rodrigo Rosenfeld Rosas
<rr.rosas@gmail.com> wrote:
Hello, this is my first message to this list, so sorry if this is not the
right place to discuss this or if some data is missing from this message.

I'll gladly send any data you request that would help us to understand this
issue. I don't believe I'm allowed to share the actual database dump, but
other than that I can provide much more details you might ask for.

I can't understand why PG 9.2 performs so differently from PG 9.1.

I tested these queries in my Debian unstable amd64 box after restoring the
same database dump this morning in both PG 9.1 (Debian unstable repository)
and PG9.2 (Debian experimental repository) with same settings:

https://gist.github.com/3f1f3aad3847155e1e35

Ignore all lines like the line below because it doesn't make any difference
on my tests if I just remove them or any other column from the SELECT
clause:

"  exists(select id from condition_document_excerpt where
condition_id=c1686.id) as v1686_has_reference,"

The results below are pretty much the same if you assume "SELECT 1 FROM
...".

I have proper indices created for all tables and the query  is fast in both
PG versions when I don't use many conditions in the WHERE clause.

fast.sql returns the same data as slow.sql but it returns much faster in my
tests with PG 9.1.

So here are the completion times for each query on each PG version:

Query   | PG 9.1         | PG 9.2 |
-----------------------------------
fast.sql| 650 ms (0.65s) | 690s   |
slow.sql| 419s           | 111s   |


For the curious, the results would be very similar to slow.sql if I use
inner joins with the conditions inside the WHERE moved to the "ON" clause of
the inner join instead of the left outer join + global WHERE approach. But I
don't have this option anyway because this query is generated dynamically
and not all my queries are "ALL"-like queries.

Here are the relevant indices (id is SERIAL primary key in all tables):

CREATE UNIQUE INDEX transaction_condition_transaction_id_type_id_idx ON transaction_condition USING btree (transaction_id, type_id);
CREATE INDEX index_transaction_condition_on_transaction_id ON transaction_condition USING btree (transaction_id);
CREATE INDEX index_transaction_condition_on_type_id ON transaction_condition USING btree (type_id);

CREATE INDEX acquirer_target_names ON company_transaction USING btree (acquiror_company_name COLLATE pg_catalog."default", target_company_name
COLLATE pg_catalog."default");
CREATE INDEX index_company_transaction_on_target_company_name ON company_transaction USING btree (target_company_name COLLATE pg_catalog."default");
CREATE INDEX index_company_transaction_on_date ON company_transaction USING btree (date);
CREATE INDEX index_company_transaction_on_edit_status ON company_transaction USING btree (edit_status COLLATE pg_catalog."default");

CREATE UNIQUE INDEX index_condition_boolean_value_on_condition_id ON condition_boolean_value USING btree (condition_id);
CREATE INDEX index_condition_boolean_value_on_value_and_condition_id ON condition_boolean_value USING btree (value COLLATE pg_catalog."default", condition_id);

CREATE UNIQUE INDEX index_condition_option_value_on_condition_id ON condition_option_value USING btree (condition_id);
CREATE INDEX index_condition_option_value_on_value_id_and_condition_id ON condition_option_value USING btree (value_id, condition_id);


CREATE INDEX index_condition_option_label_on_type_id_and_position ON condition_option_label USING btree (type_id, "position");
CREATE INDEX index_condition_option_label_on_type_id_and_value ON condition_option_label USING btree (type_id, value COLLATE pg_catalog."default");


CREATE UNIQUE INDEX index_condition_string_value_on_condition_id ON condition_string_value USING btree (condition_id);
CREATE INDEX index_condition_string_value_on_value_and_condition_id ON condition_string_value USING btree (value COLLATE pg_catalog."default", condition_id);


Please let me know of any suggestions on how to try to get similar results
in PG 9.2 as well as to understand why fast.sql performs so much better than
slow.sql on PG 9.1.

need explain analyze for 9.1 vs 9.2.  use this site:
http://explain.depesz.com/ to post info.

http://explain.depesz.com/s/ToX (fast on 9.1)
http://explain.depesz.com/s/65t (fast on 9.2)
http://explain.depesz.com/s/gZm (slow on 9.1)
http://explain.depesz.com/s/END (slow on 9.2 - funny that the generated URL was END while this was my last explain :D )

looking at your query -- it's a fair possibility that the root cause
of your issue is your database schema and organization.  It's hard to
tell for sure, but it looks like you might have dived head first into
the EAV anti-pattern -- deconstructing your data to such a degree that
accurate statistics and query plans are difficult or impossible.  I
mean this in the most constructive way possible naturally.  If that is
indeed the case a good plan is going to be sheer luck as the database
is essentially guessing.

Let me explain how the application works, how the database was designed and hopefully you'll be able to guide me in the correct way to design the database for this use case.

Our application will present a big contract to some attorneys. There is currently a dynamic template with around 800 fields to be extracted from each contract in our system. These fields can be of different types (boolean, string, number, currency, percents, fixed options, dates, time-spans and so on). There is a fields tree that is maintained by the application editors. The application will allow the attorneys to read the contracts and highlight parts of the contract where they extracted each field from and associate each field with its value interpreted by the attorney and store the reference to what paragraphs in the contract demonstrate where the value came from.

Then there is an interface that will allow clients to search for transactions based on its associated contracts and those ~800 fields. For the particular query above, 14 of the 800 fields have been searched by this particular user (most of them were boolean ones plus a few options and a string field). Usually the queries perform much better when less than 10 fields are used in the criteria. But our client wants us to handle up to 20 fields in a single query or they won't close the deal and this is a really important client to us.

So, for the time being my only plan is to rollback to PG 9.1 and replace my query builder that currently generate queries like slow.sql and change it to generate the queries like fast.sql but I'm pretty sure this approach should be avoided. I just don't know any other alternative for the time being.

What database design would you recommend me for this use case?

Or, how would you recommend me to perform the queries? Keep in mind that a user could create a filter like "(f10.value = 'N' OR f11.value = 'N') AND f13.value=50".

Problem could also be no statistics (run ANALYZE to test) or some
other configuration problem (like index locale), or a bona fide
regression.

I know barely anything about performance tuning in PostgreSQL (I tried many tutorials but I have a hard time trying to understand EXPLAIN queries for queries like above). Would you mind in explaining me how to improve statistics or what is this index locale thing you talked about? Also, I have never heard about bona fide regression before. I'm looking for it on Google right now.

Thank you very much for your response!

Cheers,
Rodrigo.

pgsql-performance by date:

Previous
From: "ktm@rice.edu"
Date:
Subject: Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Next
From: Rodrigo Rosenfeld Rosas
Date:
Subject: Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2