Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2 - Mailing list pgsql-performance

From Rodrigo Rosenfeld Rosas
Subject Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Date
Msg-id 50995D7C.9020108@gmail.com
Whole thread Raw
In response to Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
List pgsql-performance
Em 06-11-2012 16:42, Merlin Moncure escreveu:
On Tue, Nov 6, 2012 at 12:09 PM, Rodrigo Rosenfeld Rosas
<rr.rosas@gmail.com> wrote:
http://explain.depesz.com/s/ToX (fast on 9.1)
http://explain.depesz.com/s/65t (fast on 9.2)
http://explain.depesz.com/s/gZm (slow on 9.1)
http://explain.depesz.com/s/END (slow on 9.2 - funny that the generated URL
was END while this was my last explain :D )
Hm -- looking at your 'slow' 9.2 query, it is reporting that the query
took 3 seconds  (reported times are in milliseconds).  How are you
timing the data?  What happens when you run explain analyze
<your_query> from psql (as in, how long does it take)?

The time I reported in the tables of my first message were the time reported by pgAdmin3 (compiled from source).

But I get similar time when I run like this:

time psql -p 5432 -f slow.sql db_name > slow-9.2-again.explain

real    1m56.353s
user    0m0.068s
sys     0m0.020s

slow-9.2-again.explain: http://explain.depesz.com/s/zF1

Let me explain how the application works, how the database was designed and
hopefully you'll be able to guide me in the correct way to design the
database for this use case.

Our application will present a big contract to some attorneys. There is
currently a dynamic template with around 800 fields to be extracted from
each contract in our system. These fields can be of different types
(boolean, string, number, currency, percents, fixed options, dates,
time-spans and so on). There is a fields tree that is maintained by the
application editors. The application will allow the attorneys to read the
contracts and highlight parts of the contract where they extracted each
field from and associate each field with its value interpreted by the
attorney and store the reference to what paragraphs in the contract
demonstrate where the value came from.

Then there is an interface that will allow clients to search for
transactions based on its associated contracts and those ~800 fields. For
the particular query above, 14 of the 800 fields have been searched by this
particular user (most of them were boolean ones plus a few options and a
string field). Usually the queries perform much better when less than 10
fields are used in the criteria. But our client wants us to handle up to 20
fields in a single query or they won't close the deal and this is a really
important client to us.

So, for the time being my only plan is to rollback to PG 9.1 and replace my
query builder that currently generate queries like slow.sql and change it to
generate the queries like fast.sql but I'm pretty sure this approach should
be avoided. I just don't know any other alternative for the time being.

What database design would you recommend me for this use case?
I would strongly consider investigation of hstore type along with
gist/gin index.
select * from company_transaction where contract_attributes @>
'State=>Delaware, Paid=Y';
etc

I'm not very familiar with hstore yet but this was one of the reasons I wanted to migrate to PG 9.2 but I won't be able to migrate the application quickly to use hstore.

Also, I'm not sure if hstore allows us to be as flexible as we currently are (c1 and (c2 or c3 and not (c4 and c5))). c == condition

Barring that, I would then consider complete elimination of integer
proxies for your variables.  They make your query virtually impossible
to read/write, and they don't help.

I'm not sure if I understood what you're talking about. The template is dynamic and contains lots of information for each field, like type (number, percent, string, date, etc), parent_id (auto-referencing), aggregator_id (also auto-referencing) and several other columns. But the values associate the field id (type_id) and the transaction id in a unique way (see unique index in my first message of the thread). Then I need different tables to store the actual value because we're using SQL instead of MongoDB or something else. The table that stores the value depend on the field type.

Maybe it would help me to understand if you could provide some example for the design you're proposing.

Thank you very much,
Rodrigo.

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Next
From: "Gunnar \"Nick\" Bluth"
Date:
Subject: Re: Re: Increasing work_mem and shared_buffers on Postgres 9.2 significantly slows down queries