On Tue, Nov 6, 2012 at 12:09 PM, Rodrigo Rosenfeld Rosas
<rr.rosas@gmail.com> wrote:
> http://explain.depesz.com/s/ToX (fast on 9.1)
> http://explain.depesz.com/s/65t (fast on 9.2)
> http://explain.depesz.com/s/gZm (slow on 9.1)
> http://explain.depesz.com/s/END (slow on 9.2 - funny that the generated URL
> was END while this was my last explain :D )
Hm -- looking at your 'slow' 9.2 query, it is reporting that the query
took 3 seconds (reported times are in milliseconds). How are you
timing the data? What happens when you run explain analyze
<your_query> from psql (as in, how long does it take)?
> Let me explain how the application works, how the database was designed and
> hopefully you'll be able to guide me in the correct way to design the
> database for this use case.
>
> Our application will present a big contract to some attorneys. There is
> currently a dynamic template with around 800 fields to be extracted from
> each contract in our system. These fields can be of different types
> (boolean, string, number, currency, percents, fixed options, dates,
> time-spans and so on). There is a fields tree that is maintained by the
> application editors. The application will allow the attorneys to read the
> contracts and highlight parts of the contract where they extracted each
> field from and associate each field with its value interpreted by the
> attorney and store the reference to what paragraphs in the contract
> demonstrate where the value came from.
>
> Then there is an interface that will allow clients to search for
> transactions based on its associated contracts and those ~800 fields. For
> the particular query above, 14 of the 800 fields have been searched by this
> particular user (most of them were boolean ones plus a few options and a
> string field). Usually the queries perform much better when less than 10
> fields are used in the criteria. But our client wants us to handle up to 20
> fields in a single query or they won't close the deal and this is a really
> important client to us.
>
> So, for the time being my only plan is to rollback to PG 9.1 and replace my
> query builder that currently generate queries like slow.sql and change it to
> generate the queries like fast.sql but I'm pretty sure this approach should
> be avoided. I just don't know any other alternative for the time being.
>
> What database design would you recommend me for this use case?
I would strongly consider investigation of hstore type along with
gist/gin index.
select * from company_transaction where contract_attributes @>
'State=>Delaware, Paid=Y';
etc
Barring that, I would then consider complete elimination of integer
proxies for your variables. They make your query virtually impossible
to read/write, and they don't help.
merlin