Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2 - Mailing list pgsql-performance

From Rodrigo Rosenfeld Rosas
Subject Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Date
Msg-id 509BD01F.6030602@gmail.com
Whole thread Raw
In response to Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-performance
Em 07-11-2012 22:58, Tom Lane escreveu:
Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> writes:
Ok, I could finally strip part of my database schema that will allow you 
to run the explain query and reproduce the issue.
There is a simple SQL dump in plain format that you can restore both on 
9.1 and 9.2 and an example EXPLAIN query so that you can see the 
difference between both versions.
Please keep me up to date with regards to any progress. Let me know if 
the commit above fixed this issue.
AFAICT, HEAD and 9.2 branch tip plan this query a bit faster than 9.1
does.

Great! What is the estimate for 9.2.2 release?

  It does appear that the problem is the same one fixed in that
recent commit: the problem is you've got N join clauses all involving
t.id and so there are lots of redundant ways to use the index on t.id.

And what is the reason why fast.sql performs much better than slow.sql? Is it possible to optimize the planner so that both fast.sql and slow.sql finish about the same time?

I've got to say though that this is one of the most bizarre database
schemas I've ever seen.

Merlin seems to share your opinion on that. I'd love to try a different database design when I have a chance.

What would you guys suggest me for handling my application requirements?

The only reason it is bizarre is because I have no idea on how to simplify much our database design using relational databases. And pstore also doesn't sound like a reasonable option either for our requirements.

The only other option I can think of is stop splitting transaction_condition in many tables (one for each data type). Then I'd need to include all possible columns in transaction_condition and I'm not sure if it would perform better and what would be the implications with regards to the database size since most columns will be null for each record. This also introduces another issue. I would need to create a trigger to detect if the record is valid upon insertion to avoid creating records with all columns set to NULL for instance. Currently each separate table that store the values have not-null constraints among others to prevent this kind of problem. Triggers are more complicated to maintain, specially because we're used to using an ORM (except for this particular case where I generate the SQL query manually instead of using an ORM for this).

Also, we migrate the database using standalone_migrations:

https://github.com/thuss/standalone-migrations

If we change a single line in the trigger code it won't be easy to see what line has changed in the commit that introduces the change because we would have to create a separate migration to alter the trigger with all code repeated.

  It seems to be sort of an unholy combination of
EAV and a star schema.  A star schema might not actually be a bad model
for what you're trying to do, but what you want for that is one big fact
table and a collection of *small* detail tables you join to it (small
meaning just one entry per possible value).  The way this is set up, you
need to join two or three tables before you can even join to the main
fact table - and those tables don't even have the virtue of being small.
That's never going to perform well.

If I understand correctly, you're suggesting that I dropped transaction_condition(id, transaction_id, type_id) and replaced condition_boolean_value(id, condition_id, value) with condition_boolean_value(id, transaction_id, type_id, value) and repeat the same idea for the other tables.

Is that right? Would that perform much better? If you think so, I could try this approach when I find some time. But I'd also need to denormalize other related tables I didn't send in the schema dump. For instance, the documents snippets have also a condition_id column. Each field value (transaction_condition) can have multiple contract snippets in a table called condition_document_excerpt(id, document_id, condition_id, "position"). I'd need to remove condition_id from it and append transaction_id and type_id just like the values tables. No big deal if this would speed up our queries.

Am I missing something?

pgsql-performance by date:

Previous
From: David Boreham
Date:
Subject: Re: HT on or off for E5-26xx ?
Next
From: Alvaro Herrera
Date:
Subject: Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2