Re: Database slowness -- my design, hardware, or both? - Mailing list pgsql-general

From Reuven M. Lerner
Subject Re: Database slowness -- my design, hardware, or both?
Date
Msg-id 45F02F33.2060904@lerner.co.il
Whole thread Raw
In response to Re: Database slowness -- my design, hardware, or both?  ("Webb Sprague" <webb.sprague@gmail.com>)
Responses Re: Database slowness -- my design, hardware, or both?  (Richard Broersma Jr <rabroersma@yahoo.com>)
List pgsql-general
Hi, Webb Sprague.  You wrote:

Do you have new \timings?
Yup.  It just finished executing a little while ago.  With the explicitly interpolated array in place, I got the following:
LOG:  statement: UPDATE Transactions
              SET previous_value = previous_value(id)
            WHERE new_value IS NOT NULL
              AND new_value <> ''
              AND node_id IN (351, 169, 664, 240)
LOG:  duration: 16842710.469 ms

The previous version, which included lots of calls to RAISE NOTICE and also used a subselect, had the following timing:
LOG:  statement: UPDATE Transactions
              SET previous_value = previous_value(id)
            WHERE new_value IS NOT NULL
              AND new_value <> ''
              AND node_id IN (SELECT node_id FROM NodeSegments)
LOG:  duration: 16687993.067 ms
(I keep timing information in the logfile, rather than using \timing.)

So it looks like this didn't make much of a timing difference at all.  And what little difference there was, was negative.  Bleah.
What you or I think is a minor change isn't necessarily what the
planner thinks is a minor change, especially when you change data from
something that requires a query to something that is determinate.  I
would suggest changing your function to remove as many such queries as
possible too (I am thinking of the order by limit 1).  This would be a
good move also in that you narrow down the amount of moving parts to
diagnose and it just makes the whole thing cleaner.
Good idea.  I'll see if I can get the function to be a bit cleaner, although I'm not sure if it is, given the constraints of the problem.  That's one of the reasons why I've been adding these "hints" to the database -- so that I can have many small queries, rather than one large one.
I would also try amortizing the analysis with triggers, rather than
building the table all at once; this may be better or worse, depending
on the on-line character of the application (e.g., if they are waiting
at an ATM, in might be a deal breaker to add two seconds to do an
insert / update, but not if you are tailing off a log file that gets
updated every minute or so.)
The data that I'm dealing with is not changing over time.  So I've been trying to build the transactions table (aka my data warehouse) slowly, adding one or two columns at a time with hints that will make extracting the data easier.  Unfortunately, building those hints has proven to be very slow going.

Reuven

pgsql-general by date:

Previous
From: "Timasmith"
Date:
Subject: which is more scalable for the database?
Next
From: Mark Stosberg
Date:
Subject: Re: RFC tool to support development / operations work with slony replicated databases