Re: Planner cost adjustments - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Planner cost adjustments
Date
Msg-id CA+bJJbzMgRCKcy7ZPpEYUv2ng6=a83GncirQ8gy=3_BrvgsZgw@mail.gmail.com
Whole thread Raw
In response to Re: Planner cost adjustments  (Daniel Begin <jfd553@hotmail.com>)
List pgsql-general
Hi Daniel:

On Thu, Jun 11, 2015 at 2:38 PM, Daniel Begin <jfd553@hotmail.com> wrote:
.....
> The remaining problem seems related to the statistics of some large tables.
> On one hand, I might increase the statistic target for these tables to 500,
> or even to 1000 and look at the results (but I have doubts it will help). On
> the other hand, I could simply force enable_seqscan to OFF for queries
> dealing with them.
>
> Still not sure about the best solution but the problem is definitely
> narrower :-)

One somehow unrelated point. IIRC your problems where related to
queries doing joins with selected sets of indexed IDs on a smaller
table which then have to be looked up on some very large tables. ( I'm
not able to follow too well which is which, users on changesets, I'm a
bit lost ). Given your runtimes are always high ( in the seconds
range, so it seems wire speed / latencies are not too much of an issue
) and that selectivity estimates on huge tables are always problematic
and may be thwarting your plans you may be able to get faster results
splitting your query.

 If I read your plans correctly, that would be selecting your 600
users in one query and then preparing the changeset query for a single
user_id, which should be indexed, and looping it for every user. Given
current machines can easily send-receive 600 queries in a second it
may lead to a simpler solution. This mean you're using the DB as a
somehow inteligent plain old indexed file, but sometimes this is the
simpler approach ( heck, some of my code uses algorithms from the tape
era as they were the faster way I could do it ).

I needed to do this in one of my programs, the optimizer kept
selecting bad plans so I did the first query, held the results in
memory, and then prepared and executed in a loop from the app, my
query was selecting a couple thousand values from submillion table,
and joining with a multimillion one, getting a couple hundreds matches
per original value. Splitting it made the queries on the big table
always run indexed and fast ( and as a side bonus avoided duplicating
the parts of the first record in the wire a hundred times, which was
nice since the short table was wide and I only needed 3 short fields
from the second one, and that made the first query run at wire speed
and the second at disk speed ).

Francisco Olarte.


pgsql-general by date:

Previous
From: Marc Mamin
Date:
Subject: Re: select count(*);
Next
From: Geoff Winkless
Date:
Subject: Re: select count(*);