Re: Planner cost adjustments - Mailing list pgsql-general
From | Daniel Begin |
---|---|
Subject | Re: Planner cost adjustments |
Date | |
Msg-id | COL129-DS2787944731718F72D351E394A40@phx.gbl Whole thread Raw |
In response to | Planner cost adjustments (Daniel Begin <jfd553@hotmail.com>) |
List | pgsql-general |
Here is a last follow-up on adjusting the planner costs calculation 1 - Francisco, you provide me with a clue I was looking for at this point! I learned enough PLpgsql over the last week to reprogram all my processes (not always elegant but it works). By processingindividual records using loop, I will be able to move forward with my work and save individual results as theyare processed. Furthermore, it is now easier to predict jobs completion times, since the DB uses similar plans for each record. I did atest on a stratified sampling of records and I expect the jobs will run for about two weeks (contrarily to months in previousestimates!-) 2 - PLpgsql helped me to bypass the problem. As PT suggested, I will try to setup a test case that demonstrates the problemand post it to the developers' list so they might figure out what to do in such situation. Thanks to all, Daniel -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Francisco Olarte Sent: June-11-15 10:05 To: Daniel Begin Cc: Bill Moran; pgsql-general@postgresql.org; Tomas Vondra; Melvin Davidson Subject: Re: [GENERAL] Planner cost adjustments Hi Daniel: On Thu, Jun 11, 2015 at 2:38 PM, Daniel Begin <jfd553@hotmail.com> wrote: ..... > The remaining problem seems related to the statistics of some large tables. > On one hand, I might increase the statistic target for these tables to > 500, or even to 1000 and look at the results (but I have doubts it > will help). On the other hand, I could simply force enable_seqscan to > OFF for queries dealing with them. > > Still not sure about the best solution but the problem is definitely > narrower :-) One somehow unrelated point. IIRC your problems where related to queries doing joins with selected sets of indexed IDs ona smaller table which then have to be looked up on some very large tables. ( I'm not able to follow too well which is which,users on changesets, I'm a bit lost ). Given your runtimes are always high ( in the seconds range, so it seems wirespeed / latencies are not too much of an issue ) and that selectivity estimates on huge tables are always problematic and may be thwarting your plans you may be able toget faster results splitting your query. If I read your plans correctly, that would be selecting your 600 users in one query and then preparing the changeset queryfor a single user_id, which should be indexed, and looping it for every user. Given current machines can easily send-receive600 queries in a second it may lead to a simpler solution. This mean you're using the DB as a somehow inteligentplain old indexed file, but sometimes this is the simpler approach ( heck, some of my code uses algorithms fromthe tape era as they were the faster way I could do it ). I needed to do this in one of my programs, the optimizer kept selecting bad plans so I did the first query, held the resultsin memory, and then prepared and executed in a loop from the app, my query was selecting a couple thousand valuesfrom submillion table, and joining with a multimillion one, getting a couple hundreds matches per original value. Splittingit made the queries on the big table always run indexed and fast ( and as a side bonus avoided duplicating the partsof the first record in the wire a hundred times, which was nice since the short table was wide and I only needed 3 shortfields from the second one, and that made the first query run at wire speed and the second at disk speed ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
pgsql-general by date: