Re: Predicting query runtime - Mailing list pgsql-general

From Vinicius Segalin
Subject Re: Predicting query runtime
Date
Msg-id CAAeH1nBx9DB4K292aONbDSLjA1tUufv+xh36Ce9YEVz8WBhzKw@mail.gmail.com
Whole thread Raw
In response to Re: Predicting query runtime  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-general
2016-09-12 15:16 GMT-03:00 Merlin Moncure <mmoncure@gmail.com>:
On Mon, Sep 12, 2016 at 9:03 AM, Vinicius Segalin <vinisegalin@gmail.com> wrote:
> Hi everyone,
>
> I'm trying to find a way to predict query runtime (I don't need to be
> extremely precise). I've been reading some papers about it, and people are
> using machine learning to do so. For the feature vector, they use what the
> DBMS's query planner provide, such as operators and their cost. The thing is
> that I haven't found any work using PostgreSQL, so I'm struggling to adapt
> it.
> My question is if anyone is aware of a work that uses machine learning and
> PostgreSQL to predict query runtime, or maybe some other method to perform
> this.

Well, postgres estimates the query runtime in the form of an expected
'cost', where the cost is an arbitrary measure based on time
complexity of query plan.   It shouldn't be too difficult to correlate
estimated cost to runtime cost. 

That's what I though too. At least it makes sense, I guess. But sometimes logic doesn't work, so I think only giving it a try will say.
 
A statistical analysis of that
correlation would be incredibly useful work although generating sample
datasets would be a major challenge.

merlin

Indeed. I'm using TPC-B along with pgbench to have some data to test (while I don't have real data), but I'm having a hard time creating queries that give me (very) different performance results so I can train my ML algorithm.

 

pgsql-general by date:

Previous
From: Rich Shepard
Date:
Subject: Server crashed, now cannot start postgres
Next
From: Vinicius Segalin
Date:
Subject: Re: Predicting query runtime