Re: Optimal time series sampling. - Mailing list pgsql-general

From Scott Marlowe
Subject Re: Optimal time series sampling.
Date
Msg-id dcc563d10711091122o29956187ga5243db9394b5910@mail.gmail.com
Whole thread Raw
In response to Re: Optimal time series sampling.  (Ted Byers <r.ted.byers@rogers.com>)
Responses Re: Optimal time series sampling.  (Ted Byers <r.ted.byers@rogers.com>)
List pgsql-general
On Nov 9, 2007 11:47 AM, Ted Byers <r.ted.byers@rogers.com> wrote:
>
> --- Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Ted Byers <r.ted.byers@rogers.com> writes:
> > > OK, it is challenging to present it in plain text,
> > but
> > > here is the HTML exported by MySQL Query Browser.
> >
> > Why are you asking this list for help with a MySQL
> > performance problem?
> >
> because my question isn't really about MySQL, but
> rather about how best to construct the SQL required to
> get the job done, regardless of what database is used.
>  I have seen some claims that it is better to use
> joins instead of correlated subqueries and others that
> say the opposite.  And I do not, at this stage, know
> if there are other options in SQL that may or may not
> be better.

Which is better depends largely on how your database is built.  MySQL
still uses loops for all subselects, so with large numbers of tuples
in the subselect method, it will be slow.  But they might fix this in
a later release.

Fairly recent versions of PostgreSQL could make some bad choices when
doing joins for certain datasets that would be much faster with a
correlated subquery (specifically the old left join where
righttable.field is null trick made some pgsql versions choose an
inefficient join method)

So, the "right" way is a question of which db, and even which version
of that DB you're on.


>  At this time, the database in use is
> irrelevant (I want to stick as close to the ANSI
> standard as practicable so the rewriting required will
> be minimal should we decide to change the database
> later, for whatever reason).

If you want to stick with ANSI, MySQL tends to be more divergent from
the spec than pgsql and other rdbms.

Most people would consider the correlate subquery the better method.
But it's also likely to be the slowest on MySQL.

pgsql-general by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: Insert statements really slow
Next
From: Tomas Vondra
Date:
Subject: Re: INSERT performance deteriorates quickly during a large import