Re: WIP: multivariate statistics / proof of concept - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: WIP: multivariate statistics / proof of concept
Date
Msg-id 54679D09.3010203@fuzzy.cz
Whole thread Raw
In response to Re: WIP: multivariate statistics / proof of concept  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: WIP: multivariate statistics / proof of concept  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 15.11.2014 18:49, Kevin Grittner
> If you eliminate the quals besides the zipcode column you get 61
> rows and it gets much stranger, with legal municipalities that are
> completely surrounded by Madison that the postal service would
> rather you didn't use in addressing your envelopes, but they have
> to deliver to anyway, and organizations inside Madison receiving
> enough mail to (literally) have their own zip code -- where the
> postal service allows the organization name as a deliverable
> "city".
> 
> If you want to have your own fun with this data, you can download
> it here:
> 
> http://federalgovernmentzipcodes.us/free-zipcode-database.csv
>
...
> 
> I bet there are all sorts of correlation possibilities with, for
> example, latitude and longitude and other variables.  With 81831
> rows and so many correlations among the columns, it might be a
> useful data set to test with.

Thanks for the link. I've been looking for a good dataset with such
data, and this one is by far the best one.

The current version of the patch supports only data types passed by
value (i.e. no varlena types - text, ), which means it's impossible to
build multivariate stats on some of the interesting columns (state,
city, ...).

I guess it's time to start working on removing this limitation.

Tomas



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: WIP: multivariate statistics / proof of concept
Next
From: Fabrízio de Royes Mello
Date:
Subject: Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED