Re: Real-life range datasets - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: Real-life range datasets |
Date | |
Msg-id | Pine.LNX.4.64.1112221945580.14072@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Real-life range datasets (Benedikt Grundmann <bgrundmann@janestreet.com>) |
Responses |
Re: Real-life range datasets
|
List | pgsql-hackers |
Bene, we have pgfoundry project http://pgfoundry.org/projects/dbsamples/. Since your sample database is very important (for me also), I suggest to use this site. Oleg On Thu, 22 Dec 2011, Benedikt Grundmann wrote: > Hello, > > We have a table in a postgres 8.4 database that would make use of date > ranges and exclusion constraints if they were available. Sadly I cannot > give you the data as it is based on data we are paying for and as part > of the relevant licenses we are obliqued to not give the data to third > parties. > > Basically the tables keep meta data about financial instruments on > a given day. Thanks to corporate actions (companies merging, splitting > up, etc...) that meta data can be different from one day to the next. > > One way to model such a table is: > > identifier, date, payload columns... > > > unique index on (date, identifier) > > (and in fact some of the payload columns are unique per day indices > as well). > > But because there are a large number of rows per day and most don't > change this is a very wasteful representation. > > Instead we use this > > identifier, effective_from, effective_until, payload columns... > > And we have some clever plpgsql functions that merge a days snapshot > into that representation. That happens only a few times per day and > is currently quite slow mostly because a lot of time is spend in > validation triggers to check that there are no overlapping entries > for effective_from,effective_until for jane_symbol and a few other > identifiers. > > The most common operations are: > > Get all or most rows of a given day > > (select ... from instruments where :date between effective_from and effective_until) > > left join of the instruments (again in the normal case constrained to > one day but in same cases periods of a week or a few month) > > select ... from t left join instruments on > t.jane_symbol = instruments.jane_symbol > t.date between instruments.effective_from and t.effective_until > where t.date = X > and additional constraint on the number of rows from t > > With t a huge table clustered on date with roughly 500,000 to 2,000,000 > entries per day. The left join would work most of the time (my guess is > more than 90%). But there are entries in t where the jane_symbol would > not be in instruments (sadly). > > Current size (immediately after a cluster): > > table toast (all indices) total > | 1268 MB | 900 MB | 693 MB | 2861 MB > > => select min(effective_from), max(effective_from) from instruments; > min | max > ------------+------------ > 2011-05-30 | 2011-12-21 > (1 row) > > b=> select count(*) from instruments where current_date - 1 between effective_from and effective_until ; > count > -------- > 358741 > (1 row) > > I should be able to give you a table with the same characteristics as > the instruments table but bogus data by replacing all entries in the > table with random strings of the same length or something like that. > I can probably take a little bit of time during this or the next week > to generate such "fake" real world data ;-) Is there an ftp site to > upload the gzipped pg_dump file to? > > Cheers, > > Bene > > On 20/12/11 16:48, Alexander Korotkov wrote: >> Hackers, >> >> For better GiST indexing of range types it's important to have real-life >> datasets for testing on. Real-life range datasets would help to proof (or >> reject) some concepts and get more realistic benchmarks. Also, it would be >> nice to know what queries you expect to run fast on that datasets. Ideally >> it should be real-life set of queries, but it also could be your >> presentation of what are typical queries for such datasets. >> Thanks! >> >> ----- >> With best regards, >> Alexander Korotkov. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: