Thread: Result set granularity..
Hi, Since I went from Oracle to PostgreSQL I've been missing the "invisable" column 'rownum'. I often used it to lower the granularity of my data. For example, say I have a large table containing some sort of statistical data and want to plot a graph using it. If the graph is 600 pixels wide I might as well lower the granularity of my incoming data to 600 measure points before plotting. In Oracle I used to do this by using the modulus operator on the rownum column as a restriction. SELECT <column> FROM <table> WHERE mod(rownum, 5) = 0; The query above would give me every fifth row of the original result set and would save me from sending the data over my database connection and do the lowering of the granularity in the application. I have two questions, one dependent on the answer on the other one.. 1) Is it possible to achieve this any other _easy_ way? Perhaps it would be feasible to write a wrapper using a counter which makes the SELECT and then return every fifth row to the SELECT calling the wrapper. But then I assume the data still has to "travel" one step which puts on some overhead. 2) Would it be possible to add one more limit argument to the non-standard set of LIMIT and OFFET - a 'SCALE n' or 'GRANUL n' argument which would return every n row of the initial result set. I think that would be gladly accepted for folks working with statistical data. -ra
On Saturday 27 September 2003 18:41, Rasmus Aveskogh wrote: > Hi, > > Since I went from Oracle to PostgreSQL I've been missing the "invisable" > column 'rownum'. I often used it to lower the granularity of my data. > For example, say I have a large table containing some sort of statistical > data and want to plot a graph using it. > If the graph is 600 pixels wide I might as well lower the granularity of > my incoming data to 600 measure points before plotting. > > In Oracle I used to do this by using the modulus operator on the rownum > column as a restriction. > > SELECT <column> FROM <table> WHERE mod(rownum, 5) = 0; > > The query above would give me every fifth row of the original result set > and would save me from sending the data over my database connection and do > the lowering of the granularity in the application. > > I have two questions, one dependent on the answer on the other one.. > > 1) Is it possible to achieve this any other _easy_ way? > Perhaps it would be feasible to write a wrapper using a counter which > makes the SELECT and then return every fifth row to the SELECT calling the > wrapper. But then I assume the data still has to "travel" one step which > puts on some overhead. First of all there is oid which you can use but it is not strictly unique. It's limited to 4GB and wraps around. I would rather say declare a cursor and move the pointer as and when required. Sice you aren't retrieving large part of rows, that loop should be cheap enough. HTH Shridhar
Short answer, no there's no trick to doing this in postgres like rownum. You would have to either add a column with sequential or random keys and then select on it, or else pull down all the rows and only process the ones you want. Rownum is handy though, I wonder how easy it would be to add it to postgres. "Rasmus Aveskogh" <rasmus@defero.se> writes: > SELECT <column> FROM <table> WHERE mod(rownum, 5) = 0; Uhm. Does that actually work? I thought rownum only incremented for every row actually returned. So that this would return one row and then stop returning rows. I would have thought you would have to use a subquery to get this to work like: SELECT * FROM (SELECT column, rownum AS n FROM table) WHERE mod(n,5)=0 > The query above would give me every fifth row of the original result set > and would save me from sending the data over my database connection and do > the lowering of the granularity in the application. Also, in Oracle there's actually a SAMPLE keyword that you can put on a select to tell oracle that you only need a sample. It's way more efficient than using rownum because it skips whole blocks. Ie, the equivalent of above would be 'SAMPLE 20 PERCENT' or something like that, and it would read a whole block, then skip 4 whole blocks. However, for statistical purposes both of these techniques have downsides. Consider the case where you're looking for the standard deviation of some timing information and there's a spike every five minutes. The correlation between the sampling and the event could create spurious results. You could completely fail to see the events, or thing they are much worse than they are. Or even that they're worse at some times of day when in fact they're consistent. Really what you want is to assign statistically random numbers, probably floating point numbers, to each record, then read all records where those numbers are in some range. Even that might not really be kosher for serious statistics. -- greg
On Sat, 27 Sep 2003, Rasmus Aveskogh wrote: > > Hi, > > Since I went from Oracle to PostgreSQL I've been missing the "invisable" > column 'rownum'. I often used it to lower the granularity of my data. > For example, say I have a large table containing some sort of statistical > data and want to plot a graph using it. > If the graph is 600 pixels wide I might as well lower the granularity of > my incoming data to 600 measure points before plotting. > > In Oracle I used to do this by using the modulus operator on the rownum > column as a restriction. > > SELECT <column> FROM <table> WHERE mod(rownum, 5) = 0; > > The query above would give me every fifth row of the original result set > and would save me from sending the data over my database connection and do > the lowering of the granularity in the application. > > I have two questions, one dependent on the answer on the other one.. > > 1) Is it possible to achieve this any other _easy_ way? > Perhaps it would be feasible to write a wrapper using a counter which > makes the SELECT and then return every fifth row to the SELECT calling the > wrapper. But then I assume the data still has to "travel" one step which > puts on some overhead. The standard "trick" in Postgresql is to create a temporary sequence and select that as well: create temp sequence aaa; postgres=# select *, nextval('aaa') from accounts; to get a row number. The sequence creation and use is pretty fast, and I've used it before. > > 2) Would it be possible to add one more limit argument to the non-standard > set of LIMIT and OFFET - a 'SCALE n' or 'GRANUL n' argument which would > return every n row of the initial result set. I think that would be > gladly accepted for folks working with statistical data. Using the same trick, you can get every 5th row like this: select * from (select *, nextval('aaa') as row from accounts) as a where a.row%5=4;