Thread: nested query on last n rows of huge table

nested query on last n rows of huge table

From
"jcfischer"
Date:
Hi list

I'm fairly new to Postgres so bear with me. Googling and searching the
list, I didn't find anything that resembled my problem.

I have a large table with ca. 10 million inserts per day (fairly simple
data: timestam, a couple of id's and a varchar message)

I run a query every couple of minutes that looks at the new entries
since the last run and retrieves them for further processing (using a
WHERE eventtime > '2006-02-24 14:00:00' ) to limit to the most recent
entries

These queries run around 40-50 seconds (largely due to some LIKE %msg%
threwn in for good measure). Postgres performs a seq table scan on
those queries :-(

My idea is to limit the search to only the last n entries because I
found that a

SELECT * from table ORDER eventtime DESC limit 1000

is very fast. Because the inserts are in chronolgical order, I can
store the sequential id of the highest row from the last query and
subtract that from the current high row count to determine that number.

Is there a way to limit the expensive query to only those last 1000 (or
whatever) results?

I have tried to nest SELECTS but my SQL-fu is to limited to get
anything through the SQL processor :-)

thanks
Jens-Christian Fischer


Re: nested query on last n rows of huge table

From
"jcfischer"
Date:
sorry: Postgres 8.0.2 server. The EXPLAIN ANALYZE for the query looks
like this:

explain analyze select
syslog.logs.eventtime,assets.hosts.name,syslog.processes.name as
process
from
syslog.logs,assets.hosts,assets.ipaddrs,assets.macaddrs,syslog.processes
where msg like '%session opened for user root%'
and syslog.logs.assets_ipaddr_id = assets.ipaddrs.id
and assets.ipaddrs.macaddr_id = assets.macaddrs.id
and assets.macaddrs.host_id = assets.hosts.id
and syslog.processes.id = syslog.logs.process_id and
eventtime > timestamp '2006-02-24 15:05:00'

Nested Loop  (cost=0.00..328832.34 rows=2 width=254) (actual
time=49389.924..49494.665 rows=45 loops=1)
  ->  Nested Loop  (cost=0.00..328826.32 rows=1 width=90) (actual
time=49365.709..49434.500 rows=45 loops=1)
        ->  Nested Loop  (cost=0.00..328820.30 rows=1 width=90) (actual
time=49327.211..49360.043 rows=45 loops=1)
              ->  Nested Loop  (cost=0.00..328814.27 rows=1 width=90)
(actual time=49327.183..49344.281 rows=45 loops=1)
                    ->  Seq Scan on logs  (cost=0.00..328809.04 rows=1
width=16) (actual time=49314.928..49331.451 rows=45 loops=1)
                          Filter: (((msg)::text ~~ '%session opened for
user root%'::text) AND (eventtime > '2006-02-24 15:05:00'::timestamp
without time zone))
                    ->  Index Scan using "pk_syslog.processes" on
processes  (cost=0.00..5.21 rows=1 width=82) (actual time=0.278..0.280
rows=1 loops=45)
                          Index Cond: (processes.id =
"outer".process_id)
              ->  Index Scan using "pk_assets.ipaddrs" on ipaddrs
(cost=0.00..6.01 rows=1 width=8) (actual time=0.344..0.346 rows=1
loops=45)
                    Index Cond: ("outer".assets_ipaddr_id = ipaddrs.id)
        ->  Index Scan using "pk_assets.macaddrs" on macaddrs
(cost=0.00..6.01 rows=1 width=8) (actual time=1.648..1.650 rows=1
loops=45)
              Index Cond: ("outer".macaddr_id = macaddrs.id)
  ->  Index Scan using "pk_assets.hosts" on hosts  (cost=0.00..6.01
rows=1 width=172) (actual time=1.330..1.331 rows=1 loops=45)
        Index Cond: ("outer".host_id = hosts.id)
Total runtime: 49494.830 ms


Re: nested query on last n rows of huge table

From
Michael Glaesemann
Date:
On Feb 24, 2006, at 23:13 , jcfischer wrote:

> Is there a way to limit the expensive query to only those last 1000
> (or
> whatever) results?

>
> I have tried to nest SELECTS but my SQL-fu is to limited to get
> anything through the SQL processor :-)

The basics of a subquery are:

SELECT <expensive query>
FROM (
    SELECT *
    FROM table
    ORDER eventtime DESC
    LIMIT 1000
    ) as most_recent_1000

Don't know enough about the other parts, but hopefully this can get
you started. :)

Michael Glaesemann
grzm myrealbox com