SQL Server Date Range Problem When Using Data From Web Server Logs?

Give the code below a trial run. The code groups and numbers the visits from each IP address. Then it looks to see how many "uristem" hits compared with the "threshold" value.

I tested the code on a table named "Foo" and you need to check your table and column names prior to running the test DECLARE @threshold INT; SET @threshold = 4; --this number should not include the initial visit DECLARE @lookbackdays int; SET @lookbackdays = 300; ;WITH postCTE as ( SELECT ipaddress, uristem, requestdatetime, RowNumber = ROW_NUMBER() OVER (ORDER BY ipaddress,requestdatetime ASC) FROM Foo --put your table name here WHERE requestdatetime > GETDATE() - @lookbackdays ) --select * from postCTE SELECT p1. Ipaddress AS ipaddress, p2. RowNumber - p1.

RowNumber +1 AS Requests, p1. Requestdatetime AS DateStart FROM postCTE p1 INNER JOIN postCTE p2 ON p1. Ipaddress = p2.

Ipaddress AND p1. Rownumber = p2. RowNumber - (@threshold ) WHERE DATEDIFF(minute,p1.

Requestdatetime,p2. Requestdatetime).

Give the code below a trial run. The code groups and numbers the visits from each IP address. Then it looks to see how many "uristem" hits compared with the "threshold" value.

I tested the code on a table named "Foo" and you need to check your table and column names prior to running the test. DECLARE @threshold INT; SET @threshold = 4; --this number should not include the initial visit DECLARE @lookbackdays int; SET @lookbackdays = 300; ;WITH postCTE as ( SELECT ipaddress, uristem, requestdatetime, RowNumber = ROW_NUMBER() OVER (ORDER BY ipaddress,requestdatetime ASC) FROM Foo --put your table name here WHERE requestdatetime > GETDATE() - @lookbackdays ) --select * from postCTE SELECT p1. Ipaddress AS ipaddress, p2.

RowNumber - p1. RowNumber +1 AS Requests, p1. Requestdatetime AS DateStart FROM postCTE p1 INNER JOIN postCTE p2 ON p1.

Ipaddress = p2. Ipaddress AND p1. Rownumber = p2.

RowNumber - (@threshold ) WHERE DATEDIFF(minute,p1. Requestdatetime,p2. Requestdatetime).

Using partition by ipaddress order by requestdatetime would allow to easier manipulate the results and analyse further – vaso Nov 19 '10 at 3:47 RC_Cleland: Your approach is interesting and I will try to use it as a starting point for now, but there are a couple of errors with the logic: It always groups in bundles of five requests... In a much expanded data source, I am unable to know ahead of time how many hits from each IP I will have. An IP can have 5 hits or 500 hits in a single visit. – Jonas Gorauskas Nov 20 '10 at 4:19 Vaso: Can you provide a code sample?

– Jonas Gorauskas Nov 20 '10 at 4:20 I used a threshold variable, the number of additional visits, because you had mentioned the number 5 in your question. One could change the threshold number or even replace it with a parameter. – RC_Cleland Nov 21 '10 at 22:23.

I think the best way to do this is to summarize your data first, then generate your report. Here's how I'd do it. Create a SummaryTable with the FACTS you want (e.g. UserIP, SessionStart, SessionEnd, PageViews) Figure out what you consider a new visit (e.g. I think IIS default session timeout is 20 minutes, so any consecutive hit by an IP after 20 minutes I'll consider a new visit.

) Create a cursor to calculate the summarized data based on your rule. -- Summary Data DECLARE @UserIP AS VARCHAR(15) DECLARE @SessionStart AS DateTime DECLARE @SessionEnd AS DateTime DECLARE @PageViews AS INT -- Current Values DECLARE @ThisUserIP AS VARCHAR(15) DECLARE @ThisVisitTime AS DateTime DECLARE @ThisPage AS VARCHAR(100) -- Declare Cusrsor DECLARE StatCursor CURSOR FAST_FORWARD FOR -- Query, make sure you sort by IP/Date so their data is in cronological order SELECT IPAddress, RequestDateTime, UriStem FROM Stats ORDER BY IPAddress, RequestDateTime OPEN StatCursor FETCH NEXT FROM StatCursor INTO @ThisUserIP, @ThisVisitTime, @ThisPage -- Start New Summary SELECT @UserIP = @ThisUserIP, @SessionStart = @ThisVisitTime, @SessionEnd = @ThisVisitTime, @PageViews = 1 FETCH NEXT FROM StatCursor INTO @ThisUserIP, @ThisVisitTime, @ThisPage WHILE @@FETCH_STATUS = 0 BEGIN -- Check rule IF @UserIP = @ThisUserIP AND @ThisVisitTime.

Zachary: I actually tried the approach with the cursor and it worked really well for really small data sets (in the thousands of rows) but the actual data set that I need to work with is in the billions of rows and this cursor approach is actually processing rows slower than my web servers are creating log entries. – Jonas Gorauskas Nov 20 '10 at 4:24 Wow, that's some serious data... I use AWStats for most of my logging and what it does is generate the summaries for the current day on a daily/hourly basis... so your never waiting to do it all at one time. Once the raw data is processed, you never have to process or use it again!

Do the reports have to be real-time? Can you process in batch? Also, do you have any indexes on your raw data?

– Zachary Nov 20 '10 at 5:13.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions