10 messages in com.mysql.lists.win32RE: Confused query question
FromSent OnAttachments
Januski, Ken18 Aug 2005 11:43 
Urcid Pliego, Paulo18 Aug 2005 12:01 
Januski, Ken18 Aug 2005 12:09 
SGr...@unimin.com18 Aug 2005 12:14 
Januski, Ken18 Aug 2005 12:29 
Januski, Ken18 Aug 2005 15:08 
Januski, Ken18 Aug 2005 15:20 
SGr...@unimin.com19 Aug 2005 05:58 
Januski, Ken19 Aug 2005 10:17 
SGr...@unimin.com19 Aug 2005 10:41 
Subject:RE: Confused query question
From:SGr...@unimin.com (SGr@unimin.com)
Date:08/19/2005 10:41:25 AM
List:com.mysql.lists.win32

No problem!

I deduced from your first post that you have parsed the Apache logs and have generated some tables of information. One of them is an errors table which uniquely identifies (with its ID field) an error. It also has on it fields to say what the error is, when it happened, and what IP address had the problem. Based on that description, I believe that you should probably wind up with more than one error from the same IP address. It can be the same error or a different error, that doesn't matter. What matters is WHEN it occurred. Based on who logged in from that IP *last* before the error happed is the user who received the error. I imagine some error table entries that could look like this (concentrating on just one IP address for now):

-errors table- (id, name, date, ip) 1 error 1 2005-08-05 16:15:00 10.1.1.1 2 error 2 2005-08-05 16:15:05 10.1.1.1 15 error 1 2005-08-05 16:15:10 10.1.1.1 45 error 2 2005-08-05 16:15:20 10.1.1.1 101 error 3 2005-08-05 16:16:08 10.1.1.1 145 error 2 2005-08-05 16:16:00 10.1.1.1 178 error 2 2005-08-05 16:31:20 10.1.1.1 182 error 2 2005-08-05 16:56:00 10.1.1.1 183 error 2 2005-08-05 17:05:05 10.1.1.1 204 error 2 2005-08-05 17:15:48 10.1.1.1

Now, in order to know who was getting each error, we need to correlate the datetime the error occurs with the most recent datetime from the logins table (still focusing on a single IP)

-logins- (ip, logindate, user) 10.1.1.1 2005-08-05 16:00:00 user_blue 10.1.1.1 2005-08-05 16:15:30 user_red 10.1.1.1 2005-08-05 16:31:00 user_blue 10.1.1.1 2005-08-05 17:04:50 user_red

If we run my original query on this data, we should end up with records in tmpLastLogin that look like this

-tmpLastLogin- (error_id, logindate) 1 2005-08-05 16:00:00 2 2005-08-05 16:00:00 15 2005-08-05 16:00:00 45 2005-08-05 16:00:00 101 2005-08-05 16:00:30 145 2005-08-05 16:00:30 178 2005-08-05 16:31:00 182 2005-08-05 16:31:00 183 2005-08-05 17:04:50 204 2005-08-05 17:04:50

That tells us that when (for each error) the user last logged in. Since we know the IP address from the error itself and we now know the logintime we need to match against, we can use those two pieces of information to look into the logins table to uniquely identify a user. (gurus --- please hold off, I see what original design flaw I created. I will correct it later after we get past this issue).

Does that help you to see why I stuck with the error.id and not the IP address? You wanted to match errors to users and different users can share the same IP address so sticking with IP address would have just muddled the entire picture. We use the IP address in the FINAL query along with the most recent login time to pick out a particular user from the logins table. It doesn't belong in tmpLastLogin.

Have I made sense or just confused you more?

Shawn Green Database Administrator Unimin Corporation - Spruce Pine

"Januski, Ken" <kjan@phillynews.com> wrote on 08/19/2005 01:18:15 PM:

Hi Shawn,

I've gone over this again and I can't figure out why the temporary table cares about the errors id. It seems to me that it ought to care about the errors ip. Because of this and the fact that I was also getting unexpected results I thought that you must really have meant errors ip in the temp table.

Could you clarify for me why you use errors id?

Thanks again,

-----Original Message----- From: SGr@unimin.com [mailto:SGr@unimin.com] Sent: Friday, August 19, 2005 8:59 AM To: Januski, Ken Cc: win@lists.mysql.com Subject: RE: Confused query question

I am not sure that changing anything else to link by IP is the correct way to go. You can potentially have several errors from the same IP. Each error from an IP address can be from different users who logged in at different times. That is why I tried to isolate the LAST logintime for each error (matching the IP of the error to the IP of the login). This should have been the data stored in the temporary table along with the ID of the error in question. Please check that we are getting the correct data into tmpLastLogin.

The final query was built to match the errors table with the logins table (matching login to error by IP) while limiting the logins to only those logins whose logindate matched that in tmpLastLogin for the error we were looking up.

Please keep me informed to your results.

Shawn Green Database Administrator Unimin Corporation - Spruce Pine

"Januski, Ken" <kjan@phillynews.com> wrote on 08/18/2005 06:20:47 PM:

Aha! I wasn't quite sure why the sample code that you sent used error_id when it was the ips I was concerned with. Once I changed error_id to error_ip it worked just fine.

I think this has solved the problem and been a great learning

experience

to

boot.

Thanks again,

Ken

-----Original Message----- From: Januski, Ken [mailto:kjan@phillynews.com] Sent: Thursday, August 18, 2005 6:09 PM To: SGr@unimin.com Cc: win@lists.mysql.com Subject: RE: Confused query question

Shawn,

This worked, sort of. There were 117 entries in the errors table but

only

50

rows returned at the end of the query that you suggested. But this has been a great help in getting me started. I'll see if I can figure out the discrepancy and if not I'll get back to you.

Thanks again,

Ken

-----Original Message----- From: SGr@unimin.com [mailto:SGr@unimin.com] Sent: Thursday, August 18, 2005 3:15 PM To: Januski, Ken Cc: win@lists.mysql.com Subject: Re: Confused query question

"Januski, Ken" <kjan@phillynews.com> wrote on 08/18/2005 02:44:04 PM:

I think I may have gotten over my head in terms of my database

knowledge

in

trying to gather some data from apache access_logs by using myqsl.

The problem is that the log includes a username to go with an IP

when

the

user logs in to the web application. Later the log includes errors for certain IP addresses. I'd like to run a query that shows the errors

and

the

user.

To do this I've set up one table, errors, with this structure: IP Date Error ID with the ID being unique and created through autoincrement.

The other table, users, is problematic and has had varying

structures.

This

is the current one: IP Date User ID with the ID being unique and created through autoincrement.

Since the user may logon many times there could be multiple entries

for

a

certain user at a certain IP. There also could be a different user

at

that

IP. So my thought was to include the Date in the table but to

truncate

the

minute and second. So the date in the second table is something like "2005-08-18:07".

I then thought I could run a query that searched for errors.IP, errors.Date, errors.Error, users.User where errors.IP = user.IP and errors.Date like users.Date. If I skip trying to match dates then I get all of the

users

who

have used that IP at any time. So I'd thought I'd limit it by saying only those users who have the same IP but who logged it at a time that

was

LIKE

the time of the error. This would not be perfect since it would miss somebody who logged in at "2005-08-18:11:01:00" but who got an error at "2005-08-18:12:00:00". But it should get anyone who had an error

between

11

and 12 a.m. on 2005-08-18. The problem is that I don't know how to

put

the

value of user.Date in a like statement, e.g. "where errors.Date like '%users.Date%'.

I realize that for anybody who is familiar with databases that what I've done probably breaks a lot of smart database rules and best

practices.

This

is the first time I've found a need to make this sort of query. But I'd appreciate any thoughts on it. I'm hoping for some way to use the like syntax. But it may be that I just need to redesign the tables. I'd prefer not to do this since this is all being done to solve a temporary problem, find out who is getting errors at a certain web page based on time,

ip

and

query text. So it's a bit of a throwaway database and query and I

don't

want

to spend too much time on it.

Thanks for any thoughts,

I'm still using Mysql 4.0.1 by the way

OK Ken,

You have a version of the "maximum of a group" problem. You need to identify which user from a specific IP address has the last login (MAX(DATE) value).

Now, to make this work properly, you need to STOP trimming your login dates and allow it to be a full DATETIME value. Try this definition as a Logins table:

CREATE TABLE logins ( IP varchar(15) not null , logindate datetime not null , user varchar(75) not null , primary key (ip, logindate, user) );

/* Please notice that logindate is NOT a varchar. The primary key (PK) on (ip, logindate, user) is intentionally in that order. */

/* to find the most recent logins for all of your errors (this could

take

a

while): */

CREATE TEMPORARY TABLE tmpLastLogin (key (error_id)) SELECT e.id as error_id, max(l.logindate) as logindate FROM errors e INNER JOIN logins l on e.IP = l.IP and e.date > l.logindate;

/* Now we have enough information to build your desired report (which error belongs to whom): */

SELECT e.id, e.ip, e.error, l.user FROM errors e INNER JOIN tmpLastLogin tll on tll.error_id = e.id INNER JOIN logins l on l.ip = e.ip and l.logindate = tll.logindate;

/* and because people login all the time, you need to drop your temp table to get ready for the next time you need to run this query */

DROP TEMPORARY TABLE tmpLastLogin;

This can be very fast if you already know the error you want to

research

as

the way I wrote the query it will process ALL of your errors at the same time.

Does what I wrote make sense? I would be very happy to go into greater detail if you have any questions.

Shawn Green Database Administrator Unimin Corporation - Spruce Pine