10 messages in com.mysql.lists.win32RE: Confused query question| From | Sent On | Attachments |
|---|---|---|
| Januski, Ken | 18 Aug 2005 11:43 | |
| Urcid Pliego, Paulo | 18 Aug 2005 12:01 | |
| Januski, Ken | 18 Aug 2005 12:09 | |
| SGr...@unimin.com | 18 Aug 2005 12:14 | |
| Januski, Ken | 18 Aug 2005 12:29 | |
| Januski, Ken | 18 Aug 2005 15:08 | |
| Januski, Ken | 18 Aug 2005 15:20 | |
| SGr...@unimin.com | 19 Aug 2005 05:58 | |
| Januski, Ken | 19 Aug 2005 10:17 | |
| SGr...@unimin.com | 19 Aug 2005 10:41 |
| Subject: | RE: Confused query question![]() |
|---|---|
| From: | SGr...@unimin.com (SGr...@unimin.com) |
| Date: | 08/19/2005 10:41:25 AM |
| List: | com.mysql.lists.win32 |
No problem!
I deduced from your first post that you have parsed the Apache logs and have generated some tables of information. One of them is an errors table which uniquely identifies (with its ID field) an error. It also has on it fields to say what the error is, when it happened, and what IP address had the problem. Based on that description, I believe that you should probably wind up with more than one error from the same IP address. It can be the same error or a different error, that doesn't matter. What matters is WHEN it occurred. Based on who logged in from that IP *last* before the error happed is the user who received the error. I imagine some error table entries that could look like this (concentrating on just one IP address for now):
-errors table- (id, name, date, ip) 1 error 1 2005-08-05 16:15:00 10.1.1.1 2 error 2 2005-08-05 16:15:05 10.1.1.1 15 error 1 2005-08-05 16:15:10 10.1.1.1 45 error 2 2005-08-05 16:15:20 10.1.1.1 101 error 3 2005-08-05 16:16:08 10.1.1.1 145 error 2 2005-08-05 16:16:00 10.1.1.1 178 error 2 2005-08-05 16:31:20 10.1.1.1 182 error 2 2005-08-05 16:56:00 10.1.1.1 183 error 2 2005-08-05 17:05:05 10.1.1.1 204 error 2 2005-08-05 17:15:48 10.1.1.1
Now, in order to know who was getting each error, we need to correlate the datetime the error occurs with the most recent datetime from the logins table (still focusing on a single IP)
-logins- (ip, logindate, user) 10.1.1.1 2005-08-05 16:00:00 user_blue 10.1.1.1 2005-08-05 16:15:30 user_red 10.1.1.1 2005-08-05 16:31:00 user_blue 10.1.1.1 2005-08-05 17:04:50 user_red
If we run my original query on this data, we should end up with records in tmpLastLogin that look like this
-tmpLastLogin- (error_id, logindate) 1 2005-08-05 16:00:00 2 2005-08-05 16:00:00 15 2005-08-05 16:00:00 45 2005-08-05 16:00:00 101 2005-08-05 16:00:30 145 2005-08-05 16:00:30 178 2005-08-05 16:31:00 182 2005-08-05 16:31:00 183 2005-08-05 17:04:50 204 2005-08-05 17:04:50
That tells us that when (for each error) the user last logged in. Since we know the IP address from the error itself and we now know the logintime we need to match against, we can use those two pieces of information to look into the logins table to uniquely identify a user. (gurus --- please hold off, I see what original design flaw I created. I will correct it later after we get past this issue).
Does that help you to see why I stuck with the error.id and not the IP address? You wanted to match errors to users and different users can share the same IP address so sticking with IP address would have just muddled the entire picture. We use the IP address in the FINAL query along with the most recent login time to pick out a particular user from the logins table. It doesn't belong in tmpLastLogin.
Have I made sense or just confused you more?
Shawn Green Database Administrator Unimin Corporation - Spruce Pine
"Januski, Ken" <kjan...@phillynews.com> wrote on 08/19/2005 01:18:15 PM:
Hi Shawn,
I've gone over this again and I can't figure out why the temporary table cares about the errors id. It seems to me that it ought to care about the errors ip. Because of this and the fact that I was also getting unexpected results I thought that you must really have meant errors ip in the temp table.
Could you clarify for me why you use errors id?
Thanks again,
Ken
-----Original Message----- From: SGr...@unimin.com [mailto:SGr...@unimin.com] Sent: Friday, August 19, 2005 8:59 AM To: Januski, Ken Cc: win...@lists.mysql.com Subject: RE: Confused query question
I am not sure that changing anything else to link by IP is the correct way to go. You can potentially have several errors from the same IP. Each error from an IP address can be from different users who logged in at different times. That is why I tried to isolate the LAST logintime for each error (matching the IP of the error to the IP of the login). This should have been the data stored in the temporary table along with the ID of the error in question. Please check that we are getting the correct data into tmpLastLogin.
The final query was built to match the errors table with the logins table (matching login to error by IP) while limiting the logins to only those logins whose logindate matched that in tmpLastLogin for the error we were looking up.
Please keep me informed to your results.
Shawn Green Database Administrator Unimin Corporation - Spruce Pine
"Januski, Ken" <kjan...@phillynews.com> wrote on 08/18/2005 06:20:47 PM:
Aha! I wasn't quite sure why the sample code that you sent used error_id when it was the ips I was concerned with. Once I changed error_id to error_ip it worked just fine.
I think this has solved the problem and been a great learning
experience
to
boot.
Thanks again,
Ken
only
50
rows returned at the end of the query that you suggested. But this has been a great help in getting me started. I'll see if I can figure out the discrepancy and if not I'll get back to you.
Thanks again,
Ken
-----Original Message----- From: SGr...@unimin.com [mailto:SGr...@unimin.com] Sent: Thursday, August 18, 2005 3:15 PM To: Januski, Ken Cc: win...@lists.mysql.com Subject: Re: Confused query question
"Januski, Ken" <kjan...@phillynews.com> wrote on 08/18/2005 02:44:04 PM:
I think I may have gotten over my head in terms of my database
knowledge
in
trying to gather some data from apache access_logs by using myqsl.
The problem is that the log includes a username to go with an IP
when
the
user logs in to the web application. Later the log includes errors for certain IP addresses. I'd like to run a query that shows the errors
and
the
user.
To do this I've set up one table, errors, with this structure: IP Date Error ID with the ID being unique and created through autoincrement.
The other table, users, is problematic and has had varying
structures.
This
is the current one: IP Date User ID with the ID being unique and created through autoincrement.
Since the user may logon many times there could be multiple entries
for
a
certain user at a certain IP. There also could be a different user
at
that
IP. So my thought was to include the Date in the table but to
truncate
the
minute and second. So the date in the second table is something like "2005-08-18:07".
I then thought I could run a query that searched for errors.IP, errors.Date, errors.Error, users.User where errors.IP = user.IP and errors.Date like users.Date. If I skip trying to match dates then I get all of the
users
who
have used that IP at any time. So I'd thought I'd limit it by saying only those users who have the same IP but who logged it at a time that
was
LIKE
the time of the error. This would not be perfect since it would miss somebody who logged in at "2005-08-18:11:01:00" but who got an error at "2005-08-18:12:00:00". But it should get anyone who had an error
between
11
and 12 a.m. on 2005-08-18. The problem is that I don't know how to
put
the
value of user.Date in a like statement, e.g. "where errors.Date like '%users.Date%'.
I realize that for anybody who is familiar with databases that what I've done probably breaks a lot of smart database rules and best
practices.
This
is the first time I've found a need to make this sort of query. But I'd appreciate any thoughts on it. I'm hoping for some way to use the like syntax. But it may be that I just need to redesign the tables. I'd prefer not to do this since this is all being done to solve a temporary problem, find out who is getting errors at a certain web page based on time,
ip
and
query text. So it's a bit of a throwaway database and query and I
don't
want
to spend too much time on it.
Thanks for any thoughts,
Ken
I'm still using Mysql 4.0.1 by the way
OK Ken,
You have a version of the "maximum of a group" problem. You need to identify which user from a specific IP address has the last login (MAX(DATE) value).
Now, to make this work properly, you need to STOP trimming your login dates and allow it to be a full DATETIME value. Try this definition as a Logins table:
CREATE TABLE logins ( IP varchar(15) not null , logindate datetime not null , user varchar(75) not null , primary key (ip, logindate, user) );
/* Please notice that logindate is NOT a varchar. The primary key (PK) on (ip, logindate, user) is intentionally in that order. */
/* to find the most recent logins for all of your errors (this could
take
a
while): */
CREATE TEMPORARY TABLE tmpLastLogin (key (error_id)) SELECT e.id as error_id, max(l.logindate) as logindate FROM errors e INNER JOIN logins l on e.IP = l.IP and e.date > l.logindate;
/* Now we have enough information to build your desired report (which error belongs to whom): */
SELECT e.id, e.ip, e.error, l.user FROM errors e INNER JOIN tmpLastLogin tll on tll.error_id = e.id INNER JOIN logins l on l.ip = e.ip and l.logindate = tll.logindate;
/* and because people login all the time, you need to drop your temp table to get ready for the next time you need to run this query */
DROP TEMPORARY TABLE tmpLastLogin;
This can be very fast if you already know the error you want to
research
as
the way I wrote the query it will process ALL of your errors at the same time.
Does what I wrote make sense? I would be very happy to go into greater detail if you have any questions.
Shawn Green Database Administrator Unimin Corporation - Spruce Pine




