Messages per Month
|Subject:||Ideas for RAT 1 Cut&Paste Detector project|
|From:||Marija Šljivović (mak...@gmail.com)|
|Date:||Apr 3, 2009 11:24:25 am|
Hi, my name is Marija Šljivović and I am a student of informatics and mathematics in Serbia. I am interested in "RAT 1 Cut&Paste Detector" project. I find this project very interesting because there are already several tools which provide finding duplicated code (PMD, Simian...), but neither one of them can check code on internet. This will be the greatest difference between them and Apache RAT. This is something completely new and I would like to be a part of it.
I have just set up my application at http://socghop.appspot.com/student_proposal/show/google/gsoc2009/maka/t123843563294
where I presented my ideas. This project is very interesting to me and I think that some of my ideas will be useful for realization this project no matter who works on it.
RAT will work in similar way like PMD and ChackStyle Eclipse plugin-s work, but it will be retrieve code for comparation from several search engines. This tool will have xml configuration file for each search engine (engines may change search query's gramatic ). In those files will be defined characteristic properties for each search engine (for example - checking only results written in particularly program language).
In order to prevent search engines to suspect this robot for DDOS attack, this tool must support waiting for certain time amount between each two queries. This time amount will be defined in configuration file.
This tool must support multithreading (checking source in multiple search engines in same time).
I think that this tool also must support pause/continue because of potentially big time amount needed for checking big source-code bases with search engines. That way if we stop checking source-code base for any reason (lost of internet connection for example ) we wont have to start checking from the start but simply continue from where we stop last time.
In addition a Swing GUI can be made for this tool. It will support configuration (for xml files I mention before) and managing all process from selecting source-code for comparation, running checking process to viewing generated report.
The biggest challenge will be to make search engines to cooperate. I think that whole source code must be checked - this means not only suspected chunks of code will be examined. Because of that, it must been learned how search engines works to avoid multiple queries.