I want to write an online application that:
- reads the URL from address bar of the browser
- extracts its lexical features (like n-grams)
- extracts its host base开发者_StackOverflowd features (fetch DNS records online, its A, PTR, TTL fields)
- classify the URL into malicious or benign (using machine learning)
Can anyone help me with 1 and 3?
I don't believe this (application) is a task you can accomplish, as you can't really determine site content based on url.
See something like Mozilla Phishing Protection Design Documentation and Google Safe Browsing spec instead
No idea what language you may be looking at.
For Item 1 here is a .net library that maybe helpful
http://msdn.microsoft.com/en-us/library/system.web.httputility.aspx
精彩评论