im about to write a class that takes a lo开发者_JS百科ok on the html source code and filters all pdf links from it. the idea behind it is just take the parent link + the relative link.. basically it's working for
<a href="blabla/123.pdf">pdf</a>
but in some cases it doesn't e.g. if the same pdf link is written as
<a href="./blabla/123.pdf">pdf</a>
or
<a href=" blabla/123.pdf">pdf</a>
(point and space) both are working links and goes to the same pdf in the same directory if they are parsed in browsers, but for the composition in my class completely useless.
i fixed the problem for the two cases above. the question is if there are other special cases in syntax where i should pay attention on.
You do not know what the link points to until you download the file.
I can have a link like http://www.mysite.com/pages/brochure.html
which internally redirects to a PDF file.
So, if you're not in control of the links, or working on a particular section of your site, you're going to fail.
On the other hand, if you're working on a specific section of the site, where you know every PDF link has a .pdf
estension, you can simply check the extension and not the whole path (don't know how's written in Java the .lastIndexOf("string")
thing of C#).
精彩评论