I have a string of text that contains html with all different types of links (relative, absolute, root-relative). I need a regex that can be executed by PHP's preg_replace
to replace all relative links with root-relative links, without touching any of the other links. I have the root path already.
Replaced links:
<tag ... href="path/to_file.ext" ... > ---> <tag ... href="/basepath/path/to_file.ext" ... >
<tag ... href="path/to_file.ext" ... /> ---> <tag ... href="/basepath/path/to_file.ext" ... />
Untouched links:
<tag ... href="/any/path" ... >
<tag ... href="/any/path" ... />
<tag ... href="protocol://domain.com/any/path" ... >
<tag ... href="protocol://domain.开发者_StackOverflow社区com/any/path" ... />
If you just want to change the base URI, you can try the BASE
element:
<base href="/basepath/">
But note that changing the base URI affects all relative URIs and not just relative URI paths.
Otherwise, if you really want to use regular expression, consider that a relative path like you want must be of the type path-noscheme (see RFC 3986):
path-noscheme = segment-nz-nc *( "/" segment ) segment = *pchar segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) ; non-zero-length segment without any colon ":" pchar = unreserved / pct-encoded / sub-delims / ":" / "@" pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
So the begin of the URI must match:
^([a-zA-Z0-9-._~!$&'()*+,;=@]|%[0-9a-fA-F]{2})+($|/)
But please use a proper HTML parser for parsing the HTML an build a DOM out of that. Then you can query the DOM to get the href
attributes and test the value with the regular expression above.
I came up with this:
preg_replace('#href=["\']([^/][^\':"]*)["\']#', $root_path.'$1', $html);
It might be a little too simplistic. The obvious flaw I see is that it will also match href="something"
when it is outside of a tag, but hopefully it can get you started.
精彩评论