Ok I feel really stupid asking this. I see plenty of other questions that resemble my question, but none seem to be able to answer it.
I am creating an xml file for a program that is very picky about syntax. Sadly I am making开发者_如何学JAVA the XML file from scratch. Meaning, I am placing each line in individually (lots of file.WriteLine(String)).
I know this is ugly, but its the only way I can get the logic to work out.
ANYWAY. I have a few strings that are coming through with '&' in them.
if (value.Contains("&"))
{
value.Replace("&", "&");
}
Does not seem to work. The value.Contains() seems to see it, but the replace does not work. I am using C# .Net 2.0 sp2. VS 2005.
Please help me out here.. Its been a long week..
If you really want to go that route, you have to assign the result of Replace
(the method returns a new string because strings are immutable) back to the variable:
value = value.Replace("&", "&");
I would suggest rethinking the way you're writing your XML though. If you switch to using the XmlTextWriter
, it will handle all of the encoding for you (not only the ampersand, but all of the other characters that need encoded as well):
using(var writer = new XmlTextWriter(@"C:\MyXmlFile.xml", null))
{
writer.WriteStartElement("someString");
writer.WriteText("This is < a > string & everything will get encoded");
writer.WriteEndElement();
}
Should produce:
<someString>This is < a > string &
everything will get encoded</someString>
You should really use something like Linq to XML (XDocument
etc.) to solve it. I'm 100% sure you can do it without all your WriteLine
´s ;) Show us your logic?
Otherwise you could use this which will be bullet proof (as opposed to .Replace("&")
):
var value = "hej&hej<some>";
value = new System.Xml.Linq.XText(value).ToString(); //hej&hej<some>
This will also take care of <
which you also HAVE TO escape :)
Update: I have looked at the code for XText.ToString()
and internally it creates a XmlWriter
+ StringWriter
and uses XNode.WriteTo
. This may be overkill for a given application so if many strings should be converted, XText.WriteTo
would be better. An alternative which should be fast and reliant is System.Web.HttpUtility.HtmlEncode
.
Update 2: I found this System.Security.SecurityElement.Escape(xml)
which may be the fastest and ensures max compatibility (supported since .Net 1.0 and does not require the System.Web reference).
you can also use HttpUtility.HtmlEncode class under System.Web namespace instead of doing the replacement yourself. here you go: http://msdn.microsoft.com/en-us/library/73z22y6h.aspx
You can use Regex for replace char "&" only in node values:
input data example (string)
<select>
<option id="11">Gigamaster&Minimaster</option>
<option id="12">Black & White</option>
<option id="13">Other</option>
</select>
Replace with Regex
Regex rgx = new Regex(">(?<prefix>.*)&(?<sufix>.*)<");
data = rgx.Replace(data, ">${prefix}&${sufix}<");
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(data);
result data
<select>
<option id="11">Gigamaster&MiniMaster</option>
<option id="12">Black & White</option>
<option id="13">Other</option>
</select>
I'm Obviously very late to this, but the right answer is:
System.Text.RegularExpressions.Regex.Replace(input, "&(?!amp;)", "&");
Hope this helps somebody!
You can try:
value = value.Replace("&", "&");
Strings are immutable. You need to write:
value = value.Replace("&", "&");
Note that if you do this and your string contains "&"
, it's going to get changed to "&amp;"
.
I've created the following function to encode & and ' without messing up with already encoded & or ' or "
public static string encodeSelectXMLCharacters(string xmlString)
{
string returnValue = Regex.Replace(xmlString, "&(?!quot;|apos;|amp;|lt;|gt;#x?.*?;)|'",
delegate(Match m)
{
string encodedValue;
switch (m.Value)
{
case "&":
encodedValue = "&";
break;
case "'":
encodedValue = "'";
break;
default:
encodedValue = m.Value;
break;
}
return encodedValue;
});
return returnValue;
}
not sure if this is useful to anyone... I was fighting this for a while... here is a glorious regex you can use to fix all your links, javascript, content. I had to deal with a ton of legacy content that nobody wanted to correct.
Add this to your Render override in your master page, control or recode to run a string through it. Please don't flame me for putting this in the wrong place:
// remove the & from href="blaw?a=b&b=c" and replace with &
//in urls - this corrects any unencoded & not just those in URL's
// this match will also ignore any matches it finds within <script> blocks AND
// it will also ignore the matches where the link includes a javascript command like
// <a href="javascript:alert{'& & &'}">blaw</a>
html = Regex.Replace(html, "&(?!(?<=(?<outerquote>[\"'])javascript:(?>(?!\\k<outerquote>|[>]).)*)\\k<outerquote>?)(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\\d+);)(?!(?>(?:(?!<script|\\/script>).)*)\\/script>)", "&", RegexOptions.Singleline | RegexOptions.IgnoreCase);
Its a broad stroke for a rendered page but this can be adapted to many uses without blowing up your page.
What about
Value = Server.HtmlEncode(Value);
I am quite sure it will work if you "embrace" your value with CDATA, so the result is something like
<ampersandData><![CDATA[value with ampersands like …]]></ampersandData>
Hope it helps.
Michael
Very late here, but I want to share my solution which handles the cases where you have both & (incorrect xml) and & (valid xml) in the document in addition to other xml character entities.
This solution is only meant for cases where you cannot control generation of the xml, usually because it comes from some external source. If you control the xml generation please use XmlTextWriter as suggested by @Justin Niessner
It is also quite fast and handles all the different xml character entities/references
Predefined character entities:
& quot;
& amp;
& apos;
& lt;
& gt;
Numeric character entities/references:
& #nnnn;
& #xhhhh;
PS! The space after & should not be included in the entities/references, I just added it here to avoid it being encoded in the page rendering
Code
public static string CleanXml(string text)
{
int length = text.Length;
StringBuilder stringBuilder = new StringBuilder(length);
for (int i = 0; i < length; ++i)
{
if (text[i] == '&')
{
var remaining = Math.Abs(length - i + 1);
var subStrLength = Math.Min(remaining, 12);
var subStr = text.Substring(i, subStrLength);
var firstIndexOfSemiColon = subStr.IndexOf(';');
if (firstIndexOfSemiColon > -1)
subStr = subStr.Substring(0, firstIndexOfSemiColon + 1);
var matches = Regex.Matches(subStr, "&(?!quot;|apos;|amp;|lt;|gt;|#x?.*?;)|'");
if (matches.Count > 0)
stringBuilder.Append("&");
else
stringBuilder.Append("&");
}
else if (XmlConvert.IsXmlChar(text[i]))
{
stringBuilder.Append(text[i]);
}
else if (i + 1 < length && XmlConvert.IsXmlSurrogatePair(text[i + 1], text[i]))
{
stringBuilder.Append(text[i]);
stringBuilder.Append(text[i + 1]);
++i;
}
}
return stringBuilder.ToString();
}
精彩评论