Hello I am trying save a value from an input tag in some HTML source code. The tag looks like so:
<开发者_运维问答input name="user_status" value="3" />
I have the page source in a variable (pageSourceCode), and need to work out some regex to get the value (3 in this example). I have this so far:
Dim sCapture As String = System.Text.RegularExpressions.Regex.Match(pageSourceCode, "\<input\sname\=\""user_status\""\svalue\=\""(.*)?\""\>").Groups(1).Value
Which works fine most of the time, however this code is used to process source code from multiple sites (that use the same platform), and sometimes there are other attributes included in the input tag, or they are in a different order, eg:
<input class="someclass" type="hidden" value="3" name="user_status" />
I just dont understand regex enough to cope with these situations.
Any help very much appreciated.
PS Although i am looking for a specific answer to this question if at all possible, a pointer to a good regex tutorial would be great as well
Thanks
You can search for <input[^>]*\bvalue="([^"]+)"
if your input
tags never contain angle brackets.
[^>]*
matches any number of characters except >
which keeps the regex from accidentally matching across tags.
\b
ensures that we only match value
and not something like x_value
.
EDIT:
If you only want to look at input
tags where name="user_status"
, then you can do this with an additional lookahead assertion:
<input(?=[^>]*name="user_status")[^>]*\bvalue="([^"]+)"
In VB.NET:
ResultString = Regex.Match(SubjectString, "<input(?=[^>]*user_status=""name"")[^>]*\bvalue=""([^""]+)").Groups(1).Value
A good tutorial can be found at http://www.regular-expressions.info
Assuming this is an ASP.Net page and not some external HTML you can't control the better solution would be simply to access the control.
Add an ID field to your input control and a runat="server" like this.
<input id="user_status" runat="server" class="someclass" type="hidden" value="3" name="user_status" />
You can probably get rid of the Name field. It's typically the same as the ID field and ID is a better choice. You can actually have both an ID and Name field if you want and they can both be the same value.
In your code behind you can then access the value by the ID with no need for a regex.
Me.user_status.value
精彩评论