开发者

To remove garbage characters from a string using regex

开发者 https://www.devze.com 2023-01-01 09:50 出处:网络
I want to remove characters from a string other then a-z, and A-Z. Created following function for the same and it works fine.

I want to remove characters from a string other then a-z, and A-Z. Created following function for the same and it works fine.

public String stripGarbage(String s) {
 String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";
 String result = "";
 for (int i = 0; i < s.length(); i++) {
     if (good.indexOf(s.charAt(i)) >=开发者_如何学Python 0) {
             result += s.charAt(i);
      }
   }
 return result;
}

Can anyone tell me a better way to achieve the same. Probably regex may be better option.

Regards

Harry


Here you go:

result = result.replaceAll("[^a-zA-Z0-9]", "");

But if you understand your code and it's readable then maybe you have the best solution:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.


The following should be faster than anything using regex, and your initial attempt.

public String stripGarbage(String s) {
    StringBuilder sb = new StringBuilder(s.length());
    for (int i = 0; i < s.length(); i++) {
        char ch = s.charAt(i);
        if ((ch >= 'A' && ch <= 'Z') || 
            (ch >= 'a' && ch <= 'z') ||
            (ch >= '0' && ch <= '9')) {
            sb.append(ch);
        }
    }
    return sb.toString();
}

Key points:

  • It is significantly faster use a StringBuilder than string concatenation in a loop. (The latter generates N - 1 garbage strings and copies N * (N + 1) / 2 characters to build a String containing N characters.)

  • If you have a good estimate of the length of the result String, it is a good idea to preallocate the StringBuilder to hold that number of characters. (But if you don't have a good estimate, the cost of the internal reallocations etc amortizes to O(N) where N is the final string length ... so this is not normally a major concern.)

  • Searching testing a character against (up to) 3 character ranges will be significantly faster on average than searching for a character in a 62 character String.

  • A switch statement might be faster especially if there are more character ranges. However, in this case it will take many more lines of code to list the cases for all of the letters and digits.

  • If the non-garbage characters match existing predicates of the Character class (e.g. Character.isLetter(char) etc) you could use those. This would be a good option if you wanted to match any letter or digit ... rather than just ASCII letters and digits.

  • Other alternatives to consider are using a HashSet<Character> or a boolean[] indexed by character that were pre-populated with the non-garbage characters. These approaches work well if the set of non-garbage characters is not known at compile time.


This regex works:

result=s.replace(/[^A-Z0-9a-z]/ig,'');

s being the string passed to you function and result is the string with alphanumeric and numbers only.


I know this post is old, but you can shorten Stephen C's answer a little by using the System.Char structure.

public String RemoveNonAlphaNumeric(String value)
{
    StringBuilder sb = new StringBuilder(value);
    for (int i = 0; i < value.Length; i++)
    {
        char ch = value[i];

        if (Char.IsLetterOrDigit(ch))
        {
            sb.Append(ch);
        }
    }
    return sb.ToString();
}

Still accomplishes the same thing in a more compact fashion.

The Char has some really great functions for checking text. Here are some for your future reference.

Char.GetNumericValue()         
Char.IsControl()              
Char.IsDigit()             
Char.IsLetter()              
Char.IsLower()             
Char.IsNumber()         
Char.IsPunctuation()          
Char.IsSeparator()            
Char.IsSymbol()         
Char.IsWhiteSpace()


this works:

public static String removeGarbage(String s) {
        String r = "";
        for ( int i = 0; i < s.length(); i++ )
            if ( s.substring(i,i+1).matches("[A-Za-z]") ) // [A-Za-z0-9] if you want include numbers
                r = r.concat(s.substring(i, i+1));
        return r;
    }

(edit: although it's not so efficient)


/**
 *   Remove characters from a string other than ASCII
 *   
 * */
 private static StringBuffer goodBuffer = new StringBuffer();
    // Static initializer for ACSII
static {
     for (int c=1; c<128; c++) {
         goodBuffer.append((char)c);
       }
}

public String stripGarbage(String s) {
     //String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";       
     String good = goodBuffer.toString();
     String result = "";
     for (int i = 0; i < s.length(); i++) {
         if (good.indexOf(s.charAt(i)) >= 0) {
                 result += s.charAt(i);
          }
         else
             result += " ";
       }
     return result;
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消