开发者

string split by index / params?

开发者 https://www.devze.com 2023-03-30 00:43 出处:网络
Just before I write my own function just wanted to check if there exists a function likestring.split(string input, params int[] indexes)in the .NET library?

Just before I write my own function just wanted to check if there exists a function like string.split(string input, params int[] indexes) in the .NET library? This function should split the string on indexes i pass to it.

Edit: I shouldn't have added the string.join sen开发者_如何学Ctence - it was confusing.


You could use the String instance method Substring.

string a = input.Substring(0, 10);
string b = input.Substring(10, 5);
string c = input.Substring(15, 3);


All other answers just seemed too complicated, so I took a stab.

using System.Linq;

public static class StringExtensions
{
    /// <summary>
    ///     Returns a string array that contains the substrings in this instance that are delimited by specified indexes.
    /// </summary>
    /// <param name="source">The original string.</param>
    /// <param name="index">An index that delimits the substrings in this string.</param>
    /// <returns>An array whose elements contain the substrings in this instance that are delimited by one or more indexes.</returns>
    /// <exception cref="ArgumentNullException"><paramref name="index" /> is null.</exception>
    /// <exception cref="ArgumentOutOfRangeException">An <paramref name="index" /> is less than zero or greater than the length of this instance.</exception>
    public static string[] SplitAt(this string source, params int[] index)
    {
        index = index.Distinct().OrderBy(x => x).ToArray();
        string[] output = new string[index.Length + 1];
        int pos = 0;

        for (int i = 0; i < index.Length; pos = index[i++])
            output[i] = source.Substring(pos, index[i] - pos);

        output[index.Length] = source.Substring(pos);
        return output;
    }
}


The Split method divides a string based on a recognition pattern. Perfect for breaking down comma seperated lists etc.

But you are right, there are no built in string methods to achieve what you want.


This doesn't directly answer your generalized question, but in what is most likely the common case (or at least the case for which I was searching for an answer when I came upon this SO question) where indexes is a single int, this extension method is a little cleaner than returning a string[] array, especially in C# 7.

For what it's worth, I benchmarked using string.Substring() against creating two char[] arrays, calling text.CopyTo() and returning two strings by calling new string(charArray). Using string.Substring() was roughly twice as fast.

C# 7 syntax

jdoodle.com example

public static class StringExtensions
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static (string left, string right) SplitAt(this string text, int index) => 
        (text.Substring(0, index), text.Substring(index));
}

public static class Program
{
    public static void Main()
    {
        var (left, right) = "leftright".SplitAt(4);
        Console.WriteLine(left);
        Console.WriteLine(right);
    }
}

C# 6 syntax

jdoodle.com example

Note: Using Tuple<string, string> in versions prior to C# 7 doesn't save much in the way of verbosity and it might actually be cleaner to just return a string[2] array.

public static class StringExtensions
{
    // I'd use one or the other of these methods, and whichever one you choose, 
    // rename it to SplitAt()

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static Tuple<string, string> TupleSplitAt(this string text, int index) => 
        Tuple.Create<string, string>(text.Substring(0, index), text.Substring(index));

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static string[] ArraySplitAt(this string text, int index) => 
        new string[] { text.Substring(0, index), text.Substring(index) };
}

public static class Program
{
    public static void Main()
    {
        Tuple<string, string> stringsTuple = "leftright".TupleSplitAt(4);
        Console.WriteLine("Tuple method");
        Console.WriteLine(stringsTuple.Item1);
        Console.WriteLine(stringsTuple.Item2);

        Console.WriteLine();

        Console.WriteLine("Array method");
        string[] stringsArray = "leftright".ArraySplitAt(4);
        Console.WriteLine(stringsArray[0]);
        Console.WriteLine(stringsArray[1]);
    }
}


One possible solution:

public static class StringExtension
{
    public static string[] Split(this string source, params int[] sizes)
    {
        var length = sizes.Sum();
        if (length > source.Length) return null;

        var resultSize = sizes.Length;
        if (length < source.Length) resultSize++;

        var result = new string[resultSize];

        var start = 0;
        for (var i = 0; i < resultSize; i++)
        {
            if (i + 1 == resultSize)
            {
                result[i] = source.Substring(start);
                break;
            }

            result[i] = source.Substring(start, sizes[i]);
            start += sizes[i];
        }

        return result;
    }
}


public static IEnumerable<string> SplitAt(this string source, params int[] index)
{
    var indices = new[] { 0 }.Union(index).Union(new[] { source.Length });

    return indices
                .Zip(indices.Skip(1), (a, b) => (a, b))
                .Select(_ => source.Substring(_.a, _.b - _.a));
}

var s = "abcd";

s.SplitAt(); // "abcd"
s.SplitAt(0); // "abcd"
s.SplitAt(1); // "a", "bcd"
s.SplitAt(2); // "ab", "cd"
s.SplitAt(1, 2) // "a", "b", "cd"
s.SplitAt(3); // "abc", "d"


There is always regular expressions.

Here's an example from which one can expand upon:

 string text = "0123456789ABCDEF";
 Match m = new Regex("(.{7})(.{4})(.{5})").Match(text);
 if (m.Success)
 {
     var result = new string[m.Groups.Count - 1];
     for (var i = 1; i < m.Groups.Count; i++)
         result[i - 1] = m.Groups[i].Value;
 }

Here's a function that encapsulates the above logic:

    public static string[] SplitAt(this string text, params int[] indexes)
    {
        var pattern = new StringBuilder();
        var lastIndex = 0;
        foreach (var index in indexes)
        {
            pattern.AppendFormat("(.{{{0}}})", index - lastIndex);
            lastIndex = index;
        }
        pattern.Append("(.+)");

        var match = new Regex(pattern.ToString()).Match(text);
        if (! match.Success)
        {
            throw new ArgumentException("text cannot be split by given indexes");
        }

        var result = new string[match.Groups.Count - 1];
        for (var i = 1; i < match.Groups.Count; i++)
            result[i - 1] = match.Groups[i].Value;
        return result;            
    }

This was written rather quickly but I believe it illustrates my points and emphasizes my points to author of comment, Michael.


Version with "List< string >" as return.

Caller

string iTextLine = "02121AAAARobert Louis StevensonXXXX"
int[] tempListIndex = new int[] {
    // 0 -  // 0number  (exclude first)
    5,      // 1user
    9,      // 2name
    31      // role
};  

// GET - words from indexes
List<string> tempWords = getListWordsFromLine(iTextLine, tempListIndex);

method

/// <summary>
/// GET - split line in parts using index cuts
/// </summary>
/// <param name="iListIndex">Input List of indexes</param>
/// <param name="iTextLine">Input line to split</param>
public static List<string> getListWordsFromLine(string iTextLine, int[] iListIndex)
{
    // INIT
    List<string> retObj = new List<string>(); 
    int currStartPos = 0;
    // GET - clear index list from dupl. and sort it
    int[] tempListIndex = iListIndex.Distinct()
                                    .OrderBy(o => o)
                                    .ToArray();
    // CTRL
    if (tempListIndex.Length != iListIndex.Length)
    {
        // ERR
        throw new Exception("Input  iListIndex contains duplicate indexes");
    }


    for (int jj = 0; jj < tempListIndex.Length; ++jj)
    {
        try
        {
            // SET - line chunk
            retObj.Add(iTextLine.Substring(currStartPos,
                                           tempListIndex[jj] - currStartPos));
        }
        catch (Exception)
        {
            // SET - line is shorter than expected
            retObj.Add(string.Empty);                    
        }                
        // GET - update start position
        currStartPos = tempListIndex[jj];
    }
    // SET
    retObj.Add(iTextLine.Substring(currStartPos));  
    // RET
    return retObj;
}


I wanted to use the Range class to implement a solution.

My use case was to convert standard property names - e.g. CustomerName, WindowSize, etc. - into a JSON property name that would still be easy to read - as in customer_name, window_size.

Creating a JsonNamingPolicy descendent, I overrode the ConvertName method with the following implementation:

        /// <summary>
        /// Converts a property name like "CustomerName" and converts to "customer_name"
        /// </summary>
        /// <param name="name">the propery name</param>
        /// <returns>property conversion</returns>
        public override string ConvertName(string name) {
            //  using Regex to look for caps: "([A-Z]+)"
            Match[] matches = regex.Matches(name)
                .ToArray();

            if (!matches.Any()) {
                //  no capitals to match
                return name;
            }

            if (matches.Length == 1) {
                //  one match
                return name.ToLower();
            }

            //  multiple matches - we could use StringBuilder
            string[] parts = new string[matches.Length];

            int index = 0;

            //  this is somewhat verbose for debugging purposes
            while (index < matches.Length) {
                //  get our match
                Match m = matches[index];
                //  calculate range length
                int length = index + 1 < matches.Length ?
                    //  return the start of the next match
                    (matches[index + 1]).Index : 
                    //  return the end of the string
                    name.Length;

                //  create the range
                Range range = (m.Index..length);
                //  insert the part
                parts[index] = (name[range]).ToLower();
                //  increment the indexer
                ++index;
            }

            //  construct property name
            return string.Join("_", parts);
        }
    }

Note: I could use StringBuilder as some people will likely prefer. I don't anticipate performance problems as this is a one and done scenario.

That being said, if I needed to serialize tons of data to go across the wire, I would likely forego this process altogether and design my properties with the desired naming convention.

For completeness, here is the source class:

    // trimmed to the necessary bits for brevity
    public class LaunchParameters : ILoadable {
        #region     properties
        [JsonIgnore]
        string ILoadable.Directory { get; } = CONFIG_DIR;
        [JsonIgnore]
        string ILoadable.FileName { get; } = CONFIG_FILE;
        public Size WindowSize { get; set; } = new(1024, 768);
        public string Title { get; init; } = "GLX Game";
        [JsonIgnore]
        public string Application => Title.Replace(" ", "_");
        public string Label { get; init; }
        public Version Version { get; init; }
        [JsonIgnore]
        public string WindowTitle => $"{Title} Window";
        public string LogPath { get; init; } = @".\.logs";
        public string CrashLogPath { get; init; } = @".\.crash_logs";
        #endregion  properties
    }

... and the resulting JSON:

{
  "window_size": {
    "is_empty": false,
    "width": 1024,
    "height": 768
  },
  "title": "GLX Game",
  "label": null,
  "version": null,
  "log_path": ".\\.logs",
  "crash_log_path": ".\\.crash_logs"
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号