开发者

Any Framework functions helping to find the longest common starting substring of multiple strings?

开发者 https://www.devze.com 2023-01-17 12:18 出处:网络
I have a list of strings (which represent paths and) which should all have a common beginning (root path). I need to get that common beginning.

I have a list of strings (which represent paths and) which should all have a common beginning (root path). I need to get that common beginning.

That's just a couple of lines to throw together, but I have the nagging feeling that this must be thrown together a million times a year and that there might be an algorithm in the framework that c开发者_如何学JAVAan be used for that, but couldn't find something.

Also, I suppose this has been asked on SO before, but I came up dry.

Any hints?


If anyone is interested, here's what I came up with:

    public static string GetCommonStartingSubString(IList<string> strings)
    {
        if (strings.Count == 0)
            return "";
        if (strings.Count == 1)
            return strings[0];
        int charIdx = 0;
        while (IsCommonChar(strings, charIdx))
            ++charIdx;
        return strings[0].Substring(0, charIdx);
    }
    private static bool IsCommonChar(IList<string> strings, int charIdx)
    {
        if(strings[0].Length <= charIdx)
            return false;
        for (int strIdx = 1; strIdx < strings.Count; ++strIdx)
            if (strings[strIdx].Length <= charIdx 
             || strings[strIdx][charIdx] != strings[0][charIdx])
                return false;
        return true;
    }


This method should work:

string GetLongestCommonPrefix(IEnumerable<string> items)
{
    return items.Aggregate(default(string), GetLongestCommonPrefix);
}

string GetLongestCommonPrefix(string s1, string s2)
{
    if (s1 == null || s2 == null)
        return s1 ?? s2;

    int n = Math.Min(s1.Length, s2.Length);
    int i;
    for (i = 0; i < n; i++)
    {
        if (s1[i] != s2[i])
            break;
    }
    return s1.Substring(0, i);
}


Excuse my ordinary variable naming, and it's not very fast, but this should do:

// your list of strings...
List<string> strings;    

string shortestString = strings.First(x => x.Length == 
    strings.Select(y => y.Length).Min());
while (!strings.All(s => s.StartsWith(shortestString)))
{
    shortestString = shortestString.Substring(0, shortestString.Length - 1);
}


One idea to simplify your implementation is to write just a method to get the longest substring of two strings and then use Aggregate method from LINQ. Something like:

strings.Skip(1).Aggregate(strings.First(), GetCommonSubString);

I don't think there is any elegant way to implement GetCommonSubstring using standard methods for working with strings. If you care about performance, then you'll probably have to implement it in the "direct" way. A slower, but shorter alternative using LINQ could look something like this:

var chars = 
  str1.Zip(str2, (c1, c2) => new { Match = c1 == c2, Char = c1 })
      .TakeWhile(c => c.Match).Select(c => c.Char).ToArray();
return new string(chars);

This first "zips" the two strings and then takes parts where the characters are the same using TakeWhile. The rest generates an array of characters that can be used to create a string with the result.


Maybe I simplify your problem too much but what about

var rootPath = paths.Select(s => new {path = s, depth = s.Split('\\').Length}). Aggregate((memo, curr) => curr.depth < memo.depth ? curr : memo).path;

Desperate, most probably slow, and all around pretty silly try

var paths = new List<string> { @"C:\Ruby19\lib\ruby\gems",
                               @"C:\Ruby19\lib\ruby\gems\1.9.2",
                               @"C:\Ruby19\lib\ruby\gems",
                               @"C:\Ruby19\lib\test\fest\hest"};

var rootPath = paths.Select(s => new { p = s.Split('\\') })
                    .Aggregate((memo, curr) => new { p = curr.p.TakeWhile((stp, ind) => stp == memo.p.ElementAtOrDefault(ind)).ToArray() })
                    .p.Join("\\");

=> rootPath = "C:\Ruby19\lib"


I had the same problem (like many others) some time ago. Here is the solution i came up with this. I didn't make any performance measurements but i hadn't any problems with lists of 100 elements.

using System;
using System.Collections.Generic;
using System.Linq;

namespace FEV.TOPexpert.Common.Extensions
{
    public static class IEnumerableOfStringExtension
    {
        /// <summary>
        /// Finds the most common left string in a sequence of strings.
        /// </summary>
        /// <param name="source">The sequence to search in.</param>
        /// <returns>The most common left string in the sequence.</returns>
        public static string MostCommonLeftString(this IEnumerable<string> source)
        {
            return source.MostCommonLeftString(StringComparison.InvariantCulture);
        }

        /// <summary>
        /// Finds the most common left string in a sequence of strings.
        /// </summary>
        /// <param name="source">The sequence to search in.</param>
        /// <param name="comparisonType">Type of the comparison.</param>
        /// <returns>The most common left string in the sequence.</returns>
        public static string MostCommonLeftString(this IEnumerable<string> source, StringComparison comparisonType)
        {
            if (source == null)
                throw new ArgumentNullException("source");

            string mcs = String.Empty;

            using (var e = source.GetEnumerator())
            {
                if (!e.MoveNext())
                    return mcs;

                mcs = e.Current;
                while (e.MoveNext())
                    mcs = mcs.MostCommonLeftString(e.Current, comparisonType);
            }
            return mcs;
        }

        /// <summary>
        /// Returns a sequence with the most common left strings from a sequence of strings.
        /// </summary>
        /// <param name="source">A sequence of string to search through.</param>
        /// <returns>A sequence of the most common left strings ordered in descending order.</returns>
        public static IEnumerable<string> MostCommonLeftStrings(this IEnumerable<string> source)
        {
            return MostCommonLeftStrings(source, StringComparison.InvariantCulture);
        }

        /// <summary>
        /// Returns a sequence with the most common left strings from a sequence of strings.
        /// </summary>
        /// <param name="source">A sequence of string to search through.</param>
        /// <param name="comparisonType">Type of comparison.</param>
        /// <returns>A sequence of the most common left strings ordered in descending order.</returns>
        public static IEnumerable<string> MostCommonLeftStrings(this IEnumerable<string> source, StringComparison comparisonType)
        {
            if (source == null)
                throw new ArgumentNullException("source");

            var listOfMcs = new List<string>();

            using (var e = source.GetEnumerator())
            {
                while (e.MoveNext())
                {
                    if (e.Current == null)
                        continue;

                    string removeFromList = String.Empty;
                    string addToList = String.Empty;

                    foreach (var element in listOfMcs)
                    {
                        addToList = e.Current.MostCommonLeftString(element, comparisonType);

                        if (addToList.Length > 0)
                        {
                            removeFromList = element;
                            break;
                        }
                    }

                    if (removeFromList.Length <= 0)
                    {
                        listOfMcs.Add(e.Current);
                        continue;
                    }

                    if (addToList != removeFromList)
                    {
                        listOfMcs.Remove(removeFromList);
                        listOfMcs.Add(addToList);
                    }
                }
            }

            return listOfMcs.OrderByDescending(item => item.Length);
        }

        /// <summary>
        /// Returns a string that both strings have in common started from the left.
        /// </summary>
        /// <param name="first">The first string.</param>
        /// <param name="second">The second string.</param>
        /// <returns>Returns a string that both strings have in common started from the left.</returns>
        public static string MostCommonLeftString(this string first, string second)
        {
            return first.MostCommonLeftString(second, StringComparison.InvariantCulture);
        }

        /// <summary>
        /// Returns a string that both strings have in common started from the left.
        /// </summary>
        /// <param name="first">The first string.</param>
        /// <param name="second">The second string.</param>
        /// <param name="comparisonType">Type of comparison.</param>
        /// <returns>Returns a string that both strings have in common started from the left.</returns>
        public static string MostCommonLeftString(this string first, string second, StringComparison comparisonType)
        {
            if (first == null
                || second == null)
                return null;

            int length = Math.Min(first.Length, second.Length);
            first = first.Substring(0, length);
            second = second.Substring(0, length);

            while (!first.Equals(second, comparisonType))
            {
                first = first.Substring(0, first.Length - 1);
                second = second.Substring(0, second.Length - 1);
            }

            return first;
        }

        private static bool MatchesWithList(string match, IList<string> elements, StringComparison comparisonType)
        {
            string removeFromList = String.Empty;
            string addToList = String.Empty;

            foreach (var element in elements)
            {
                addToList = match.MostCommonLeftString(element, comparisonType);

                if (addToList.Length > 0)
                {
                    removeFromList = element;
                }
            }

            if (removeFromList.Length > 0)
            {
                if (addToList != removeFromList)
                {
                    elements.Remove(removeFromList);
                    elements.Add(addToList);
                }
                return true;
            }

            return false;
        }
    }
}


The following returns the longest common prefix of any set of IEnumerable<T> not just strings.

public static bool Same<T>(this IEnumerable<T> xs) {
  return !xs.Any() || !xs.Skip(!xs.Skip(1).All(x => x.Equals(xs.First()));
}

public static IEnumerable<T> CommonPrefix<T>(this IEnumerable<IEnumerable<T>> xss) {
  var r = new List<T>();
  var es = xss.Select(x => x.GetEnumerator()).ToList();
  while (es.Select(x => x.MoveNext()).All(x => x))
     if (!es.Select(x => x.Current).Same())
        return r;
     return r;
  }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消