开发者

Match first part of filepath strings

开发者 https://www.devze.com 2023-03-02 20:19 出处:网络
I have a simple class like the following: class Record { public Record(string fp1, string fp2, string fp3)

I have a simple class like the following:

class Record
{
    public Record(string fp1, string fp2, string fp3)
    {
        Filepath1 = fp1;
        Filepath2 = fp2;
        Filepath3 = fp3;
    }

    public string Filepath1 { get; private set; }
    public string Filepath2 { get; private set; }
    public string Filepath3 { get; private开发者_开发问答 set; }
}

Each of these filepaths will be very similar (and very long), and will only differ toward the last few characters of the filepath.

Now, I want to have several thousand of these Records in memory, and I would like these records to use up a smaller amount of RAM. So I am trying to think of ways to optimize memory usage, and here is one solution that I came up with:

class Record
{
    private string _baseFilepath;
    private string _fp1;
    private string _fp2;
    private string _fp3;
    public Record(string fp1, string fp2, string fp3)
    {
        _baseFilepath = /*get common first part of filepaths*/;
        _fp1 = /*last part of fp1*/;
        _fp2 = /*last part of fp2*/;
        _fp3 = /*last part of fp3*/;
    }

    public string Filepath1
    {
        get { return _baseFilepath + _fp1; }
    }

    public string Filepath2
    {
        get { return _baseFilepath + _fp2; }
    }

    public string Filepath3
    {
        get { return _baseFilepath + _fp3; }
    }
}

You can see that I could save a lot of RAM, especially with really long filepaths where only the last few characters are different. The question is, is there an easy way to get the common first part of the filepath?

EDIT: There could be up to 700,000 Records in memory, and the actual production class has several more filepaths. I'm trying to make the app as lightweight as possible, while trying to keep the optimizations extremely simple for simplicity's sake.


This would do it:

public static string GetCommonStart(string fp1, string fp2, string fp3)
{
    int idx = 0;
    int minLength = Math.Min(Math.Min(fp1.Length, fp2.Length), fp3.Length);
    while (idx < minLength && fp1[idx] == fp2[idx] && fp2[idx] == fp3[idx])
       idx++;
    return fp1.Substring(0, idx);
}


You can use something like this, if this action is not performance-critical to you:

public static class StringExtensions
{
    public static string GetCommonPrefix(string a, string b)
    {
        int commonPrefixLength = 0;
        int minimumLength = Math.Min(a.Length, b.Length);

        for (int i = 0; i < minimumLength; i++)
        {
            if (a[i] == b[i])
            {
                commonPrefixLength++;
            }
        }

        return a.Substring(0, commonPrefixLength);
    }

    public static string GetCommonPrefix(params string[] strings)
    {
        return strings.Aggregate(GetCommonPrefix);
    }
}


Please consider this a supplementary answer, providing an alternate suggestion rather than a straight answer to your problem (which have already been supplied).

If possible, I'd break these filepaths into base and suffix at the first possible opportunity & then pass in that manner through your entire system.

This is applicable if

  • you are generating these filepaths yourself, somewhere in your system
  • you are reading files from a known & finite set of locations

You would simply have a set of base filepaths, and each Filepath references once of those base values and also contains its own suffix value.

Depending on the number of base filepaths & how they were determined, this would be significantly more memory efficient. Your current solution gives best case one-third memory use (each three filepaths are best optimized into one filepath). It also makes sense to store this object (filepath) in a consistent manner across your entire app.

0

精彩评论

暂无评论...
验证码 换一张
取 消