开发者

Make a usable Join relationship with LINQ on top of a database CSV design error

开发者 https://www.devze.com 2022-12-30 09:43 出处:网络
I\'m looking for a way to fix and/or abstract away a comma-separated values (CSV) list in a database field in orde开发者_Go百科r to reconstruct a usable relationship such that I can properly join the

I'm looking for a way to fix and/or abstract away a comma-separated values (CSV) list in a database field in orde开发者_Go百科r to reconstruct a usable relationship such that I can properly join the two tables below and query them using C# LINQ and its .Join method.

Following is a sample showing the Person table and CsvArticleIds field having a CSV value to represent a one-to-many association with Article records.

TABLE [dbo].[Person]

Id Name       CsvArticleIds
-- ---------- --------
1  Joe        "15,22"
5  Ed         "22"
10 Arnie      "8,15,22"

^^^(Of course a link table should have been created; nonetheless the relationship with articles is trapped inside that list of CSV values.)

TABLE [dbo].[Article]

Id Title
-- ----------
8  Beginning C#
15 A Historic look at Programming in the 90s
22 Gardening in January

Additional Info

  • the fix can be at any level: C#.NET or SQL Server
  • something easy because I will be repeating the solution for many other CSV values in other tables.
  • Elegant is nice too.
  • not looking for efficiency because this is part of a one-time data migration task and can take as long as it wants to run.


I would fix this at the table level using SQL. I'd create a new table with the person Id and an article Id in it. After populating this new table, I'd drop the Person.CsvArticleIds column. You will then have a normalized table structure to store articles for people.

You'll need to split that CsvArticleIds string. There are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method:

"Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog

You need to create a split function. This is how a split function can be used:

SELECT
    *
    FROM YourTable                               y
    INNER JOIN dbo.yourSplitFunction(@Parameter) s ON y.ID=s.Value

I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.

For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:

SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO Numbers
    FROM sys.objects s1
    CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)

Once the Numbers table is set up, create this function:

CREATE FUNCTION [dbo].[FN_ListToTable]
(
     @SplitOn  char(1)      --REQUIRED, the character to split the @List string on
    ,@List     varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN 
(

    ----------------
    --SINGLE QUERY-- --this will not return empty rows
    ----------------
    SELECT
        ListValue
        FROM (SELECT
                  LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(@SplitOn, List2, number+1)-number - 1))) AS ListValue
                  FROM (
                           SELECT @SplitOn + @List + @SplitOn AS List2
                       ) AS dt
                      INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
                  WHERE SUBSTRING(List2, number, 1) = @SplitOn
             ) dt2
        WHERE ListValue IS NOT NULL AND ListValue!=''

);
GO 

You can now easily split a CSV string into a table and join on it:

select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,')

OUTPUT:

ListValue
-----------------------
1
2
3
4
5
6777

(6 row(s) affected)

To make what you need work, use CROSS APPLY:

DECLARE @YourTable table (Id int, Name varchar(10), CsvArticleIds varchar(500))
INSERT @YourTable VALUES (1  ,'Joe'        ,'15,22')
INSERT @YourTable VALUES (5  ,'Ed'         ,'22')
INSERT @YourTable VALUES (10 ,'Arnie'      ,'8,15,22')

DECLARE @YourTableNormalized table (Id int, ArticleId int)

    INSERT INTO @YourTableNormalized
        (Id, ArticleId)
    SELECT 
        y.Id, st.ListValue
        FROM @YourTable y 
            CROSS APPLY  dbo.FN_ListToTable(',',y.CsvArticleIds) AS st
        ORDER BY st.ListValue

SELECT * FROM @YourTableNormalized ORDER BY Id,ArticleId

OUTPUT:

Id          ArticleId
----------- -----------
1           15
1           22
5           22
10          8
10          15
10          22

(6 row(s) affected)


transform the Person table into something more useful first, like

var newpersons =
    data.Persons.Select(p => new
        {
          Id = p.Id,
          Name = p.Name,
          ArticleIds = p.CsvArticleIds.Substring(1, p.CsvArticleIds.Length -2).Split(',').ToList()
        });

now you can join against the person.ArticleIds collection.

if holding the entire transformed Person table in memory can't be done, then use the same .Select to transform groups of records, pulling Person objects out of the DB, say 100 at a time, using Skip() and Take().

0

精彩评论

暂无评论...
验证码 换一张
取 消