I have a MSSQL 2005 table:
[Companies](
[CompanyID] [int] IDENTITY(1,1) NOT NULL,
[Title] [nvarchar](128),
[开发者_高级运维Description] [nvarchar](256),
[Keywords] [nvarchar](256)
)
I want to generate a tag cloud for this companies. But I've saved all keywords in one column separated by commas. Any suggestions for how to generate tag cloud by most used keywords. There could be millions of companies approx ten keywords per company.
Thank you.
Step 1: separate the keywords into a proper relation (table).
CREATE TABLE Keywords (KeywordID int IDENTITY(1,1) NOT NULL
, Keyword NVARCHAR(256)
, constraint KeywordsPK primary key (KeywordID)
, constraint KeywordsUnique unique (Keyword));
Step 2: Map the many-to-many relation between companies and tags into a separate table, like all many-to-many relations:
CREATE TABLE CompanyKeywords (
CompanyID int not null
, KeywordID int not null
, constraint CompanyKeywords primary key (KeywordID, CompanyID)
, constraint CompanyKeyword_FK_Companies
foreign key (CompanyID)
references Companies(CompanyID)
, constraint CompanyKeyword_FK_Keywords
foreign key (KeywordID)
references Keywords (KeywordID));
Step 3: Use a simple GROUP BY query to generate the 'cloud' (by example taking the 'cloud' to mean the most common 100 tags):
with cte as (
SELECT TOP 100 KeywordID, count(*) as Count
FROM CompanyKeywords
group by KeywordID
order by count(*) desc)
select k.Keyword, c.Count
from cte c
join Keyword k on c.KeywordID = k.KeywordID;
Step 4: cache the result as it changes seldom and it computes expensively.
I'd much rather see your design normalized as suggested by Remus, but if you're at a point where you can't change your design...
You can use a parsing function (the example I'll use is taken from here), to parse your keywords and count them.
CREATE FUNCTION [dbo].[fnParseStringTSQL] (@string NVARCHAR(MAX),@separator NCHAR(1))
RETURNS @parsedString TABLE (string NVARCHAR(MAX))
AS
BEGIN
DECLARE @position int
SET @position = 1
SET @string = @string + @separator
WHILE charindex(@separator,@string,@position) <> 0
BEGIN
INSERT into @parsedString
SELECT substring(@string, @position, charindex(@separator,@string,@position) - @position)
SET @position = charindex(@separator,@string,@position) + 1
END
RETURN
END
go
create table MyTest (
id int identity,
keywords nvarchar(256)
)
insert into MyTest
(keywords)
select 'sql server,oracle,db2'
union
select 'sql server,oracle'
union
select 'sql server'
select k.string, COUNT(*) as count
from MyTest mt
cross apply dbo.fnParseStringTSQL(mt.keywords,',') k
group by k.string
order by count desc
drop function dbo.fnParseStringTSQL
drop table MyTest
Both Remus and Joe are correct but yes as what Joe said if you dont have a choice then you have to live with it. I think I can offer you an easy solution by using an XML Data Type. You can already easily view the parsed column by doing this query
WITH myCommonTblExp AS (
SELECT CompanyID,
CAST('<I>' + REPLACE(Keywords, ',', '</I><I>') + '</I>' AS XML) AS Keywords
FROM Companies
)
SELECT CompanyID, RTRIM(LTRIM(ExtractedCompanyCode.X.value('.', 'VARCHAR(256)'))) AS Keywords
FROM myCommonTblExp
CROSS APPLY Keywords.nodes('//I') ExtractedCompanyCode(X)
now knowing that you can do that, all you have to do is to group them and count, but you cannot group XML methods so my suggestion is create a view of the query above
CREATE VIEW [dbo].[DissectedKeywords]
AS
WITH myCommonTblExp AS (
SELECT
CAST('<I>' + REPLACE(Keywords, ',', '</I><I>') + '</I>' AS XML) AS Keywords
FROM Companies
)
SELECT RTRIM(LTRIM(ExtractedCompanyCode.X.value('.', 'VARCHAR(256)'))) AS Keywords
FROM myCommonTblExp
CROSS APPLY Keywords.nodes('//I') ExtractedCompanyCode(X)
GO
and perform your count on that view
SELECT Keywords, COUNT(*) AS KeyWordCount FROM DissectedKeywords
GROUP BY Keywords
ORDER BY Keywords
Anyways here is the full article -->http://anyrest.wordpress.com/2010/08/13/converting-parsing-delimited-string-column-in-sql-to-rows/
精彩评论