I have a table that allows records to be inserted which are very similar but differ by a single column value or two. For example the following records are in the table:
ID 开发者_开发技巧 TITLE URL COUNTRY
1494 Hollywood Reporter http://www.hollywoodreporter.com USA
1497 Hollywood Reporter via Drudge Report http://www.hollywoodreporter.com NULL
2158 Hollywood Reporter via http://www.hollywoodreporter.com NULL
I would like to update the country column in the last two records where the url is the same. Also I would also like to know how to sort my table by url column so that all duplicate urls are grouped together or even if they are similar as you have in some cases like:
http://www.hollywoodreporter.com http://www.blog.hollywoodreporter.com
Thanks in advance.
Tried the following and it worked where the urls were the same
UPDATE t1
SET t1.country = t2.country FROM Sources AS t1 JOIN sources AS t2 ON t1.url = t2.url WHERE t1.url= t2.url;
just got to figure out the rest. thanks everyone
UPDATE
I was able to edit the records that had matching urls but for the ones that are similar like http://www.pantagraph.com http://pantagraph.com http://pantagraph.com/titles
were not updated. When using the sort and select I can view all these records but when trying to update them it does not work. I even just tried this simple version of:
select * from Sources s
where s.url like url
which shows the records but when updating it did not work.
update Sources
set country = s.country
from Sources s
where s.url like url
You could use a subquery:
update yt
set country = (
select distinct country
from YourTable yt2
where yt.url = yt2.url
and yt2.country is not null
)
from YourTable yt
This should give an error if there is an URL with different countries: in that case you should adapt the query to choose one of the countries.
Per your comment, to find rows with conflicting countries:
select url
, count(distinct country) as NumberOfCountries
from YourTable
where country is not null
group by
url
, country
having count(distinct country) > 1
For sorting the URL, try adding another column for just "domain name". You would extract the domain name (hollywoodreporter.com), store it, and sort on that column. Alternatively, you could use regular expressions in your ORDER BY clause, but that could lead to performance problems.
update table
set
table.country = t2.country
from
table t2
where
table.url = t2.url
and t2.country is not null
You're joining the table to itself based on url and updating from rows that have something in the country field.
But, if you've got that much duplicated data, your design is probably bad. Try normalizing the db if you can.
精彩评论