I have 2 databases that have the same structure, but different data. Both are SQL 2005.
I am trying to find which of the Persons in Database A, exist in Database B. My best opportunity for match is to match on FirstName and LastName.
I only want to bring back a list of:
DatabaseA.Person DatabaseB.Person
Where: 1. I want all records from DatabaseA, even if there is not a match in Database B. 2. I on开发者_如何学编程ly want records from DatabaseB where the FirstName/LastName match only one record in DatabaseB.
I have written a query, where I group by, but since I need to see more data than FirstName and LastName, I cannot bring it back without grouping it - which gives me many duplicates. What kind of query should I be using? Do I need to use a cursor?
Here is my query now, which sort of works - except I'm getting results for duplicates in DatabaseB and all I want to know about Database B is when FirstName/LastName matches to one distinct record and no others. My objective is to get a list of people that I know are the same person in 2 databases so that I can build a dictionary list of department code mappings between employees.
select
count(DatabaseAEmployee.id) as matchcount
, DatabaseAPerson.id as DatabaseAPersonid
, DatabaseAEmployee.DeptCode DatabaseADeptCode
, DatabaseAPerson.firstname as DatabaseAfirst
, DatabaseAPerson.lastname as DatabaseAlast
, DatabaseBPerson.id as DatabaseBPersonid
, DatabaseBEmployee.DeptCode as DatabaseBDeptCode
, DatabaseBPerson.firstname as DatabaseBfirst
, DatabaseBPerson.lastname as DatabaseBlast
, DatabaseAPerson.ssn as DatabaseAssn
, DatabaseBPerson.ssn as DatabaseBssn
, DatabaseAPerson.dateofbirth as DatabaseAdob
, DatabaseBPerson.dateofbirth as DatabaseBdob
FROM [DatabaseA].[dbo].Employee DatabaseAEmployee
LEFT OUTER JOIN [DatabaseA].[dbo].Person DatabaseAPerson
ON DatabaseAPerson.id = DatabaseAEmployee.id
LEFT OUTER JOIN [DatabaseB].[dbo].Person DatabaseBPerson
ON
DatabaseAPerson.firstname = DatabaseBPerson.firstname
AND
DatabaseAPerson.lastname = DatabaseBPerson.lastname
LEFT OUTER JOIN [DatabaseB].[dbo].Employee DatabaseBEmployee
on DatabaseBEmployee.id = DatabaseBPerson.id
group by
DatabaseAPerson.firstname
, DatabaseAPerson.lastname
, DatabaseAPerson.id
, DatabaseAEmployee.DeptCode
, DatabaseBPerson.id
, DatabaseBEmployee.DeptCode
, DatabaseBPerson.firstname
, DatabaseBPerson.lastname
, DatabaseBPerson.ssn
, DatabaseAPerson.ssn
, DatabaseBPerson.dateofbirth
, DatabaseAPerson.dateofbirth
Here's what I'm trying now, but I'm getting duplicates on the left side:
with UniqueMatchedPersons (Id, FirstName, LastName)
as (
select
p2.ID, p2.FirstName, p2.LastName
from
[DatabaseA].[dbo].[Employee] p1
INNER JOIN [DatabaseA].[dbo].[Person] p2 on p1.id = p2.id
inner join [DatabaseB].[dbo].[Person] p3
on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName
INNER JOIN [DatabaseB].[dbo].[Employee] p4
on p3.id = p4.id
group by p2.ID, p2.FirstName, p2.LastName
having count(p2.ID) = 1
)
select p1.*, p2.*
from DatabaseA.dbo.Person p1
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID
left outer join DatabaseB.dbo.Person p2
on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName
Try this:
SELECT id,FirstName,Lastname
FROM dba.Persons
UNION
SELECT b.id,b.FirstName,b.LastName
FROM dbb.Persons as b
INNER JOIN dba.Persons as a
ON b.FirstName = a.FirstName AND b.LastName = a.LastName
If you want to get all from A and only those from B that DON'T have a match (which would make more sense to me) i'd use this:
SELECT id,FirstName,Lastname
FROM dba.Persons
UNION
SELECT b.id,b.FirstName,b.LastName
FROM dbb.Persons as b
LEFT OUTER JOIN dba.Persons as a
ON b.FirstName = a.FirstName AND b.LastName = a.LastName
WHERE a.id is null
Try something like:
Select dta.LastName, dta.FirstName, dta.[otherColumns] dtb.LastName, dtb.FirstName
dtb.[otherColumns]
From [databaseA].[table] as dta
LEFT OUTER JOIN [databaseB].[table] as dtb
on dta.Lastname = dtb.LastName and dta.FirstName = dtb.FirstName
That should get you: 1) everyone in table A, and 2) everyone in table B who is has a Lastname/Firstname match in table A.
Works when SQL Server (at least it should)
SELECT
A.*
, B.*
FROM
DatabaseA.dbo.Person A
LEFT JOIN DatabaseB.dbo.Person B
ON A.FirstName = B.FirstName AND A.LastName = B.LastName
Edit: You mention you receive duplicates from DatabaseB where you only need the match on first and lastname. But you also request other data (then first/lastname) this is the problem. If you distinct data they you only request that data.
Using transact-sql, the following untested query should allow you to view unique matches only:
select
p1.ID, p1.FirstName, p1.LastName
from
[DatabaseA].[dbo].[Persons] p1
left outer join [DatabaseB].[dbo].[Persons] p2
on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName
group by p1.ID, p1.FirstName, p2.LastName
having count(p1.ID) = 1
If using Sql Server, this can then be encapsulated within a common table expression, to which you can perform a join.
with UniqueMatchedPersons (Id, FirstName, LastName)
as (
--query in previous code snippet
)
select persons.*
from Persons
inner join UniqueMatchedPersons on Persons.ID = UniqueMatchedPersons.ID
Update:
If you wish to select fields from both tables, you can simply respecify the original join condition that evaluated name matching before; this is because duplicated matches on the left hand side of the join have been filtered out by the having
aggregate condition.
Modifying the select
portion of the above snippet to read the following will allow you to select fields from either side of the join:
select p1.*, p2.*
from [DatabaseA].[dbo].[Persons] p1
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID
left outer join [DatabaseB].[dbo].[Persons] p2
on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName
Update 2:
To filter out duplicates on the left hand side (which will also cause duplicates on the right) you'll have to remove the grouping on [DatabaseA].[dbo].[Persons].[ID]
.
When I refer to duplicates, I mean names in adjacent rows that are identical in terms of characters and padding. If you have diacritic variations of first and last names, then the results of the name comparison will be subject to the database collation (unless you explicity declare a collation on a join expression). Likewise if you have variations in spacing, padding or punctuation between names, you may have to consider a different approach than a direct equality operator for name matching.
Try the following:
with UniqueMatchedPersons (FirstName, LastName)
as (
select
p1.FirstName, p1.LastName
from
[DatabaseA].[dbo].[Person] p1
left outer join [DatabaseB].[dbo].[Person] p2
on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName
group by p1.FirstName, p1.LastName
having count(p1.FirstName) = 1
)
select p1.*, p2.*, e1.*, e2.*
from [DatabaseA].[dbo].[Person] p1
inner join UniqueMatchedPersons ump
on p1.FirstName = ump.FirstName and p1.LastName = ump.LastName
left outer join [DatabaseB].[dbo].[Person] p2
on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName
inner join [DatabaseA].[dbo].[Employee] e1 on p1.ID = e1.ID
inner join [DatabaseB].[dbo].[Employee] e2 on e2.ID = p2.ID
order by p1.id asc
精彩评论