开发者

concatenate multi values in one record without duplication

开发者 https://www.devze.com 2022-12-23 00:56 出处:网络
I have a dbf table like below which is the result of one to many join from two tables. I want to have unique zone values from one Taxlot id field.

I have a dbf table like below which is the result of one to many join from two tables. I want to have unique zone values from one Taxlot id field.

table name: input table

tid ----- zone

1 ------ A
1 ------ A
1 ------ B
1 ------ C
2 ------ D
2 ------ E
3 ------ C

Desirable output table table name: input table

tid ----- zone

1 ------ A, B, C
2 ------ D, E
3 ------ C

I got some help but couldn't make it to work.

inputTbl = r"C:\temp\input.dbf"
taxIdZoningDict = {}
searchRows = gp.searchcursor(inputTbl)
searchRow = searchRows.next()
while searchRow:
   if searchRow.TID in taxIdZoningDict:
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   else:
      taxIdZoningDict[searchRow.TID] = set() #a set prevents dulpicates!
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   searchRow = searchRows.next()

outputTbl = r"C:\temp\output.dbf"
gp.CreateTable_management(r"C:\temp", "output.dbf")
gp.AddField_management(outputTbl, "TID", "LONG")
gp.AddField_management(outputTbl, "ZONES", "TE开发者_Go百科XT", "", "", "20")
tidList = taxIdZoningDict.keys()
tidList.sort() #sorts in ascending order
insertRows = gp.insertcursor(outputTbl)
for tid in tidList:
   concatString = ""
   for zone in taxIdZoningDict[tid]
      concatString = concatString + zone + ","
   insertRow = insertRows.newrow()
   insertRow.TID = tid
   insertRow.ZONES = concatString[:-1]
   insertRows.insertrow(insertRow)
del insertRow
del insertRows


I would use my dbf module and defaultdict to greatly simplify that code:

import dbf
from collections import defaltdict

inputTbl = dbf.Table(r'c:\temp\input.dbf')
taxIdZoning = defaultdict(set)

for record in inputTbl:
    taxIdZoning[record.tid].add(record.zone)
inputTbl.close()

outputTbl = dbf.Table(r'c:\temp\output.dbf', 'tid N(17.0), zones C(20)')
for tid in sorted(taxIdZoning):
    record = outputTbl.append()
    record.tid = tid
    record.zones = ','.join(sorted(taxIdZoning[tid]))
outputTbl.close()

NOTE: field names are lowercase, and I wasn't sure exactly how to represent LONG, but hopefully 17 digits is enough. :) My apologies for any bugs -- hard to test without input files.


This worked for me using both Microsoft Access VBA and Microsoft Excel VBA. It isn't very efficient code, but it works. I was able to open the resultant file in both Access and Excel.

Set the sDBF* and sOutDBF* variables to tailor this to your own custom paths.

Sub VBASolution()
    Dim oRS
    Dim sConn
    Dim sDBFPath, sOutDBFPath
    Dim sDBFName, sOutDBFName
    Dim oDict
    Dim curTID, curZone, sZones
    Dim oConn
    Dim oFS
    Dim sTableName

    sDBFPath = "C:\Path\To\DBFs\"
    sDBFName = "Input.dbf"

    sOutDBFPath = "C:\Path\To\DBFs\"
    sOutDBFName = "RESULTS.dbf"

    sConn = "Driver={Microsoft dBASE Driver (*.dbf)}; DriverID=277; Dbq="
    Set oRS = CreateObject("ADODB.Recordset")


    oRS.Open "SELECT DISTINCT tid, zone FROM " & sDBFName, sConn & sDBFPath

    Set oDict = CreateObject("Scripting.Dictionary")

    Do While Not oRS.EOF
        curTID = oRS.Fields("tid").Value
        curZone = oRS.Fields("zone").Value
        If Not oDict.Exists(curTID) Then
            oDict.Add curTID, CreateObject("Scripting.Dictionary")
        End If
        If Not oDict(curTID).Exists(curZone) Then
            oDict(curTID).Add curZone, curZone
        End If
        oRS.MoveNext
    Loop
    oRS.Close

    Set oRS = Nothing

    Set oConn = CreateObject("ADODB.Connection")
    oConn.Open sConn & sOutDBFPath

    'Delete the resultant DBF file if it already exists.
    Set oFS = CreateObject("Scripting.FileSystemObject")
    With oFS
        If .FileExists(sOutDBFPath & "\" & sOutDBFName) Then
            .DeleteFile sOutDBFPath & "\" & sOutDBFName
        End If
    End With

    sTableName = oFS.GetBaseName(sOutDBFName)

    oConn.Execute "CREATE TABLE " & sTableName & " (tid int, zone varchar(80))"

    Dim i, j
    For Each i In oDict.Keys
        curTID = i
        sZones = ""
        For Each j In oDict(i)
            sZones = sZones & "," & j
        Next
        sZones = Mid(sZones, 2)
        oConn.Execute "INSERT INTO " & sTableName & " (tid, zone) VALUES ('" & curTID & "','" & sZones & "')"
    Next
    oConn.Close

    Set oConn = Nothing
    Set oDict = Nothing
    Set oFS = Nothing
End Sub

EDIT: For what it's worth, this also worked for me by inserting it into a VBScript .VBS file (text) in Windows XP and adding this line to the bottom of the file:

Call VBASolution()

I do not know whether or not it requires Office to be installed or if the appropriate dbf drivers come with Windows.


I think Morlock's answer does not satisfy the requirement of dropping duplicates. I would use a defaultdict(set), which will automatically omit dups, instead of defaultdict(list), and thus .add() instead of .append().


instead of this:

taxIdZoningDict = {}
searchRows = gp.searchcursor(inputTbl)
searchRow = searchRows.next()
while searchRow:
   if searchRow.TID in taxIdZoningDict:
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   else:
      taxIdZoningDict[searchRow.TID] = set() #a set prevents dulpicates!
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   searchRow = searchRows.next()

do this:

zones = {}
for row in gp.searchcursor(inputTbl):
    zones.setdefault(row.TID, set())
    zones[row.TID].add(row.ZONE)

much more pythonic, with the same result ;-)

And then to output:

for k, v in zones:
    print k, ", ".join(v)


Here is a quickly made code in Python that may suit your needs, with minimal fidgeting.

import collections

d = collections.defaultdict(list)

with open("input_file.txt") as f:   
    for line in f:
        parsed = line.strip().split()
        print parsed
        k = parsed[0]
        v = parsed[2]
        d[k].append(v)

for k, v in sorted(d.iteritems()):
    s = " ----- "
    v = list(set(v)) # Must be a library function to do this
    v.sort()
    print k, s,
    for j in v:
        print j,
    print

Hope this helps


The OP wanted commas in the zone column. Might change the output piece of Morlock's code slightly to get those commas and perhaps be a little clearer by using this single line for the output instead of an explicit loop over v:

    print k, s, ",".join(v)

That does pack more into one line (possibly a negative). The use of join in this way is pretty common in python, and IMHO expresses the intent more clearly (and is easier to digest when reading) than the explicit loop.

0

精彩评论

暂无评论...
验证码 换一张
取 消