开发者

Need help thinking about ways to normalize some messy data

开发者 https://www.devze.com 2022-12-13 15:08 出处:网络
I have some data in an Access table. I am well-versed in Access and am using it in this instance for its report-generating qualities and ease of use for the non-tech proficient.开发者_Python百科 This

I have some data in an Access table. I am well-versed in Access and am using it in this instance for its report-generating qualities and ease of use for the non-tech proficient.开发者_Python百科 This is the only tool at my disposal. The data came to me as an .xls with a lot of junk and stuff basically all over the place. I made a bunch of queries to get rid of the junk and re-organize (which was a lot easier to do in Access than Excel).

However this is where I'm stuck. This is what I have now:

Name | Test Date | Test data  
Jane | ----------| --------  
-----| 3/10/09   | --------  
-----| --------- | 1  
-----| --------- | 2  
-----| --------- | 3  
John | ----------| ------  
-----| 3/12/09   | ------   
-----| --------- | 3  
-----| --------- | 1  
-----| --------- | 5  
-----| 3/13/09   | ------  
-----| --------- | 2  
-----| --------- | 5  
-----| --------- | 7  

This is what I'd like to have:

Name | Test Date | Test data  
Jane | 3/10/09   | 1  
Jane | 3/10/09   | 2   
Jane | 3/10/09   | 3  
John | 3/12/09   | 3  
John | 3/12/09   | 1  
John | 3/12/09   | 5  
John | 3/13/09   | 2  
John | 3/13/09   | 5  
John | 3/13/09   | 7  

Some of the names have more than one test under them. How many or which names are impossible to predict, although all tests have exactly 12 rows of data. But you need to SEE the data to know which names and dates go with which data.

I know ideally you would have Names and Test dates in their own table, but I'm trying to make this a one-table deal for ease of use by people who are not me who will be ok with importing and running the queries (which is basically one click using macros) but not much else.

I can make and run a bunch of queries and forms to do this, but I can't seem to figure out which ones. Does anyone have any ideas?


I'm trying to make this a one-table deal for ease of use by people who are not me who will be ok with importing and running the queries.

Perhaps it would be better to have multiple tables to normalize it, and then create views for other people?


If you have a key field that holds the order in which these rows should be, I suggest you step through using VBA and build a new table.


I would recommend transforming it permanently to your preferred form, but even still, it should be possible to do this in a straight Access query using triangular joins to find the highest previous non-blank row. You need to have a row_id to preserve row order, of course - without that the data will be uninterpretable.

SELECT *
FROM tbl AS test_data
INNER JOIN (
    SELECT test_data.row_id, MAX(name_data.row_id) AS name_row_id
    FROM tbl AS test_data
    INNER JOIN tbl AS name_data
        ON name_data.row_id < test_data.row_id
        AND name_data.name IS NOT NULL -- or whatever your empty columns contain
    GROUP BY test_data.row_id
) AS name_row_id
ON name_id.row_id = tbl.row_id
INNER JOIN (
    SELECT test_data.row_id, MAX(date_data.row_id) AS date_row_id
    FROM tbl AS test_data
    INNER JOIN tbl AS date_data
        ON date_data.row_id < test_data.row_id
        AND date_data.[test date] IS NOT NULL -- or whatever your empty columns contain
    GROUP BY test_data.row_id
) AS date_row_id
INNER JOIN tbl AS name_data
    ON name_data.row_id = name_row_id.name_row_id
INNER JOIN tbl AS date_data
    ON date_data.row_id = date_row_id.date_row_id
WHERE test_data.[test data] IS NOT NULL -- or whatever your empty columns contain


You should definitely isolate the data model from your user's concern. By not doing so you will create a data model that will be very hard to manage, adapt and upgrade. The 'user friendly' considerations can usually be solved easily by giving them access to views instead of having them query the original tables.

If you want to (really) normalize your data, I guess you'll have to go through the following steps:

  1. create tables
  2. transfer data
  3. create view

Create tables

table_person (id_person PK, namePerson)
table_test (id_test PK, id_Person FK, dateTest)
table_result (id_result PK, id_test FK, valueResult)

This data model takes into consideration the 1 to many relations between (a)Table_Person and Table_test and (b)Table_test and Table_result.

transfer data

Once this is done, you'll have to write some vba code, similar to this one (I assume here that your original table is called table_data, and records are correctly ordered to match your example!):

Dim rsData as DAO.recordset, _
    rsperson as DAO.recordset, _
    rsTest as DAO.recordset, _
    rsResult as DAO.recordset

set rsData = currentDb.openRecordset("Table_Data")
set rsPerson = currentDb.openRecordset("Table_Persone")
set rsTest = currentDb.openRecordset("Table_Test")
set rsResult = currentDb.openRecordset("Table_result")

rsData.moveFirst
Do while not rsData.eof

    'person is already known
    if isnull(rsData.fields("name") or _
        rsData.fields("name") = rsPerson.fields("name") Then

        'test reference is already known
        if isnull(rsData.fields("test date") or _
            rsData.fields("test date") = rsPerson.fields("dateTest") Then
            'add new result record
            rsResult.addNew
            rsResult.fields("id_Result") = ... (your choice of a PK(*))
            rsResult.fields("id_Test") = rsTest.fields("id_test")
            rsresult.fields("valueResult") = rsData.fields("Test data")
            rsResult.update

        Else

            'add new test record
            rsTest.addNew
            rsTest.fields("id_Test") = ... (your choice of a PK(*))
            rsTest.fields("id_Person") = rsPerson.fields("id_Person")
            rsTest.fields("dateTest") = rsData.fields("Test date")
            rsTest.update

        Endif
    Else

        'add new person record
        rsPerson.addNew
        rsPerson.fields("id_Person") = ... (your choice of a PK(*))
        rsPerson.fields("namePerson") = rsData.fields("name"
        rsPerson.update

    Endif
    rsData.moveNext
loop
''close your recordsets and you're ok

PK(*): depending on your Primary Key choice, you might not need to generate it via code. If tables are set to have (for example) an autoincrement number as PK, Access will automatically generate the corresponding value at recordset update time.

create view

Your view will allow your users to access the data the 'friendly' way you want:

SELECT table_person.namePerson, table_test.dateTest, table_result.valueResult FROM tbl_person LEFT OUTER JOIN ...


At a minimum you need to have an autonumber column on the table. This is the only way you will be able to end up realting the records. Then you will want to add another column for parent id. YOu will then go through and populate this by finding the next record up with a name inthe name field and putting its autogenrated id number into the field. Then you will need to update the parent record with data in the child records joining on the id field in the parent to the parentid field in the child. All of this assumes the data went into the table inthe correct order which I'm not sure it did without seeing waht the data looked like or how you put it in.


Just for starters, if you have this in a DB, you should have incrementing row_id to keep the order of records. It may be actually easier to put this into Excel and "copy-down".


Running this VBA code in an Excel spreadsheet formmatted like your first table will produce your second table. Perhaps, export from Access to csv, import into Excel, run the macro then export to csv and re-import into Access.

Try it out, put your sample table above in Sheet1 of an Excel workbook, run the macro and the result will appear in sheet2. If you run the real table replace the constant '16' in the code below with the actual number of rows in the table.

Sub Normalize()

  Dim row, row2 As Integer
  Dim name, dt, test As String
  Dim wname, wdt, wtest As String

  row = 2
  row2 = 1

  While row < 16
    name = Sheet1.Cells(row, 1).value
    dt = Sheet1.Cells(row, 2).value
    test = Sheet1.Cells(row, 3).value
    If name <> "" Then
        wname = name
        wdt = ""
        wtest = ""
    End If
    If dt <> "" Then
        wdt = dt
        wtest = ""
    End If
    If test <> "" Then wtest = test
    If wname <> "" And wdt <> "" And wtest <> "" Then
        Sheet2.Cells(row2, 1).value = wname
        Sheet2.Cells(row2, 2).value = wdt
        Sheet2.Cells(row2, 3).value = wtest
        row2 = row2 + 1
    End If
    row = row + 1
  Wend

End Sub
0

精彩评论

暂无评论...
验证码 换一张
取 消