I need to merge data from 2 different sources. The tables below illustrate what I have :
And the desired Output :
The idea is:
Look into the first col. of Table1 (TrialNO)
Look for it in the first col. of Table2.
Check that the values in the second col. are equal (in reality those col. won't be located at the same locations in the 2 lists)
If Check is pass append the values located in col. 3 & 4 (Cond1 & Cond2) to the line in Table2.
I don't think I will keep the headers in the real structure, so it should not represent an additional problem, but suggestions to deal with headers are welcome (whether to remove them and store them somewhere else or treat them in a special way)
**
EDIT : Giving precision on the shape of my data & my goals
**
I will give a little background on how I get those data to clarify its shape. I am sure it could be described technically in more accurate way. Please don`t hesitate to correct me.
I am recording eye-movements (saccades & fixations) and subjects answer to a task while displaying stimuli on a screen on.
- Each Trial consist of two consecutive displays of 3 seconds each. It is a 2AFC (Two-alternative forced choice).
- Each display consist of presenting a frame (about 1/4 size of the screen) with 8 shapes in it, displayed on 1 out of 4 quadrants of the screen.
- There is 5 conditions of what the frame itself is made of and thus 10 conditions possible for each trial (1 condition of frame against another without repetition).
- There are 2 measures : The choice of the subject & the eye-movements recorded while looking at the stimuli.
I get those data from 2 different sources :
- The "display" machine which provide
-Trial number/Display number
-Informations about the screen
-Conditions
-Subject Answer
-X & Y coordinate as well as size of the 11 object composing the stimuli displayed.
In this Matrix, each row is a Display so the DisplayNO Column Would go from 1 to 400 (1,2,3,4,...,400) while the TrialNO Columns actually goes from 1 to 200 (1,1,2,2,..,200,200) since there is 2 displays per trial.
- The "Eye-Tracking" machine which provides :
-Some similar info (Display number (1to 400), that will be used to merge the 2, condition number, that can be uses to check the mapping bet. the 2)
Then a massive amount of variables describing the eye-movements :
-Fixations and saccades durations, locations, timing etc. (about 100 columns)
In this Matrix, each row is a fixation. The saccade characteristics are then given in columns (previous and next saccade) And there can be from 1 to 30-50 fixations for each display. As a result, I could have 19 rows of data for the first display and 5 for the second.
The first step is to merge the 2 the 2 data structure to obtain a big one with each rows corresponding to a fixation.
Will have to do this for every subject then aggregate the subjects data on top of each other.
This is my plan to deal with this monster afterward (And this will explain my needs in the other questions) :
Extract the Header & columns number.
Present them by group in nice table form -General Info (Trial ID, Condition, Subject ID...), -Display Info (coordinates of objects on screen), -Fixations info etc...
Have for each of this variabe a summary of the data type (String, Number, text), the range, how many different values the columns takes and some basic descriptive statistics.
A system to extract parts of this set conditionall开发者_开发知识库y (For Example : Extract the condition number, fixations durations, for chosen display by a particular subject) That way I extract some well defined table I then run my analysis on without touching the original data.
If I used my precise situation to present my problem I believe, this could yield to a nice efficient and graphically easy to use tool to deal with a lot of data set in general.
Here is a possibility:
MergeTables[data1_, data2_, samepos1_, samepos2_] :=
Cases[data1,
x_ :> Block[{y =
Cases[data2, z_ /; z[[samepos2]] === x[[samepos1]]]},
Apply[Sequence, Join[x, Delete[#, Thread[{samepos2}]]] & /@ y]]]
Usage:
MergeTables[data2, data1, {1, 2}, {1, 2}]
I await a more detailed description of your data for better optimization.
checkMerge[src_, trg_, si_, ti_, sp_] :=
Module[{rls, ext},
rls = #[[si]] -> #[[sp]] & /@ src;
AppendTo[rls, _ -> {,}];
ext = Replace[trg[[All, ti]], Dispatch@rls, 1];
ArrayFlatten[{{trg, ext}}]
]
The syntax is:
src
= "source" list (data1)trg
= "target" list (data2)si
= list of indexes from source to compareti
= list of indexes from target to comparesp
= list of indexes from source to append to target
For your example, this would be:
checkMerge[data1, data2, {1,2}, {1,2}, {3,4}]
I had to guess the level of changes to accommodate from:
(in reality those col. won't be located at the same locations in the 2 lists)
Therefore, this may have too much or too little specificity.
At present
sp
must be a list of two indexes (column numbers) just because that made things a little simpler, and I am not sure what you want. Do you want to specify which elements are taken fromdata1
and appended todata2
, or should it be all elements after the compared ones, or something else?If there are standard values for
si
,ti
,sp
, defaults can be added so that you may omit these, unless different values are required.I assumed that it was okay to extend the rows that do not match with
Null
to create a rectangular array; these could be deleted afterward to produce a ragged array, if that is your desire.
The answer by Sasha is way too cool, I do not even know how it works yet.
Here is my attempt, and being a structural more than functional programmer, I used a Table[] (OMG!) to do it. Well, a Table[] is still sort of on the edge of functional programming :)
(Here A is data2, and B is data1)
n=Length[A]; m=Length[B];
isMatch[a_,b_] := a[[1]]=== b[[1]]&&a[[2]]===b[[2]]
A[[1]] = A[[1]]~Join~B[[1,3;;-1]]; (*do the header on its own*)
Table[If[ isMatch[B[[i]],A[[j]]],
A[[j]] = Join[A[[j]],B[[i,3;;-1]]]
],
{i,2,m},{j,2,n}
];
A//TableForm
--Nasser
精彩评论