开发者

merging several data.frames of different column length and manipulating columns

开发者 https://www.devze.com 2023-03-08 09:08 出处:网络
I am using 9 files with different data (proteins per tissue data). Each file represents a different tissue and has values of proteins expression (as numbers). I am trying to merge the data into one da

I am using 9 files with different data (proteins per tissue data). Each file represents a different tissue and has values of proteins expression (as numbers). I am trying to merge the data into one data.frame. I used

read.delim("fileName.txt")  

for all the files. After that, i used a list for all the data fr开发者_运维技巧ames

l <- list(data.frame1,..etc)

Then I used the plyr library and the do.call(rbind.fill,l).

my questions:

1) I wish to loop through the list of 9 data.frames find the unique data in them and plot it in a histogram. If i find more than one entry with the same name but different tissue it should be added to the histogram each above the correct tissue label. That is - I go to the first data.frame in the list, from it I take out the first entry, search if this entry is found in one of the other data.frames and if so add it to the histogram.

The histogram has 9 tissues at the x axis and the y axis is the value from my files. I can't figure how to get the histogram (and the code) to change the name appropriately and how to display the bar in the correct place.

In addition i do not know how to build the axis to get the tissue names under each bar.

I have some basic code that is not doing what i want :

i=1

for( val in list2[1:9] )
{
    if( val appears in one of the other data.frames)
           plot a bar over the correct tissue.

    hist(val[i,8],breaks=11,col="blue",density=13,angle=45,
           labels=c("Lung","ErythroleukemicCellLine","TCells","Blood","liver",
           "BLimpho","pancreas","prostate","Bladder"), main=fileName[i,1])
    dev.new() #each hist in a new window
    i = i + 1

}

thank you yigeal

this are a few lines of the end of the output of the code: after reading the file in with read.delim("nameOfFile.txt")

 dput(BloodErythroleukemicCellLineFile)
 "Tax_Id=9606 Gene_Symbol=ZNF589 Uncharacterized protein", 
    "Tax_Id=9606 Gene_Symbol=ZNF598 Isoform 1 of Zinc finger protein 598", 
    "Tax_Id=9606 Gene_Symbol=ZNF609 Zinc finger protein 609", 
    "Tax_Id=9606 Gene_Symbol=ZNF610 Isoform 1 of Zinc finger protein 610", 
    "Tax_Id=9606 Gene_Symbol=ZNF613 Isoform 1 of Zinc finger protein 613", 
    "Tax_Id=9606 Gene_Symbol=ZNF614 Zinc finger protein 614", 
    "Tax_Id=9606 Gene_Symbol=ZNF622 Zinc finger protein 622", 
    "Tax_Id=9606 Gene_Symbol=ZNF625 Zinc finger protein 625", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 1 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 4 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF646 Isoform 1 of Zinc finger protein 646", 
    "Tax_Id=9606 Gene_Symbol=ZNF658B Zinc finger protein 658B", 
    "Tax_Id=9606 Gene_Symbol=ZNF667 Zinc finger protein 667, isoform CRA_a", 
    "Tax_Id=9606 Gene_Symbol=ZNF671 Zinc finger protein 671", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Isoform 1 of Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF691 cDNA FLJ56317, highly similar to Zinc finger protein 691", 
    "Tax_Id=9606 Gene_Symbol=ZNF700 Zinc finger protein 700", 
    "Tax_Id=9606 Gene_Symbol=ZNF714 Isoform 1 of Zinc finger protein 714", 
    "Tax_Id=9606 Gene_Symbol=ZNF72 Zinc finger protein 72 (Fragment)", 
    "Tax_Id=9606 Gene_Symbol=ZNF721 zinc finger protein 721", 
    "Tax_Id=9606 Gene_Symbol=ZNF76 Isoform 2 of Zinc finger protein 76", 
    "Tax_Id=9606 Gene_Symbol=ZNF782 Zinc finger protein 782", 
    "Tax_Id=9606 Gene_Symbol=ZNF787 Zinc finger protein 787", 
    "Tax_Id=9606 Gene_Symbol=ZNF800 Zinc finger protein 800", 
    "Tax_Id=9606 Gene_Symbol=ZNF827 21 kDa protein", "Tax_Id=9606 Gene_Symbol=ZNF828 Zinc finger protein 828", 
    "Tax_Id=9606 Gene_Symbol=ZNF837 Zinc finger protein 837", 
    "Tax_Id=9606 Gene_Symbol=ZNF878 Zinc finger protein 878", 
    "Tax_Id=9606 Gene_Symbol=ZNF891 Zinc finger protein 891", 
    "Tax_Id=9606 Gene_Symbol=ZNHIT2 Zinc finger HIT domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZP2 Zona pellucida sperm-binding protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZRANB2 Isoform 1 of Zinc finger Ran-binding domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZSWIM6 Zinc finger SWIM domain-containing protein 6", 
    "Tax_Id=9606 Gene_Symbol=ZUFSP 32 kDa protein", "Tax_Id=9606 Gene_Symbol=ZW10 Centromere/kinetochore protein zw10 homolog", 
    "Tax_Id=9606 Gene_Symbol=ZWINT ZW10 interactor", "Tax_Id=9606 Gene_Symbol=ZYG11B Isoform 1 of Protein zyg-11 homolog B", 
    "Tax_Id=9606 Gene_Symbol=ZYX cDNA FLJ53160, highly similar to Zyxin", 
    "Tax_Id=9606 Gene_Symbol=ZYX Uncharacterized protein", "Tax_Id=9606 Gene_Symbol=ZYX Zyxin"
    ), class = "factor")), .Names = c("proteinIdentifier", "protein", 
"spectra", "unique_peptides", "FDR", "local_FDR", "sequence_coverage", 
"expression_value", "expression_percentile", "organism", "tissue", 
"localization", "condition", "experiment", "annotation"), class = "data.frame", row.names = c(NA, 
-4802L))

it is much longer in the console


It is not easy to find the core of the problem in your question. For merging data frames using some common field (or fields) you can use the merge() function, like:

merge(dataframe1, dataframe2, by=c('column_name1','column_name2'), suffixes=c('.from_df1','.from_df2'))

If you want to select rows or columns, you can do it like this:

dataframe1[dataframe$column1 == 'some_value", c('col1', 'col2')]

etc... Does this help you?

0

精彩评论

暂无评论...
验证码 换一张
取 消