开发者

extracting strings from a file

开发者 https://www.devze.com 2023-02-19 10:50 出处:网络
Hi i have written a java program to get molecular function and biological process from a file if ID matches but im gettin StringIndexOutofBoundsException.

Hi i have written a java program to get molecular function and biological process from a file if ID matches but im gettin StringIndexOutofBoundsException. can any one please correct it? Thanks in advance. Here is my input:

chr11   RAP3_rep    mRNA    17114958    17117968    .   +   .   ID=Os11t0448200-01;Name=Os11t0448200-01;Gene_symbols=AM14;GO=Molecular Function: protein kinase activity (GO:0004672),Molecular Function: ATP binding (GO:0005524),Biological Process: protein amino acid phosphorylation (GO:0006468),Molecular Function: protein tyrosine kinase activity (GO:0004713),Molecular Function: protein serine/threonine kinase activity (GO:0004674);ID_converter=Os11g0448200;InterPro=Protein kinase, core (IPR000719),Tyrosine protein kinase (IPR001245),Serine/threonine protein kinase (IPR002290),Serine/threonine protein kinase, active site (IPR008271),Protein kinase-like (IPR011009),Serine/threonine protein kinase-related (IPR017442);Link_to=8185 (Oryzabase),Protein kinase%2C core (Plant Gene Family Database);Locus_id=Os11g0448200;Note=Arbuscular mycorrhizal specific marker 14.;ORF_evidence=Q53JE9 (UniProt);Transcript_evidence=Inferred from reference;Sequence_download=Os11t0448200-01;References=19033527%2C 15905328;Status=manual curation (Oct 29%2C 2010)
chr11   RAP3_rep    CDS 17114958    17115039    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17115846    17115869    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17115970    17116095    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17116205    17116546    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17116669    17116784    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17116880    17117140    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17117589    17117786    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17117891    17117968    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    mRNA    17565866    17568694    .   -   .   ID=Os11t0455500-01;Name=Os11t0455500-01;Alias=AK059712,AK060299,AK119539,AK122115;ID_converter=Os11g0455500;Link_to=S-adenosyl-L-homocysteine hydrolase (Plant Gene Family Database);Locus_id=Os11g0455500;NIAS_FLcDNA=001-032-F05;Note=Similar to Adenosylhomocysteinase-like protein.;ORF_evidence=Q84VE1 (UniProt);Transcript_evidence=AK059712 (DDBJ%2C Best hit);Sequence_download=Os11t0455500-01;InterPro=NAD(P)-binding (IPR016040),S-adenosyl-L-homocysteine hydrolase (IPR000043),S-adenosyl-L-homocysteine hydrolase%2C NAD binding (IPR015878);GO=Molecular Function: catalytic activity (GO:0003824),Molecular Function: binding (GO:0005488),Biological Process: metabolic process (GO:0008152),Molecular Function: adenosylhomocysteinase activity (GO:0004013),Biological Process: one-carbon compound metabolic process (GO:0006730);Expression=AK059712
chr11   RAP3_rep    CDS 17567891    17568694    .   -   .   Parent=Os11t0455500-01;
chr11   RAP3_rep    CDS 17566493    17567029    .   -   .   Parent=Os11t0455500-01;
chr11   RAP3_rep    CDS 17566191    17566400    .   -   .   Parent=Os11t0455500-01;

and program

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.InputStreamReader;
import java.io.ObjectInputStream.GetField;
import java.util.ArrayList;
import java.util.Scanner;

public class Sample
{
 public static void main(String args[]) throws FileNotFoundException
 {
   Sample s=new Sample();
   String inputID="Os11t0120200-01";

   //System.out.println("Enter the value");
   //Scanner sc=new Scanner(System.in);
   //n=sc.nextLong();

   ArrayList<String> IDlist=new ArrayList<String>();
   ArrayList<String> InputIDlist=new ArrayList<String>();
   int n;
   try
   {
     File nf=new File("textfile1.txt");
     FileOutputStream fop1=new FileOutputStream(nf,true);
     String os ="";

     FileInputStream fis1=new FileInputStream("chr11.gb");
     FileInputStream fis2=new FileInputStream("1.txt");
     InputStreamReader in1 = new InputStreamReader(fis1, "UTF-8");
     InputStreamReader in2 = new InputStreamReader(fis2, "UTF-8");
     BufferedReader input1 = new BufferedReader(in1);
     BufferedReader input2 =  new BufferedReader(in2);

     String line1;
     String line2;

     FileInputStream fis=new FileInputStream("chr11.GB");
     InputStreamReader in = new InputStreamReader(fis, "UTF-8");
     BufferedReader input = new BufferedReader(in);
     String line;

     File f=new File("1.GB");
     FileOutputStream fop=new FileOutputStream(f);

     if(f.exists())
     {
        os="This data is written through the program\t\n";
        fop1.write(os.getBytes());

        String str1="";
        String str2="";
        os="The data has been written\t\n";
        fop1.write(os.getBytes());

        while((line=input.readLine())!=null)
        {
          String splits[]=line.split("\t");
          if(splits[2].equalsIgnoreCase("mrna"))
          {
            IDlist.add((splits[8]));
          }
        }

        while((line=input2.readLine())!=null)
        {
          String splits[]=line.split("\t");
          if(splits[0]!="")
          {
            InputIDlist.add((splits[0]));
          }
        }
        for(int j=0; j<InputIDlist.size(); j++)
        {
          for(int i=0; i<IDlist.size(); i++)
          {
            if((IDlist.get(i).substring(3, 18).toString()).equals(InputIDlist.get(j)))
            {
              if(IDlist.get(i).contains("Alias"))
              {
                 os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Alias"),IDlist.get(i).lastIndexOf("ID_converter"))+"\t\n";
                 fop1.write(os.getBytes());
              }
              if(IDlist.get(i).contains("Biological Process"))
              {
                 //n=IDlist.get(i).lastIndexOf("Biological Process");
                 os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Biological Process"),IDlist.get(i).lastIndexOf(";"))+"\t\n";
                 fop1.write(os.getBytes());
              }
              if(IDlist.get(i).contains("Molecular Function"))
              {
                 //n=IDlist.get(i).lastIndexOf("Molecular Function");
                 os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Molecular Function"), IDlist.get(i).lastIndexOf(","))+"\t\n";
                 fop1.write(os.getBytes());
              }
              break;开发者_运维百科
            }
            String p="\n";
            fop1.write(p.getBytes());
          }
        }
     }
     else
     {
        System.out.println("This file is not exist");
     }
   }
   catch (Exception e)
   {
      e.printStackTrace();
   }
 }
}


I agree with the comments on the question, but I'll still try a guess:

Most likely, it is the following file (due to StringIndexOutOfBoundsException): IDlist.get(i).substring(3, 18). If this is shorter, you'd get that exception.

A reason for this might be this part:

if(splits[0]!="")  
{
   InputIDlist.add((splits[0]));
}

If splits[0] is empty, == might still not be true (and thus != might be true). Use !splits[0].equals("") here (or better !"".equals(splits[0]) to account for the possibility that splits[0] might ever be null). Note that == checks for reference equality, i.e. do both references point to the same object (in terms of C++, is it the same pointer), whereas equals checks for logical equality (might be differently implemented for each object).

Edit:

Another possibility for that exception would be one of those lines:

os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Alias"),IDlist.get(i).lastIndexOf("ID_converter"))

You check for "Alias", so lastIndexOf("Alias") should not return -1, but IDlist.get(i).lastIndexOf("ID_converter") might. If so, you are out of bounds.

Edit 2:

Yet another thing: Even if both Strings ("Alias" and "ID_converter") are in the source string, but in the wrong order ("ID_converter .... Alias"), you'd get that exception as well, since then begin index > end index which is not allowed (please read the JavaDoc on String.substring()).


Change:

if (IDlist.get(i).contains("Alias"))

To:

if ((IDlist.get(i).contains("Alias")) && (IDlist.get(i).contains("ID_converter")))

Any set a breakpoint to check why is the second condition is false if it doesnt go in to the if statement then.

0

精彩评论

暂无评论...
验证码 换一张
取 消