开发者

Java Time Specific Average

开发者 https://www.devze.com 2023-01-07 15:45 出处:网络
I have a text file: DATE 20开发者_如何学Go090105 1 2.25 1.5 3 3.6 0.099 4 3.6 0.150 6 3.6 0.099 8 3.65 0.0499

I have a text file:

DATE 20开发者_如何学Go090105
1 2.25 1.5
3 3.6 0.099
4 3.6 0.150
6 3.6 0.099
8 3.65 0.0499
DATE 20090105
DATE 20090106
1 2.4 1.40
2 3.0 0.5
5 3.3 0.19
7 2.75 0.5
10 2.75 0.25
DATE 20090106
DATE 20090107
2 3.0 0.5
2 3.3 0.19
9 2.75 0.5
DATE 20100107

On each day I have:

Time Rating Variance

I want to work out the average variance at a specific time on the biggest time scale.

The file is massive and this is just a small edited sample. This means I don't know the latest time and the earliest time (it's around 2600) and the latest time may be around 50000.

So for example on all the days I only have 1 value at time t=1, hence that is the average variance at that time.

At time t=2, on the first day, the variance at time t=2 takes value 1.5 as it last until t=3, on the second day it takes value=0.5 and on the third day it takes value ((0.5+0.18)/2). So the avg variance over all the days at time t=2 is the sum of all the variances at that time, divided by the number of different variances at that time.

For the last time in the day, the time scale it takes is t=1.

I'm just wondering as to how I would even go about this.

As a complete beginner I'm finding this quite complicated. I am a Uni Student, but university is finished and I am trying to learn Java to help out with my Dads business over the summer. So any help with regards to solutions is greatly appreciated.


You have to follow below steps

  • Create a class with date and trv property
  • Craete a list of above class
  • Read the file using IO classes.
  • Read in chunks and convert to string
  • Split whole string by "DATE" and trim
  • Split by space (" ")
  • The first item would be your date.
  • Convert all other items to float and find average.
  • Add it to list. Now you have a list of daily average.
  • You can persist it to disk and query it for your required data.

EDIT you have edited your question and now it looks totaly diffrent. I think you need help in parsing the file. Correct me if i am wrong.


If I understand you correctly, you are after a moving average that is calculated on a stream of data. The following class I wrote provides some such statistics.

  • moving average
  • decaying average (reflects the average of the last few samples, based on the decay factor).
  • moving variance
  • decaying variance
  • min and max.

Hope it helps.

/**
 * omry 
 * Jul 2, 2006
 * 
 * Calculates:
 * 1. running average 
 * 2. running standard deviation.
 * 3. minimum
 * 4. maximum
 */
public class Statistics
{
    private double m_lastValue;
    private double m_average = 0;
    private double m_stdDevSqr = 0;

    private int m_n = 0;
    private double m_max = Double.NEGATIVE_INFINITY;
    private double m_min = Double.POSITIVE_INFINITY;

    private double m_total;

    // decay factor.
    private double m_d;
    private double m_decayingAverage;
    private double m_decayingStdDevSqr;

    public Statistics()
    {
        this(2);
    }

    public Statistics(float d)
    {
        m_d = d;
    }

    public void addValue(double value)
    {
        m_lastValue = value;
        m_total += value;

        // see http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
        m_n++;
        double delta = value - m_average;
        m_average = m_average + delta / (float)m_n;
        double md = (1/m_d);
        if (m_n == 1)
        {
            m_decayingAverage = value;
        }
        m_decayingAverage = (md * m_decayingAverage + (1-md)*value);

        // This expression uses the new value of mean
        m_stdDevSqr = m_stdDevSqr + delta*(value - m_average);

        m_decayingStdDevSqr = m_decayingStdDevSqr + delta*(value - m_decayingAverage);

        m_max = Math.max(m_max, value);
        m_min = Math.min(m_min, value);     
    }

    public double getAverage()
    {
        return round(m_average);
    }

    public double getDAverage()
    {
        return round(m_decayingAverage);
    }   

    public double getMin()
    {
        return m_min;
    }

    public double getMax()
    {
        return m_max;
    }

    public double getVariance()
    {
        if (m_n > 1)
        {
            return round(Math.sqrt(m_stdDevSqr/(m_n - 1)));
        }
        else
        {
            return 0;
        }
    }


    public double getDVariance()
    {
        if (m_n > 1)
        {
            return round(Math.sqrt(m_decayingStdDevSqr/(m_n - 1)));
        }
        else
        {
            return 0;
        }
    }

    public int getN()
    {
        return m_n;
    }

    public double getLastValue()
    {
        return m_lastValue;
    }

    public void reset()
    {
        m_lastValue = 0;
        m_average = 0;
        m_stdDevSqr = 0;
        m_n = 0;
        m_max = Double.NEGATIVE_INFINITY;
        m_min = Double.POSITIVE_INFINITY;
        m_decayingAverage = 0;
        m_decayingStdDevSqr = 0;
        m_total = 0;
    }

    public double getTotal()
    {
        return round(m_total);
    }

    private double round(double d)
    {
        return Math.round((d * 100))/100.0;
    }
}


I think i understand. You want to

  1. find the average variance at a given time t on each day - which is given by the highest timestamp on that day that is less than t
  2. deal with cases where multiple readings at the same time by averaging them.
  3. find the average variance on all days at time t

So I'd suggest, once you parse the data as @Manjoor suggested, then, (pseudocode!)

function getAverageAt(int t)
  float lastvariance = 0; // what value to start on, 
                        // if no variance is specified at t=1 on day 1
                        // also acts as accumulator if several values at one 
                        // timestamp
  float allDaysTotal = 0; // cumulative sum of the variance at time t for all days
  for each day {
    float time[], rating[], variance[];
    //read these from table
    int found=0; //how many values found at time t today
    for(int i=0;i<time.length;i++){
       if(time[i]<t) lastvariance=variance[i];  // find the most recent value
                        // before t.
                        // This relies on your data being in order!
       else if(time[i]==t){  // time 
         found++;
         if (found==1) lastvariance=variance[i]; // no previous occurrences today
         else lastvariance+=variance[i];
       }
       else if(time[i]>t) break;
    }
    if(found>1) lastvariance/=found;  // calculate average of several simultaneous
    // readings, if more than one value found today at time t.
    // Note that: if found==0, this means you're using a previous
    // timestamp's value.
    // Also note that, if at t=1 you have 2 values of variance, that 
    // averaged value will not continue over to time t. 
    // You could easily reimplement that if that's the behaviour you desire,
    // the code is similar, but putting the time<t condition along with the 
    // time==t condition 
    allDaysTotal+=lastvariance;
  }
  allDaysMean = allDaysTotal / nDays

Your problem isn't a simple one, as the catch-cases I pointed out show.


Ok, I've got a code which works. But it takes a very long time(around 7 months worth of day, with 30,000 variances a day) because it has to loop round so many times. Are there any other better suggestions?

I mean this code, for something seemingly simple, would take around 24-28 hours...

package VarPackage;

import java.io.BufferedReader; import java.io.FileReader; import java.util.ArrayList;

public class ReadText {

public static void main(String[] args) throws Exception {
    String inputFileName="C:\\MFile";


    ArrayList<String> fileLines = new ArrayList<String>();
    FileReader fr;
    BufferedReader br;

    // Time
    int t = 1;


    fr = new FileReader(inputFileName);
    br = new BufferedReader(fr);
    String line;


    while ((line=br.readLine())!=null) {
     fileLines.add(line);
    }

    AvgVar myVar = new AvgVar(fileLines);

    for(t=1; t<10; t++){ 
    System.out.print("Average Var at Time t=" + t + " = " + myVar.avgVar(t)+"\n");

}

} }

===================================

NewClass

package VarPackage;

import java.util.ArrayList;

public class AvgVar { // Class Variables private ArrayList inputData = new ArrayList();

// Constructor AvgVar(ArrayList fileData){ inputData = fileData; }

public double avgVar(int time){

 double avgVar = 0;

 ArrayList<double[]> avgData = avgDuplicateVars(inputData);

 for(double[] arrVar : avgData){
 avgVar += arrVar[time-1];
 //System.out.print(arrVar[time-1] + "," + arrVar[time] + "," + arrVar[time+1] + "\n");
 //System.out.print(avgVar + "\n");
 }

 avgVar /= numDays(inputData);

 return avgVar;
}

private int numDays(ArrayList<String> varData){

 int n = 0;
 int flag = 0;

for(String line : varData){

String[] myData = line.split(" ");

if(myData[0].equals("DATE") && flag == 0){

    flag = 1;

   }
   else if(myData[0].equals("DATE") && flag == 1){

    n = n + 1;
    flag = 0;

   }

}

return n;

}

private ArrayList<double[]> avgDuplicateVars(ArrayList<String> varData){

 ArrayList<double[]> avgData = new ArrayList<double[]>();

 double[] varValue = new double[86400];
 double[] varCount = new double[86400];

 int n = 0;
 int flag = 0;

for(String iLine : varData){

String[] nLine = iLine.split(" ");
   if(nLine[0].equals("DATE") && flag == 0){

    for (int i=0; i<86400; i++){
    varCount[i] = 0;
    varValue[i] = 0;
    }

    flag = 1;

   }
   else if(nLine[0].equals("DATE") && flag == 1){

    for (int i=0; i<86400; i++){
    if (varCount[i] != 0){
    varValue[i] /= varCount[i];
    }
    }

    varValue = fillBlankSpreads(varValue, 86400);

    avgData.add(varValue.clone());

    flag = 0;

   }
   else{

    n = Integer.parseInt(nLine[0])-1;

    varValue[n] += Double.parseDouble(nLine[2]);
    varCount[n] += 1;

   }

}

return avgData;

}

private double[] fillBlankSpreads(double[] varValue, int numSpread){
//Filling the Data with zeros to make the code faster
 for (int i=1; i<numSpread; i++){
 if(varValue[i] == 0){
 varValue[i] = varValue[i-1];
 }
 }

 return varValue;
}

}

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号