开发者

Succinct way of reading data from file into an immutable 2 dimensional array in Scala

开发者 https://www.devze.com 2023-02-10 21:36 出处:网络
What I am looking for is a succinct way of ending up with an immutable two dimensional array X and one dimensional array Y without first scanning the file to find out the dimensions of the data.

What I am looking for is a succinct way of ending up with an immutable two dimensional array X and one dimensional array Y without first scanning the file to find out the dimensions of the data.

The data, which consists of a header line followed by columnar double values, is in the following format

X0, X1, X2, ...., Y
0.1, 1.2, -0.2, ..., 1.1
0.2, 0.5, 0.4, ..., -0.3
-0.5, 0.3, 0.3, ..., 0.1

I have the following code (so far) for getting lines from a file and tokenizing each comma delimited line in order to get the samples. It currently doesn't fill in the X and Y arrays nor assign num and dimx

val X = new Array[Array[Double]](num,dimx)
val Y = new Array[Double](num)

def readDataFromFile(filename: String) {
    var firstTime = true
    val lines = fromFile(filename).getLines
    lines.foreach(line => {
        val tokens = line split(",")
        if(firstTime) {
            tokens.foreach(token => // get header titles and set dimx)
        开发者_运维百科    firstTime = false
        } else {
            println("data")
            tokens.foreach(token => //blah, blah, blah...)
        }
    })
}

Obviously this is an issue because, while I can detect and use dimx on-the-fly, I don't know num a priori. Also, the repeated tokens.foreach is not very elegant. I could first scan the file and determine the dimensions, but this seems like a nasty way to go. Is there a better way? Thanks in advance


There isn't anything built in that's going to tell you the size of your data. Why not have the method return your arrays instead of you declaring them outside? That way you can also handle error conditions better.

case class Hxy(headers: Array[String], x: Array[Array[Double]], y: Array[Double]) {}
def readDataFromFile(name: String): Option[Hxy] = {
  val lines = io.Source.fromFile(name).getLines
  if (!lines.hasNext) None
  else {
    val header = lines.next.split(",").map(_.trim)
    try {
      val xy = lines.map(_.split(",").map(_.trim.toDouble)).toArray
      if (xy.exists(_.length != header.length)) None
      else Some( Hxy(header, xy.map(_.init), xy.map(_.last)) )
    }
    catch { case nfe: NumberFormatException => None }
  }
}

Here, only if we have well-formed data do we get back the relevant arrays (helpfully packaged into a case class); otherwise, we get back None so we know that something went wrong.

(If you want to know why it didn't work, replace Option[Hxy] with something like Either[String,Hxy] and return Right(...) instead of Some(...) on success, Left(message) instead of None on failure.)


Edit: If you want the values (not just the array sizes) to be immutable, then you'd need to map everything to Vector somewhere along the way. I'd probably do it at the last step when you're placing the data into Hxy.


Array, as in Java is mutable. So you can't have immutable array. you need to choose between Array and immutablity. One way, how you can achieve your goal without foreaches and vars is similar to following:

// simulate the lines for this example
val lines = List("X,Y,Z,","1,2,3","2,5.0,3.4") 
val res = lines.map(_.split(",")).toArray


Use Array.newBuilder. I assume that the header has already been extracted.

val b = Array.newBuilder[Array[Double]]
lines.foreach { b += _.split(",").map(_.toDouble) }
val data = b.result

If you want to be immutable, take some immutable implementation of IndexedSeq (e.g. Vector) instead of Array; builders work on all collections.

0

精彩评论

暂无评论...
验证码 换一张
取 消