What I am looking for is a succinct way of ending up with an immutable two dimensional array X and one dimensional array Y without first scanning the file to find out the dimensions of the data.
The data, which consists of a header line followed by columnar double values, is in the following format
X0, X1, X2, ...., Y
0.1, 1.2, -0.2, ..., 1.1
0.2, 0.5, 0.4, ..., -0.3
-0.5, 0.3, 0.3, ..., 0.1
I have the following code (so far) for getting lines from a file and tokenizing each comma delimited line in order to get the samples. It currently doesn't fill in the X and Y arrays nor assign num and dimx
val X = new Array[Array[Double]](num,dimx)
val Y = new Array[Double](num)
def readDataFromFile(filename: String) {
var firstTime = true
val lines = fromFile(filename).getLines
lines.foreach(line => {
val tokens = line split(",")
if(firstTime) {
tokens.foreach(token => // get header titles and set dimx)
开发者_运维百科 firstTime = false
} else {
println("data")
tokens.foreach(token => //blah, blah, blah...)
}
})
}
Obviously this is an issue because, while I can detect and use dimx on-the-fly, I don't know num a priori. Also, the repeated tokens.foreach is not very elegant. I could first scan the file and determine the dimensions, but this seems like a nasty way to go. Is there a better way? Thanks in advance
There isn't anything built in that's going to tell you the size of your data. Why not have the method return your arrays instead of you declaring them outside? That way you can also handle error conditions better.
case class Hxy(headers: Array[String], x: Array[Array[Double]], y: Array[Double]) {}
def readDataFromFile(name: String): Option[Hxy] = {
val lines = io.Source.fromFile(name).getLines
if (!lines.hasNext) None
else {
val header = lines.next.split(",").map(_.trim)
try {
val xy = lines.map(_.split(",").map(_.trim.toDouble)).toArray
if (xy.exists(_.length != header.length)) None
else Some( Hxy(header, xy.map(_.init), xy.map(_.last)) )
}
catch { case nfe: NumberFormatException => None }
}
}
Here, only if we have well-formed data do we get back the relevant arrays (helpfully packaged into a case class); otherwise, we get back None
so we know that something went wrong.
(If you want to know why it didn't work, replace Option[Hxy]
with something like Either[String,Hxy]
and return Right(...)
instead of Some(...)
on success, Left(message)
instead of None
on failure.)
Edit: If you want the values (not just the array sizes) to be immutable, then you'd need to map everything to Vector
somewhere along the way. I'd probably do it at the last step when you're placing the data into Hxy
.
Array
, as in Java
is mutable. So you can't have immutable array. you need to choose between Array
and immutablity. One way, how you can achieve your goal without foreach
es and var
s is similar to following:
// simulate the lines for this example
val lines = List("X,Y,Z,","1,2,3","2,5.0,3.4")
val res = lines.map(_.split(",")).toArray
Use Array.newBuilder
. I assume that the header has already been extracted.
val b = Array.newBuilder[Array[Double]]
lines.foreach { b += _.split(",").map(_.toDouble) }
val data = b.result
If you want to be immutable, take some immutable implementation of IndexedSeq
(e.g. Vector
) instead of Array
; builders work on all collections.
精彩评论