开发者

Complex multi-dimensional list operations in Scala

开发者 https://www.devze.com 2023-02-06 03:52 出处:网络
Given a list such as the following: val dane = List( (\"2011-01-04\", -137.76), (\"2011-01-04\", 2376.45),

Given a list such as the following:

val dane = List(
    ("2011-01-04", -137.76),
    ("2011-01-04", 2376.45),
    ("2011-01-04", -1.70),
    ("2011-01-04", -1.70),
    ("2011-01-04", -1.00),
    // ... skip a few ...
    ("2011-12-22", -178.02),
    ("2011-12-29", 1800.82),
    ("2011-12-23", -83.97),
    ("2011-12-24", -200.00),
    ("2011-12-24", -30.55),
    ("2011-12-30", 728.00)
)

I'd like to sum the values (i.e. the second item of the inner lists) of a specific month (e.g. January, or 01), using the following operations in the specified order:

    开发者_如何学Python
  1. groupBy
  2. slice
  3. collect
  4. sum


I'm feeling contrary, so here's an answer that uses NONE of the prescribed methods: groupBy, slice, collect or sum

Avoiding collect was the hardest part, condOpt/flatten is just so much uglier...

val YMD = """(\d\d\d\d)-(\d\d)-(\d\d)""".r

import PartialFunction._

(dane map {
  condOpt(_:(String,Double)){ case (YMD(_,"01",_), v) => v }  
}).flatten reduceLeft {_+_}


(for((YearMonthDay(_, 1, _), value)<-dane) yield value).sum

object YearMonthDay{
   def unapply(dateString:String):Option((Int, Int, Int)) ={ 
       //yes, there should really be some error checking in this extractor 
       //to return None for a bad date string
       val components = dateString.split("-")
       Some((components(0).toInt, components(1).toInt, components(2).toInt)) 
  }  

}


Now that Kevin has started the trend of contrary answers, here's one you should never use, but gosh, it works! (And avoids every requested method, and will work on any month if you change the string, but it does require that the list be sorted by date.)

dane.scanLeft(("2011-01",0.0))((l,r) =>
  ( l._1,
    if ((l._1 zip r._1).forall(x => x._1==x._2)) l._2+r._2 else 0.0
  )
).dropWhile(_._2==0).takeWhile(_._2 != 0.0).reverse.head._2


Break the problem up into smaller steps. Start with trying to split the list into one list for every month. You could use groupBy for this. Your first problem will probably be how to parse the date string. A general solution would be to use a custom date class and a regular expression; however a simpler ad-hoc solution of using an indexed substring (or slice) could be appropriate in this context.

A general tip would be to load the data into the Scala REPL and play around with it. Good luck.


import scala.collection.mutable.HashMap
val totals = new HashMap[Int, Double]
for (e <- dane) {
    val (date, value) = e
    val month = date.drop(5).take(2).toInt
    totals(month) = totals.getOrElse(month,0.0) + value
}

Another implementation using none of the suggested functions, and mutable collections and some bastard mix of procedural and functional style avoiding some useful functions :)

totals ends up as a map from month number to total.


So, here's an idea:

  • groupBy, because you need to group data from each month together
  • slice, because you need to see which is the month of the date
  • collect, because you need to filter by month and map to value
  • sum, mmmm... I'm not sure where this one comes in. Any ideas?


I refuse to obfuscate sum.

import org.joda.time.DateMidnight
for (month <- 1 to 12) yield {
  dane map { case (d,v) => new DateMidnight(d).getMonthOfYear -> v }
  filter { case (m, v) => m == month }
  map (_._2)
  sum
}


dane.groupBy (_._1.matches (".*-01-.*")).slice (0, 1).map (x => x._2).flatten .map (y => y._2).sum

I really should look up 'collect', which somehow should replace my map/flatten/map.

My result is: Double = 2234.29

0

精彩评论

暂无评论...
验证码 换一张
取 消