I'd like to be able to transform an individual column in an incanter data set, and save the resulting data set to a new (csv) file. What is the simplest way to do that?
Essentially, I开发者_运维问答'd like to be able to map a function over a column in the data set, and replace the original column with this result.
You can define something like:
(defn map-data [dataset column fn]
(conj-cols (sel dataset :except-cols column)
($map fn column dataset)))
and use as
(def data (get-dataset :cars))
(map-data data :speed #(* % 2))
there is only one problem with changing of column names - I'll try to fix it, when I'll have free time...
Here are two similar functions, both column name and order preserving.
(defn transform-column [col-name f data]
(let [new-col-names (sort-by #(= % col-name) (col-names data))
new-dataset (conj-cols
(sel data :except-cols col-name)
(f ($ col-name data)))]
($ (col-names data) (col-names new-dataset new-col-names) )))
(defn transform-rows [col-name f data]
(let [new-col-names (sort-by #(= % col-name) (col-names data))
new-dataset (conj-cols
(sel data :except-cols col-name)
($map f col-name data))]
And here is an example illustrating the difference:
=> (def test-data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
=> (transform-column :a (fn [x] (map #(* % 2) x)) test-data)
[:a :b]
[2 2]
[6 4]
=> (transform-rows :a #(* % 2) test-data)
[:a :b]
[2 2]
[6 4]
transform-rows
is best for simple transformations, where as transform-column
is for when the transformation for one row is dependent on other rows (such as when normalizing a column).
Saving and loading CSV can be done with the standard Incanter functions, so a full example looks like:
(use '(incanter core io)))
(def data (col-names (read-dataset 'data.csv') [:a :b])
(save (transform-rows :a #(* % 2) data) 'transformed-data.csv')
Again: maybe you can use the internal structure of the dataset.
user=> (defn update-column
[dataset column f & args]
(->> (map #(apply update-in % [column] f args) (:rows dataset))
vec
(assoc dataset :rows)))
#'user/update-column
user=> d
[:col-0 :col-1]
[1 2]
[3 4]
[5 6]
user=> (update-column d :col-1 str "d")
[:col-0 :col-1]
[1 "2d"]
[3 "4d"]
[5 "6d"]
Again it should be checked in how far this is public API.
NOTE: this solution requires Incanter 1.5.3 or greater
For those who can use recent versions of Incanter...
add-column & add-derived-column were added to Incanter in 1.5.3 (pull request)
From the docs:
add-column
"Adds a column, with given values, to a dataset."
(add-column column-name values)
or
(add-column column-name values data)
Or you can use:
add-derived-column
"This function adds a column to a dataset that is a function of existing columns. If no dataset is provided, $data (bound by the with-data macro) will be used. f should be a function of the from-columns, with arguments in that order."
(add-derived-column column-name from-columns f)
or
(add-derived-column column-name from-columns f data)
a more complete example
(use '(incanter core datasets))
(def cars (get-dataset :cars))
(add-derived-column :dist-over-speed [:dist :speed] (fn [d s] (/ d s)) cars)
(with-data (get-dataset :cars)
(view (add-derived-column :speed**-1 [:speed] #(/ 1.0 %))))
精彩评论