开发者

Functional approach to parse hierarchical CSV

开发者 https://www.devze.com 2022-12-21 19:30 出处:网络
I\'m trying to create a piece of code but cannot get it working. The simplest example I can think of is parsing some CSV file.

I'm trying to create a piece of code but cannot get it working. The simplest example I can think of is parsing some CSV file. Suppose we have a CVS file, but the data is organized in so开发者_如何学Cme kind of hierarchy in it. Like this:

Section1;
        ;Section1.1
        ;Section1.2
        ;Section1.3
Section2;
        ;Section2.1
        ;Section2.2
        ;Section2.3
        ;Section2.4

etc.

I did this:

let input = 
"a;
;a1
;a2
;a3
b;
;b1
;b2
;b3
;b4
;b5
c;
;c1"

let lines = input.Split('\n') 
let data = lines |> Array.map (fun l -> l.Split(';'))

let sections = 
  data 
  |> Array.mapi (fun i l -> (i, l.[0])) 
  |> Array.filter (fun (i, s) -> s <> "")

and I got

val sections : (int * string) [] = [|(0, "a"); (4, "b"); (10, "c")|]

Now I'd like to create a list of line index ranges for each section, something like this:

[|(1, 3, "a"); (5, 9, "b"); (11, 11, "c")|]

with the first number being a starting line index of the subsection range and the second - the ending line index. How do I do that? I was thinking about using fold function, but couldn't create anything.


As far as I know, there is no easy way to do this, but it is definitely a good way to practice functional programming skills. If you used some hierarchical representation of data (e.g. XML or JSON), the situation would be a lot easier, because you wouldn't have to transform the data structure from linear (e.g. list/array) to hierarchical (in this case, a list of lists).

Anyway, a good way to approach the problem is to realize that you need to do some more general operation with the data - you need to group adjacent elements of the array, starting a new group when you find an line with a value in the first column.

I'll start by adding a line number to the array and then convert it to list (which is usually easier to work with in F#):

let data = lines |> Array.mapi (fun i l -> 
  i, l.Split(';')) |> List.ofSeq

Now, we can write a reusable function that groups adjacent elements of a list and starts a new group each time the specified predicate f returns true:

let adjacentGroups f list =
  // Utility function that accumulates the elements of the current 
  // group in 'current' and stores all groups in 'all'. The parameter
  // 'list' is the remainder of the list to be processed
  let rec adjacentGroupsUtil current all list =
    match list with
    // Finished processing - return all groups
    | [] -> List.rev (current::all) 
    // Start a new group, add current to the list
    | x::xs when f(x) -> 
      adjacentGroupsUtil [x] (current::all) xs
    // Add element to the current group
    | x::xs ->
      adjacentGroupsUtil (x::current) all xs

  // Call utility function, drop all empty groups and
  // reverse elements of each group (because they are
  // collected in a reversed order)
  adjacentGroupsUtil [] [] list
    |> List.filter (fun l -> l <> [])
    |> List.map List.rev

Now, implementing your specific algorithm is relatively easy. We first need to group the elements, starting a new group each time the first column has some value:

let groups = data |> adjacentGroups (fun (ln, cells) -> cells.[0] <> "")

In the second step, we need to do some processing for each group. We take its first element (and pick the title of the group) and then find the minimal and maximal line number among the remaining elements:

groups |> List.map (fun ((_, firstCols)::lines) ->
  let lineNums = lines |> List.map fst
  firstCols.[0], List.min lineNums, List.max lineNums )

Note that the pattern matching in the lambda function will give a warning, but we can safely ignore that because the group will always be non-empty.

Summary: This answer shows that if you want to write elegant code, you can implement your reusable higher order function (such as adjacentGroups), because not everything is available in the F# core libraries. If you use functional lists, you can implement it using recursion (for arrays, you'd use imperative programming as in the answer by gradbot). Once you have a good set of reusable functions, most of the problems are easy :-).


In general when you only work with arrays you force yourself to use mutable and imperative style code. I made a generic Array.splitBy function to group together different sections. If you're going to write your own parser then I suggest using List and other high level constructs.

module Question
open System

let splitArrayBy f (array:_[]) =
    [|
        let i = ref 0
        let start = ref 0
        let last = ref [||]

        while !i < array.Length do
            if f array.[!i] then
                yield !last, array.[!start .. !i - 1]
                last := array.[!i]
                start := !i + 1

            i := !i + 1

        if !start <> !i then
            yield !last, array.[!start .. !i - 1]
    |]

let input = "a;\n;a1\n;a2\n;a3\nb;\n;b1\n;b2\n;b3\n;b4\n;b5\nc;\n;c1"
let lines = input.Split('\n') 
let data = lines |> Array.map (fun l -> l.Split(';'))
let result = data |> splitArrayBy (fun s -> s.[0] <> "")

Array.iter (printfn "%A") result

Will output the following.

([||], [||])
([|"a"; ""|], [|[|""; "a1"|]; [|""; "a2"|]; [|""; "a3"|]|])
([|"b"; ""|], [|[|""; "b1"|]; [|""; "b2"|]; [|""; "b3"|]; [|""; "b4"|]; [|""; "b5"|]|])
([|"c"; ""|], [|[|""; "c1"|]|])

Here is a slight modification from the above to produce the example output.

let splitArrayBy f (array:_[][]) =
    [|
        let i = ref 0
        let start = ref 0
        let last = ref ""
        while !i < array.Length do
            if f array.[!i] then
                if !i <> 0 then
                    yield !start, !i - 1, !last
                last := array.[!i].[0]
                start := !i + 1
            i := !i + 1
        if !start <> !i then
            yield !start, !i - 1, !last
    |]

let input = "a;\n;a1\n;a2\n;a3\nb;\n;b1\n;b2\n;b3\n;b4\n;b5\nc;\n;c1"
let lines = input.Split('\n') 
let data = lines |> Array.map (fun l -> l.Split(';'))
let result = data |> splitArrayBy (fun s -> s.[0] <> "")

(printfn "%A") result

Output

[|(1, 3, "a"); (5, 9, "b"); (11, 11, "c")|]


the JSON structure would appear to be ideal for you; parsers and converters are already availible.

read about it here: http://msdn.microsoft.com/en-us/library/bb299886.aspx

edit: for some reason i saw j#, perhaps it still applies in f#..

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号