Best strategies for reading J code_问答_开发者

I've been using J for a few months now, and I find that reading unfamiliar code (e.g. that I didn't write myself) is one of the most challenging aspects of the language, particularly when it's in tacit. After a while, I came up with this strategy:

1) Copy the code segment into a word document

2) Take each operator from (1) and place it on a separate line, so that it reads vertically

3) Replace each operator with its verbal description in the Vocabulary page

4) Do a rough translation from J syntax into English grammar

5) Use the translation to identify conceptually related components and separate them with line breaks

6) Write a description of what each component from (5) is supposed to do, in plain English prose

7) Write a description of what the whole program is supposed to do, based on (6)

开发者_StackOverflow中文版

8) Write an explanation of why the code from (1) can be said to represent the design concept from (7).

Although I learn a lot from this process, I find it to be rather arduous and time-consuming -- especially if someone designed their program using a concept I never encountered before. So I wonder: do other people in the J community have favorite ways to figure out obscure code? If so, what are the advantages and disadvantages of these methods?

EDIT:

An example of the sort of code I would need to break down is the following:

binconv =: +/@ ((|.@(2^i.@#@])) * ]) @ ((3&#.)^:_1)

I wrote this one myself, so I happen to know that it takes a numerical input, reinterprets it as a ternary array and interprets the result as the representation of a number in base-2 with at most one duplication. (e.g., binconv 5 = (3^1)+2*(3^0) -> 1 2 -> (2^1)+2*(2^0) = 4.) But if I had stumbled upon it without any prior history or documentation, figuring out that this is what it does would be a nontrivial exercise.

Just wanted to add to Jordan's Answer : if you don't have box display turned on, you can format things this way explicitly with 5!:2

   f =. <.@-:@#{/:~
   5!:2 < 'f'
┌───────────────┬─┬──────┐
│┌─────────┬─┬─┐│{│┌──┬─┐│
││┌──┬─┬──┐│@│#││ ││/:│~││
│││<.│@│-:││ │ ││ │└──┴─┘│
││└──┴─┴──┘│ │ ││ │      │
│└─────────┴─┴─┘│ │      │
└───────────────┴─┴──────┘

There's also a tree display:

   5!:4 <'f'
              ┌─ <.
        ┌─ @ ─┴─ -:
  ┌─ @ ─┴─ #       
──┼─ {             
  └─ ~ ─── /:

See the vocabulary page for 5!: Representation and also 9!: Global Parameters for changing the default.

Also, for what it's worth, my own approach to reading J has been to retype the expression by hand, building it up from right to left, and looking up the pieces as I go, and using identity functions to form temporary trains when I need to.

So for example:

   /:~ i.5
0 1 2 3 4
   NB. That didn't tell me anything
   /:~ 'hello'
ehllo
   NB. Okay, so it sorts. Let's try it as a train:
   [ { /:~ 'hello'
┌─────┐
│ehllo│
└─────┘
   NB. Whoops. I meant a train:
   ([ { /:~) 'hello'
|domain error
|       ([{/:~)'hello'
   NB. Not helpful, but the dictionary says
   NB. "{" ("From") wants a number on the left.
   (0: { /:~) 'hello'
e
   (1: { /:~) 'hello'
h
   NB. Okay, it's selecting an item from the sorted list.
   NB. So f is taking the ( <. @ -: @ # )th item, whatever that means...
   <. -: # 'hello'
2
   NB. ??!?....No idea. Let's look up the words in the dictionary.
   NB. Okay, so it's the floor (<.) of half (-:) the length (#)
   NB. So the whole phrase selects an item halfway through the list.
   NB. Let's test to make sure.
   f 'radar' NB. should return 'd'
d
   NB. Yay!

addendum:

   NB. just to be clear:
   f 'drara' NB. should also return 'd' because it sorts first
d

Try breaking the verb up into its components first, and then see what they do. And rather than always referring to the vocab, you could simply try out a component on data to see what it does, and see if you can figure it out. To see the structure of the verb, it helps to know what parts of speech you're looking at, and how to identify basic constructions like forks (and of course, in larger tacit constructions, separate by parentheses). Simply typing the verb into the ijx window and pressing enter will break down the structure too, and probably help.

Consider the following simple example: <.@-:@#{/:~

I know that <. -: # { and /: are all verbs, ~ is an adverb, and @ is a conjunction (see the parts of speech link in the vocab). Therefore I can see that this is a fork structure with left verb <.@-:@# , right verb /:~ , and dyad { . This takes some practice to see, but there is an easier way, let J show you the structure by typing it into the ijx window and pressing enter:

   <.@-:@#{/:~
+---------------+-+------+
|+---------+-+-+|{|+--+-+|
||+--+-+--+|@|#|| ||/:|~||
|||<.|@|-:|| | || |+--+-+|
||+--+-+--+| | || |      |
|+---------+-+-+| |      |
+---------------+-+------+

Here you can see the structure of the verb (or, you will be able to after you get used to looking at these). Then, if you can't identify the pieces, play with them to see what they do.

   10?20
15 10 18 7 17 12 19 16 4 2
   /:~ 10?20
1 4 6 7 8 10 11 15 17 19
   <.@-:@# 10?20
5

You can break them down further and experiment as needed to figure them out (this little example is a median verb).

J packs a lot of code into a few characters and big tacit verbs can look very intimidating, even to experienced users. Experimenting will be quicker than your documenting method, and you can really learn a lot about J by trying to break down large complex verbs. I think I'd recommend focusing on trying to see the grammatical structure and then figure out the pieces, building it up step by step (since that's how you'll eventually be writing tacit verbs).

(I'm putting this in the answer section instead of editing the question because the question looks long enough as it is.)

I just found an excellent paper on the jsoftware website that works well in combination with Jordan's answer and the method I described in the question. The author makes some pertinent observations:

1) A verb modified by an adverb is a verb.

2) A train of more than three consecutive verbs is a series of forks, which may have a single verb or a hook at the far left-hand side depending on how many verbs there are.

This speeds up the process of translating a tacit expression into English, since it lets you group verbs and adverbs into conceptual units and then use the nested fork structure to quickly determine whether an instance of an operator is monadic or dyadic. Here's an example of a translation I did using the refined method:

d28=: [:+/\{.@],>:@[#(}.-}:)@]%>:@[

[: +/\

{.@] ,

>:@[ #

(}.-}:)@] %

>:@[

cap (plus infix prefix)

(head atop right argument) ravel

(increment atop left argument) tally

(behead minus curtail) atop right argument

divided by

increment atop left argument
the partial sums of the sequence defined by

the first item of the right argument, raveled together with

(one plus the left argument) copies of

(all but the first element) minus (all but the last element)

of the right argument, divided by

(one plus the left argument).
the partial sums of the sequence defined by

starting with the same initial point,

and appending consecutive copies of points derived from the right argument by

subtracting each predecessor from its successor

and dividing the result by the number of copies to be made
Interpolating x-many values between the items of y

I just want to talk about how I read: <.@-:@#{/:~

First off, I knew that if it was a function, from the command line, it had to be entered (for testing) as

(<.@-:@#{/:~)

Now I looked at the stuff in the parenthesis. I saw a /:~, which returns a sorted list of its arguments, { which selects an item from a list, # which returns the number of items in a list, -: half, and <., floor...and I started to think that it might be median, - half of the number of items in the list rounded down, but how did # get its arguments? I looked at the @ signs - and realized that there were three verbs there - so this is a fork. The list comes in at the right and is sorted, then at the left, the fork got the list to the # to get the number of arguments, and then we knew it took the floor of half of that. So now we have the execution sequence:

sort, and pass the output to the middle verb as the right argument.

Take the floor of half of the number of elements in the list, and that becomes the left argument of the middle verb.

Do the middle verb.

That is my approach. I agree that sometimes the phrases have too many odd things, and you need to look them up, but I am always figuring this stuff out at the J instant command line.

Personally, I think of J code in terms of what it does -- if I do not have any example arguments, I rapidly get lost. If I do have examples, it's usually easy for me to see what a sub-expression is doing.

And, when it gets hard, that means I need to look up a word in the dictionary, or possibly study its grammar.

Reading through the prescriptions here, I get the idea that this is not too different from how other people work with the language.

Maybe we should call this 'Test Driven Comprehension'?