I've come into ownership of several thousand lines of Matlab code, some as >900 line functions and a few directories full of function_name.m
files. It's hard to figure out what everything is doing (or relating to) or figure out the dependencies. What would you suggest to visualize the functions structure, such as what functions are called from which, and in what sequence?
Port to NumPy.
(Joke.)
Usually in Matlab you have some files written as functions, and some as scripts. Scripts do things like load the data you want to process, and feed it to the functions, and graph it.
To organize things I would start at the top level script and find out which functions do the loading, graphing, processing, etc. Keep the scripts in a top level directory and try to separate the functions out into subdirectories, according to the purpose of the function. Put dependencies of a function into the same subdirectory. Try to make it so that no code in a directory depends on anything in a parent directory (or cousin directory).
Whenever you figure out what a function does and what its arguments are, write a doc comment.
This assumes the person who wrote the code was reasonable. If not, Matlab makes it easy to plunk everything down into one directory and have everything depend on everything else in a rickety tower of code, so you may end up doing a lot of refactoring.
I have had to deal with this problem many times in my various roles at The MathWorks. This is what I do for the big pieces of MATLAB code:
- Back it up, maybe twice!
- Select all, Ctrl-I to smart indent
Select all, Ctrl-J to wrap comments
If I am feeling paper-based- Print all the files out, and get a set of highlighters- follow manually, highlighting long term variables and important function calls.
~~~ AND / OR ~~~
5 If I am feeling lucky, start running the code in the debugger, stepping through one line at a time (stepping into subfunctions that were user written)
At this point, I can go through and follow a typical flow through the control structure. I may not have a great idea what everything does, but I have a decent idea of what is going on.
Normally, my goal is to find a bug, solve it and move on. Your goals might be completely different. This is the method that I have used to quickly comprehend hundereds of different pieces of MATLAB code that I have been sent over the years.
Does your code come with decent help text? In that case, m2html is going to be a great help, since it allows you to create linked html help for easy browsing.
Furthermore, it allows you to make dependency graphs, which help you understand a bit more how you may want to organize the code.
MATLAB Programming Style Guidelines by Richard Johnson is a good resource.
some suggestions on Matlab coding convention:
use addpath to avert file cluttering and assist in functions taxnomony
break up section_ for functional scripts or set for conditional runs, this can also help in plug-in/ out modules and re-use or code referencing.
use a config file to turn on and off the options
- have an overview of the architectural set up of the constructs, as well as the modus operandi
- keep a status/ readme file ( treat yourself as a new-user, how would you assist in making it assimilable as part of a new-user's own module or part of the solutions seamlessly? if you come back to the code 3 months later feeling lost or unable to trace - something is wrong.) My suggestion: keep a journal to refine your thoughts on maintaining artful projects. Keep perfecting your art!
- for equations, use latex for documentations (and keep it in a nearby folder titled eg. documents, ensure they are easily accessible and traceable - if you have to use 'search' over your drive, something is wrong with the project management)
- break up codes into short modules for localization and shorter codes, with less scrolling, the codes will be easier to trace.
- use meaningful variables and function names (java style seems nice, eg. 'backedupDataForVerification'), do not stint to shorten the words, you will suffer later
- in designing, re-think if you should use function, scripting, or OO (object oriented)
- do not haste on premature optimatization, for speed matlab is not the best choice. If you really must, keep a none optimized version for side-to-side readibility comparison, troubleshooting and debug will not be less of a curse.
- Always, always, always comment your codes. Never use the excuse of having no time, you'll waste more time later.
for differentiation, consider setting a new node for code modifications, eg. set a tree to differentiate the versions.
use a separate folder for inputs/ outputs, images, intermediate results, etc.
use timestamp to trace your versions
share your codes with someone else, if they find it difficult to maintain, use or modify, rethink on how to refine your builds.
I agree with most of the comments about Matlab not being terribly supportive of modern software source code structuring but I don't believe it's too difficult to impose some of your own structure with a little discipline.
Organise your source files into a hierarchy of directories, as you would the source files for any program written in another programming language. You don't need to stick to a hierarchy, choose your own structure if you wish. Use the setpath command (or whatever the heck it is called) to tell Matlab where to look for your m files when you are working.
Acquaint yourself with the Matlab profiler tool which can give you call graphs (not terribly graphically, more like gprof's call graphs) which is some help in deciphering spaghetti code.
Of course, all our m files are in the repository and we serve them out of that. We keep a private toolbox on one of our networked drives and all users can call the 'released' code in that toolbox directly.
Back everything up is right. Create a pristine tarball of the original source tree, and then throw it all in source control so you can track and roll back your changes.
Have a look at Matlab's depfun() and depdir(), which detect static dependencies. It could help you see dependencies between Matlab functions. With "depfun -toponly" on all the files and a little string munging, you could build a list of immediate dependencies and throw that in a GraphViz file to produce a big directed graph of your codebase's call connections. Clusters in the graph could be a good place to divide the code around. (EDIT: See Jonas's solution; looks like m2html does this for you.)
If you have a lot of latitude to rewrite the code, consider rewriting some of the code as objects, using stateless utility classes with class methods and private functions as ways of packaging related functions together and providing some encapsulation. I've worked with largish Matlab codebases organized this way, and it works all right. In classic Matlab, classes are your only way of doing some sort of packages. I believe Matlab's newer OO system has namespace support, too.
If you don't want to convert the code to OO, you can organize related functions in subdirectories. That helps to organize it for source code browsing at least.
All the functions should have some doco in Matlab's standard helptext format, including an H1 line. If they don't, stick the comments on what you learn there. Then use the "contentsrpt" tool to automatically generate table of contents files for the classes or directories.
Good luck.
精彩评论