I'm learning a little C over the holiday weekend, and I started to look at other programs written in C. I ended up looking at GNU Netcat, thinking it would be a good example.
I was a bit shocked to see a 600 line main()
function. Is this normal? If it is normal is this considered good C coding practices?
There's a quote of an American President (Lincoln?) who was asked how long a man's legs should be. "Long enough to reach from his body to the ground," he said.
Getting back on topic:
Authors of books like "Clean Code" advertise that every function do only one thing (that's grossly simplified by me here), so in theory your main()
should maybe call an initialization function and then another function orchestrating the work of the application, and that's all.
In practice, many programmers find a lot of tiny functions irritating. A perhaps more useful metric is that a function should usually fit on one screen, if only to make it easier to see and think about.
If a program is complex and most of its functionality is in main()
, someone hasn't done a decent job of breaking the problem down. Essentially you should strive for manageability, understandability and readability. There's usually no good reason for a main() to be huge.
I often find on certain kinds of applications that main() has hundreds of lines of initialization followed by about 20 lines of top-level loop.
It's my habit not to break functions out until I need to call them twice. This sometimes leads to me writing a 300 line function, but as soon as I see the same block occur twice I break that block out.
As for main, initialization routines are often once so 600 lines does not sound unreasonable.
A 600 line main is a bit of a warning sign. But if you look at it and can't see any way of breaking it up into smaller pieces other than to do this.
void the_first_part_of_main(args...);
void the_second_part_of_main(args...);
...
main()
{
the_first_part_of_main();
the_second_part_of_main();
...
}
Then you should leave it alone.
Regardless of the language, I would try and restrict a subroutine method to roughly what's visible in one page of code, and extract functionality to subroutines wherever possible.
600 lines sounds quite long for any implementation. Perhaps there's some overriding reason wrt. passing arguments around and clarity (I've not looked at the example you've posted) but it sounds to be at the far end of what's normally practised, and it should be possible to subdivide this task.
I suspect it's been developed by continual incremental addition of functionality over the years, and nobody has stopped and refactored this to be more readable/maintainable. If there are no unit tests for this (and in my experience main()
methods don't often get written tests - for whatever reasons) then there's going to be an understandable reluctance to refactor it.
The limit is the editor window...
Heh, that's horrible but I've seen worse. I've seen large, multi-thousand line fortran programs with no subroutines at all.
I believe the answer is: it should fit in an editor window and it should have low cyclomatic complexity.
If a main program is just a series of function calls or computations, then I suppose it could be as long as necessary and it could have an exemption from the editor window constraint. Even then, I would be a bit surprised that there was not a natural way to extract meaningful discrete methods.
But if it is testing and branching and return
ing and break
ing and continue
-ing then it needs to be broken up into individual and individually tested functional components.
Hopefully they are planning on refactoring. This looks very rough.
443 while (optind < argc) {
444 const char *get_argv = argv[optind++];
445 char *q, *parse = strdup(get_argv);
446 int port_lo = 0, port_hi = 65535;
447 nc_port_t port_tmp;
448
449 if (!(q = strchr(parse, '-'))) /* simple number? */
450 q = strchr(parse, ':'); /* try with the other separator */
451
452 if (!q) {
453 if (netcat_getport(&port_tmp, parse, 0))
454 netcat_ports_insert(old_flag, port_tmp.num, port_tmp.num);
455 else
456 goto got_err;
457 }
458 else { /* could be in the forms: N1-N2, -N2, N1- */
459 *q++ = 0;
460 if (*parse) {
461 if (netcat_getport(&port_tmp, parse, 0))
462 port_lo = port_tmp.num;
463 else
464 goto got_err;
465 }
466 if (*q) {
467 if (netcat_getport(&port_tmp, q, 0))
468 port_hi = port_tmp.num;
469 else
470 goto got_err;
471 }
472 if (!*parse && !*q) /* don't accept the form '-' */
473 goto got_err;
474
475 netcat_ports_insert(old_flag, port_lo, port_hi);
476 }
477
478 free(parse);
479 continue;
480
481 got_err:
482 free(parse);
483 ncprint(NCPRINT_ERROR, _("Invalid port specification: %s"), get_argv);
484 exit(EXIT_FAILURE);
485 }
By some standards a 600 line function of any sort is a bad idea, but there is no reason main should be treated any differently to any other function.
The only reason I can think of such a situation arising is where a program is developed rapidly, and as it grows no one ever bothers to split it up into more logical units.
As short as possible. Usually, whenever there's an operation I can assign a name to, I create a new method for it.
I would say your routines should be as long/short as necessary to be effective and reliably and automatically tested. A 600-statement routine likely has multiple paths through it and the combinations of routines might get very large very quickly. I try to break down functions into someting that make it easily readable. Functions are either "functional" or "narrative." All the while including unit tests.
My personal coding style is to try and only use the the main function for command-line argument parsing, and any big-ticket initialization that the program needs.
Nearly all 600-line functions I have seen were also stupidly written. This doesn't have to be so.
However, in these cases it was just a failure of a programmer to present some zoomed-out view, and give meaningful names to sections - both high-level (like, Initialize()) and low-level (something that takes a common 3-line pattern and hides it under one name, with parameters).
In cases of extreme stupidity, they were optimizing function call performance when it was not required.
main()
, like any function, should be exactly as big as it needs to be. "As it needs to be" will vary a lot depending on what it needs to do. Having said that, it shouldn't have to be more than a couple of hundred lines in most cases. 600 lines is a bit on the hefty side, and some of that could / should probably be refactored into separate functions.
For an extreme example, one team I was on was tasked with speeding up some code to drive a 3d display. The code was originally written by a wirehead who was obviously taught himself programming using old-school FORTRAN; main()
was over five thousand lines of code, with random bits #include
ed here and there. Instead of breaking code out into functions, he'd simply branch to a subroutine within main()
via goto
(somewhere between 13 and 15 gotos, branching both directions seemingly at random). As a first step we simply turned on level 1 optimization; the compiler promptly swallowed up all available memory and swap space and panicked the kernel. The code so brittle that we couldn't make any changes without breaking something. We finally told the customer they had two choices: allow us to rewrite the entire system from scratch or buy faster hardware.
They bought faster hardware.
精彩评论