Small is beautiful, but is it also fast?_问答_开发者

I had an argument with a co-worker about implementation of simple string parser. One is "small", 10 lines of code, using c++ and streams, the other is 70 lines of code, using switch cases and iterating string char by char. We tested it over 1 million of iterations, and measured speed using time command. It appears that the long and ugly approach is 1 second faster on average.

The problem: Input: string

"v=spf1 mx include:_spf-a.microsoft.com include:_spf-b.microsoft.com include:_spf-c.microsoft.com include:_spf-ssg-a.microsoft.com ip4:131.107.115.212 ip4:131.107.115.215 ip4:131.107.115.214 ip4:205.248.106.64 ip4:205.248.106.30 ip4:205.248.106.32 ~all a:1.2.3.4"

Output: map<string, list<string>> with all the values for each key such as: ip4, include,a

example output of one iteration, on the input string given above:

key:a

1.2.3.4,

key:include

_spf-a.microsoft.com, _spf-b.microsoft.com, _spf-c.microsoft.com, _spf-ssg-a.microsoft.com,

key:ip4

131.107.115.212, 131.107.115.215, 131.107.115.214, 205.248.106.64, 205.248.106.30, 205.248.106.32,

The "small is beautiful" parser:

        istringstream iss(input);
        map<string, list<string> > data;
        string item;
        string key;
        string value;

        size_t pos;
        while (iss.good()) {
                iss >> item;
                pos = item.find(":");
                key = item.substr(0,pos);
                data[key].push_back(item.substr(pos+1));
        }

The second faster approach:

  typedef enum {I,Include,IP,A,Other} State;
  State state = Other;
  string line = input;
  string value;
  map<string, list<string> > data;
  bool end = false;
  size_t pos = 0;
  while (pos < line.length()) {
   switch (state) {
    case Other:
     value.clear();
     switch (line[pos]) {
      c开发者_StackOverflowase 'i':
       state = I;
       break;
      case 'a':
       state = A;
       break;
      default:
       while(line[pos]!=' ' && pos < line.length())
        pos++;
     }
     pos++;
     break;
    case I:
     switch (line[pos]) {
      case 'p':
       state = IP;
       break;
      case 'n':
       state = Include;
       break;
     }
     pos++;
     break;
    case IP:
     pos+=2;
     for (;line[pos]!=' ' && pos<line.length(); pos++) {
      value+=line[pos];
     }
     data["ip4"].push_back(value);
     state = Other;
     pos++;
     break;
    case Include:
     pos+=6;
     for (;line[pos]!=' ' && pos<line.length(); pos++) {
      value+=line[pos];
     }
     data["include"].push_back(value);
     state = Other;
     pos++;
     break;
    case A:
     if (line[pos]==' ')
      data["a"].push_back("a");
     else {
      pos++;
      for (;line[pos]!=' ' && pos<line.length(); pos++) {
       value+=line[pos];
      }
     }
     data["a"].push_back(value);
     state = Other;
     pos++;
     break;
   }
  }

I truly believe that "small is beautiful" is the way to go, and i dislike the longer code presented here, but it's hard to argue about it, when the code runs faster.

Can you suggest a ways to optimize or completely rewrite the small approach, in a way, where it stays small and beautiful but also runs faster?

Update: Added state definition and initialization. Context: the longer approach completes 1 million iterations on the same string in 15.2 seconds, the smaller code does the same in 16.5 seconds on average.

both versions compiled with g++ -O3, g++-4.4, ran on Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz, Linux Mint 10

The good side have won this battle :) I found small bug in the small program, it added even invalid values to the map, the ones that did not had the ":" colon in the string. After adding an "if" statement to check for the presence of colon, the smaller code runs faster, much faster. Now the timings are: "small and beautiful":12.3 and long and ugly: 15.2.

Small is beautiful :)

Smaller may not be faster. One example: bubble sort is very short, but it is O(n * n). QuickSort and MergeSort is longer and seems more complicated, but it is O(n log n).

But having said that... always make sure the code is readable, or if the logic is complicated, add good comments to it so that other people can follow.

Less lines of code you have; the better. Don't add 60 lines more if you really don't need to. If it's slow, profile. Then optimize. Don't optimize before you need it. If it runs fine, leave it as it is. Adding more code will add more bugs. You don't want that. Keep it short. Really.

Read this wiki post.

"Premature optimization is the root of all evil" - Donald Knuth, a pretty smart guy.

It is possible to write faster code by writing less of it, just more intelligently. One way to aid speed: do less.

Quoting Raymond Chen:

"One of the questions I get is, "My app is slow to start up. What are the super secret evil tricks you guys at Microsoft are using to get your apps to start up faster?" The answer is, "The super evil trick is to do less stuff." -- "Five Things Every Win32 Programmer Needs to Know" (16 Sept. 2005)

Also, check out why GNU grep is fast.

The second version may well be faster, but also note that it is highly tied to the content it is reading and if that changes, you have to change that atrocious mess of code. The first version is a bit more forgiving on this and may require less maintenance when something changes. Both will choke if the data is garbage/incorrect.

The real question to be asking is, is that extra second important? Does it matter?

And yes, look for ways of optimising the small/readable version. You may be losing a second, but you gain immediate clarity.

This is a single function, if the performance gain provides a significant benefit I would keep the ugly version. But as others mentioned, document it well and maybe include in a comment the small and beautiful version, so that people looking at the code can understand the choice.

In this specific case it looks like the choice has no impact on a large architecture, it's just the implementation of a "leaf function", so if there is a valid reason I won't keep the slow version just for the sake of code aesthetics.

I'm happy knowing that when I call the Sort function of some library it uses QuickSort instead of a two-lines super elegant but slow BubbleSort :-)

I truly believe that "small is beautiful" is the way to go, and i dislike the longer code presented here, but it's hard to argue about it, when the code runs faster.

No it isn't. Well it isn't as long as when you say one is one second faster than the other you mean something like one takes 10 seconds and the other takes 11 seconds rather than one takes 0.1 seconds and the other takes 1.1 seconds. Even then, if you only have to run the parse once when the program starts it may not be worth worrying about.

Always prefer the concise easy to understand code over the long winded opaque but faster version unless the performance gains can be demonstrated to be significant by profiling. Don't forget the programmer's time is worth something too. If you spend 10 minutes more trying to figure out what the bottom bit of code does, that's equivalent to the savings of 600 runs of the code.

One small optimization might be to avoid using the map [] operator and instead first use a find to see if the key already in the map and otherwise use an insert to add the new key. The two substr could also be combined into one to avoid unnecessary copying.

Also, I don't remember who said it first, it's easier to make correct code fast than fast code correct.

Also Knuth's quote about premature optimization is often taken out of context, he also said:

"The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can't debug or maintain their 'optimized' programs"

Consider the size of the:

Language standards
Tool chains used
Prerequisite knowledge to grok the code
Generated output
Compile times

Note these points apply to the subsets used if the C-style code was in fact compiled on a C++ compiler.

It's important to realise that time to pick up and start developing without the implicit magic of the various C++ standard library stuff you've used in the "small" version, may outweigh the time taken to code the "long" version.

Also to comment on the speed differences: extensive use of libraries will smooth out the differences in programming ability. To really squeeze out performance and "beauty" you will need to master the lowest level language applicable to the problem.

In my experience code written to check for boundary conditions, validations are very needed and yes does add more to the codebase in terms of no of lines but very needed to craft a production ready code.

I am finding it hard to work out what the second version does. state does not seem to be initialised at the start, for one thing.

I also find it hard to work out what the first one does where the token does not have a colon in it. It seems to just add it anyway with no value?

The first one seems generic, the second one specific to knowing what keys you are looking for.

If performance is everything here and you are looking for specific keys then you can optimise in various ways for example just have named lists that you are looking for and knowing which ones they are.

However before you go for performance, if this is something you read only once as configuration, then 1 second slower in a million iterations is a microsecond smaller in reality and not worth bothering about, and the generality (rather than the lines of code it has) makes it better.

I will disagree with the answers saying you should categorically choose the shorter, slower version; depending on the trade-off of performance gain and improved user experience vs. readability and maintenance cost of the code the faster version could be preferable.

Trust your own judgment!

Regarding this particular case, I have the following notes:

The two versions do not actually do the same thing. The shorter one handles all possible key names, while the longer one only supports a few hard coded keys.
I would guess (this should be profiled!) that the most of the time spent in both algorithms is in the construction/allocation of the std::map and std::list nodes. Changing to a different datastructure not requiring per-key and per-value allocations would probably dramatically speed up both implementations.

I'd take a crack at it with a lexer generator. From glancing at the input, I'd guess that most of the complexity amounts to figuring out what the tokens are. A simple tokenizer and a hand written state machine parser (I'd guess it would have about 2 or 3 states) should sum to about 20 lines of code:

extern string payload;

Output Process() {
   Output output;

   string key = "";
   bool value = false;
   while(!eof()) {
     switch(yylex()) {
      case tok_string:
       if (value) {
         output[key].push_back(payload);
         value = false;
         key = "";
       } else {
         key = payload;
       }
       break;

      case tok_colon:
       value = (key != "");
       break;
     }
   }
   return output;
}