开发者

Stupid mistakes in C. Break, Switch, If. 1990 Crash of Telephone Network [closed]

开发者 https://www.devze.com 2023-03-03 14:14 出处:网络
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical andcannot be reasonably answered in its current form. For help clari
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

I was hesitating to ask this, since it seems very easy.

What is wrong in this pseudocode?

In the switching software (written in C), there was;

  • a long "do... while" construct, which contained
  • a "switch" statement, which contained
  • an "if' clause, which contained
  • a "break," which was intended for the "if" clause
    • but instead broke from the "switch" statement.

This caused a crash of the telephone system in 1990 (See: http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/att_collapse.html).

I need a very simple, explanation, why this code is wrong. I think the most simple answer is that within a if clause a break is not possible? So 开发者_StackOverflow中文版what statement needs to be written instead of a break within a if clause for getting the wanted effect, which is breaking the if clause?


I suspect that the description / pseudo-code is incorrect when it says:

  • a "break," which was intended for the "if" clause

It would make sense if that was meant to be:

  • a break, which was intended to terminate the do while loop

The problem description then makes sense.

do
{
    ...
    switch (...)
    {
    case ...:
        ...
        break;
    ...
    case ...:
        ...
        if (critical_condition())
            break;  // Intended to exit loop - actually exits switch only
        ...
        break;      // Terminates the case in the switch
     }
 } while (!time_to_stop());

Reading the URL referenced in the question, the pseudo-code there is:

In pseudocode, the program read as follows:

1  while (ring receive buffer not empty 
          and side buffer not empty) DO

2    Initialize pointer to first message in side buffer
     or ring receive buffer

3    get copy of buffer

4    switch (message)

5       case (incoming_message):

6             if (sending switch is out of service) DO

7                 if (ring write buffer is empty) DO

8                     send "in service" to status map

9                 else

10                    break

                  END IF

11           process incoming message, set up pointers to
             optional parameters

12           break
       END SWITCH

13   do optional parameter work

When the destination switch received the second of the two closely timed messages while it was still busy with the first (buffer not empty, line 7), the program should have dropped out of the if clause (line 7), processed the incoming message, and set up the pointers to the database (line 11). Instead, because of the break statement in the else clause (line 10), the program dropped out of the case statement entirely and began doing optional parameter work which overwrote the data (line 13). Error correction software detected the overwrite and shut the switch down while it couls [sic] reset. Because every switch contained the same software, the resets cascaded down the network, incapacitating the system.

This agrees with my hypothesis - the pseudo-code in the question is an incorrect characterization of the pseudo-code in the paper.


Another reference on the same subject (found via a Google search 'att crash 1990 4ess') says:

Error Description

What was reported in ACM's Software Engineering Notes [Reference 2] is that the software defect was traced to an elementary programming error, which is described as follows:

In the offending "C" program text there was a construct of the form: [Erratic indentation as in original]

/* ``C'' Fragment to Illustrate AT&T Defect */   
do {

      switch expression {

          ...

                case (value):

                        if (logical) {
                                sequence of statements
                                        break
                        }
                        else
                        {
                                another sequence of statements
                        }
                        statements after if...else statement
                }

                statements after case statement

        } while (expression)

        statements after do...while statement

Programming Mistake Described

The mistake is that the programmer thought that the break statement applied to the if statement in the above passage, was clearly never exercised. If it had been, then the testers would have noticed the abnormal behavior and would have been able to corr [sic]

The only caveat to this statement is the following: it is possible that tests applied to the code contain information which would reveal the error; however, if the testers do not examine the output and notice the error, then the deficiency is not with th [sic]

In the case of a misplaced break statement, it is very likely that the error would have been detected.

References

  1. "Can We Trust Our Software?", Newsweek, 29 January 1990.

  2. ACM SIGSOFT, Software Engineering Notes, Vol. 15, No. 2, Page 11ff, April 1990.


Apparently, the programmer really did just think that break would end the if statement; it was a small mental blackout that led to a large real-world blackout.


If I understand it right, the else block where the incriminated break statement occurs is merely part of that "one line bug" as it's called before1. I don't see any good reason for that else to exist there, unless those "certain types of messages" that received optimization were thought be the only occurrence of a non-empty buffer while processing a message. The description you linked misses good deals of domain knowledge, without which I at least cannot fully understand that piece of code. I'll try anyway to give an explanation.

As break statements can only refer to a switch or a loop, I can assume that:

hypothesis #1

the original coder intended to "speed processing of certain types of messages" by cutting the while statement with such a break. However, the nesting misled the guy and let him oversee that the switch statement and not the while was to be affected by the break.

hypothesis #2

the original coder really intended to quickly end the switch statement, but put that break too early and forgot to eventually update pointers to optional parameters, e.g. marking somehow that no optional parameters were provided with the current message.

  1. I would thus call it "two lines bug"
0

精彩评论

暂无评论...
验证码 换一张
取 消