I have confusion regarding COBOL data types. Like in many interviews it is asked to explain the difference betwe开发者_StackOverflow中文版en COMP-3 and COMP... what is the exact difference? what is the meaning of usage modes in COBOL and how is it related to data types?
USAGE
in COBOL describes how a data item is to be used. A few examples
of USAGE are:
- DISPLAY. This identifies an item that may be printed on a terminal or
report. This may or may not be a number (e.g. could be a text value). The
description of the DISPLAY item is given by the PICture clause. For example:
PIC 9(5) USAGE DISPLAY
describes a 5 digit number that may be displayed (printed). OftenUSAGE DISPLAY
is left off because it is implied if missing. - INDEX. This identifies an item used as an index into a table (OCCURS).
- COMPsomething indicates that the data item is to be used in arithmetic operations (i.e. it is a number of some type).
There are various types of numeric item. Two of the most commonly used numeric data types are:
- COMPUTATIONAL or COMP. This is equivalent to BINARY
- COMPUTATIONAL-3 or COMP-3. This is equivalent to PACKED-DECIMAL
COMP (BINARY) data items are generally the most efficient way to perform calculations on data items that represent integer values.
COMP-3 (PACKED-DECIMAL) data items are used in COBOL because they maintain a fixed number of decimal points. All computations lead to a result having the prescribed number of decimal points. This is particularly useful in accounting type operations. Floating point numbers make the number of digits after the decimal point variable (e.g. the decimal point can "float") which is not the way financial operations are usually represented.
You can find a complete list of COMPutational items for IBM Enterprise COBOL here
One of the problems many programmers have when beginning with COBOL is understanding that a COMP item is great for doing math but cannot be displayed (printed) until it is converted into a DISPLAYable item through a MOVE statement. If you MOVE a COMP item into a report or onto a screen it will not present very well. It needs to be moved into a DISPLAY item first.
The other thing that you may want to research a bit more is the relationship between the PICture and the USAGE when defining variables in COBOL. Here is a link to a very good introductory COBOL Tutorial from the University of Limerick.
COBOL really only has two data types: Numbers and strings.
The layout of each field in a COBOL record is precisely specified by a PICTURE
(usually abbreviated PIC
) clause. The most common ones are:
PIC X
for strings.PIC X(100)
means a 100-byte string.PIC 9
for numbers, optionally withS
(sign) orV
(implicit decimal point). For example,PIC S9(7)V99
means a signed number with 7 digits to the left of the implicit decimal point and 2 digits to the right.
Numeric fields can have a USAGE
clause to optimize their storage. The most common USAGE
s are DISPLAY
, COMP
, and COMP-3
.
DISPLAY
stores each digit as a character. For example, PIC 9(4) VALUE 123
stores the number as if it were the string "0123". And PIC 9(4)V99 VALUE 123.45
stores it as "012345". Note that the decimal point is not actually stored.
This is an inefficient format in that it requires 8 bits to represent each digit. But it does have an "optimization" for signed numbers by using half of the last byte to store the sign. Normally, EBCDIC digits all have a high nybble of F, so 0123 is F0 F1 F2 F3. But -0123 is F0 F1 F2 D3; the D indicates negative. C means positive, and F means unsigned (i.e., positive). (Similar formats are used in ASCII versions of COBOL, but not as standardized.)
COMP-3
is binary-coded decimal with trailing sign nybble. PIC 9(3) COMP-3 VALUE 123
becomes the two bytes 12 3F.
COMP
or BINARY
is native binary format, just like short
, int
, or long
in C.
As for deciding which data type to use, it can be made very complicated - BUT - a simple set of guidelines are:
DISPLAY and Edited Zone Decimal should only be used for displaying numerics in a report or sysout. Move COMP and COMP-3 fields to a DISPLAY/Edited field before putting it in a report or to sysout.
COMP - has the fastest calculation speed for integers
COMP-3 (PACKED Decimal) - should be used when decimal positions should be maintained.
COMP and COMP-3 fields can be used together in calculations. The compiler will determeine which field type will be converted (under the covers) to a single common numeric data type - rules based.
As other reply suggests, COMP means big endian binary. COMP-3 is packed decimal- which means one decimal digit is mapped to each nibble.
I am not sure the previous reply got the issue around precision correct though.
PIC S9(9)V9(9) COMP and PIC S9(9)V9(9) COMP-3
Have exactly the same precision. That is part of the ANSI85 standard. It is the job of the compiler and runtime to ensure that the binary representation in the COMP has the appropriate transformations placed upon it to ensure exactly the same results are achieved as would be if usage was display or COMP-3.
IBM mainframe computers have packed decimal calculations in hardware. This is very helpful, because the conversion of decimal to binary scales as n squared n is the length of the number. This means that COMP-3 is every often the fastest format of the mainframe, but is less likely to be on distributed systems. However, this again is not always the case. For example, the Micro Focus native COBOL solution will tend to be faster in COMP-3 than COMP-5 for very large decimal precision (>18 digits) but the reverse for otherwise. The Managed COBOL system from Micro Focus is almost always fastest in COMP (actually, COMP-5 is the best - which is similar to COMP but will have hardware endian rather than enforcing big-endian memory layout).
Finally, my I suggest that for intermediate values and general mathematics, the newer data definitions of binary-long and binary-double are a better choice because then the compiler can make the decisions about how to store and optimize for you.
For more on COBOL on distributed and Managed COBOL check out this knol: http://knol.google.com/k/alex-turner/micro-focus-managed-cobol/2246polgkyjfl/4 and also feel free to look up cobol on facebook :)
To clarify when you would select a particular type and usage for a data item.
Any character data then PIC X(n) of the appropriate size for the string. Shorter strings will be padded with trailing spaces.
Numbers which are seldom used in calculations but are displayed often (e.g. AGE, ZIPCODE, CUSTOMER_NUMBER) then PIC 9(n) USAGE DISPLAY.
Whole numbers used to count things which are used in calculations (e.g. QTY_AVAILABLE) THEN PIC S9(4) COMP. S9(4) is a smallint on most paltforms S9(8) is a 32 bit integer on most platforms.
Currency values used in calculations (e.g. PRICE, DELIVERY_COST, TAX ) then PIC S9(4)V99 COMP or COMP-3. This will enable accounting calculations with the correct rounding.
If platform is an IBM mainframe or similar which has hardware support for packed decimal then choose COMP-3, otherwise COMP is more efficient.
Note that to show COMP values on a scrren or report you must first move it to a DISPLAY type item so "PIC S9(4)V99 COMP" should be moved to a "PIC ---9.99 DISPLAY" item to make it human readable. This would display numbers as " 12.45" and " -123.45".
精彩评论