primitives of a programming language_问答_开发者

Which do the concepts control flow, data type, statement, expression and operation belong to? Syntax or semantics?

What is the relation between control flow, data type, statement, expression, operation, function, ...? How a program is built from these primitives level by level?

I would like to understand these primitive concepts and thei开发者_如何学运维r relations in order to figure out what aspects of a new language should one learn.

Thanks and regards!

All of those language elements have both syntax (how it is written) and semantics (how the way it is written corresponds to what it actually means). Control flow determines which statements are executed and when, expressions yield a value and can be made up of functions and other language elements (although the details depend on the programming language). An operation is usually a sequence of statements. The meaning of "function" varies from language to language; in some languages, any operation that can be invoked by name is a function. In other languages, a function is an operation that yields a result (as opposed to a procedure that does not report a result). Some languages also require that functions be non-mutating while procedures can be mutating, although this varies from language to language. Data types encapsulate both data and the operations/procedures/functions that can be operated on that data.

They belong to both worlds:

Syntax will describe which are the operators, which are primitive types (int, float), which are the keywords (return, for, while). So syntax decides which "words" you can use in the programming language. With word I mean every single possible token: = is a token, void is a token, varName12345 is a token that is considered as an identifier, 12.4 is a token considered as a float and so on..
Semantics will describe how these tokens can be combined together inside you language.

For example you will have that while semantics is something like:

WHILE ::= 'while' '(' CONDITION ')' '{' STATEMENTS '}'
CONDITION ::= CONDITION '&&' CONDITION | CONDITION '||' CONDITION | ...
STATEMENTS ::= STATEMENT ';' STATEMENTS | empty_rule

and so on. This is the grammar of the language that describes exactly how the language is structured. So it will be able to decide if a program is correct according to the language semantics.

Then there is a third aspect of the semantics, that is "what does that construct mean?". You can see it as a correspondence between, for example, a for loop and how it is translated into the lower level language needed to be executed.

This third aspect will decide if your program is correct with respect to the allowed operations. Usually you can make a compiler reject many of programs that have no meaning (because they violates the semantic) but to be able to find many different mistakes you will have to introduce a new tool: the type checker that will also check that whenever you do operations they are correct according to the types.

For example you grammar can allow doing varName = 12.4 but the typechecker will use the declaration of varName to understand if you can assign a float to it. (of course we're talking about static type checking)

Those concepts belong to both.

Statements, expressions, control flow operations, data types, etc. have their structure defined using the syntax. However, their meaning comes from the semantics.

When you have defined syntax and semantics for a programming language and its constructs, this basically provides you with a set of building blocks. The syntax is used to understand the structure in the code - usually represented using an abstract syntax tree, or AST. You can then traverse the tree and apply the semantics to each element to execute the program, or generate some instructions for some instruction set so you can execute the code later.