| Contents | Prev | Next | The T Language Specification, Version 2 Spring 2006 |
CHAPTER 1
This chapter specifies the structure of the language.
Programs are written in ASCII characters. Line terminators are defined (§1.1) to support the different conventions of existing host systems while maintaining consistent line numbers.
The ASCII characters are reduced to a sequence of input elements (§1.2), which are white space (§1.3), comments (§1.4), and tokens. The tokens are the identifiers (§1.5), keywords (§1.6), literals (§1.7), separators (§1.8), and operators (§1.9) of the syntactic grammar.
Implementation divides the sequence of ASCII characters into lines by recognizing line terminators. (This definition of lines determines the line numbers produced). It also specifies the termination of the // form of a comment.
LineTerminator:
the ASCII LF character, also known as "newline"
the ASCII CR character, also known as "return"
the ASCII CR character followed by the ASCII LF character
InputCharacter:
ASCIICharacter but not CR or LF
Lines are terminated by the ASCII characters CR, or LF, or CR LF. The two characters CR immediately followed by LF are counted as one line terminator, not two.
The result is a sequence of line terminators and input characters, which are the terminal symbols for the third step in the tokenization process.
The input characters and line terminators are reduced to a sequence of input elements. Those input elements that are not white space (§1.3) or comments (§1.4) are tokens.
This process is specified by the following productions:
Input:
InputElementsopt
InputElements:
InputElement InputElements
InputElement
InputElement:
WhiteSpace
Comment
Token
Token:
Identifier
Keyword
Literal
Separator
Operator
White space is defined as the ASCII space, horizontal tab, and form feed characters, as well as line terminators.
WhiteSpace:
the ASCII SP character, also known as "space"
the ASCII HT character, also known as "horizontal tab"
the ASCII FF character, also known as "form feed"
LineTerminator
The comment can be of the following form.
// text
All the text from the ASCII // characters to the LineTerminator will be ignored.
EndOfLineComment:
/ / CharactersInLineopt LineTerminator
CharactersInLine:
InputCharacter
CharactersInLine InputCharacter
An identifier is an unlimited-length sequence of letters and digits, the first of which must be a letter. An identifier cannot have the same spelling (ASCII character sequence) as a keyword (§1.6), or the null literal (§1.7.2).
Identifier:
IdentifierChars but not a Keyword or NullLiteral
IdentifierChars:
Letter
IdentifierChars LetterOrDigit
Letter:
any ASCII character that is a letter (see below)
LetterOrDigit:
any ASCII character that is a letter or digit (see below)
The letters include uppercase and lowercase ASCII Latin letters A-Z (0x41-0x5a), and a-z (0x61-0x7a), and the ASCII underscore (_, or 0x5f). The digits include the ASCII digits 0-9 (0x30-0x39).
Two identifiers are the same only if they are identical, that is, have the same ASCII character for each letter or digit.
The following character sequences, formed from ASCII letters, are reserved for use as keywords and cannot be used as identifiers.
Keyword: one of
break
class
continue
delete
else
extends
if
int
main
new
out
return
super
this
while
A literal is the source code representation of a value of an integer type or a null type.
Literal:
IntegerLiteral
NullLiteral
An integer literal should be expressed in decimal (base 10).
IntegerLiteral:
DecimalNumeral
A decimal numeral consists of an ASCII digit from 0 to 9, optionally followed by one or more ASCII digits from 0 to 9, representing a positive integer.
DecimalNumeral:
Digits
Digits:
Digit
Digits Digit
Digit: one of
0 1 2 3 4 5 6 7 8 9
An integer literal is of type int (§2.2).
The largest decimal literal is 2147483648 (231). All decimal literals from 0 to 2147483647 may appear anywhere an integer literal may appear, but the literal 2147483648 may appear only as the operand of the unary negation operator "-" .
A compile-time error occurs if a decimal literal is larger than 2147483648 (231), or if the literal 2147483648 appears anywhere other than as the operand of the unary "-" operator.
The null type has one value, the null reference, represented by the literal null, which is formed from ASCII characters. A null literal is always of the null type.
NullLiteral:
null
The following nine ASCII characters are the separators (punctuators):
Separator:one of( ) { } [ ] ; , . ~
The following 9 tokens are the operators, formed from ASCII characters:
=
==
+
>
-
!
/
<
*
| Contents | Prev | Next | The T Language Specification, Version 2 Spring 2006 |
Author(s): Prabesh Devkota (§1.1-1.9)