Contents | Prev | Next | The T Language Specification, Version 2 Spring 2006 |
CHAPTER 1
This chapter specifies the structure of the language.
Programs are written in ASCII characters. Line terminators are defined (§1.1) to support the different conventions of existing host systems while maintaining consistent line numbers.
The ASCII characters are reduced to a sequence of input elements (§1.2), which are white space (§1.3), comments (§1.4), and tokens. The tokens are the identifiers (§1.5), keywords (§1.6), literals (§1.7), separators (§1.8), and operators (§1.9) of the syntactic grammar.
Implementation divides the sequence of ASCII characters into lines by recognizing line terminators. (This definition of lines determines the line numbers produced). It also specifies the termination of the //
form of a comment.
LineTerminator:the ASCII LF character, also known as "newline"
the ASCII CR character, also known as "return"
the ASCII CR character followed by the ASCII LF character
InputCharacter: ASCIICharacterbut not CR or LF
Lines are terminated by the ASCII characters CR
, or LF
, or CR LF
. The two characters CR
immediately followed by LF
are counted as one line terminator, not two.
The result is a sequence of line terminators and input characters, which are the terminal symbols for the third step in the tokenization process.
The input characters and line terminators are reduced to a sequence of input elements. Those input elements that are not white space (§1.3) or comments (§1.4) are tokens.
This process is specified by the following productions:
Input: InputElementsopt InputElements: InputElement InputElements InputElement InputElement: WhiteSpace Comment Token Token: Identifier Keyword Literal Separator Operator
White space is defined as the ASCII space, horizontal tab, and form feed characters, as well as line terminators.
WhiteSpace:the ASCII SP character, also known as "space"
the ASCII HT character, also known as "horizontal tab"
the ASCII FF character, also known as "form feed"
LineTerminator
The comment can be of the following form.
// text
All the text from the ASCII // characters to the LineTerminator will be ignored.
EndOfLineComment:
/ /
CharactersInLineopt LineTerminator
CharactersInLine:
InputCharacter
CharactersInLine InputCharacter
An identifier is an unlimited-length sequence of letters and digits, the first of which must be a letter. An identifier cannot have the same spelling (ASCII character sequence) as a keyword (§1.6), or the null literal (§1.7.2).
Identifier: IdentifierCharsbut not
a Keyword or NullLiteral IdentifierChars: Letter IdentifierChars LetterOrDigit Letter:any ASCII character that is a letter (see below)
LetterOrDigit:any ASCII character that is a letter or digit (see below)
The letters include uppercase and lowercase ASCII Latin letters A-Z (0x41-0x5a), and a-z (0x61-0x7a), and the ASCII underscore (_, or 0x5f). The digits include the ASCII digits 0-9 (0x30-0x39).
Two identifiers are the same only if they are identical, that is, have the same ASCII character for each letter or digit.
The following character sequences, formed from ASCII letters, are reserved for use as keywords and cannot be used as identifiers.
Keyword: one of
class
delete
else
extends
if
int
main
new
out
return
super
this
while
A literal is the source code representation of a value of an integer type or a null type.
Literal: IntegerLiteral NullLiteral
An integer literal should be expressed in decimal (base 10).
IntegerLiteral: DecimalNumeral
A decimal numeral consists of an ASCII digit from 0 to 9, optionally followed by one or more ASCII digits from 0 to 9, representing a positive integer.
DecimalNumeral: Digits Digits: Digit Digits Digit Digit:one of
0 1 2 3 4 5 6 7 8 9
An integer literal is of type int
(§2.2).
The largest decimal literal is 2147483648 (231). All decimal literals from 0 to 2147483647 may appear anywhere an integer literal may appear, but the literal 2147483648 may appear only as the operand of the unary negation operator "-
" .
A compile-time error occurs if a decimal literal is larger than 2147483648 (231), or if the literal 2147483648 appears anywhere other than as the operand of the unary "-
" operator.
The null type has one value, the null reference, represented by the literal null, which is formed from ASCII characters. A null literal is always of the null type.
NullLiteral:
null
The following nine ASCII characters are the separators (punctuators):
Separator:one of
( ) { } [ ] ; , . ~
The following 9 tokens are the operators, formed from ASCII characters:
=
==
+
>
-
!
/
<
*
Contents | Prev | Next | The T Language Specification, Version 2 Spring 2006 |
Author(s): Prabesh Devkota (§1.1-1.9)