Contents | Prev | Next The T Language Specification
Spring 2005

CHAPTER 1

Lexical Structure


This chapter specifies the structure of the language.

Programs are written in ASCII characters. Line terminators are defined (§1.1) to support the different conventions of existing host systems while maintaining consistent line numbers.

The ASCII characters are reduced to a sequence of input elements (§1.2), which are white space (§1.3), comments (§1.4), and tokens. The tokens are the identifiers (§1.5), keywords (§1.6), literals (§1.7), separators (§1.8), and operators (§1.9) of the syntactic grammar.

1.1 Line Terminators

Implementation divides the sequence of ASCII characters into lines by recognizing line terminators. (This definition of lines determines the line numbers produced). It also specifies the termination of the // form of a comment.




LineTerminator:

    the ASCII LF character, also known as "newline"

    the ASCII CR character, also known as "return"

    the ASCII CR character followed by the ASCII LF character

	

InputCharacter:

    ASCIICharacter but not CR or LF



Lines are terminated by the ASCII characters CR, or LF, or CR LF. The two characters CR immediately followed by LF are counted as one line terminator, not two.

The result is a sequence of line terminators and input characters, which are the terminal symbols for the third step in the tokenization process.

1.2 Input Elements and Tokens

The input characters and line terminators are reduced to a sequence of input elements. Those input elements that are not white space (§1.3) or comments (§1.4) are tokens.

This process is specified by the following productions:




Input:

    InputElementsopt



InputElements:

    InputElement InputElements

    InputElement



InputElement:

    WhiteSpace

    Comment

    Token



Token:

    Identifier

    Keyword

    Literal

    Separator

    Operator



1.3 White Space

White space is defined as the ASCII space, horizontal tab, and form feed characters, as well as line terminators.




WhiteSpace:

    the ASCII SP character, also known as "space"

    the ASCII HT character, also known as "horizontal tab"

    the ASCII FF character, also known as "form feed"

    LineTerminator



1.4 Comments

The comment can be of the following form.




// text



All the text from the ASCII // characters to the LineTerminator will be ignored.




EndOfLineComment:

    / / CharactersInLineopt LineTerminator



CharactersInLine:

    InputCharacter

    CharactersInLine InputCharacter



1.5 Identifiers

An identifier is an unlimited-length sequence of letters and digits, the first of which must be a letter. An identifier cannot have the same spelling (ASCII character sequence) as a keyword (§1.6), or the null literal (§1.7.2).




Identifier:

    IdentifierChars but not a Keyword or NullLiteral



IdentifierChars:

    Letter

    IdentifierChars LetterOrDigit



Letter:

    any ASCII character that is a letter (see below)



LetterOrDigit:

    any ASCII character that is a letter or digit (see below)



The letters include uppercase and lowercase ASCII Latin letters A-Z (0x41-0x5a), and a-z (0x61-0x7a), and the ASCII underscore (_, or 0x5f). The digits include the ASCII digits 0-9 (0x30-0x39).

Two identifiers are the same only if they are identical, that is, have the same ASCII character for each letter or digit.

1.6 Keywords

The following character sequences, formed from ASCII letters, are reserved for use as keywords and cannot be used as identifiers.




Keyword: one of

    class 

    delete

    else

    extends

    if

    int
	
    main

    new

    out

    return

    super

    this

    while



1.7 Literals

A literal is the source code representation of a value of an integer type or a null type.




Literal: 

    IntegerLiteral

    NullLiteral



1.7.1 Integer Literals

An integer literal should be expressed in decimal (base 10).




IntegerLiteral:

    DecimalNumeral



A decimal numeral consists of an ASCII digit from 0 to 9, optionally followed by one or more ASCII digits from 0 to 9, representing a positive integer.




DecimalNumeral:

    Digits



Digits:

    Digit

    Digits Digit



Digit: one of

    0 1 2 3 4 5 6 7 8 9



An integer literal is of type int (§2.2).

The largest decimal literal is 2147483648 (231). All decimal literals from 0 to 2147483647 may appear anywhere an integer literal may appear, but the literal 2147483648 may appear only as the operand of the unary negation operator "-" .

A compile-time error occurs if a decimal literal is larger than 2147483648 (231), or if the literal 2147483648 appears anywhere other than as the operand of the unary "-" operator.

1.7.2 Null Literal

The null type has one value, the null reference, represented by the literal null, which is formed from ASCII characters. A null literal is always of the null type.




NullLiteral:

    null



1.8 Separators

The following nine ASCII characters are the separators (punctuators):




Separator: one of

    (    )    {    }    [    ]    ;    ,    .    ~



1.9 Operators

The following 9 tokens are the operators, formed from ASCII characters:




=

==

+

>

-

!

/

<

*




Contents | Prev | Next The T Language Specification
Spring 2005

Author(s): Prabesh Devkota (§1.1-1.9)