SQA Higher Haggis Context Free Syntax Richard Connor∗ January 7, 2014 1 Introduction Please note that at time of writing (January 7, 2014) this document is still a draft and should not be taken as authoritative. This formal specification goes together with a syntax checker available here1 . The syntax checker there is implemented directly from, and versionstepped with, this definition, and should therefore be completely consistent. This document gives lexical rules and a context-free syntax defined in BNF. A context-sensitive syntax and operational semantics are defined separately2 . As the current specification of Haggis (both in SQA documentation and as defined by Michaelson and Cutts3 ) is strictly defined as a pseudocode language, some minor restrictions have had to be made to the language in order to render it suitable for formal specification. These should not impact negatively on any pseudocode use, whilst also allowing mechanised syntax checking. 2 Lexical Rules and Microsyntax Reserved words (keywords) are those defined as terminal symbols in any version of the Haggis language. No reserved word may be used as an identifier, ∗ mailto:[email protected] http://haggis4sqa.appspot.com/haggisParser.html?version=higher 2 work in progress. . . 3 http://haggis4sqa.appspot.com/haggisDocs/Haggis_2_2.pdf 1 1 even those which are not defined in this version. All reserved words are in uppercase; it is therefore safest, and good practice, not to use uppercase for user-defined identifiers. The boolean literal values true and false , and the operator mod , may not be used as user-defined identifiers. hIdentifieris must start with an uppercase or lowercase Roman letter, and may be followed by any contiguous sequence of letters, digits, underscores, hyphens and full stops. hIntegerLiteralsis comprise any sequence of digits; leading zeros are allowed. hFloatLiteralis comprise two hIntegerLiteralis separated by a full stop. hBooleanLiteralis comprise the characters sequences true and false . hStringLiteralis comprise any sequence of Unicode characters4 , on a single line, between two inverted commas. In this context, “inverted comma” means the ascii character number 22 (Unicode 0022), the character which is typically entered into a browser text area when the inverted comma key is pressed. Other Unicode characters for inverted commas, notably Unicode characters 201C and 201D often used by word processors for left and right inverted commas respectively (“smart quotes”), may not be used, but may be included in string literals. Comments are allowed within a single line only, starting with # . This character, and any following it, are disregarded up until the end of the line which contains it. Comments starting with an asterisk, i.e. #∗ , are generated by the checking system and, whilst valid comments, will be removed when the program is checked via the implemented interface. An elision, consisting of any characters (other than > ) contained within < · · · > , within a single line, may be used in place of any command or toplevel expression. Elision may not be used as a sub-expression: for example SET x TO < elision > is allowed, but SET x TO 3 + < elision > is not, to avoid potential confusion with the < and > operators. Also for this reason, and to promote good layout practice, there is a further lexical rule 4 Inverted commas and backslashes can be included using \ as an “escape” character i.e. \” and \\ respectively. 2 that infix operators must be placed on the same line as their first operand, although the second operand may appear after a line break. Unicode characters, such as 6=, are not yet allowed in this system5 to ensure maximum compatibility. For this reason the checking system is defined in terms of ascii characters, so for example the symbol pair ! = is used in place of 6=, >= in place of ≥ etc. 3 Context Free Syntax A standard pure BNF notation is used here; note that there is no use of meta-symbols other than ::= within productions, so any character appearing outside angle brackets is a terminal symbol of the language. 3.1 Command Sequences A program is a sequence of commands. hProgrami hSequencei hSequencei ::= ::= ::= hSequencei hCommandi hCommandi; hSequencei A lexical rule allows the elision of the semicolon in any context where this coincides with a new line, and thus semicolons are only actually required when more than one command is joined on the same line. It is expected that most programs will not contain semicolons, and this character is used mostly for convenience in the formal specification. 5 although they are a part of the language definition 3 3.2 Commands hCommandi hCommandi hCommandi hCommandi hCommandi hCommandi hCommandi hCommandi ::= ::= ::= ::= ::= ::= ::= ::= hAssignmenti hConditionali hRepetitioni hIterationi hInputOutputCommandi hSubprogramDeclarationi hProcedureCalli An empty command is allowed, inductively allowing an empty sequence. 3.2.1 Assignment hAssignmenti ::= SET hLocationi TO hExpressioni Assignment may be made to any location, where a hLocationi is either a locally declared variable or an element of an array. This syntax is overloaded for both the introduction of new variables, and the updating of existing variables. When it is used to introduce a new variable, hLocationi should be an identifier6 . 3.2.2 Conditional Execution hConditionali hCondBodyi hCondBodyi 3.2.3 IF hExpressionihCondBodyi END IF THEN hSequencei THEN hSequencei ELSE hSequencei Repetition hRepetitioni hRepetitioni 6 ::= ::= ::= ::= ::= WHILE hExpressioni DO hSequencei END WHILE REPEAT hSequencei UNTIL hExpressioni END REPEAT this is not formally defined in the context-free syntax 4 3.2.4 Iteration hIterationi hIterationi hRangei hIterationi hIterationi 3.2.5 ::= ::= ::= ::= ::= REPEAT hExpressioni TIMES hSequencei END REPEAT FOR hIdentifierihRangei DO hSequencei END FOR FROM hExpressioni TO hExpressioni FOR EACH hIdentifieri FROM hExpressioni DO hSequencei END FOR EACH FOREACH hIdentifieri FROM hExpressioni DO hSequencei END FOREACH Input and Output hInputOutputCommandi hInputOutputCommandi hInputOutputCommandi hInputi hOutputi hFileCommandi hFileCommandi hFileCommandi 3.2.6 ::= ::= ::= ::= ::= ::= ::= ::= hInputi hOutputi hFileCommandi RECEIVE hLocationi FROM ( hTypei ) hExpressioni SEND hExpressioni TO hExpressioni OPEN hExpressioni CREATE hExpressioni CLOSE hExpressioni Subprogram Declaration hSubprogramDeclarationi hSubprogramDeclarationi hProcedurei hFunctioni ::= ::= ::= ::= hProcedurei hFunctioni PROCEDURE hFormalParametersihProcedureBodyi END PROCEDURE hTypei FUNCTION hFormalParametersihFunctionBodyi END FUNCTION 5 hFormalParametersi hFormalParametersi ::= ::= ( ) ( hFormalParameterListi ) hFormalParameterListi hFormalParameterListi hFormalParameteri hFormalParameteri ::= ::= ::= ::= hFormalParameteri hFormalParameteri , hFormalParameterListi hIdentifieri hTypeihIdentifieri hProcedureBodyi hFunctionBodyi ::= ::= hSequencei hSequencei; RETURN hExpressioni 3.2.7 Procedure Call hProcedureCalli hActualParametersi hActualParametersi hActualParameterListi hActualParameterListi 3.3 hIdentifierihActualParametersi ( ) ( hActualParameterListi ) hExpressioni hExpressioni , hActualParameterListi ::= ::= hSequencei of hTypei hBaseTypei Types hTypei hTypei 3.3.1 ::= ::= ::= ::= ::= Structured Types hSequencei hTypei ::= ::= 6 ARRAY STRING 3.3.2 Base Types hBaseTypei hBaseTypei hBaseTypei hBaseTypei 3.4 ::= ::= ::= ::= INTEGER REAL BOOLEAN CHARACTER Expressions Two versions of expression syntax are given here. First, an ambiguous, easyto-read version is given. Second, for the cognoscenti (and to guarantee a faithful parser implementation!) a disambiguated version, effectively defining operator precedence for the first version, is also given. In most cases, the ambiguous version along with well-established precedence rules is sufficient to understand the language, but the latter version should be read as the formal definition of the language. 3.4.1 Ambiguous Expressions hExpressioni hExpressioni hExpressioni hExpressioni hExpressioni 3.4.2 ::= ::= ::= ::= ::= hExpressionihExpOpihExpressioni hExpressionihRelOpihExpressioni hExpressionihMultOpihExpressioni hExpressionihAddOpihExpressioni hBaseExpressioni Disambiguated Expressions hExpressioni hExpressioni hExpression1i hExpression1i hExpression2i hExpression2i hExpression3i hExpression3i ::= ::= ::= ::= ::= ::= ::= ::= hExpression1i hExpression1ihRelOpihExpression1i hExpression2i hExpression2ihAddOpihExpression2i hExpression3i hExpression3ihMultOpihExpression3i hBaseExpressioni hBaseExpressionihExpOpihBaseExpressioni 7 3.4.3 Operators hLogicalOpi hLogicalOpi ::= ::= AND OR hRelOpi hRelOpi hRelOpi hRelOpi hRelOpi hRelOpi ::= ::= ::= ::= ::= ::= = 6= < ≤ > ≥ hAddOpi hAddOpi ::= ::= + − hMultOpi hMultOpi hMultOpi hMultOpi ::= ::= ::= ::= ∗ / mod & hExpOpi ::= ˆ 8 3.4.4 Base Expressions hBaseExpressioni hBaseExpressioni hBaseExpressioni hBaseExpressioni hBaseExpressioni hBaseExpressioni hBaseExpressioni hBaseExpressioni hBaseExpressioni hBaseExpressioni ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= hUnaryOpExpi hLocationi hFunctionCalli ( hExpressioni ) hSequenceLiterali hIntegerLiterali hFloatLiterali hStringLiterali hBooleanLiterali hKeywordLiterali hUnaryOpExpi hUnaryOpExpi ::= ::= − hBaseExpressioni NOT hBaseExpressioni hLocationi hLocationi hArrayDereferencei ::= ::= ::= hIdentifieri hArrayDereferencei hExpressioni [ hExpressioni ] hFunctionCalli ::= hIdentifierihActualParametersi hSequenceLiterali hSequenceLiterali hValueListi hValueListi ::= ::= ::= ::= [ ] [ hValueListi ] hExpressioni hExpressioni , hValueListi hKeywordLiterali hKeywordLiterali ::= ::= KEYBOARD DISPLAY 9
© Copyright 2026 Paperzz