Grammar Definition¶
This chapter will give some grammar definitions related to Berry. We use Extended Backus Normal Form (EBNF) to define or express grammar. We did not use strict EBNF grammar to define, but made a lot of simplifications, but these simplifications will not affect readers’ understanding of the grammar.
The EBNF definition of Berry language grammar is as follows:
(* program define *)
program = block;
(* block define *)
block = {statement};
(* statement define *)
statement = class_stmt | func_stmt | var_stmt | if_stmt | while_stmt |
for_stmt | break_stmt | return_stmt | expr_stmt | import_stmt |
try_stmt | throw_stmt | ';';
if_stmt = 'if' expr block {'elif' expr block} ['else' block] 'end';
while_stmt = 'while' expr block 'end';
for_stmt = 'for' ID ':' expr block 'end';
break_stmt = 'break' | 'continue';
return_stmt = 'return' [expr];
(* function define statement *)
func_stmt = 'def' ID func_body;
func_body = '(' [arg_field {',' arg_field}] ')' block 'end';
arg_field = ['*'] ID;
(* class define statement *)
class_stmt = 'class' ID [':' ID] class_block 'end';
class_block = {'var' ID {',' ID} | 'static' ['var'] ID ['=' expr] {',' ID ['=' expr] } | 'static' func_stmt | func_stmt};
import_stmt = 'import' (ID (['as' ID] | {',' ID}) | STRING 'as' ID);
(* exceptional handling statement *)
try_stmt = 'try' block except_block {except_block} 'end';
except_block = except_stmt block;
except_stmt = 'except' (expr {',' expr} | '..') ['as' ID [',' ID]];
throw_stmt = 'raise' expr [',' expr];
(* variable define statement *)
var_stmt = 'var' ID ['=' expr] {',' ID ['=' expr]};
(* expression define *)
expr_stmt = expr [assign_op expr];
expr = suffix_expr | unop expr | expr binop expr | range_expr | cond_expr;
cond_expr = expr '?' expr ':' expr; (* conditional expression *)
assign_op = '=' | '+=' | '-=' | '*=' | '/=' |
'%=' | '&=' | '|=' | '^=' | '<<=' | '>>=';
binop = '<' | '<=' | '==' | '!=' | '>' | '>=' | '||' | '&&' |
'<<' | '>>' | '&' | '|' | '^' | '+' | '-' | '*' | '/' | '%';
range_expr = expr '..' [expr]
unop = '-' | '!' | '~';
suffix_expr = primary_expr {call_expr | ('.' ID) | '[' expr ']'};
primary_expr = '(' expr ')' | simple_expr | list_expr | map_expr | anon_func | lambda_expr;
simple_expr = INTEGER | REAL | STRING | ID | 'true' | 'false' | 'nil';
call_expr = '(' [expr {',' expr}] ')';
list_expr = '[' {expr ','} [expr] ']';
map_expr = '{' {expr ':' expr ','} [expr ':' expr] '}';
anon_func = 'def' func_body; (* anonymous function *)
lambda_expr = '/' [arg_field {',' arg_field}] | {arg_field}] '->' expr;
The standard EBNF format can be found in related materials. Here is an
explanation of the details that need attention when reading the above
grammar. The symbols that have appeared to the left of the equal sign
are non-terminal symbols, and the others are terminal symbols. The
terminator enclosed in quotation marks ’
is a fixed string, which is
usually a language keyword or operator. There are several terminators
that are inconvenient to describe directly in EBNF: INTEGER
represents the integer face value; REAL
represents the real number
face value; STRING
represents the string literal value; ID
represents the identifier. These terminators can be defined using
regular expressions:
INTEGER
:0x[a-fA-F0-9]+|\d+
.REAL
:(\d+\.?|\.\d)\d*([eE][+-]?\d+)?
.STRING
:"(\\.|[^"])*"|’(\\.|[^’])*’
.ID
:[_a-zA-Z]\w*
The symbols that appear sequentially in the standard EBNF are separated by commas. For intuitiveness, I use spaces to implement the comma function. The vertical bar symbol “|” is pronounced as “or”, it means that the left and right patterns can only match one of them, or has the lowest priority. For example, the grammar a0a1|a2 means either the matching formula a0a1 or the matching a2. The square brackets indicate that the sub-expression inside the parentheses are matched 0 or 1 times, the curly braces indicate that the internal sub-expression is matched 0 or more times, and the parentheses only have the function of taking the internal sub-expression as a whole.
The following is the JSON grammar definition supported by the JSON module in the Berry standard library. The usage of EBNF still complies with the above conventions:
json = value;
value = object | array |
string | number | 'true' | 'false' | 'null';
object = '{' [ string ':' value ] { ',' string ':' value } '}';
array = '[' [json] { ',' json } ']';
Non-terminal symbols string
and number
can also be defined using
regular expressions. http://www.json.org gives the standard grammar of
JSON, which also includes the definitions of string
and number
.
The Berry JSON library’s support for numbers is different from the
standard. The standard JSON numbers must start with “-” or the number
“0-9”, while the Berry JSON library also accepts numbers starting with a
decimal point.