I am working on a basic C code compiler using Yacc & Lex files. I am having an issue when I am handling the case when there is a singular if statement, which is only running one line and it does not have an else block. The issue is caused by how I run the detection for ';' semicolon at the end of a statement. I tried to add a fix, but it does not run since I cannot make "%pred" work with my code.
%{
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<ctype.h>
#include"lex.yy.c"
void yyerror(const char *s);
int yylex();
int yywrap();
%}
%token VOID CHARACTER PRINTF SCANF INT FLOAT CHAR FOR WHILE IF ELSE TRUE FALSE
%token NUMBER FLOAT_NUM ID LE GE EQ NE GT LT AND OR STR ADD MULTIPLY
%token DIVIDE SUBTRACT UNARY INCLUDE RETURN
%%
program : headers main '(' ')' '{' body return_stmt '}' {
printf("Parsing: program\n");
printf("program compiled successfully_no_of_lines %d\n", countn);
};
headers : headers INCLUDE
| INCLUDE
;
main : datatype ID { printf("Parsing: main function\n"); }
;
datatype : INT
| FLOAT
| CHAR
| VOID
;
body : statement ';' body
| FOR '(' statement ';' condition ';' statement ')' '{' body '}' body { printf("Parsing: for loop\n"); }
| WHILE '(' condition ')' '{' body '}' body { printf("Parsing: while loop\n"); }
| IF '(' condition ')' '{' body '}' else_part body { printf("Parsing: if statement\n"); }
| /* empty */
;
scanf_var_list: '&' ID ',' scanf_var_list
| '&' ID { printf("Parsing: variable list scanf\n");}
;
var_list : ID ',' var_list
| ID { printf("Parsing: variable list printf\n");}
;
else_part : ELSE '{' body '}' { printf("Parsing: else statement\n"); }
| ELSE statement { printf("Parsing: else statement\n"); }
|
;
condition : value relop value
| TRUE
| FALSE
;
statement : PRINTF '(' STR ')' { printf("Parsing: string print\n"); }
| PRINTF '(' STR ',' var_list ')' { printf("Parsing: variable print\n"); }
| SCANF '(' STR ',' scanf_var_list ')' { printf("Parsing: scan function\n"); }
| FOR '(' statement ';' condition ';' statement ')' statement { printf("Parsing: for loop\n"); }
| WHILE '(' condition ')' statement { printf("Parsing: while loop\n"); }
| IF '(' condition ')' statement else_part { printf("Parsing: if else statement\n"); }
| datatype ID init var_list { printf("Parsing: declaration\n"); }
| ID '=' expression { printf("Parsing: assignment\n"); }
| ID relop expression
| ID UNARY
| UNARY ID
|
;
pb : ';'
|
;
var_list : ',' ID init var_list
|
;
init : '=' value
|
;
expression : value
| expression arithmetic value
;
arithmetic : ADD
| SUBTRACT
| MULTIPLY
| DIVIDE
;
relop : LT
| GT
| LE
| GE
| EQ
| NE
;
value : NUMBER
| FLOAT_NUM
| CHARACTER
| ID
;
return_stmt : RETURN value ';' { printf("Parsing: return statement\n"); }
| RETURN ';' /* Allow empty return */
| /* empty */
;
%%
int main() {
yyparse();
return 0; /* Ensure valid C main function */
}
void yyerror(const char* msg) {
printf("Syntax Error: %s\n", msg);
}
I am working on a basic C code compiler using Yacc & Lex files. I am having an issue when I am handling the case when there is a singular if statement, which is only running one line and it does not have an else block. The issue is caused by how I run the detection for ';' semicolon at the end of a statement. I tried to add a fix, but it does not run since I cannot make "%pred" work with my code.
%{
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<ctype.h>
#include"lex.yy.c"
void yyerror(const char *s);
int yylex();
int yywrap();
%}
%token VOID CHARACTER PRINTF SCANF INT FLOAT CHAR FOR WHILE IF ELSE TRUE FALSE
%token NUMBER FLOAT_NUM ID LE GE EQ NE GT LT AND OR STR ADD MULTIPLY
%token DIVIDE SUBTRACT UNARY INCLUDE RETURN
%%
program : headers main '(' ')' '{' body return_stmt '}' {
printf("Parsing: program\n");
printf("program compiled successfully_no_of_lines %d\n", countn);
};
headers : headers INCLUDE
| INCLUDE
;
main : datatype ID { printf("Parsing: main function\n"); }
;
datatype : INT
| FLOAT
| CHAR
| VOID
;
body : statement ';' body
| FOR '(' statement ';' condition ';' statement ')' '{' body '}' body { printf("Parsing: for loop\n"); }
| WHILE '(' condition ')' '{' body '}' body { printf("Parsing: while loop\n"); }
| IF '(' condition ')' '{' body '}' else_part body { printf("Parsing: if statement\n"); }
| /* empty */
;
scanf_var_list: '&' ID ',' scanf_var_list
| '&' ID { printf("Parsing: variable list scanf\n");}
;
var_list : ID ',' var_list
| ID { printf("Parsing: variable list printf\n");}
;
else_part : ELSE '{' body '}' { printf("Parsing: else statement\n"); }
| ELSE statement { printf("Parsing: else statement\n"); }
|
;
condition : value relop value
| TRUE
| FALSE
;
statement : PRINTF '(' STR ')' { printf("Parsing: string print\n"); }
| PRINTF '(' STR ',' var_list ')' { printf("Parsing: variable print\n"); }
| SCANF '(' STR ',' scanf_var_list ')' { printf("Parsing: scan function\n"); }
| FOR '(' statement ';' condition ';' statement ')' statement { printf("Parsing: for loop\n"); }
| WHILE '(' condition ')' statement { printf("Parsing: while loop\n"); }
| IF '(' condition ')' statement else_part { printf("Parsing: if else statement\n"); }
| datatype ID init var_list { printf("Parsing: declaration\n"); }
| ID '=' expression { printf("Parsing: assignment\n"); }
| ID relop expression
| ID UNARY
| UNARY ID
|
;
pb : ';'
|
;
var_list : ',' ID init var_list
|
;
init : '=' value
|
;
expression : value
| expression arithmetic value
;
arithmetic : ADD
| SUBTRACT
| MULTIPLY
| DIVIDE
;
relop : LT
| GT
| LE
| GE
| EQ
| NE
;
value : NUMBER
| FLOAT_NUM
| CHARACTER
| ID
;
return_stmt : RETURN value ';' { printf("Parsing: return statement\n"); }
| RETURN ';' /* Allow empty return */
| /* empty */
;
%%
int main() {
yyparse();
return 0; /* Ensure valid C main function */
}
void yyerror(const char* msg) {
printf("Syntax Error: %s\n", msg);
}
Share
Improve this question
asked Mar 23 at 9:23
ladsadladsad
232 bronze badges
3
- 1 The problem is that a correct C language parser is much more complex than that. If you look at a draft (you can find them here), you will see that they do not define a statement as being terminated with a semi-colon, but a a list of possible statements including the compound statements. Furthermore, a condition can be any expression (a function call, a single variable, ...). It is not an accident is the first version of Basic had a syntax much simpler than C one... – Serge Ballesta Commented Mar 23 at 10:15
- It appears that by "a basic C compiler", you mean a compiler for a tightly constrained subset of C, as opposed to a bare-bones compiler for the full C language (any version). At least, that's what your grammar seems to show. That's fine, but your description initially surprised and confused me. – John Bollinger Commented Mar 23 at 13:55
- 1 If you check the standard, a lot of the grammar (if not all) is provided! – ikegami Commented Mar 23 at 23:32
1 Answer
Reset to default 2There are numerous issues with the grammar presented. A lot of them, including the one you specifically ask about, seem to stem from losing sight of the grammatical structure of the language among the lexical details. In some places the parser description fails to implement the generalizations it should, and in other places it implements generalizations it shouldn't.
The essential insight required for your particular question is that a brace-enclosed block as the body of an if
statement (or a for
or while
statement) is not a different case for if
than any other. It's all the same when you treat a brace-enclosed block as a kind of statement itself. The C language spec calls it a "compound statement". (Full C has other uses of brace-enclosed sections, too, but it does not appear that your subset language does).
Following that, the next thing you will need to appreciate is that in C, the semicolon does not function as a statement separator. That is, you don't need one between every pair of statements. For example:
if (argc > 1) {
printf("%s\n", argv[1]);
}
// no semicolon here
return 0;
Rather, the semicolon is part of the syntax of some kinds of statements themselves, and not of others.
I have no intention of doing your homework for you, but here are a few pieces of grammar that demonstrate the above points:
statements: /* empty */
| statements statement /* Always use left recursion with yacc, not right recursion */
;
statement : expression ';' /* The ';' is part of this kind of statement */
| WHILE '(' condition ')' statement /* No ';' in this kind of statement itself */
/* ... */
| '{' statements '}' /* A brace enclosed list of statements is a statement itself */
;
I leave it to you to determine how (and whether) to apply something like that to your own parser, including how to use it for if
statements, which are a bit more complex than while
statements.