Lex Strings |
Quoted strings frequently appear in programming languages. Here is one way to match a string in lex:
%{
char *yylval;
#include <string.h>
%}
%%
\"[^"\n]*["\n] {
yylval = strdup(yytext+1);
if (yylval[yyleng-2] != '"')
warning("improperly terminated string");
else
yylval[yyleng-2] = 0;
printf("found '%s'\n", yylval);
}
The above example ensures that strings don't cross line boundaries and removes enclosing
quotes. If we wish to add escape sequences, such as "\n", start states simplify
matters:
%{
char buf[100];
char *s;
%}
%x STRING
%%
\" { BEGIN STRING; s = buf; }
<STRING>\\n { *s++ = '\n'; }
<STRING>\\t { *s++ = '\t'; }
<STRING>\\\" { *s++ = '\"'; }
<STRING>\" {
*s = 0;
BEGIN 0;
printf("found '%s'\n", buf);
}
<STRING>\n { printf("invalid string"); exit(1); }
<STRING>. { *s++ = *yytext; }
Exclusive start state STRING is defined in the definition section. When the scanner
detects a quote the BEGIN macro shifts lex into the STRING state. Lex
stays in the STRING state and recognizes only patterns that begin with <STRING> until another BEGIN is executed. Thus we have a
mini-environment for scanning strings. When the trailing quote is recognized we switch back to
initial state 0.