Lexical analyze is a important part in the processing of compiling. If you just want to learn something about the compiler and try to write a simple compiler by yourself, you do not need write a lexical analyzer by yourself. In computer science, there is a program named Lex which is used to generate lexical analyzers. You can find it on many Unix/Linux system you can install the flex using apt-get in Ubuntu, this tool is writen by Eric Schmidt and Mike Lesk. Lex and the parser generator, such as yacc and bison, are used together. The lex is used to provide the tokens for parser.
Structure of a lex file
Definition section
%%
Rules section
%%
C code section
The definition section is the place to define macros and to include the header files. You also can write any C code here.
The rules section is the most important part in the lex file.Patterns are simply regular expressions. When the lex sees some text in the input matching a given pattern, it would execute the C code which is associated to the pattern.
The C code section contains statements and functions.
There are some internal variables and functions which would be used in the lex source file.
yyin yyout yytext yyleng yylineno (Lex Variables)
yylex() yywrap() yyless(int n) yymore()
Here is an simple example which pick all the number from a file
Then create a file containing string mixing character and number. The string is skdj3244sdkfj234kjsdfkl23kjl12
No comments:
Post a Comment