Practical Uses for the PHP Tokenizer

Devolio

Search

Twitter

    Saturday, August 23. 2008

    Practical Uses for the PHP Tokenizer

    When PHP has to process a request, the engine goes through several passes of parsing until the code is expressed as a set of instructions that the interpreter can execute. The first such step is “lexical scanning”, which splits the code into smaller strings called “tokens”. The token is the smallest meaningful unit of your source code, and it can represent a reserved word (for, while, class, if, etc.), operator (+, -, *, /, && etc.), value literals (integers, floats, strings etc.) and other special symbols.

    The same lexical scanner which PHP uses, is also available to userspace PHP developers via the function token_get_all().

    Trackbacks

    No Trackbacks

    Comments
    Display comments as (Linear | Threaded)

    #1 - Binny V A said:
    2008-09-13 14:18 - (Reply)

    Its very useful - I used it in one of my project. It was a program to extract comments and create documentation using the code.

    First, I tried to do it using Regular expression - but soon the code got real hairy. Then I switched to PHP's tokenizer functions - they where very helpful.

    Take a look at the code...
    http://github.com/binnyva/apiextractor/tree/master

    Just a reminder - its a loooong way from being complete.


    Add Comment

    Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
    Standard emoticons like :-) and ;-) are converted to images.
    E-Mail addresses will not be displayed and will only be used for E-Mail notifications