Wednesday, 3 May 2023

A revised lexical scanner that can tokenize simple mathematical expressions

Introduction

Here is a revised version of the finite state machine (FSM) for simple mathematical expressions proposed previously. It corrects the bug found before and recognizes negative signs as well. However, names for mathematical functions such as sine and logarithms still cannot be identified by the algorithm.

Example for valid expressions

17-5*28.9, 0.2736+2.1^0.333, -2+7^(2.5/33.86), -(2-5.9)/70.24*(5-(0.18+4.9)), -(-(2+5.9)*4)/9.3*(5-(0.2+6)).

Example for detecting invalid expressions

3+, -4*(5-+2.7/6), (3+4.2).52, (3+4)52.

Remarks

  1. A class for the finite state machine is given so that a FSM can easily be created by submitting it with a human readable script that includes states and transition arcs. The above FSM can be created by the script below:
    eng = FSM()
    # states
    [A,B,C,D,E,F] = ['A','B','C','D','E','F']
    eng.initialState = A    # initial state
    eng.finalStates = [C,D,E,F]  # final states
    # transition arcs
    # (startState, tokenList, endState, action)
    eng.addArc(A,'-('        ,A,FSM.storeToken)
    eng.addArc(A,'123456789' ,C,FSM.buildToken)
    eng.addArc(A,'0'         ,D,FSM.buildToken)
    eng.addArc(B,'('         ,B,FSM.storeToken)
    eng.addArc(B,'123456789' ,C,FSM.buildToken)
    eng.addArc(C,'+-*/^'     ,B,FSM.storeToken)
    eng.addArc(C,'0123456789',C,FSM.buildToken)
    eng.addArc(C,')'         ,F,FSM.storeToken)
    eng.addArc(C,'.'         ,E,FSM.buildToken)
    eng.addArc(B,'0'         ,D,FSM.buildToken)
    eng.addArc(D,'+-*/^'     ,B,FSM.storeToken)
    eng.addArc(D,'.'         ,E,FSM.buildToken)
    eng.addArc(E,'0123456789',E,FSM.buildToken)
    eng.addArc(E,')'         ,F,FSM.storeToken)
    eng.addArc(E,'+-*/^'     ,B,FSM.storeToken)
    eng.addArc(F,')'         ,F,FSM.storeToken)
    eng.addArc(F,'+-*/^'     ,B,FSM.storeToken)
    
  2. Minus sign is accepted by this algorithm. Expressions such as -3, -(4+5) and -(-(-0.278+5)*3) are valid.
  3. Expression such as .3 is considered as invalid, not 0.3.
  4. Pair check for brackets is not yet implemented as the function of the algorithm is to tokenize expressions. Whether an expression is mathematically correct or not is the responsibility of later stages of parsing.

Source Code

Output

Prolog Version

A prolog version of the above FSM is given below. Only the states and transactions are implemented, the actions of storing tokens and building tokens are not implemented. Furthermore, whether the exit state of an expression is one of the valid final states is not checked by the script.

Source Code

Output (Prolog version)

No comments:

Post a Comment