Tokenizer

The Tokenizer is a helper class aimed at support of syntax highlighting / colorizing of source codes loaded into DOM elements for rendering: <pre>,<code>,<textarea>,<plaintext>.

Intended to be used together with Selection.applyMark() function to mark text runs.

Constants

N/A

Properties

tokenStart
- bookmark, start of token.
tokenEnd
- bookmark, position where token ends.
tag
- string, markup tokenizer only - tag name, valid at #TAG-START and #TAG-END tokens.
attr
- string, markup tokenizer only - attribute name, valid at TAG-ATTR token.
value
- string, token text content or attribute value at TAG-ATTR token.
type
- symbol, either #source or #markup - type of current tokenizer model.
element
- DOM element where parsed token was found.

Methods

this
( element: Element, tokenizerType: symbol [, subType: symbol] ) : Tokenizer

Constructs Tokenizer instance, parameters:

push
( tokenizerType: symbol, until: string [, subType: symbol] ) : this

Pushes sub-tokenizer for the "island" with different syntax. As a rule it is used with base #markup tokenizer to parse content of <style> and <script> elements.

pop
( ) : this

Pops last pushed tokenizer from internal stack. As a rule it is used in response of getting #END-OF-ISLAND token.

token
( ) : symbol

parses input and returns type of token:

elementType
( tag: string ) : (elementType, contentModelType, parsingType)

This static method returns types of markup element known to Sciter by default (this data is defined in its internal tables).

elementType is one of the following values:

const UNKNOWN_TAG = 0;      // unknown 
const INLINE_BLOCK_TAG = 1; // <img>, <input> ...
const BLOCK_TAG = 2;        // <div>,<ul>,<p> ... 
const INLINE_TAG = 3;       // <span>,<b>,<strong> ...
const TABLE_TAG = 4;        // <table>
const TABLE_BODY_TAG = 5;   // <thead>,<tbody>,<tfoot>
const TABLE_ROW_TAG = 6;    // <tr>
const TABLE_CELL_TAG = 7;   // <td>,<th>
const INFO_TAG = 8;         // <link>,<style>,<head> ...  

contentModelType describes type of content normally allowed inside the element:

const CMODEL_BLOCKS = 0;      // Flow elements - blocks and inlines : <div>
const CMODEL_INLINES = 1;     // Phrasing elements - inlines: <p>
const CMODEL_TRANSPARENT = 2; // <del>, <a>, etc. 
const CMODEL_TEXT = 3;        // Only text: <title> 
const CMODEL_TABLE = 4;       // Only table components inside

parsingType describes HTML parsing flavour:

const PMODEL_NORMAL = 0;  // normal head/tail: <div></div> ...
const PMODEL_NO_TAIL = 1; // no tail: <img>, <br>, <hr> ...
const PMODEL_CDATA = 2;   // head/tail, known to contain CDATA inside: <script>,<style>,...