📄 states.py
字号:
# Author: David Goodger# Contact: goodger@users.sourceforge.net# Revision: $Revision: 4258 $# Date: $Date: 2006-01-09 04:29:23 +0100 (Mon, 09 Jan 2006) $# Copyright: This module has been placed in the public domain."""This is the ``docutils.parsers.restructuredtext.states`` module, the core ofthe reStructuredText parser. It defines the following::Classes: - `RSTStateMachine`: reStructuredText parser's entry point. - `NestedStateMachine`: recursive StateMachine. - `RSTState`: reStructuredText State superclass. - `Inliner`: For parsing inline markup. - `Body`: Generic classifier of the first line of a block. - `SpecializedBody`: Superclass for compound element members. - `BulletList`: Second and subsequent bullet_list list_items - `DefinitionList`: Second+ definition_list_items. - `EnumeratedList`: Second+ enumerated_list list_items. - `FieldList`: Second+ fields. - `OptionList`: Second+ option_list_items. - `RFC2822List`: Second+ RFC2822-style fields. - `ExtensionOptions`: Parses directive option fields. - `Explicit`: Second+ explicit markup constructs. - `SubstitutionDef`: For embedded directives in substitution definitions. - `Text`: Classifier of second line of a text block. - `SpecializedText`: Superclass for continuation lines of Text-variants. - `Definition`: Second line of potential definition_list_item. - `Line`: Second line of overlined section title or transition marker. - `Struct`: An auxiliary collection class.:Exception classes: - `MarkupError` - `ParserError` - `MarkupMismatch`:Functions: - `escape2null()`: Return a string, escape-backslashes converted to nulls. - `unescape()`: Return a string, nulls removed or restored to backslashes.:Attributes: - `state_classes`: set of State classes used with `RSTStateMachine`.Parser Overview===============The reStructuredText parser is implemented as a recursive state machine,examining its input one line at a time. To understand how the parser works,please first become familiar with the `docutils.statemachine` module. In thedescription below, references are made to classes defined in this module;please see the individual classes for details.Parsing proceeds as follows:1. The state machine examines each line of input, checking each of the transition patterns of the state `Body`, in order, looking for a match. The implicit transitions (blank lines and indentation) are checked before any others. The 'text' transition is a catch-all (matches anything).2. The method associated with the matched transition pattern is called. A. Some transition methods are self-contained, appending elements to the document tree (`Body.doctest` parses a doctest block). The parser's current line index is advanced to the end of the element, and parsing continues with step 1. B. Other transition methods trigger the creation of a nested state machine, whose job is to parse a compound construct ('indent' does a block quote, 'bullet' does a bullet list, 'overline' does a section [first checking for a valid section header], etc.). - In the case of lists and explicit markup, a one-off state machine is created and run to parse contents of the first item. - A new state machine is created and its initial state is set to the appropriate specialized state (`BulletList` in the case of the 'bullet' transition; see `SpecializedBody` for more detail). This state machine is run to parse the compound element (or series of explicit markup elements), and returns as soon as a non-member element is encountered. For example, the `BulletList` state machine ends as soon as it encounters an element which is not a list item of that bullet list. The optional omission of inter-element blank lines is enabled by this nested state machine. - The current line index is advanced to the end of the elements parsed, and parsing continues with step 1. C. The result of the 'text' transition depends on the next line of text. The current state is changed to `Text`, under which the second line is examined. If the second line is: - Indented: The element is a definition list item, and parsing proceeds similarly to step 2.B, using the `DefinitionList` state. - A line of uniform punctuation characters: The element is a section header; again, parsing proceeds as in step 2.B, and `Body` is still used. - Anything else: The element is a paragraph, which is examined for inline markup and appended to the parent element. Processing continues with step 1."""__docformat__ = 'reStructuredText'import sysimport reimport romanfrom types import TupleTypefrom docutils import nodes, statemachine, utils, urischemesfrom docutils import ApplicationError, DataErrorfrom docutils.statemachine import StateMachineWS, StateWSfrom docutils.nodes import fully_normalize_name as normalize_namefrom docutils.nodes import whitespace_normalize_namefrom docutils.utils import escape2null, unescape, column_widthfrom docutils.parsers.rst import directives, languages, tableparser, rolesfrom docutils.parsers.rst.languages import en as _fallback_language_moduleclass MarkupError(DataError): passclass UnknownInterpretedRoleError(DataError): passclass InterpretedRoleNotImplementedError(DataError): passclass ParserError(ApplicationError): passclass MarkupMismatch(Exception): passclass Struct: """Stores data attributes for dotted-attribute access.""" def __init__(self, **keywordargs): self.__dict__.update(keywordargs)class RSTStateMachine(StateMachineWS): """ reStructuredText's master StateMachine. The entry point to reStructuredText parsing is the `run()` method. """ def run(self, input_lines, document, input_offset=0, match_titles=1, inliner=None): """ Parse `input_lines` and modify the `document` node in place. Extend `StateMachineWS.run()`: set up parse-global data and run the StateMachine. """ self.language = languages.get_language( document.settings.language_code) self.match_titles = match_titles if inliner is None: inliner = Inliner() inliner.init_customizations(document.settings) self.memo = Struct(document=document, reporter=document.reporter, language=self.language, title_styles=[], section_level=0, section_bubble_up_kludge=0, inliner=inliner) self.document = document self.attach_observer(document.note_source) self.reporter = self.memo.reporter self.node = document results = StateMachineWS.run(self, input_lines, input_offset, input_source=document['source']) assert results == [], 'RSTStateMachine.run() results should be empty!' self.node = self.memo = None # remove unneeded referencesclass NestedStateMachine(StateMachineWS): """ StateMachine run from within other StateMachine runs, to parse nested document structures. """ def run(self, input_lines, input_offset, memo, node, match_titles=1): """ Parse `input_lines` and populate a `docutils.nodes.document` instance. Extend `StateMachineWS.run()`: set up document-wide data. """ self.match_titles = match_titles self.memo = memo self.document = memo.document self.attach_observer(self.document.note_source) self.reporter = memo.reporter self.language = memo.language self.node = node results = StateMachineWS.run(self, input_lines, input_offset) assert results == [], ('NestedStateMachine.run() results should be ' 'empty!') return resultsclass RSTState(StateWS): """ reStructuredText State superclass. Contains methods used by all State subclasses. """ nested_sm = NestedStateMachine def __init__(self, state_machine, debug=0): self.nested_sm_kwargs = {'state_classes': state_classes, 'initial_state': 'Body'} StateWS.__init__(self, state_machine, debug) def runtime_init(self): StateWS.runtime_init(self) memo = self.state_machine.memo self.memo = memo self.reporter = memo.reporter self.inliner = memo.inliner self.document = memo.document self.parent = self.state_machine.node def goto_line(self, abs_line_offset): """ Jump to input line `abs_line_offset`, ignoring jumps past the end. """ try: self.state_machine.goto_line(abs_line_offset) except EOFError: pass def no_match(self, context, transitions): """ Override `StateWS.no_match` to generate a system message. This code should never be run. """ self.reporter.severe( 'Internal error: no transition pattern match. State: "%s"; ' 'transitions: %s; context: %s; current line: %r.' % (self.__class__.__name__, transitions, context, self.state_machine.line), line=self.state_machine.abs_line_number()) return context, None, [] def bof(self, context): """Called at beginning of file.""" return [], [] def nested_parse(self, block, input_offset, node, match_titles=0, state_machine_class=None, state_machine_kwargs=None): """ Create a new StateMachine rooted at `node` and run it over the input `block`. """ if state_machine_class is None: state_machine_class = self.nested_sm if state_machine_kwargs is None: state_machine_kwargs = self.nested_sm_kwargs block_length = len(block) state_machine = state_machine_class(debug=self.debug, **state_machine_kwargs) state_machine.run(block, input_offset, memo=self.memo, node=node, match_titles=match_titles) state_machine.unlink() new_offset = state_machine.abs_line_offset() # No `block.parent` implies disconnected -- lines aren't in sync: if block.parent and (len(block) - block_length) != 0: # Adjustment for block if modified in nested parse: self.state_machine.next_line(len(block) - block_length) return new_offset def nested_list_parse(self, block, input_offset, node, initial_state, blank_finish, blank_finish_state=None, extra_settings={}, match_titles=0, state_machine_class=None, state_machine_kwargs=None): """ Create a new StateMachine rooted at `node` and run it over the input `block`. Also keep track of optional intermediate blank lines and the required final one. """ if state_machine_class is None: state_machine_class = self.nested_sm if state_machine_kwargs is None: state_machine_kwargs = self.nested_sm_kwargs.copy() state_machine_kwargs['initial_state'] = initial_state state_machine = state_machine_class(debug=self.debug, **state_machine_kwargs) if blank_finish_state is None: blank_finish_state = initial_state state_machine.states[blank_finish_state].blank_finish = blank_finish for key, value in extra_settings.items(): setattr(state_machine.states[initial_state], key, value) state_machine.run(block, input_offset, memo=self.memo, node=node, match_titles=match_titles) blank_finish = state_machine.states[blank_finish_state].blank_finish state_machine.unlink() return state_machine.abs_line_offset(), blank_finish def section(self, title, source, style, lineno, messages): """Check for a valid subsection and create one if it checks out."""
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -