algor.h

来自「基于python的中文分词程序」· C头文件代码 · 共 81 行

81 行

#ifndef _ALGORITHM_H_#define _ALGORITHM_H_#include <vector>#include "chunk.h"#include "token.h"#include "dict.h"/** * The Algorithm of MMSeg use four rules: *  - Maximum matching rule *  - Largest average word length rule *  - Smallest variance of word length rule *  - Largest sum of degree of morphemic freedom of one-character *    words rule */namespace rmmseg{    class Algorithm    {    public:        Algorithm(const char *text, int length)            :m_text(text), m_pos(0),            m_text_length(length),            m_tmp_words_i(0),            m_match_cache_i(0)            {                for (int i = 0; i < match_cache_size; ++i)                    m_match_cache[i].first = -1;            }        Token next_token();        const char *get_text() const        {            return m_text;        }    private:        Token get_basic_latin_word();        Token get_cjk_word(int);                std::vector<Chunk> create_chunks();        int next_word();        int next_char();        std::vector<Word *> find_match_words();        int max_word_length() { return 4; }                const char *m_text;        int m_pos;        int m_text_length;        /* tmp words are only for 1-char words which         * are not exist in the dictionary. It's length         * value will be set to -1 to indicate it is         * a tmp word. */        Word *get_tmp_word()        {            if (m_tmp_words_i >= max_tmp_words)                m_tmp_words_i = 0;  // round wrap            return &m_tmp_words[m_tmp_words_i++];        }        /* related to max_word_length and match_words_cache_size */        static const int max_tmp_words = 64;        Word m_tmp_words[max_tmp_words];        int m_tmp_words_i;        /* match word caches */        static const int match_cache_size = 3;        typedef std::pair<int, std::vector<Word *> > match_cache_t;        match_cache_t m_match_cache[match_cache_size];        int m_match_cache_i;    };}#endif /* _ALGORITHM_H_ */

algor.h - 源码说明

本页面展示了「基于python的中文分词程序」中的 algor.h 源码文件，采用 C头文件编程语言编写，共 81 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫开发者社区收录了大量与Python相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?