⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 segment.cpp

📁 最短路径法分词程序。将中文句子经过原子切分后生成一个有向无环图
💻 CPP
字号:
#include "StdAfx.h"
#include ".\segment.h"

CSegment::CSegment(void)
{
	if (!m_Dict.Load("coreDict.dct"))
		AfxMessageBox("词典读取失败!");
}

CSegment::~CSegment(void)
{
}
void CSegment::AtomSegment()
{
	int i=0,j=0,k;
	char sChar[3];

	while (i<strlen(m_sSentence))
	{
		sChar[0]=m_sSentence[i];
		sChar[1]=0;
		if (sChar[0]<0)
		{
			sChar[1]=m_sSentence[i+1];
			sChar[2]=0;
			strcpy(m_sAtom[j],sChar);
			m_nAtomLen[j]=2;
			m_bAtomChinese[j]=TRUE;
			i+=2;
		}
		else
		{
			for (k=i;k<strlen(m_sSentence) && m_sSentence[k]>=0;k++)
				m_sAtom[j][k-i]=m_sSentence[k];
			m_sAtom[j][k]=0;
			m_nAtomLen[j]=k-1;
			m_bAtomChinese[j]=FALSE;
			i=k;
		}
		j++;
	}

	m_nAtomNum=j;
}
void CSegment::GenerateGraph()
{
	int i,j,k;
	char sWord[WORD_MAXLENGTH];
	
	m_Graph.Init(m_nAtomNum+1);
	for (i=0;i<m_nAtomNum;i++)
	{
		if (!m_bAtomChinese[i])
		{
			m_Graph.Let(i,i+1,1);
			continue;
		}
		for (j=i+1;j<m_nAtomNum+1;j++)
		{
			
			for (sWord[0]=0,k=i;k<j;k++)
				strcat(sWord,m_sAtom[k]);
			if (m_Dict.Find(sWord))
				m_Graph.Let(i,j,1);
			else
//				if (j==i+1)
//					m_Graph.Let(i,j,1);
//				else
					m_Graph.Let(i,j,MYINFINITE);
			CString t;
			t.Format("(%d,%d) %s %d",i,j,sWord,m_Graph.Get(i,j));
//			AfxMessageBox(t);
		}
	}

}
void CSegment::ShortPath()
{
	int i,j,k;						
	int nDistance=MYINFINITE;			//中间变量,存距离
	int nNode1,nNode2;				//每次总是找寻nNode1到nNode2的距离
	//把目标结点加入反圈
	m_Route[0].bMarked=1;
	m_Route[0].nDistance=0;
	m_Route[0].nRoute=0;

	for (k=0;k<m_Graph.m_nNodes-1;k++)		//对每一个结点找最小距离
	{
		nDistance=MYINFINITE;			
		
		for (i=0;i<m_Graph.m_nNodes;i++)
				if (m_Route[i].bMarked)	//是否在目标点集内
				{	//若是
					for (j=0;j<m_Graph.m_nNodes;j++)	//找目标点集外点到此点的最小距离
					{
						if (!m_Route[j].bMarked && m_Graph.Get(i,j)!=MYINFINITE)
						{
							if (m_Graph.Get(i,j)+m_Route[i].nDistance<nDistance)
							{
								nDistance=m_Graph.Get(i,j)+m_Route[i].nDistance;
								nNode1=i,nNode2=j;
							}
						}
					}
				}
			//这时找到的是最小距离和路由
			m_Route[nNode2].bMarked=1;
			m_Route[nNode2].nDistance=nDistance;
			m_Route[nNode2].nRoute=nNode1;
	}
}
void CSegment::GenerateResult()
{
	int i=m_Graph.m_nNodes-1,j=0,k,p=0;
	char sWord[WORD_MAXLENGTH];
	char sStack[MAX_SENTENCE_LEN][WORD_MAXLENGTH]={};

//	for (i=0;i<m_Graph.m_nNodes+1;i++)
//	{
//		CString t;
//		t.Format("%d:%d",i,m_Route[i].nRoute);
//		AfxMessageBox(t);
//	}

	while (i!=0)
	{
		j=m_Route[i].nRoute;
		sWord[0]=0;
		for (k=j;k<i;k++)
		{
			strcat(sWord,m_sAtom[k]);
		}
		
		strcpy(sStack[p++],sWord);
		//AfxMessageBox(sWord);
		i=j;
	}
	for (m_sResult[0]=0;p>=0;p--)
	{
		strcat(sStack[p]," ");
		strcat(m_sResult,sStack[p]);
	}
	
}
void CSegment::CleanUp()
{
	int i;
	for (i=0;i<MAX_SENTENCE_LEN;i++)
	{
		m_Route[i].bMarked=0;
		m_Route[i].nDistance=0;
		m_Route[i].nRoute=0;
	}
	for (i=0;i<MAX_SENTENCE_LEN;i++)
		m_nAtomLen[i]=0;
	for (i=0;i<MAX_SENTENCE_LEN;i++)
		m_sAtom[i][0]=0;
	for (i=0;i<MAX_SENTENCE_LEN;i++)
		m_bAtomChinese[i]=FALSE;
	m_nAtomNum=0;
	m_sSentence[0]=0;
	m_sResult[0]=0;
	m_Graph.CleanUp();

}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -