📄 count_artvec.1
字号:
.\" Process this file with .\" groff -man -Tascii count_artvec.1..TH COUNT_ARTVEC 1 "February 2004" "Infomap Project" "Infomap NLP Manual".SH NAME.TP count_artvec \- compute article vectors from word vectors and \tokenized corpus.SH SYNOPSIS.B count_artvec.BR -m " <model_data_dir> ".SH DESCRIPTION.B count_artvec.SH OPTIONS.TP.BI -m \ <model_data_dir>The directory from which input files are read and to which outputfiles are written..\" .SH EXAMPLES.SH INPUT FILESThese files are read from the model data directory, specified asan argument to the.B -mdiroption..I numDocs.RSA file containing the number of documents in the input corpus.See.BR prepare_corpus (1)..RE.I dic.RSThe "dictionary" file listing the word types (terms) from the inputcorpus, along with their term and document frequencies.See .BR prepare_corpus (1)..RE.I wordlist.RSA tokenized form of the corpus, with document boundaries marked.See.BR prepare_corpus (1)..RE.I wordvec.bin.RSA binary representation of the word vectors. See .BR encode_wordvec (1)..RE.I model_params.bin.RSReads this file to obtain parameters for the model being built.See .BR prepare_corpus (1)..RE.SH OUTPUT FILESThese files are written to the model data directory, specified asan argument to the.B -mdiroption..I artvec.bin.RSA binary representation of the article vectors. Analogousto .IR wordvec.bin ..RE.I art2offset.{dir,pag}.RSThese two files make up a DBM database. Each key in this databaseis a document (article) ID; the corresponding value is the offset into .I artvec.binat which the vector for the document with that ID can be found.In a single-file corpus, a document ID is the offset into the corpusfile at which the document begins. In a multiple-file corpus, adocument ID is an key into the .I number2nameDBM (see .BR prepare_corpus (1))that can be used to retrieve the document's filename.The .I art2offset DBM and.I artvec.bincan be used to retrieve a document's vector given its ID..RE.I offset2art.{dir,pag}.RSThese two files make up a DBM database. Each key in this databaseis an offset into .I artvec.binat which a document vector begins. The corresponding value is thedocument ID of the document having that vector. This DBM and.I artvec.bincan be used to retrieve a document given its vector..RE.SH SEE ALSO.BR prepare_corpus (1), \ count_wordvec (1), \ svdinterface (1), \\ encode_wordvec (1), \ write_text_params (1)..SH DIAGNOSTICSReturns 0 to indicate success; 1 to indicate error..SH BUGSPlease report bugs to .BR infomap-nlp-users@lists.sourceforge.net ..SH CREDITSThe Infomap NLP software was written by Stefan Kaufmann, HinrichSchuetze, Dominic Widdows, Beate Dorow, and Scott Cederberg. TheInfomap algorithm was originally developed by Hinrich Schuetze..SH AUTHORThis manual page was written by Scott Cederberg. Please directinquiries and bug reports to .BR infomap-nlp-users@lists.sourceforge.net .
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -