N-gramファイルをメモリに読み込み単語辞書と対応を取る [詳細]

#include <sent/stddefs.h>
#include <sent/ngram2.h>
#include <sent/vocabulary.h>

関数
boolean	init_ngram_bin (NGRAM_INFO ndata, char bin_ngram_file)
	Read and setup N-gram data from binary format file.
boolean	init_ngram_arpa (NGRAM_INFO ndata, char ngram_file, int dir)
	Read and setup N-gram data from ARPA format file.
boolean	init_ngram_arpa_additional (NGRAM_INFO ndata, char bigram_file)
	Read additional LR 2-gram for 1st pass.
boolean	make_voca_ref (NGRAM_INFO ndata, WORD_INFO winfo)
	Make correspondence between word dictionary and N-gram vocabulary.
void	set_unknown_id (NGRAM_INFO ndata, char str)
	Set unknown word ID to the N-gram data.
void	fix_uniprob_srilm (NGRAM_INFO ndata, WORD_INFO winfo)
	Fix unigram probability of BOS / EOS word.

説明

N-gramファイルをメモリに読み込み単語辞書と対応を取る

作者:: Akinobu LEE

日付:: Wed Feb 16 07:40:53 2005

Revision:: 1.7

init_ngram.c で定義されています。

関数

boolean init_ngram_bin	(	NGRAM_INFO *	ndata,
		char *	bin_ngram_file
	)

Read and setup N-gram data from binary format file.

引数:

ndata	[out] pointer to N-gram data structure to store the data
bin_ngram_file	[in] file name of the binary N-gram

init_ngram.c の 36 行で定義されています。

参照元 initialize_ngram().

boolean init_ngram_arpa	(	NGRAM_INFO *	ndata,
		char *	ngram_file,
		int	dir
	)

Read and setup N-gram data from ARPA format file.

引数:

ndata	[out] pointer to N-gram data structure to store the data
ngram_file	[in] file name of ARPA (reverse) 3-gram file
dir	[in] direction (DIR_LR \| DIR_RL)

init_ngram.c の 65 行で定義されています。

参照元 initialize_ngram().

boolean init_ngram_arpa_additional	(	NGRAM_INFO *	ndata,
		char *	bigram_file
	)

Read additional LR 2-gram for 1st pass.

引数:

ndata	[out] pointer to N-gram data structure to store the data
bigram_file	[in] file name of ARPA 2-gram file

init_ngram.c の 98 行で定義されています。

参照元 initialize_ngram().

boolean make_voca_ref	(	NGRAM_INFO *	ndata,
		WORD_INFO *	winfo
	)

Make correspondence between word dictionary and N-gram vocabulary.

引数:

ndata	[i/o] word/class N-gram, the unknown word information will be set.
winfo	[i/o] word dictionary, the word-to-ngram-entry mapping will be done here.

init_ngram.c の 127 行で定義されています。

参照元 initialize_ngram().

void set_unknown_id	(	NGRAM_INFO *	ndata,
		char *	str
	)

Set unknown word ID to the N-gram data.

引数:

ndata	[out] N-gram data to set unknown word ID.
str	[in] word name string of unknown word

init_ngram.c の 169 行で定義されています。

参照元 initialize_ngram().

void fix_uniprob_srilm	(	NGRAM_INFO *	ndata,
		WORD_INFO *	winfo
	)

Fix unigram probability of BOS / EOS word.

This function checks the probabilities of BOS / EOS word, and if it is set to "-99", give the same as another one. This is the case when the LM is trained by SRILM, which assigns unigram probability of "-99" to the beginning-of-sentence word, and causes search on reverse direction to fail.

引数:

ndata	[i/o] N-gram data
winfo	[i/o] Vocabulary information

init_ngram.c の 206 行で定義されています。

参照元 initialize_ngram().