構造体 NGRAM_INFO

Main N-gram structure. [詳細]

#include <ngram2.h>

NGRAM_INFOのコラボレーション図
Collaboration graph
[凡例]

変数

int version
 version number
boolean from_bin
 TRUE if source is bingram, otherwise ARPA.
WORD_ID max_word_num
 N-gram vocabulary size.
NNID ngram_num [MAX_N]
 Total number of tuples for each N.
NNID bigram_bo_num
 Total number of bigram tuples that has back-off weight (i.e. context of upper 3-gram) (v4).
WORD_ID unk_id
 Unknown word ID.
int unk_num
 Number of dictionary words that are not in this N-gram vocabulary.
LOGPROB unk_num_log
 Log10 value of unk_num, used for calculating probability of unknown words.
boolean isopen
 TRUE if dictionary has unknown words, which does not appear in this N-gram.
char ** wname
 List of word string [nid].
PATNODEroot
 Root of index tree to search n-gram word ID from its name.
LOGPROBp
 1-gram log probabilities [nid]
LOGPROBbo_wt_lr
 Back-off weights for LR 2-gram [nid].
LOGPROBbo_wt_rl
 Back-off weights for RL 2-gram [nid].
NNIDn2_bgn
 2-gram IDs (n2) representing beginning point of 2-gram entries that have the left context
WORD_IDn2_num
 Number of 2-gram that have the left context of above.
WORD_IDn2tonid
 Mapping each 2-gram index ID (n2) to its last word ID (nid).
LOGPROBp_lr
 LR 2-gram log probabilities [n2].
LOGPROBp_rl
 RL 2-gram log probabilities [n2].
NNID_UPPERn2bo_upper
 Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4).
NNID_LOWERn2bo_lower
 Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4).
LOGPROBbo_wt_rrl
 Back-off weights for RL 3-gram [n2-bo].
NNIDn3_bgn
 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v3)
NNID_UPPERn3_bgn_upper
 upper 8-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4)
NNID_LOWERn3_bgn_lower
 lower 16-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4)
WORD_IDn3_num
 Number of 3-gram that have the left context of above.
WORD_IDn3tonid
 Mapping each 3-gram index ID (n3) to its last word ID (nid).
LOGPROBp_rrl
 RL 3-gram log probabilities [n3].

説明

Main N-gram structure.

bigrams and trigrams are stored in the form of sequential lists. They are grouped by the same context, and referred from the context ((N-1)-gram) data by the beginning ID and its number.

ngram2.h113 行で定義されています。


構造体

Unknown word ID.

This value is always fixed to 0, since the CMU-Cambridge SLM Toolkit always define the unknown word "<UNK>" at the first word in vocabulary.

参照:
set_unknown_id

ngram2.h126 行で定義されています。

参照元 bi_prob_lr(), bi_prob_rl(), make_ngram_ref(), make_voca_ref(), print_ngram_info(), set_unknown_id(), tri_prob_rl(), と uni_prob().


この構造体の説明は次のファイルから生成されました:

Juliusに対してTue Sep 22 00:15:48 2009に生成されました。  doxygen 1.6.0