Gauche ユーザリファレンス: 11.14 rfc.822

11.14 `rfc.822` - RFC822メッセージ形式

Module: rfc.822: 電子メールを交換する際に使用されるテキストのフォーマットである、“インターネット・メッセージ・フォーマット”をパーズ/生成する手続きを定義しています。最新の仕様は、RFC2822 (RFC2822) にあります。このフォーマットは最初 RFC 822 で定義されたため、未だに“RFC822形式”と呼ばれています。それがこのモジュール名の由来です。以下では、このフォーマットを“RFC822形式”と呼びます。

メッセージヘッダのパーズ

Function: rfc822-read-headers iport &keyword strict? reader

入力ポート iport から、メッセージ・ヘッダの終わりに達するまで、 RFC822 形式のメッセージを読み込みます。ヘッダ・フィールドは以下のフォーマットのリストに展開、分離されます。

((name body) …)

Name … はフィールド名で、body … は対応するフィールドのボディ、ともに文字列です。フィールド名は小文字に変換されます。フィールドのボディは、行折り返しが取り除かれる以外は変更されません。フィールドの順番は保存されます。

デフォルトでは、パーザの動作は寛容です。ヘッダをパーズ中に EOF に出会うとそれをメッセージの終端とみなします。継続(折り返し)行でもなく、新しいヘッダフィールドの始端でもない行は無視します。このふるまいはキーワード引数 strict? に真の値を渡すことで変更することができます。真を渡すと、このような不正な形式のヘッダに対してエラーを発生させるようになります。

キーワード引数 reader は iport から一行読み込む手続きをとります。デフォルトは read-line です。ほとんどの場合これで十分のはずです。

Function: rfc822-header->list iport &keyword strict? reader: これはrfc822-read-headersの古い名前です。互換性のために残してありますが、新しいコードは rfc822-read-headersを使って下さい。

Function: rfc822-header-ref header-list field-name &optional default

rfc822-read-headers が返すパーズ済みのヘッダリストから特定のフィールドを得るためのユーティリティ手続きです。

Field-name は小文字の文字列でフィールド名を指定します。与えられた名前をもつフィールドが header-list 中にあれば、その値を文字列で返します。そうでない場合、もし default が与えられていればそれが返り、与えられていなければ #f が返されます。

基本的なフィールドパーザ

RFC2822メッセージの「構造化」されたヘッダフィールドをパーズするために、いくつかの手続きが提供されています。これらの手続きはヘッダフィールドの本体部を処理します。たとえば、ヘッダフィールドが、 "To: Wandering Schemer <schemer@example.com>" であれば、これらの手続きは "Wandering Schemer <schemer@example.com>" をパーズします。

ほとんどの手続きは入力ポートを引数にとります。通常は最初に、ヘッダフィールド全部を rfc822-read-headers でパーズし、ヘッダの本体を rfc822-header-ref で取得してから、その本体用に入力文字列ポートをオープンして、それをこれらの手続きを用いてパーズします。

このように複雑になっているのは、フィールドのタイプによって別々のトークン化スキームが必要になるからです。RFC2822 では多くの場合トークン間にコメントがあらわれことを許しているので、初心な正規表現ではうまくいきません。RFC2822 のコメントはネスト可能で、正規表現では表現しきれないからです。そういうわけで、このレイヤの手続きは、いろいろな構文に対応できるよう十分な柔軟性があるように設計されています。標準的なタイプのヘッダについては高水準のパーザも提供されています。後述の「特定フィールド用パーザ」を参照してください。

Function: rfc822-next-token iport &optional tokenizer-specs

基本的なトークナイザです。まず、もしあれば、白空白および/またはコメント (CFWS) を iport から読み飛ばします。それから、 tokenizer-specs にしたがってトークンをひとつ読み込みます。トークンを読み込む前に、iport が EOF に到達したら、EOF が返されます。

tokenizer-specs はトークナイザ仕様のリストです。トークナイザ仕様は、文字集合または文字集合と手続きのペアのどちらかです。

CFWS を読み飛ばしたあと、この手続きは iport の先頭の一文字を見て、tokenizer-specs のひとつひとつに対してチェックします。その文字が含まれている文字集合がみつかれば、トークンを次のようにして引き出します。トークナイザ仕様が文字集合だけの場合、その文字集合に属している文字の並びがトークンを構成します。トークナイザ仕様が文字集合と手続きのペアだったら、その手続きを iport とともに呼びだし、トークンを読み込みます。

もし、先頭も文字がどの文字集合ともマッチしなければ、そも文字が iport から取り出され、それが返されます。

デフォルトの tokenizer-specs は以下のようになっています。

(list (cons #["] rfc822-quoted-string)
      (cons *rfc822-atext-chars* rfc822-dot-atom))

ここで rfc822-quoted-string および rfc822-dot-atom は後述するトークナイザ手続きで、*rfc822-atext-chars* は RFC2822 で規定された atext の文字集合に束縛されています。つまり、rfc822-next-token はデフォルトでは RFC2822 で規定された quoted-string あるいは dot-atom のトークンを引き出します。

tokenizer-specs をつかって、ヘッダフィールドのパーズ方法をカスタマイズすることができます。たとえば、(1) 英字で構成された単語、または (2) クウォート文字列、のトークンを取り出したいときには、 rfc822-next-token をこんなふうに呼べます。

(rfc822-next-token iport
   `(#[[:alpha:]] (#["] . ,rfc822-quoted-string)))

Function: rfc822-field->tokens field &optional tokenizer-specs: これは便利関数です。フィールド本体 field に対応する入力文字列ポートを生成し、それに対して、rfc822-next-token を全入力を消費するまで、繰り返しよび、トークンのリストを返します。Tokenizer-specs は、 rfc822-next-token に渡されます。

Function: rfc822-skip-cfws iport: iport から、すべてのコメントおよび/または白空白文字を消費し、白空白でもコメントでもない、先頭の文字を返します。返された文字は、 iportに残ります。

Constant: *rfc822-atext-chars*: atom を構成する有効な文字集合に束縛されています。

Constant: *rfc822-standard-tokenizers*: デフォルトの tokenizer-specs に束縛されています。

Function: rfc822-atom iport
Function: rfc822-dot-atom iport
Function: rfc822-quoted-string iport: それぞれ、atom、dot-atom および quoted-string に対応するトークナイザです。quoted-string 中の二重引用符およびエスケープのためのバックスラッシュは rfc822-quoted-string によって取り除かれます。

特定フィールド用パーザ

Function: rfc822-parse-date string

RFC822 形式の日付文字列を取り、8つの値を返します。

year, month, day-of-month, hour, minutes, seconds, timezone, day-of-week.

timezone は UT(グリニッジ標準時)からの分単位のオフセットです。 day-of-week は日曜日から数えた曜日で、情報が不足している場合は #f です。 monthは1から12までの整数です。文字列がパーズ不可能ならば、全ての要素が #f になります。

Function: rfc822-date->date string: RFC822形式の日付フォーマットをパーズし、SRFI-19 の <date> オブジェクト (日付参照) を返します。string がパーズできないときはかわりに #f を返します。

メッセージの構築

Function: rfc822-write-headers headers &keyword output continue check

This is a sort of inverse function of rfc822-read-headers. It receives a list of header data, in which each header data consists of (<name> <body>), and writes them out in RFC822 header field format to the output port specified by the output keyword argument. The default output is the current output port.

By default, the procedure assumes headers contains all the header fields, and adds an empty line in the end of output to indicate the end of the header. You can pass a true value to the continue keyword argument to prevent this, enabling more headers can be added later.

I said “a sort of” above. That's because this function doesn't (and can't) do the exact inverse. Specifically, the caller is responsible for line folding and make sure each header line doesn't exceed the “hard limit” defined by RFC2822 (998 octets). This procedure cannot do the line folding on behalf of the caller, because the places where line folding is possible depend on the semantics of each header field.

It is also the caller's responsibility to make sure header field bodies don't have any characters except non-NUL US-ASCII characters. If you want to include characters outside of that range, you should convert them in the way allowed by the protocol, e.g. MIME. The rfc.mime module (See section rfc.mime - MIMEメッセージ処理) provides a convenience procedure mime-encode-text for such purpose. Again, this procedure cannot do the encoding automatically, since the way the field should be encoded depends on header fields.

What this procedure can do is to check and report such violations. By default, it runs several checks and signals an error if it finds any violations of RFC2822. You can control this checking behavior by the check keyword argument. It can take one of the following values:

:error: Default. Signals an error if a violation is found.
#f, :ignore: Doesn't perform any check. Trust the caller.
procedure: When rfc822-write-headers finds a violation, the procedure is called with three arguments; the header field name, the header field body, and the type of violation explained below. The procedure may correct the problem and return two values, the corrected header field name and body. The returned values are checked again. If the procedure returns the header field name and body unchanged, an error is signalled in the same way as :error is specified.

The third argument passed to the procedure given to the check argument is one of the following symbols. New symbols may be added in future versions for more checks.

incomplete-string: Incomplete string is passed.
bad-character: Header field contains characters outside of US-ASCII or NUL.
line-too-long: Line length exceeds 998 octet limit.
stray-crlf: The string contains CR and/or LF character that doesn't consist of proper line folding.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

This document was generated by Shiro Kawai on November, 22 2009 using texi2html 1.78.

11.14 rfc.822 - RFC822メッセージ形式

メッセージヘッダのパーズ

基本的なフィールドパーザ

特定フィールド用パーザ

メッセージの構築

11.14 `rfc.822` - RFC822メッセージ形式