[English | Japanese]

Specification of NMZ.* files


Table of Contents

NMZ.i

Index file for word searching. (inverted file)

Structure

For each word, the pair of [documentID containing that word][score] is stored sequencially, making the record for the word. The record is of variable length, the byte count of each data part is placed in front of them.

    [data length for word1][documentID][score][documentID][score]...
    [data length for word2][documentID][score][documentID][score]...
    [data length for word3][documentID][score][documentID][score]...
       :

Note

NMZ.ii

Index for 'seek'ing NMZ.i.

Structure


    [position of word 1 in NMZ.i][position of word 2 in NMZ.i]
    [position of word 3 in NMZ.i]...

Note

NMZ.w

List of words.

Structure

A simple line-oriented text. Sorted in ascending order. You can seek NMZ.ii by line number. (Note: line number = wordID)

Note

NMZ.wi

Index for 'seek'ing NMZ.w

Structure


    [position of word 1 in NMZ.w][position of word 2 in NMZ.w]
    [position of word 3 in NMZ.w]...

Note

NMZ.r

List of files registered in index.

Structure

Each line records a document file which is registered in the index file. However, a line beginning with '#' indicates a file deleted from the index. A line beginning with '##' indicates comment. Example:


    /home/foo/bar1.html
    /home/foo/bar2.html
    /home/foo/bar3.html
    ## indexed: Sun, 08 Jan 2006 02:28:00 +0900
    (an empty line)
    # /home/foo/bar1.html
    ## deleted: Sun, 08 Jan 1998 12:34:56 +0900

NMZ.p

Index for phrase searching.

Description

Two words are converted to a 16 bit hash value. For phrase searching, all words in a phrase are 'AND'ed and searched, then check the word order by referring NMZ.p. Note that the word order are recorded for each two word pairs. So, to search "foo bar baz", documents including "foo bar" or "bar baz" are retrieved. By collision of hash values, inappropriate documents may also be retrieved. Though phrase search is inaccurate, it usually works fine.

Structure


                    |<------   data byte count (1)    ------->|
[data byte count(1)][documentID including hash value \x0000]...
                    |<------   data byte count (2)    ------->|
[data byte count(2)][documentID including hash value \x0001]...
...
[data byte count(n)][documentID including hash value \xffff]...

Note

NMZ.pi

Index of index for phrase searching.

Structure


    [position of \x0000 in NMZ.p][position of \x0001 in NMZ.p] ...
    [position of \xffff in NMZ.p]

Note

NMZ.t

Record information about time stamps and deleted documents.

Description

File time stamps are recorded in 32 bits. This is used for sorting search results by date. Also, if value is -1, then the document is regarded as deleted.

Structure


    [time stamp of documentID1][time stamp of documentID2]...

Note

NMZ.field.{subject,from,date,message-id,...}

File to record field information.

Description

Used in field-specified searching. A simple line-oriented text. grep'ed by the regular expression engine. A line number can be used as a documentID. Also, used in displaying the search results.

Structure

A simple line-oriented text. (line number = documentID)

Note

Since it is a line-oriented text, it can be edited by an editor or other tools. In case you edit, you should rebuild NMZ.field.{subject,from,date,message-id,...}.i files by rfnmz.

NMZ.field.{subject,from,date,message-id,...}.i

Index for 'seek'ing NMZ.field.{subject,from,date,message-id,...}

Structure


    [field position in documentID1][field position in documentID2]...

Note

NMZ.access

Configuration file for user access control.

Structure

Access control by IP address, host name and/or domain name. deny defines hosts from which you deny user access, and allow defines hosts from which you allow user access. When host is specified by IP address, prefix matching is used, and when host if specified by host name or domain name, suffix matching is used. all indicates all hosts. Configuration is evaluated from the top. Example:


    deny all
    allow localhost
    allow 123.123.123.
    allow .foobar.jp

This configuration allows access from the localhost, hosts with IP address 123.123.123.*, or hosts with domain name *.foobar.jp. Access from other hosts are denied.

For Apache web sever, access control by host name and/or domain name requires the following description in "httpd.conf".


    HostnameLookups On

NMZ.status

Data necessary to update index is stored.

NMZ.result

File to specify the style of search results.

Description

${field name} is replaced by the contents of the field. For example, ${title} is replaced by the contents of NMZ.field.title. ${namazu::counter} and ${namazu::score} have special meanings. They are replaced by the counter of search results and its score respectively.

By default, NMZ.result.normal and NMZ.result.short are provided. Users can freely create NMZ.result.*.

NMZ.head

Header of search results.

NMZ.foot

Footer of search results.

NMZ.body

Query description. Displayed when no keyword is given.

NMZ.tips

Tips for searching. Displayed when no document is retrieved.

NMZ.log

Log file of index updating.

NMZ.lock

Lock file to prevent searching.

NMZ.lock2

Lock file to prevent updating/making the same index simultaneously.

NMZ.slog

Log file for searched keywords.

Note


Namazu Homepage

$Id: nmz.html.en,v 1.18 2006/10/21 06:26:08 opengl2772 Exp $
developers@namazu.org