Latexで使用するための参考文献の変換

Question

以下のプログラムがawk動作するはずです。各行の要素を見つけて、( ... )「author、year」または「author1 year1、author2 year2、...」パターンと一致することを確認します。その場合は、参照コマンドを生成してグループを置き換えます( ... )。それ以外の場合、グループはそのまま残ります。

#!/usr/bin/awk -f


# This small function creates an 'authorYYYY'-style string from
# separate author and year fields. We split the "author" field
# additionally at each space in order to strip leading/trailing
# whitespace and further authors.
function contract(author, year)
{
    split(author,auth_fields," ");
    auth=tolower(auth_fields[1]);
    return sprintf("%s%4d",auth,year);
}



# This function checks if two strings correspond to "author name(s)" and
# "year", respectively.
function check_entry(string1, string2)
{
    if (string1 ~ /^ *([[:alpha:].-]+ *)+$/ && string2 ~ /^ *[[:digit:]]{4} *$/) return 1;
    return 0;
}




# This function creates a 'citation' command from a raw element. If the
# raw element does not conform to the reference syntax of 'author, year' or
# 'author1 year1,author2 year2, ...', we should leave it "as is", and return
# a "0" as indicator.
function create_cite(raw_elem)
{
    cite_argument=""

    # Split at ','. The single elements are either name(list) and year,
    # or space-separated name(list)-year statements.
    n_fields=split(raw_elem,sgl_elem,",");

    if (n_fields == 2 && check_entry(sgl_elem[1],sgl_elem[2]))
    {
        cite_argument=contract(sgl_elem[1],sgl_elem[2]);
    }
    else
    {
        for (k=1; k<=n_fields; k++)
        {
            n_subfield=split(sgl_elem[k],subfield," ");

            if (check_entry(subfield[1],subfield[n_subfield]))
            {
                new_elem=contract(subfield[1],subfield[n_subfield]);
                if (cite_argument == "")
                {
                    cite_argument=new_elem;
                }
                else
                {
                    cite_argument=sprintf("%s,%s",cite_argument,new_elem);
                }
            }
            else
            {
                return 0;
            }
        }
    }


    cite=sprintf("\\{%s}",cite_argument);
    return cite;
}




# Actual program
# For each line, create a "working copy" so we can replace '(...)' pairs
# already processed with different text (here: 'X ... Y'); otherwise 'sub'
# would always stumble across the same opening parentheses.
# For each '( ... )' found, check if it fits the pattern. If so, we replace
# it with a 'cite' command; otherwise we leave it as it is.

{
    working_copy=$0;

    # Allow for unmatched ')' at the beginning of the line:
    # if a ')' was found before the first '(', mark is as processed
    i=index(working_copy,"(");
    j=index(working_copy,")");
    if (i>0 && j>0 && j<i) sub(/\)/,"Y",working_copy);

    while (i=index(working_copy,"("))
    {
        sub(/\(/,"X",working_copy); # mark this '(' as "already processed

        j=index(working_copy,")");
        if (!j)
        {
            continue;
        }
        sub(/\)/,"Y",working_copy); # mark this ')', too


        elem=substr(working_copy,i+1,j-i-1);

        replacement=create_cite(elem);
        if (replacement != "0")
        {
            elem="\\(" elem "\\)"
            sub(elem,replacement);
        }

    }
    print $0;
}

このプログラムを呼び出す

~$ awk -f transform_citation.awk input.tex

ノートプログラムは、入力が「うまく構成された」と予想します。つまり、行のすべての角かっこはペアで一致する必要があります（行の先頭に1つの右角かっこが許可され、一致しない左角括弧は無視されます）。

また参考にしてください上記の構文の中にはGNU awkが必要です。別の実装に移植するには

if (string1 ~ /^ *([[:alpha:].-]+ *)+$/ && string2 ~ /^ *[[:digit:]]{4} *$/) return 1;

そして

if (string1 ~ /^ *([a-zA-Z.-]+ *)+$/ && string2 ~ /^ *[0123456789][0123456789][0123456789][0123456789] *$/) return 1;

そして、照合順序ロケールがに設定されていることを確認してくださいC。

Answer 1