ファイルに 1 行だけ残るまでシェルスクリプトを実行します。

Question 1

ファイルが正しい形式のXMLファイルであり、<text>ノードを別々のファイルに抽出したい場合は、XMLStarletを使用して次のことを実行できます。

#!/bin/sh

infile="$1"

xmlstarlet sel -t -v '//text/@id' -nl "$infile" |
while read id; do
    xmlstarlet sel -t --var id="'$id'" -v '//text[@id = $id]' "$infile" >"$id.txt"
done

コマンドラインに次のファイルのパス名を入力します。

<?xml version="1.0"?>
<root>
  <text id="cade2296-1">
The first text, called "cade2296-1".
</text>
  <text id="cafr3062-1">
The second text, called "cafr3062-1".
</text>
</root>

...これは現在のディレクトリに2つのファイルを作成し、cade2296-1.txt元cafr3062-1.txtのファイルの2つのタグの内容を含みます。<text>

ファイル名はラベルidの属性から取得されます<text>。idこれらの値は最初にXMLから抽出され、次にループから関連タグ値を抽出するために使用されます。

ループ内でXMLStarlet呼び出しを次-vのように変更すると、次のような結果が得られます。-cコピー<text>タグのデータだけでなく、XMLタグのコンテンツです。

Answer

ファイルが正しい形式のXMLファイルであり、<text>ノードを別々のファイルに抽出したい場合は、XMLStarletを使用して次のことを実行できます。

#!/bin/sh

infile="$1"

xmlstarlet sel -t -v '//text/@id' -nl "$infile" |
while read id; do
    xmlstarlet sel -t --var id="'$id'" -v '//text[@id = $id]' "$infile" >"$id.txt"
done

コマンドラインに次のファイルのパス名を入力します。

<?xml version="1.0"?>
<root>
  <text id="cade2296-1">
The first text, called "cade2296-1".
</text>
  <text id="cafr3062-1">
The second text, called "cafr3062-1".
</text>
</root>

...これは現在のディレクトリに2つのファイルを作成し、cade2296-1.txt元cafr3062-1.txtのファイルの2つのタグの内容を含みます。<text>

ファイル名はラベルidの属性から取得されます<text>。idこれらの値は最初にXMLから抽出され、次にループから関連タグ値を抽出するために使用されます。

ループ内でXMLStarlet呼び出しを次-vのように変更すると、次のような結果が得られます。-cコピー<text>タグのデータだけでなく、XMLタグのコンテンツです。

Question 2

はい、@George Vasiliouのおかげでうまくいくことができました。これでスクリプトは次のようになります。

#!/bin/sh
echo "file to split?"
read file

# This variable is to name resulting files
f=0

while :
do
    # Count how many occurrences of "<text" are in the file to split
    count=$(grep "<text" "$file" | wc -l)
if [ "$count" -gt 1 ]
then

    # Send the occurrences of "<text" with their line number to the titles.txt file
    grep -n "<text" "$file" > titles.txt

    # From the second line of titles get the line number
    lines=$(cat titles.txt| sed -n 2'p' | sed -r 's/^([0-9]*).*/\1/g')

    # Every time the script is run the resulting file gets the next number as name      
    f=$((f+1))

    # From the line number obtained at the second line substract 1
    substrac="$(($lines-1))"

    # Create a new file taking the amount of lines indicated by the substraction from the splitting file
    head -"$substrac" "$file" > "$f"

    # Delete the lines corresponding to the newly created file from the splitting file to start the process over
    sed -i '1,'"$substrac"'d' "$file"
    echo "file \"$f\" generated"
else
    echo "process finished!"
    exit 1;
fi
done

説明する：次の形式の巨大なテキストファイルがあります。

  <text id="cade2296-1">
  many
  undetermined
  lines
  ...
 </text>

 The same schema repeteated undetermined times

  <text id="cafr3062-1">
  many
  undetermined
  lines
  ...
 </text>

私が必要とするのは、別のファイルの各パターンです。

Answer

はい、@George Vasiliouのおかげでうまくいくことができました。これでスクリプトは次のようになります。

#!/bin/sh
echo "file to split?"
read file

# This variable is to name resulting files
f=0

while :
do
    # Count how many occurrences of "<text" are in the file to split
    count=$(grep "<text" "$file" | wc -l)
if [ "$count" -gt 1 ]
then

    # Send the occurrences of "<text" with their line number to the titles.txt file
    grep -n "<text" "$file" > titles.txt

    # From the second line of titles get the line number
    lines=$(cat titles.txt| sed -n 2'p' | sed -r 's/^([0-9]*).*/\1/g')

    # Every time the script is run the resulting file gets the next number as name      
    f=$((f+1))

    # From the line number obtained at the second line substract 1
    substrac="$(($lines-1))"

    # Create a new file taking the amount of lines indicated by the substraction from the splitting file
    head -"$substrac" "$file" > "$f"

    # Delete the lines corresponding to the newly created file from the splitting file to start the process over
    sed -i '1,'"$substrac"'d' "$file"
    echo "file \"$f\" generated"
else
    echo "process finished!"
    exit 1;
fi
done

説明する：次の形式の巨大なテキストファイルがあります。

  <text id="cade2296-1">
  many
  undetermined
  lines
  ...
 </text>

 The same schema repeteated undetermined times

  <text id="cafr3062-1">
  many
  undetermined
  lines
  ...
 </text>

私が必要とするのは、別のファイルの各パターンです。

ファイルに 1 行だけ残るまでシェルスクリプトを実行します。

答え1

答え2

関連情報