次のfile1.txtがあります。私がしたいのは、value1をvalue7に変換して1行に出力することです。値は「Start」と「End」という単語の間で検索されます。ラベル/値が欠落している場合、出力に「NA」と表示されます。
下記のoutput.txtをご覧ください。
簡単に言えば、StartとEndの間の値をコピーして1行に出力したいと思います。値ラベルがない場合、値にNAが表示されます。そして、file1.txtが見つかるまで、別のレコードの値を(最初から最後まで)検索し続けます。
ファイル1.txt
Start
label1 label2 label3 label4
value1 value2 value3 value4
label5
value5
label6 label7
value6 value7
End
Start
label1 label2 label4
valueA valueB valueD
label5
valueE
label6
valueF
End
Start
.
.
.
End
出力.txt
label1 label2 label3 label4 label5 label6 label7
value1 value2 value3 value4 value5 value6 value7
valueA valueB NA valueD valueE valueF NA
答え1
このPythonスクリプトは必要なことをする必要があります。
#!/usr/bin/env python
# -*- encoding: ascii -*-
"""parse.py
Parses a custom-format data-file.
Processes the file first and then prints the results.
"""
import sys
# Read the data from the file
file = open(sys.argv[1], 'r')
# Initialize a dictionary to collect the values for each label
labels = {}
# Initialize a stack to keep track of block state
stack = []
# Initialize a counter to count the number of blocks
block = 0
# Process the file
line = file.readline()
while line:
# Remove white-space
line = line.strip()
# The stack should be empty when we start a new block
if line.lower() == "start":
if stack:
raise Exception("Invalid File Format: Bad Start")
else:
stack.append(line)
# Otherwise the bottom of the stack should be a "Start"
# When we reach the end of a block we empty the stack
# end increment the block counter
elif line.lower() == "end":
if stack[0].lower() != "start":
raise Exception("Invalid File Format: Bad End")
else:
block += 1
stack = []
# Other lines should come in consecutive label/value pairs
# i.e. a value row should follow a label row
elif line:
# If there are an odd number of data rows in the stack then
# the current row should be a value row - check that it matches
# the corresponding label row
if len(stack[1:])%2==1:
_labels = stack[-1].split()
_values = line.split()
# Verify that the label row and value row have the same number
# of columns
if len(_labels) == len(_values):
stack.append(line)
for label, value in zip(_labels, _values):
# Add new labels to the labels dictionary
if label not in labels:
labels[label] = {
"cols": len(label)
}
# Add the value for the current block
labels[label][block] = value
# Keep track of the longest value for each label
# so we can format the output later
if len(value) > labels[label]["cols"]:
labels[label]["cols"] = len(value)
else:
raise Exception("Invalid File Format: Label/Value Mismatch")
# If there are an even number of data rows in the stack then
# the current row should be a label row - append it to the stack
else:
stack.append(line)
# Read the next line
line = file.readline()
# Construct the header row
header = ""
for label in labels:
cols = labels[label]["cols"]
header += "{0: <{width}}".format(label, width=cols+1)
# Construct the data rows
rows = []
for i in range(0, block):
row = ""
for label in labels:
cols = labels[label]["cols"]
row += "{0: <{width}}".format(labels[label].get(i, "NA"), width=cols+1)
rows.append(row)
# Print the results
print(header)
for row in rows:
print(row)
次のように実行できます。
python parse.py file1.txt
サンプルデータに対して次の出力を生成します。
label1 label2 label3 label4 label5 label6 label7
value1 value2 value3 value4 value5 value6 value7
valueA valueB NA valueD valueE valueF NA