インデントされた行を親行とグループ化しながらファイルを並べ替える（複数レベル）

Question 1

拡大するPython解決策：

サンプルinfileコンテンツ（レベル4）：

first
    apple
    orange
        train
        car
            truck
            automobile
    kiwi
third
    orange
    apple
        plane
second
    lemon

sort_hierarchy.pyスクリプト：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import re

with open(sys.argv[1], 'rt') as f:
    pat = re.compile(r'^\s+')
    paths = []

    for line in f:
        offset = pat.match(line)
        item = line.strip()

        if not offset:
            offset = 0
            paths.append(item)
        else:
            offset = offset.span()[1]
            if offset > prev_offset:
                paths.append(paths[-1] + '.' + item)
            else:
                cut_pos = -prev_offset//offset
                paths.append('.'.join(paths[-1].split('.')[:cut_pos]) + '.' + item)

        prev_offset = offset

    paths.sort()
    sub_pat = re.compile(r'[^.]+\.')
    for i in paths:
        print(sub_pat.sub(' ' * 4, i))

使用法:

python sort_hierarchy.py path/to/infile

出力：

first
    apple
    kiwi
    orange
        car
            automobile
            truck
        train
second
    lemon
third
    apple
        plane
    orange

Answer

拡大するPython解決策：

サンプルinfileコンテンツ（レベル4）：

first
    apple
    orange
        train
        car
            truck
            automobile
    kiwi
third
    orange
    apple
        plane
second
    lemon

sort_hierarchy.pyスクリプト：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import re

with open(sys.argv[1], 'rt') as f:
    pat = re.compile(r'^\s+')
    paths = []

    for line in f:
        offset = pat.match(line)
        item = line.strip()

        if not offset:
            offset = 0
            paths.append(item)
        else:
            offset = offset.span()[1]
            if offset > prev_offset:
                paths.append(paths[-1] + '.' + item)
            else:
                cut_pos = -prev_offset//offset
                paths.append('.'.join(paths[-1].split('.')[:cut_pos]) + '.' + item)

        prev_offset = offset

    paths.sort()
    sub_pat = re.compile(r'[^.]+\.')
    for i in paths:
        print(sub_pat.sub(' ' * 4, i))

使用法:

python sort_hierarchy.py path/to/infile

出力：

first
    apple
    kiwi
    orange
        car
            automobile
            truck
        train
second
    lemon
third
    apple
        plane
    orange

Question 2

Awk解決策：

サンプルinfileコンテンツ（レベル4）：

first
    apple
    orange
        train
        car
            truck
            automobile
    kiwi
third
    orange
    apple
        plane
second
    lemon

awk '{
         offset = gsub(/ /, "");
         if (offset == 0) { items[NR] = $1 }
         else if (offset > prev_ofst) { items[NR] = items[NR-1] "." $1 }
         else {
             prev_item = items[NR-1];
             gsub("(\\.[^.]+){" int(prev_ofst / offset) "}$", "", prev_item);
             items[NR] = prev_item "." $1
         }
         prev_ofst = offset;
     }
     END{
         asort(items);
         for (i = 1; i <= NR; i++) {
             gsub(/[^.]+\./, "    ", items[i]);
             print items[i]
         }
     }' infile

出力：

first
    apple
    kiwi
    orange
        car
            automobile
            truck
        train
second
    lemon
third
    apple
        plane
    orange

Answer

Awk解決策：

サンプルinfileコンテンツ（レベル4）：

first
    apple
    orange
        train
        car
            truck
            automobile
    kiwi
third
    orange
    apple
        plane
second
    lemon

awk '{
         offset = gsub(/ /, "");
         if (offset == 0) { items[NR] = $1 }
         else if (offset > prev_ofst) { items[NR] = items[NR-1] "." $1 }
         else {
             prev_item = items[NR-1];
             gsub("(\\.[^.]+){" int(prev_ofst / offset) "}$", "", prev_item);
             items[NR] = prev_item "." $1
         }
         prev_ofst = offset;
     }
     END{
         asort(items);
         for (i = 1; i <= NR; i++) {
             gsub(/[^.]+\./, "    ", items[i]);
             print items[i]
         }
     }' infile

出力：

first
    apple
    kiwi
    orange
        car
            automobile
            truck
        train
second
    lemon
third
    apple
        plane
    orange

Question 3

どんな深さでも動作

#!/usr/bin/python3
lines = open('test_file').read().splitlines()

def yield_sorted_lines(lines):
        sorter = []
        for l in lines:
                fields = l.split('\t')
                n = len(fields)
                sorter = sorter[:n-1] + fields[n-1:]
                yield sorter, l


prefixed_lines = yield_sorted_lines(lines)
sorted_lines = sorted(prefixed_lines, key=lambda x: x[0])
for x, y in sorted_lines:
        print(y)

またはパイプ

awk -F'\\t' '{a[NF]=$NF; for (i=1; i<=NF; ++i) printf "%s%s", a[i], i==NF? "\n": "\t"}' file|
sort | awk -F'\\t' -vOFS='\t' '{for (i=1; i<NF; ++i) $i=""; print}'

Answer

どんな深さでも動作

#!/usr/bin/python3
lines = open('test_file').read().splitlines()

def yield_sorted_lines(lines):
        sorter = []
        for l in lines:
                fields = l.split('\t')
                n = len(fields)
                sorter = sorter[:n-1] + fields[n-1:]
                yield sorter, l


prefixed_lines = yield_sorted_lines(lines)
sorted_lines = sorted(prefixed_lines, key=lambda x: x[0])
for x, y in sorted_lines:
        print(y)

またはパイプ

awk -F'\\t' '{a[NF]=$NF; for (i=1; i<=NF; ++i) printf "%s%s", a[i], i==NF? "\n": "\t"}' file|
sort | awk -F'\\t' -vOFS='\t' '{for (i=1; i<NF; ++i) $i=""; print}'

Question 4

sed '/^ /{H;$!d};x;1d;s/\n/\x7/g' | sort | tr \\a \\n

/continuation/{H;$!d};x;1d(またはなど) は/firstline/!フルルック音であり、バッファに完全なラインがあるときだけ削除されます。

単一ラインの蓄積で終わることができる場合は、${p;x;/\n/d}必要なデュアルポンプを追加してください。

Answer

sed '/^ /{H;$!d};x;1d;s/\n/\x7/g' | sort | tr \\a \\n

/continuation/{H;$!d};x;1d(またはなど) は/firstline/!フルルック音であり、バッファに完全なラインがあるときだけ削除されます。

単一ラインの蓄積で終わることができる場合は、${p;x;/\n/d}必要なデュアルポンプを追加してください。

インデントされた行を親行とグループ化しながらファイルを並べ替える（複数レベル）

答え1

答え2

答え3

答え4

関連情報