テーブルDDLから列名を抽出する

Question 1

すべてのUnixシステムのすべてのシェルでawkを使用してください。

$ cat tst.awk
$1 == "describe" {
    out = $2
    next
}
/^[+]/ {
    mod = (++cnt % 3)
    if ( mod == 0 ) {
        print out
    }
    next
}
mod == 2 {
    out = out "," $2
}

$ awk -f tst.awk file
test_table,Name,Age

Answer

すべてのUnixシステムのすべてのシェルでawkを使用してください。

$ cat tst.awk
$1 == "describe" {
    out = $2
    next
}
/^[+]/ {
    mod = (++cnt % 3)
    if ( mod == 0 ) {
        print out
    }
    next
}
mod == 2 {
    out = out "," $2
}

$ awk -f tst.awk file
test_table,Name,Age

Question 2

Perlでこれを行う1つの方法の例は次のとおりです。

$ cat extract-column-names.pl
#!/usr/bin/perl -l

while(<>) {
  # Is the current line a "describe" line or are we at the End Of File?
  if (m/describe\s+(.*)/i || eof) {
    # Do we already a table name and column names?
    if ($table && @columns) {
      print join(",", $table, @columns);
      # clear current @columns array
      @columns=();
    };
    # extract table name
    $table = $1;
    next;
  };

  # skip header lines, ruler lines, and empty lines
  next if (m/col_name|-\+-|^\s*$/);

  # extract column name with regex capture group
  if (m/^\|\s+(\S+)\s+\|/) { push @columns, $1 };
}

複数のテーブル記述を含む入力の例：

$ cat table.txt
describe test_table
+-----------+------------+
| col_name  | data_type  |
+-----------+------------+
| Name      | string     |
| Age       | string     |
+-----------+------------+

describe test_table2
+------------+------------+
| col_name   | data_type  |
+------------+------------+
| FirstName  | string     |
| MiddleName | string     |
| LastName   | string     |
+------------+------------+

実行例：

$ ./extract-column-names.pl table.txt
test_table,Name,Age
test_table2,FirstName,MiddleName,LastName

ただし、このスクリプトは標準入力（例cat table.txt | ./extract-column-names.pl：）と複数のファイル名引数（例./extract-column-names.pl table1.txt table2.txt ... tableN.txt：）も処理できます。

data_typeまた、各列を抽出する機能を追加することも難しくありません。これは、別の配列（たとえば）に保存されるか、ハッシュ（キーと値として使用される）を@types使用するようにスクリプトを変更できます。しかし、ハッシュを使用する場合、ハッシュは本質的に順序がないことを覚えておくことが重要です。したがって、列が表示される順序を覚えるには配列が必要です。col_namedata_type@columns

シングルラインバージョン：

$ perl -lne 'if (m/describe\s+(.*)/i || eof) {if ($table && @columns) {print join(",", $table, @columns);@columns=()}$table = $1;next};next if (m/col_name|-\+-|^\s*$/);if (m/^\|\s+(\S+)\s+\|/) {push @columns, $1};' table.txt 
test_table,Name,Age
test_table2,FirstName,MiddleName,LastName

Answer