複数のURLターゲットをテキストファイルに保存する

Question 1

使用-iオプション：

wget -i ./url.txt

からman wget：

-i ファイル

--入力ファイル=ファイル

ローカルファイルまたは外部ファイルからURLを読みます。 - がファイルとして指定されている場合は、標準入力からURLを読み込みます。（リテラル名が - のファイルから読み取るには ./- を使用します。）この機能を使用すると、コマンドラインに URL を表示する必要はありません。コマンドラインと入力ファイルの両方にURLがある場合は、コマンドラインのURLが最初に検索されます。 --force-htmlが指定されていない場合、ファイルには1行に1つずつ一連のURLを含める必要があります。

ただし、--force-html を指定すると、そのドキュメントは html として扱われます。この場合、ドキュメントに ""を追加するか、コマンドラインで--base = urlを指定して解決できる相対リンク関連の問題が発生する可能性があります。

ファイルが外部の場合、Content-Typeがtext / htmlと一致すると、文書は自動的にhtmlとして処理されます。また、指定しない場合は、ファイルの場所が暗黙的にデフォルトのhrefとして使用されます。

Answer

使用-iオプション：

wget -i ./url.txt

からman wget：

-i ファイル

--入力ファイル=ファイル

ローカルファイルまたは外部ファイルからURLを読みます。 - がファイルとして指定されている場合は、標準入力からURLを読み込みます。（リテラル名が - のファイルから読み取るには ./- を使用します。）この機能を使用すると、コマンドラインに URL を表示する必要はありません。コマンドラインと入力ファイルの両方にURLがある場合は、コマンドラインのURLが最初に検索されます。 --force-htmlが指定されていない場合、ファイルには1行に1つずつ一連のURLを含める必要があります。

ただし、--force-html を指定すると、そのドキュメントは html として扱われます。この場合、ドキュメントに ""を追加するか、コマンドラインで--base = urlを指定して解決できる相対リンク関連の問題が発生する可能性があります。

ファイルが外部の場合、Content-Typeがtext / htmlと一致すると、文書は自動的にhtmlとして処理されます。また、指定しない場合は、ファイルの場所が暗黙的にデフォルトのhrefとして使用されます。

Question 2

wgetこれを行うオプションがあります。

wget --input-file url.txt

1行に1つのURLを読み込み、url.txt現在のディレクトリに順次ダウンロードします。

より一般的には、次のものを使用できます。xargsこの種の仕事のために組み合わせてください。wgetまたはcurl:

xargs wget < url.txt
xargs curl -O < url.txt

xargs入力の各行を読み取り、それをユーザーが提供するコマンドの引数として提供します。ここでコマンドはwgetまたはですcurl -O。どちらのコマンドもURLをダウンロードして現在のディレクトリに保存します。< url.txt提供されたコンテンツはurl.txtコマンドへの入力として使用されますxargs。

Pythonコードの問題は、urllibから取得することが次のようになることです。バイトその後、データはファイルに直接印刷され、バイトを文字列化しますb'abc\00\0a...'（バイトリテラルが書き込まれる方法）。

Answer

wgetこれを行うオプションがあります。

wget --input-file url.txt

1行に1つのURLを読み込み、url.txt現在のディレクトリに順次ダウンロードします。

より一般的には、次のものを使用できます。xargsこの種の仕事のために組み合わせてください。wgetまたはcurl:

xargs wget < url.txt
xargs curl -O < url.txt

xargs入力の各行を読み取り、それをユーザーが提供するコマンドの引数として提供します。ここでコマンドはwgetまたはですcurl -O。どちらのコマンドもURLをダウンロードして現在のディレクトリに保存します。< url.txt提供されたコンテンツはurl.txtコマンドへの入力として使用されますxargs。

Pythonコードの問題は、urllibから取得することが次のようになることです。バイトその後、データはファイルに直接印刷され、バイトを文字列化しますb'abc\00\0a...'（バイトリテラルが書き込まれる方法）。

Question 3

そしてw3m：

echo 'http://unix.stackexchange.com/questions/148670/save-html-to-text-file' |
tee - - - | 
xargs -n1 w3m -dump | 
sed '/Save html/!d;N;N;N;N;N;N;N'

私の考えにはこれはxargs必要ありません。もちろん同時に複数のURLを設定する設定がありますが、今は把握できません。とにかくxargs動作します。

Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt

Answer

そしてw3m：

echo 'http://unix.stackexchange.com/questions/148670/save-html-to-text-file' |
tee - - - | 
xargs -n1 w3m -dump | 
sed '/Save html/!d;N;N;N;N;N;N;N'

私の考えにはこれはxargs必要ありません。もちろん同時に複数のURLを設定する設定がありますが、今は把握できません。とにかくxargs動作します。

Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt

Question 4

他に2つの方法があります。

wget $(<file)

そして

while read -r link; do wget "$link"; done < file

Answer

他に2つの方法があります。

wget $(<file)

そして

while read -r link; do wget "$link"; done < file

複数のURLターゲットをテキストファイルに保存する

答え1

答え2

答え3

答え4

関連情報