このページを正しくダウンロードする方法は？

Question

wgetJavaScriptがURLを処理する方法のため、それ自体は機能しません。ページを解析xmllintし、URL を処理wgetできる形式で処理する必要があります。

まず、JavaScriptで処理されたURLを抽出して処理し、次に出力しますurls.txt。

wget -O - 'https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=10685&itemId=1119299160&resourceId=42647' | \
xmllint --html --xpath "//li[@class='resourceColumn']//a/@href" - 2>/dev/null | \
sed -e 's# href.*Books#https://bcs.wiley.com/he-bcs/Books#' -e 's/amp;//g' -e 's/&newwindow.*$//' > urls.txt

次に、各URLを開き、見つかったPDFファイルをダウンロードしますurls.txt。

wget -O - -i urls.txt | grep -o 'https.*pdf' | wget -i -

curl選択する：

curl 'https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=10685&itemId=1119299160&resourceId=42647' | \
xmllint --html --xpath "//li[@class='resourceColumn']//a/@href" - 2>/dev/null | \
sed -e 's# href.*Books#https://bcs.wiley.com/he-bcs/Books#' -e 's/amp;//g' -e 's/&newwindow.*$//' > urls.txt

curl -s $(cat urls.txt) | grep -o 'https.*pdf' | xargs -l curl -O

Answer 1

wgetJavaScriptがURLを処理する方法のため、それ自体は機能しません。ページを解析xmllintし、URL を処理wgetできる形式で処理する必要があります。

まず、JavaScriptで処理されたURLを抽出して処理し、次に出力しますurls.txt。

wget -O - 'https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=10685&itemId=1119299160&resourceId=42647' | \
xmllint --html --xpath "//li[@class='resourceColumn']//a/@href" - 2>/dev/null | \
sed -e 's# href.*Books#https://bcs.wiley.com/he-bcs/Books#' -e 's/amp;//g' -e 's/&newwindow.*$//' > urls.txt

次に、各URLを開き、見つかったPDFファイルをダウンロードしますurls.txt。

wget -O - -i urls.txt | grep -o 'https.*pdf' | wget -i -

curl選択する：

curl 'https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=10685&itemId=1119299160&resourceId=42647' | \
xmllint --html --xpath "//li[@class='resourceColumn']//a/@href" - 2>/dev/null | \
sed -e 's# href.*Books#https://bcs.wiley.com/he-bcs/Books#' -e 's/amp;//g' -e 's/&newwindow.*$//' > urls.txt

curl -s $(cat urls.txt) | grep -o 'https.*pdf' | xargs -l curl -O

このページを正しくダウンロードする方法は？

答え1

関連情報