リダイレクトチェーンのすべてのURLを表示するには？

Question 1

簡単に使ってみてはいかがでしょうかwget？

$ wget http://picasaweb.google.com 2>&1 | grep Location:
Location: /home [following]
Location: https://www.google.com/accounts/ServiceLogin?hl=en_US&continue=https%3A%2F%2Fpicasaweb.google.com%2Flh%2Flogin%3Fcontinue%3Dhttps%253A%252F%252Fpicasaweb.google.com%252Fhome&service=lh2&ltmpl=gp&passive=true [following]
Location: https://accounts.google.com/ServiceLogin?hl=en_US&continue=https%3A%2F%2Fpicasaweb.google.com%2Flh%2Flogin%3Fcontinue%3Dhttps%3A%2F%2Fpicasaweb.google.com%2Fhome&service=lh2&ltmpl=gp&passive=true [following]

curl -vまた、一部の情報を表示しますが、表示には適していませんwget。

$ curl -v -L http://picasaweb.google.com 2>&1 | egrep "^> (Host:|GET)"
> GET / HTTP/1.1
> Host: picasaweb.google.com
> GET /home HTTP/1.1
> Host: picasaweb.google.com
> GET /accounts/ServiceLogin?hl=en_US&continue=https%3A%2F%2Fpicasaweb.google.com%2Flh%2Flogin%3Fcontinue%3Dhttps%253A%252F%252Fpicasaweb.google.com%252Fhome&service=lh2&ltmpl=gp&passive=true HTTP/1.1
> Host: www.google.com
> GET /ServiceLogin?hl=en_US&continue=https%3A%2F%2Fpicasaweb.google.com%2Flh%2Flogin%3Fcontinue%3Dhttps%253A%252F%252Fpicasaweb.google.com%252Fhome&service=lh2&ltmpl=gp&passive=true HTTP/1.1
> Host: accounts.google.com

Answer

簡単に使ってみてはいかがでしょうかwget？

$ wget http://picasaweb.google.com 2>&1 | grep Location:
Location: /home [following]
Location: https://www.google.com/accounts/ServiceLogin?hl=en_US&continue=https%3A%2F%2Fpicasaweb.google.com%2Flh%2Flogin%3Fcontinue%3Dhttps%253A%252F%252Fpicasaweb.google.com%252Fhome&service=lh2&ltmpl=gp&passive=true [following]
Location: https://accounts.google.com/ServiceLogin?hl=en_US&continue=https%3A%2F%2Fpicasaweb.google.com%2Flh%2Flogin%3Fcontinue%3Dhttps%3A%2F%2Fpicasaweb.google.com%2Fhome&service=lh2&ltmpl=gp&passive=true [following]

curl -vまた、一部の情報を表示しますが、表示には適していませんwget。

$ curl -v -L http://picasaweb.google.com 2>&1 | egrep "^> (Host:|GET)"
> GET / HTTP/1.1
> Host: picasaweb.google.com
> GET /home HTTP/1.1
> Host: picasaweb.google.com
> GET /accounts/ServiceLogin?hl=en_US&continue=https%3A%2F%2Fpicasaweb.google.com%2Flh%2Flogin%3Fcontinue%3Dhttps%253A%252F%252Fpicasaweb.google.com%252Fhome&service=lh2&ltmpl=gp&passive=true HTTP/1.1
> Host: www.google.com
> GET /ServiceLogin?hl=en_US&continue=https%3A%2F%2Fpicasaweb.google.com%2Flh%2Flogin%3Fcontinue%3Dhttps%253A%252F%252Fpicasaweb.google.com%252Fhome&service=lh2&ltmpl=gp&passive=true HTTP/1.1
> Host: accounts.google.com

Question 2

正しいカールベースのソリューション

url=https://rb.gy/x7cg8r
while redirect_url=$(
  curl -I -s -S -f -w "%{redirect_url}\n" -o /dev/null "$url"
); do
  echo "$url"
  url=$redirect_url
  [[ -z "$url" ]] && break
done

結果：

https://rb.gy/x7cg8r
https://t.co/BAvVoPyqNr
https://unix.stackexchange.com/

私より12％速いwgetベースのソリューション。

ベンチマークの詳細

cd "$(mktemp -d)"

cat <<'EOF' >curl-based-solution
#!/bin/bash
url=https://rb.gy/x7cg8r
while redirect_url=$(
  curl -I -s -S -f -w "%{redirect_url}\n" -o /dev/null "$url"
); do
  echo "$url"
  url=$redirect_url
  [[ -z "$url" ]] && break
done
EOF
chmod +x curl-based-solution

cat <<'EOF' >wget-based-solution
#!/bin/bash
url=https://rb.gy/x7cg8r
wget -S --spider "$url" 2>&1 \
 | grep -oP '^--[[:digit:]: -]{19}--  \K.*'
EOF
chmod +x wget-based-solution

hyperfine --warmup 5 ./wget-based-solution ./curl-based-solution

$ hyperfine --warmup 5 ./wget-based-solution ./curl-based-solution
Benchmark #1: ./wget-based-solution
  Time (mean ± σ):      1.397 s ±  0.025 s    [User: 90.3 ms, System: 19.7 ms]
  Range (min … max):    1.365 s …  1.456 s    10 runs
 
Benchmark #2: ./curl-based-solution
  Time (mean ± σ):      1.250 s ±  0.015 s    [User: 72.4 ms, System: 23.4 ms]
  Range (min … max):    1.229 s …  1.277 s    10 runs
 
Summary
  './curl-based-solution' ran
    1.12 ± 0.02 times faster than './wget-based-solution'

Answer

正しいカールベースのソリューション

url=https://rb.gy/x7cg8r
while redirect_url=$(
  curl -I -s -S -f -w "%{redirect_url}\n" -o /dev/null "$url"
); do
  echo "$url"
  url=$redirect_url
  [[ -z "$url" ]] && break
done

結果：

https://rb.gy/x7cg8r
https://t.co/BAvVoPyqNr
https://unix.stackexchange.com/

私より12％速いwgetベースのソリューション。

ベンチマークの詳細

cd "$(mktemp -d)"

cat <<'EOF' >curl-based-solution
#!/bin/bash
url=https://rb.gy/x7cg8r
while redirect_url=$(
  curl -I -s -S -f -w "%{redirect_url}\n" -o /dev/null "$url"
); do
  echo "$url"
  url=$redirect_url
  [[ -z "$url" ]] && break
done
EOF
chmod +x curl-based-solution

cat <<'EOF' >wget-based-solution
#!/bin/bash
url=https://rb.gy/x7cg8r
wget -S --spider "$url" 2>&1 \
 | grep -oP '^--[[:digit:]: -]{19}--  \K.*'
EOF
chmod +x wget-based-solution

hyperfine --warmup 5 ./wget-based-solution ./curl-based-solution

$ hyperfine --warmup 5 ./wget-based-solution ./curl-based-solution
Benchmark #1: ./wget-based-solution
  Time (mean ± σ):      1.397 s ±  0.025 s    [User: 90.3 ms, System: 19.7 ms]
  Range (min … max):    1.365 s …  1.456 s    10 runs
 
Benchmark #2: ./curl-based-solution
  Time (mean ± σ):      1.250 s ±  0.015 s    [User: 72.4 ms, System: 23.4 ms]
  Range (min … max):    1.229 s …  1.277 s    10 runs
 
Summary
  './curl-based-solution' ran
    1.12 ± 0.02 times faster than './wget-based-solution'

Question 3

表示するみんな最初のURLを含むリダイレクトチェーンのURLの数：

wget -S --spider https://rb.gy/x7cg8r 2>&1 \
 | grep -oP '^--[[:digit:]: -]{19}--  \K.*'

結果（Fedora Linuxでテスト）：

https://rb.gy/x7cg8r
https://t.co/BAvVoPyqNr
https://unix.stackexchange.com/

使用されたwgetオプション：

-S
--server-response

    Print the headers sent by HTTP servers and responses sent by FTP servers.

--spider

    When invoked with this option, Wget will behave as a Web spider, which
    means that it will not download the pages, just check that they are there
    ...

^{^{源泉：https://www.mankier.com/1/wget}}

これコンビネーション要求-Sの代わりに要求が発行されるように--spiderします。wgetHEADGET

使用されたGNU grepオプション：

-o
--only-matching

    Print only the matched (non-empty) parts of a matching line, with each such
    part on a separate output line.

-P
--perl-regexp

    Interpret PATTERNS as Perl-compatible regular expressions (PCREs).

^{^{源泉：https://www.mankier.com/1/grep}}

私たちが興味のある行は次のとおりです。

--2021-12-07 12:29:25--  https://rb.gy/x7cg8r

タイムスタンプは、数字、ハイフン、コロン、スペースを含む19文字で構成されています。だから[[:digit:]-: ]{19}私たちが使った場所と一致します固定数量子19.

これ\K 一致するセクションの先頭をリセット。

grepをsedに置き換える

必要に応じて、パイプラインgrepステップを次のように置き換えることができます。sed

wget -S --spider https://rb.gy/x7cg8r 2>&1 \
 | sed -En 's/^--[[:digit:]: -]{19}--  (.*)/\1/p'

以下に基づくソリューションとの比較`curl`：

Curlベースのソリューションは、リダイレクトチェーンの最初のURLを省略します。

$ curl -v -L https://rb.gy/x7cg8r 2>&1 | grep -i "^< location:"
< Location: https://t.co/BAvVoPyqNr
< location: https://unix.stackexchange.com/

さらに、2番目のパイプライン段階で送信されるバイト数が4354.99％増加しました。

$ wget -S --spider https://rb.gy/x7cg8r 2>&1 | wc -c
2728

$ curl -v -L https://rb.gy/x7cg8r 2>&1 | wc -c
121532

$ awk 'BEGIN {printf "%.2f\n", (121532-2728)/2728*100}'
4354.99

私のベンチマークでは、wgetソリューションはカールベースのソリューションよりわずかに高速でした（4％）。

修正する:バラよりカールに基づく私の答え最速のソリューションのため。

Answer

表示するみんな最初のURLを含むリダイレクトチェーンのURLの数：

wget -S --spider https://rb.gy/x7cg8r 2>&1 \
 | grep -oP '^--[[:digit:]: -]{19}--  \K.*'

結果（Fedora Linuxでテスト）：

https://rb.gy/x7cg8r
https://t.co/BAvVoPyqNr
https://unix.stackexchange.com/

使用されたwgetオプション：

-S
--server-response

    Print the headers sent by HTTP servers and responses sent by FTP servers.

--spider

    When invoked with this option, Wget will behave as a Web spider, which
    means that it will not download the pages, just check that they are there
    ...

^{^{源泉：https://www.mankier.com/1/wget}}

これコンビネーション要求-Sの代わりに要求が発行されるように--spiderします。wgetHEADGET

使用されたGNU grepオプション：

-o
--only-matching

    Print only the matched (non-empty) parts of a matching line, with each such
    part on a separate output line.

-P
--perl-regexp

    Interpret PATTERNS as Perl-compatible regular expressions (PCREs).

^{^{源泉：https://www.mankier.com/1/grep}}

私たちが興味のある行は次のとおりです。

--2021-12-07 12:29:25--  https://rb.gy/x7cg8r

タイムスタンプは、数字、ハイフン、コロン、スペースを含む19文字で構成されています。だから[[:digit:]-: ]{19}私たちが使った場所と一致します固定数量子19.

これ\K 一致するセクションの先頭をリセット。

grepをsedに置き換える

必要に応じて、パイプラインgrepステップを次のように置き換えることができます。sed

wget -S --spider https://rb.gy/x7cg8r 2>&1 \
 | sed -En 's/^--[[:digit:]: -]{19}--  (.*)/\1/p'

以下に基づくソリューションとの比較`curl`：

Curlベースのソリューションは、リダイレクトチェーンの最初のURLを省略します。

$ curl -v -L https://rb.gy/x7cg8r 2>&1 | grep -i "^< location:"
< Location: https://t.co/BAvVoPyqNr
< location: https://unix.stackexchange.com/

さらに、2番目のパイプライン段階で送信されるバイト数が4354.99％増加しました。

$ wget -S --spider https://rb.gy/x7cg8r 2>&1 | wc -c
2728

$ curl -v -L https://rb.gy/x7cg8r 2>&1 | wc -c
121532

$ awk 'BEGIN {printf "%.2f\n", (121532-2728)/2728*100}'
4354.99

私のベンチマークでは、wgetソリューションはカールベースのソリューションよりわずかに高速でした（4％）。

修正する:バラよりカールに基づく私の答え最速のソリューションのため。

Question 4

curl -vHTTPリダイレクトチェーンのすべてのURLが表示されることがあります。

$ curl -v -L https://go.usa.gov/W3H 2>&1 | grep -i "^< location:"
< location: http://hurricanes.gov/nhc_storms.shtml
< Location: https://www.hurricanes.gov/nhc_storms.shtml
< location: https://www.nhc.noaa.gov:443/nhc_storms.shtml
< location: http://www.nhc.noaa.gov/cyclones
< Location: https://www.nhc.noaa.gov/cyclones
< location: http://www.nhc.noaa.gov/cyclones/
< Location: https://www.nhc.noaa.gov/cyclones/

Answer

curl -vHTTPリダイレクトチェーンのすべてのURLが表示されることがあります。

$ curl -v -L https://go.usa.gov/W3H 2>&1 | grep -i "^< location:"
< location: http://hurricanes.gov/nhc_storms.shtml
< Location: https://www.hurricanes.gov/nhc_storms.shtml
< location: https://www.nhc.noaa.gov:443/nhc_storms.shtml
< location: http://www.nhc.noaa.gov/cyclones
< Location: https://www.nhc.noaa.gov/cyclones
< location: http://www.nhc.noaa.gov/cyclones/
< Location: https://www.nhc.noaa.gov/cyclones/

リダイレクトチェーンのすべてのURLを表示するには？

答え1

答え2

正しいカールベースのソリューション

ベンチマークの詳細

答え3

以下に基づくソリューションとの比較`curl`：

答え4

関連情報

答え1

答え2

正しいカールベースのソリューション

ベンチマークの詳細

答え3

以下に基づくソリューションとの比較curl：

答え4

関連情報

以下に基づくソリューションとの比較`curl`：