sed/awk を使用して URL からホスト名を削除する

Question 1

そしてperl：

perl -pe 's|^([^/:]+:)?//[^/]*||' < your-file

代替文字が削除され（およびhttp://host/path両方を処理するために//host/path）//、その文字を除くすべての文字が削除されます（たとえば、および/削除されますhost）。user:password@host:8080ftp://user:password@host:8080/pub

これに対応する内容はsed次のとおりです。

LC_ALL=C sed 's|^\([^/:]\{1,\}:\)\{0,1\}//[^/]*||' < your-file

とにかく、s/pattern/replacement/and演算子はsed正規perl表現をパターンとして使用します。基本正規表現のためにsed、パール正規表現perl（これは改善され拡張されます。拡張正規表現今日、多くの実装sedでもこのオプションをサポートしています。-E

URIを構造化オブジェクトに解析するURIモジュールもあります。perl

perl -MURI -lpe '$_ = URI->new($_)->path' < your-file

クエリ文字列（のようにhttp://host/path?query）とフラグメント（のように）http://host/file.html#anchorがある場合は、それを削除します。クエリを含めるには（存在する場合）->pathに置き換えます。->path_query

Answer

そしてperl：

perl -pe 's|^([^/:]+:)?//[^/]*||' < your-file

代替文字が削除され（およびhttp://host/path両方を処理するために//host/path）//、その文字を除くすべての文字が削除されます（たとえば、および/削除されますhost）。user:password@host:8080ftp://user:password@host:8080/pub

これに対応する内容はsed次のとおりです。

LC_ALL=C sed 's|^\([^/:]\{1,\}:\)\{0,1\}//[^/]*||' < your-file

とにかく、s/pattern/replacement/and演算子はsed正規perl表現をパターンとして使用します。基本正規表現のためにsed、パール正規表現perl（これは改善され拡張されます。拡張正規表現今日、多くの実装sedでもこのオプションをサポートしています。-E

URIを構造化オブジェクトに解析するURIモジュールもあります。perl

perl -MURI -lpe '$_ = URI->new($_)->path' < your-file

クエリ文字列（のようにhttp://host/path?query）とフラグメント（のように）http://host/file.html#anchorがある場合は、それを削除します。クエリを含めるには（存在する場合）->pathに置き換えます。->path_query

Question 2

これはLinux coreutilsを使用して簡単に実行できます。

cut -d '/' -f 3- somefilewithyoururls.txt | sed 's/^/\//'

3番目以降のすべての内容を/切り取り、行の先頭を/。複雑な正規表現は必要ありません。

Answer

これはLinux coreutilsを使用して簡単に実行できます。

cut -d '/' -f 3- somefilewithyoururls.txt | sed 's/^/\//'

3番目以降のすべての内容を/切り取り、行の先頭を/。複雑な正規表現は必要ありません。

Question 3

すべてのsedを使用してください：

$ sed 's:[^/]*//[^/]*::' file
/
/
/blog/
/blog/
/blog/
/blog/
/blog/
/blog/
/cases/page/4/
/cdn-cgi/challenge-platform/h/g/cv/result/7c9123dc38da6841
/cdn-cgi/challenge-platform/h/g/scripts/jsd/7fe83wdcs/invisible.js
/cdn-cgi/challenge-platform/h/g/scripts/jsd/7fe83wdcs/invisible.js
/cdn-cgi/challenge-platform/h/g/scripts/jsd/7fe83wdcs/invisible.js

Answer

すべてのsedを使用してください：

$ sed 's:[^/]*//[^/]*::' file
/
/
/blog/
/blog/
/blog/
/blog/
/blog/
/blog/
/cases/page/4/
/cdn-cgi/challenge-platform/h/g/cv/result/7c9123dc38da6841
/cdn-cgi/challenge-platform/h/g/scripts/jsd/7fe83wdcs/invisible.js
/cdn-cgi/challenge-platform/h/g/scripts/jsd/7fe83wdcs/invisible.js
/cdn-cgi/challenge-platform/h/g/scripts/jsd/7fe83wdcs/invisible.js

Question 4

使用幸せ（以前のPerl_6）

~$ raku -MURL -ne 'my $url = URL.new($_); put "/" ~ .path.join("/") for $url;'  file

出力例：

/
/
/blog
/blog
/blog
/blog
/blog
/blog
/cases/page/4
/cdn-cgi/challenge-platform/h/g/cv/result/7c9123dc38da6841
/cdn-cgi/challenge-platform/h/g/scripts/jsd/7fe83wdcs/invisible.js
/cdn-cgi/challenge-platform/h/g/scripts/jsd/7fe83wdcs/invisible.js
/cdn-cgi/challenge-platform/h/g/scripts/jsd/7fe83wdcs/invisible.js

Rakuの場合、URLモジュールをロードするとURLでユーザー名/パスワードを処理できるため、おそらく最もきちんとした答えです。上記で識別されたpath要素の前にはスラッシュが続き、joinその後には/スラッシュが続き、次にアウトが続きますput。

上記のコードを単純化すると、どの要素が認識されるのかがわかります。

~$ raku -MURL -ne 'my $url = URL.new($_); .raku.put for $url;'  file
URL.new(scheme => "http", username => Str, password => Str, hostname => "www.example.com", port => Int, path => [], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "www.example.com", port => Int, path => [], query => {}, fragment => Str)
URL.new(scheme => "http", username => Str, password => Str, hostname => "example.com", port => Int, path => ["blog"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "example.com", port => Int, path => ["blog"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "www.example.co.uk", port => Int, path => ["blog"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "example.co.uk", port => Int, path => ["blog"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "sub.example.co.uk", port => Int, path => ["blog"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "www.example.com", port => Int, path => ["blog"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "www.example.com", port => Int, path => ["cases", "page", "4"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "www.example.com", port => Int, path => ["cdn-cgi", "challenge-platform", "h", "g", "cv", "result", "7c9123dc38da6841"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "www.example.com", port => Int, path => ["cdn-cgi", "challenge-platform", "h", "g", "scripts", "jsd", "7fe83wdcs", "invisible.js"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "www.example.co.uk", port => Int, path => ["cdn-cgi", "challenge-platform", "h", "g", "scripts", "jsd", "7fe83wdcs", "invisible.js"], query => {}, fragment => Str)
URL.new(scheme => "https", username => Str, password => Str, hostname => "sub.example.co.uk", port => Int, path => ["cdn-cgi", "challenge-platform", "h", "g", "scripts", "jsd", "7fe83wdcs", "invisible.js"], query => {}, fragment => Str)

正規表現を使用してURLを解析する勇気がある場合（悪意を持って操作されたデータがないと確信していますか？）、@Stéphane_Chazelasが投稿したPerlの回答をかなり直接翻訳したものは次のとおりです。

~$ raku -pe 's|^ ( <-[/:]>+ \: )? \/ \/ <-[/]>* ||;'  < file

https://raku.land/cpan:TYIL/URL
https://raku.org

Answer