ユーザーが特定の単語を検索し、その単語を含むすべてのウェブサイトを表示するJSONページからURLをどのように取得できますか?

ユーザーが特定の単語を検索し、その単語を含むすべてのウェブサイトを表示するJSONページからURLをどのように取得できますか?

特定のWebページの現在のURLを返すbashシェルのスクリプトを使用しようとしています...私が持っているのはすべてのURLを返すスクリプトですが、必要なリンクをコードに入れる必要があります。ユーザーが単語を入力し、その単語を含むすべてのURLを返すようにします。これ./reddit.sh Linuxにより、その単語を含むURLが表示されます。これは私のコードです。

wget -qO- http://reddit.com/ | grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" | sort | unique

答え1

完璧なソリューション:

使用されたもの:強く打つ得るxmllintsedタイプ

reddit.shスクリプト:

#!/bin/bash

search_word="$1"

wget -qO - --follow-tags=a "http://reddit.com/search?q=${search_word}" \
|  xmllint --html --xpath '//a[contains(@href, "'"${search_word}"'")]' - 2>/dev/null \
| sed 's/<\/a>/&\n/g' | sort -u

使用法:

$ bash reddit.sh linux

出力(短縮):

<a href="https://fossbytes.com/firefox-quantum-57-is-here-to-kill-google-chrome-download-for-windows-mac-linux/" class="search-link may-blank">https://fossbytes.com/firefox-quantum-57-is-here-to-kill-google-chrome-download-for-windows-mac-linux/</a>
<a href="https://www.change.org/p/lenovo-demand-that-lenovo-provide-bios-update-to-enable-linux-installation">https://www.change.org/p/lenovo-demand-that-lenovo-provide-bios-update-to-enable-linux-installation</a>
<a href="https://www.gamingonlinux.com/articles/atari-are-launching-a-new-gaming-system-the-ataribox-and-it-runs-linux.10418" class="search-link may-blank">https://www.gamingonlinux.com/articles/atari-are-launching-a-new-gaming-system-the-ataribox-and-it-runs-linux.10418</a>
<a href="https://www.reddit.com/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" data-inbound-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=1" data-href-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" class="search-comments may-blank">2,315 comments</a>
<a href="https://www.reddit.com/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" data-inbound-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=1" data-href-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" class="search-title may-blank">Every time I try out linux</a>
<a href="https://www.reddit.com/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" data-inbound-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=14" data-href-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" class="search-comments may-blank">269 comments</a>
<a href="https://www.reddit.com/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" data-inbound-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=14" data-href-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" class="search-title may-blank">20170825: Happy Birthday Linux</a>
...

追加のテストケースを見るには、以下を検索してくださいpython

$ bash reddit.sh python

出力(短縮):

<a href="https://developers.slashdot.org/story/17/12/15/1133217/microsoft-considers-adding-python-as-an-official-scripting-language-in-excel" class="search-link may-blank">https://developers.slashdot.org/story/17/12/15/1133217/microsoft-considers-adding-python-as-an-official-scripting-language-in-excel</a>
<a href="https://www.reddit.com/r/ATBGE/comments/7bjnxs/check_out_this_python/" data-inbound-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=7" data-href-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/" class="search-comments may-blank">302 comments</a>
<a href="https://www.reddit.com/r/ATBGE/comments/7bjnxs/check_out_this_python/" data-inbound-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=7" data-href-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/" class="search-title may-blank">Check out this python!</a>
<a href="https://www.reddit.com/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" data-inbound-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=8" data-href-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" class="search-comments may-blank">1,364 comments</a>
<a href="https://www.reddit.com/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" data-inbound-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=8" data-href-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" class="search-title may-blank">Monty Python Life Of Brian is still relevant today</a>
...

答え2

似たようなことを試してみましたか?

target="reddit"; wget -qO- http://reddit.com/ | grep -Po "http.*?(?=\")" | grep -i $target | sort | uniq

編集する:@RomanPerekhrestと同じ行に沿って拡張

target="linux"; wget -qO- "http://reddit.com/search?q=${target}" | grep -Po "http.*?(?=\")" | grep $target | sort -u

編集編集@nxnevに関する複数の単語

target="arch linux"; url="http://reddit.com/search?q=$target"; search=$(echo $target | sed 's/ /|/'); wget -qO- "$url" | grep -Po "http.*?(?=\")" | grep -Eh "$search" | sort -u

答え3

Redditの検索結果(URLのみ)を表示してAPIを使用したくない場合は、次のようにします。

reddit() {
  local 'search_term' 'user_agent'
  user_agent='your_user_agent'
  for search_term; do
    curl \
      --data-urlencode "q=${search_term}" \
      --get \
      --header "User-Agent: ${user_agent}" \
      --silent \
      "https://www.reddit.com/search" \
    | grep -P -o -e '<a [^>]*? class="search-title may-blank" >.*?<\/a>' \
    | grep -P -o -e '(?<=href=")(.*?)(?=")' \
    | tail -n '+4'
  done
}

例:

$ reddit 'arch linux'
https://www.reddit.com/r/linux/comments/6pepav/someone_got_offended_by_a_hostname_of_an/
https://www.reddit.com/r/linux/comments/6g6xsu/the_arch_linux_wiki_is_awesome_and_i_would_like/
https://www.reddit.com/r/linuxmasterrace/comments/7ikqxs/my_new_macbook_pro_has_been_made_glorious_by_the/
https://www.reddit.com/r/linux/comments/5sx15b/arch_linux_pulls_the_plug_on_32bit/
https://www.reddit.com/r/archlinux/comments/7a4sgv/almost_no_one_on_campus_got_it_but_i_dressed_up/
https://www.reddit.com/r/archlinux/comments/7blg7w/arch_linux_news_the_end_of_i686_support/
https://www.reddit.com/r/archlinux/comments/7g53jg/here_is_a_screenshot_of_a_music_player_ive_been/
https://www.reddit.com/r/thinkpad/comments/7k704w/my_beloved_x1_carbon_5th_gen_running_arch_linux/
https://www.reddit.com/r/pcmasterrace/comments/39hl6h/im_thoroughly_enjoying_arch_linux_60fps/
https://www.reddit.com/r/linux/comments/3qsmk4/twitch_installs_arch_linux_similar_to_twitch/
https://www.reddit.com/r/linuxmasterrace/comments/7aai76/i_am_using_archlinux/
https://www.reddit.com/r/archlinux/comments/7j2zhl/fully_encrypted_archlinux_with_secure_boot_on/
https://www.reddit.com/r/linuxmasterrace/comments/5dbgku/my_experience_with_arch_linux_so_far/
https://www.reddit.com/r/linux/comments/4m0r93/why_did_archlinux_embrace_systemd/
https://www.reddit.com/r/unixporn/comments/7iss7b/xfce_arch_linux_satisfaction/
https://www.reddit.com/r/archlinux/comments/5ndu7r/my_manual_to_install_arch_linux_the_minimal_way_i/
https://www.reddit.com/r/archlinux/comments/73g3vz/librem_5_will_support_arch_linux/
https://www.reddit.com/r/haskell/comments/7jyie0/the_arch_linux_community_does_not_look_very_about/
https://www.reddit.com/r/linux_gaming/comments/4xep1o/no_mans_sky_running_on_wine_in_64_bit_arch_linux/
https://www.reddit.com/r/archlinux/comments/7bjp8j/hexadecimal_arch_linux_calendar_for_2018/
https://www.reddit.com/r/linux/comments/3r1mdv/twitch_installs_arch_linux_lasts_only_a_few_hours/
https://www.reddit.com/r/archlinux/comments/7hfb9m/farch_functional_arch_linux_system_management/

関連情報