bash – extract urls from xml sitemap

Command to get urls from a sitemap url:

curl -s sitemap_url | grep "<loc>" | awk -F"<loc>" '{print $2} ' | awk -F"</loc>" '{print $1}' 


To get these in a bash variable for iteration, use sub-shell command execution as shown below:
urls=$(curl -s site_map_url | grep "<loc>" | awk -F"<loc>" '{print $2} ' | awk -F"</loc>" '{print $1}')
for i in $urls 
do
  echo "$i"
done

Use case

This approac can be used to write a quick cache warming script for a site.

Share this article: share on Google+ share on facebook share on linkedin tweet this submit to reddit

Comments

Click here to write/view comments