Command to get urls from a sitemap url:
curl -s sitemap_url | grep "<loc>" | awk -F"<loc>" '{print $2} ' | awk -F"</loc>" '{print $1}'
To get these in a bash variable for iteration, use sub-shell command execution as shown below:
urls=$(curl -s site_map_url | grep "<loc>" | awk -F"<loc>" '{print $2} ' | awk -F"</loc>" '{print $1}') for i in $urls do echo "$i" done
Use case
This approac can be used to write a quick cache warming script for a site.