regex - Use wget to crawl specific URLs -


i trying crawl links website use download manager download files.

i've tried:

wget --wait=20 --limit-rate=20k -r -p -u mozilla "www.mywebsite.com"

i can't figure out how use wget or regular expressions save desired links only!

wget offers wide variety of options fine tuning files download in recursive crawl.

here few options can interest you:

  • --accept-regex urlregex

download url matching urlregex. urlregex regular expression matched against complete url.

  • --reject-regex urlregex

ignore url matching urlregex. urlregex regular expression matched against complete url.

  • -l

tells wget follow relative links.

relative links example:

<a href="foo.gif"> <a href="foo/bar.gif"> <a href="../foo/bar.gif"> 

non relative links:

<a href="/foo.gif"> <a href="/foo/bar.gif"> <a href="http://www.server.com/foo/bar.gif"> 

references


Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

customize file_field button ruby on rails -

SoapUI on windows 10 - high DPI/4K scaling issue -