{"id":965,"date":"2014-12-01T06:29:40","date_gmt":"2014-12-01T06:29:40","guid":{"rendered":"http:\/\/invisiblezero.net\/?p=965"},"modified":"2024-03-11T19:31:08","modified_gmt":"2024-03-11T19:31:08","slug":"php-crawl-websites-from-command-line-interface","status":"publish","type":"post","link":"http:\/\/ndthanh.com\/php-crawl-websites-from-command-line-interface\/","title":{"rendered":"PHP – Crawl websites from command line interface"},"content":{"rendered":"

Recently, i wrote a new crawler script to warn caches on some Magento websites. Today i’d like to share it with you, because i wrote it in a way that works with many websites other than Magento and many platforms.<\/p>\n

You can see the help content by running the crawler in command interface like below, make sure there is no sitemap.xml file or you have -help option as parameter in your command line.<\/p>\n

<\/p>\n

\nphp -f iz_crawler.php\nUsage:  php -f crawler.php -- [options]\n\n  -sitemap &amp;lt;list of files&amp;gt;     List of sitemap xml files, delimit by semicolon ; . Default is 'sitemap.xml'\n  -website &amp;lt;website&amp;gt;           Website url for input. Will be ignored if -sitemap option selected or there is sitemap.xml file in the same directory with this crawler\n  -depth &amp;lt;number&amp;gt;              Set depth level. Default is 0\n  -interval &amp;lt;number&amp;gt;           Set scrap interval, measure in second(s). Defalt is 0\n  -exclude &amp;lt;extensions&amp;gt;        Exclude link extensions like png, css, js, etc... delimit by semicolon ; . Default is &amp;quot;jpg;png;jpeg;pdf;7z;zip;rar;mp3;aac;mp4;apk;bat;tar;swf;iso&amp;quot;\n  -verbose                     Display crawler output. Default is false\n  -help                        This help\n\n  Note: sitemap.xml default location is at root, and it will add initial urls for crawler, use -depth to make most use of sitemap.xml\n\n  Example : php -f crawler.php  -- -website http:\/\/www.google.com -depth 1 -interval 0.5 -verbose -exclude &amp;quot;png;pdf;html&amp;quot;\n<\/pre>\n

Because you can figure out a lot from the help content, so i will only show you how it looks here. i placed iz_crawler.php at the root directory of my website and execute this command “php -f iz_crawler.php — -verbose” :<\/p>\n

\nphp -f iz_crawler.php -- -verbose\n
Home<\/a><\/blockquote>