Kaptain Krayola - pie 2.0

Mangoes Support Communism
Archive for the 'Scraping' Category
PHP WordPress Blog Finder
Author: The KaptainThis little script will automate the discovery of WordPress blogs for you. It could actually be used to harvest links to anything you want from Google SERPs but for this example it’s used to find WordPress blogs. You would just need to add different search terms to the array and it will return whatever you are looking for. It currently only works with Google but support for the other engines may be added in the near future. You can specify as many search strings as you want and the script will randomly choose one each time it’s run.
The script requires that you give it a keyword to use in the search so you can find blogs that are related to the topic of your site(s).
The function will return an array of URLs that you can use however you see fit.
This script requires that you have the Snoopy Class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | include("Snoopy.class.php"); function getWordpress($keyword){ //SNOOPY! $snoopy = new Snoopy; $aGoogleSearch = array(); // add an entry for every google query you want to run to harvest the URLs $aGoogleSearch[] = "http://www.google.com/search?hl=en&q=%22You+can+leave+a+response%2C+or+trackback+from+your+own+site%22&btnG=Google+Search&start=".rand(0,900); $aGoogleSearch[] = "http://www.google.com/search?hl=en&lr=&safe=off&q=%22Leave+a+Reply%22+%22powered+by+wordpress%22&btnG=Search&start=".rand(0,900); $aGoogleSearch[] ="http://www.google.com/search?hl=en&q=%22powered+by+wordpress%22+".$keyword."&btnG=Google+Search&start=".rand(0,900); //figure out which google query to run $arraylen = count($aGoogleSearch); $getPos = rand(0,$arraylen); //snatch it up out of the array $targetURL = $aGoogleSearch[$getPos]; //hit google $page = $snoopy->fetch($targetURL); $page = $snoopy->results; //create an array to hold our links $foundLinks = array(); //parse out the links we want //<a href="http://wordpress.org/" class=l preg_match_all("/<a\s+href=(.*?)\s+class=l+(.*?)>(.*?)<\/a>/",$page,$match); for ($i = 1; $i <= 9; $i++){ $foundLinks[] = str_replace('"','',$match[1][$i]); } return $foundLinks; } |
And the useage…
1 2 3 4 5 | $stuff = getWordPress("shoes"); for ($i = 0; $i <= 10; $i++){ echo $stuff[$i]."<br />"; } |
Coming up next: WordPress Comment Poster…
read comments (1)















