PHP Classes

Fixed bad URLs

Recommend this page to a friend!

      Site Map Generator  >  All threads  >  Fixed bad URLs  >  (Un) Subscribe thread alerts  
Subject:Fixed bad URLs
Summary:prevent of indexing javacript: urls
Messages:2
Author:Timo Henke
Date:2008-07-02 17:43:45
Update:2008-12-14 10:13:15
 

  1. Fixed bad URLs   Reply   Report abuse  
Picture of Timo Henke Timo Henke - 2008-07-02 17:43:45
Hi,

if the crawled webpage contains anchors with javascript: hrefs it cracks up with some less beautiful errors:

-----

PHP Warning: file_get_contents(MYDOMAIN/javascript: void(0)) [<a href='function.file-get-contents'>function.file-get-contents</a>]: failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request

-----

I fixed the function inside the classfile like this :


public function navigate() {

if( preg_match('/javascript:/si',$this->actual) ) $this->html = false;
else $this->html = file_get_contents($this->actual);

Maybe this helps

Regards

Timo

  2. Re: Fixed bad URLs   Reply   Report abuse  
Picture of Petar Benke Petar Benke - 2008-12-14 10:13:16 - In reply to message 1 from Timo Henke
I had problem when script was trying to include URLs out of main site URL into sitemap (like http://www.facebook.com/share.php?u=$url) and then parsing those pages and generating lot of inexistent pages also trying to parse... :) so I modified foreach statement in same function:

foreach($links as $link)
if(strpos($link,$this->site)==0)
$this->link($link);

Now works faster.

 
For more information send a message to info at phpclasses dot org.