PHP Classes

Fixed bad URLs

Recommend this page to a friend!

      Site Map Generator  >  All threads  >  Fixed bad URLs  >  (Un) Subscribe thread alerts  
Subject:Fixed bad URLs
Summary:prevent of indexing javacript: urls
Author:Timo Henke
Date:2008-07-02 17:43:45
Update:2008-12-14 10:13:15

  1. Fixed bad URLs   Reply   Report abuse  
Picture of Timo Henke Timo Henke - 2008-07-02 17:43:45

if the crawled webpage contains anchors with javascript: hrefs it cracks up with some less beautiful errors:


PHP Warning: file_get_contents(MYDOMAIN/javascript: void(0)) [<a href='function.file-get-contents'>function.file-get-contents</a>]: failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request


I fixed the function inside the classfile like this :

public function navigate() {

if( preg_match('/javascript:/si',$this->actual) ) $this->html = false;
else $this->html = file_get_contents($this->actual);

Maybe this helps



  2. Re: Fixed bad URLs   Reply   Report abuse  
Picture of Petar Benke Petar Benke - 2008-12-14 10:13:16 - In reply to message 1 from Timo Henke
I had problem when script was trying to include URLs out of main site URL into sitemap (like$url) and then parsing those pages and generating lot of inexistent pages also trying to parse... :) so I modified foreach statement in same function:

foreach($links as $link)

Now works faster.

For more information send a message to info at phpclasses dot org.