开发者

FAST Search for Sharepoint Crawler issue with Dokuwiki pages

开发者 https://www.devze.com 2023-03-15 04:59 出处:网络
My level of frustion is maxxing out over crawling Dokuwiki sites. I have a content source using FAST search for SharePoint that i have set up to crawl a dokuwiki/doku.php site. My crawler rules are s

My level of frustion is maxxing out over crawling Dokuwiki sites.

I have a content source using FAST search for SharePoint that i have set up to crawl a dokuwiki/doku.php site. My crawler rules are set to: http://servername/* , match case and include all items in this path with crawl complex urls.. testing the content source in the crawl rules shows that it will be crawled by the crawler. However..... The crawl always last for under 2 minutes and completes having only crawled the page I pointed to and no other link on that page. I have check with the Dokuwki admin and he has the robots text set to allow. when I look at the source on the pages I see that it says meta name="robots" content="index,follow"

so in order to test that the other linked pages were not a problem, I added those links to the content souce manually and recrawled.. examp开发者_Python百科le source page has three links

  • site A
  • site B
  • site C.

I added Site A,B and C urls to the crawl source. The results of this crawl are 4 successes, the primary souce page and the other links A,B, and C i manually added.

So my question is why wont the crawler crawl the link on the page? is this something I need to do with the crawler on my end or is it something to do with how namespaces are defined and links constructed with Dokuwiki?

Any help would be appreciated

Eric


Did you disable the delayed indexing options and rel=nofollow options?


The issue was around authentication even though no issues were reported suggesting it was authentication in the FAST Crawl Logs. The fix was adding a $freepass setting for the IP address of the Search indexing server so that Appache would not go through the authentication process for each page hit.

Thanks for the reply

Eric

0

精彩评论

暂无评论...
验证码 换一张
取 消