pages not indexed by Google_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-22 10:05 出处：网络

My company have Google Search running on our sites indexing all pages, as far as I know. I\'ve developed a document system that is also being indexed by Google. The pages in the system are dynamically

相关专题：.net indexing

My company have Google Search running on our sites indexing all pages, as far as I know. I've developed a document system that is also being indexed by Google. The pages in the system are dynamically generated, so I have www.mysite.com/doc.aspx?id=234, www.mysite.com/doc.aspx?id=236, etc which are indexed.开发者_C百科 The thing is that some random pages (say, www.mysite.com/doc.aspx?id=235) are not indexed for some unknown reason. Where do I look to have this resolved? Any ideas?

here is a short and very simpliefied outline on how google processes your site(s)

discovery -> crawling -> indexing -> ranking (->feedback)

discovery: is the process of google discovering the pages of your site(s), this can either be done via links in html or via an sitemap.xml (and urls in onpage javascript, rss or atom feeds, ... basically any url google can find somewhere)

crawling: the process of google fetching the content of a discovered url (and pushing newly found URLs into the discovery queue)

indexing: storing the discovered and crawled content into their database and making it searchable

ranking: matching the indexed content with a user query and - if it is important enough - return it as a visible SERP listing to the user.

feedback based on the click/no-click behavior and data collected from other sources (presumed ISDN data and google toolbar, chrome browser reports, ...) google gathers feedback about the user behavior on it's serp (and after the click).

between each and every step are a lot of quality metrics (the last step is just a quality metric collection step).
each and every step reports back to the previous steps.

so basically even if you communicate all your urls to google (i.e. via sitemap.xml) google will not necessarily crawl all of your urls or index or rank them visible.

ok, so what are the low hanging fruites to get more pages into the index (where they at least have a chance to rank for something)?

communicate exactly one URL per page (use http 301 redirects, canonical tag and clean up all links on the web)
make your site faster (huge impact)
make it lighter KB wise (nice impact, mostly because it's faster, too)
put more unique content on your pages.
prevent duplicate content
external (from other websites) links to your pages (not the total number is important, but a steady growth over time)

p.s.: just as a side-note - the crawling step is optional. even uncrawled urls (i.e. if they were blocked via robots.txt) can get indexed (and rank) - but well that's not very common

Afaik, pages are not indexed, if they are not linked to from other pages. Maybe not a single page links to the non-indexed pages?

I agree with Daniel. You need a page with a links list. Or a page with pagination listing links.

But dinamyc urls are bad for SEO, the best way is friendly url. Take a look to ISAPIRewrite or Routing.

I hope this help you.

Not all pages are indexed, the index engine simply deems some pages to be uninterresting. On our site about 80% of the pages are indexed, and that is considered to be very good for that type of site, very few sites have a higher rate.

As Daniel mentioned, having links to the page is crucial, otherwise it won't be found at all. Then the page have to have some information that is unique for that page, and preferrably a unique title, or it may be classified as a duplicate.