I'm migrating a complex old website to a new one coded with codeigniter and i'm facing a lots of rewriting url problems leading to duplicated content because of the way that the codeigniter's routes config works.
I've old urls like this:
- /detail.php?id=ABCDE&lang=en&page=2
- /detail/ABCDE/en/2
The new site instead have seo friendly urls like:
- /en/products/hard-disks-2.html
In my routes config i've:
- $route['(:any)/(:any)/(:any)'] = 'controller/$1/$2/$3';
- $url_suffix is '.html'
This is leading to duplicated content because:
- /en/products/hard-disks-2
- /en/products/hard-disks-2.html
- /en/products/hard-disks-2.html?p=2
- /en/products/hard-disks-2?p=2
- /en/products/hard-disks-2.html/
- /en/products/hard-disks-2.html/.html
all of the above are valid routes for codei开发者_如何转开发gniter and this lead for duplicated content within the website.
Is there a way to avoid this? Maybe using regular expression?
I cannot solve this problem with .htaccess because the website has too many possibile combinaton of the urls and i've also some controller where i still need to use "get" parameters.
I finally figure out how do not have duplicate urls parsing.
First of all in config.php remove the suffix, better never user it: $config['url_suffix'] = '';
Then in routes.php never use wildcards and always uses regular expression.
I.e, if i use: $route['(:any)/(:num)'] = 'homepage/parser/$1/$2'; this will work for all the following urls:
/a/10
/a/10/11
/a/10/11/12
and so on!
Instead:
$route['([\w_-]+)/(\d+)'] = 'homepage/parser/$1/$2';
this only work for
/a/10
and:
$route['([\w_-]+).html'] = 'homepage/parser/$1';
will only work if you URLs really end in .html
Unlucky /a/10.html/ is still a duplicate, so, i need at least one .htaccess rule to remove trailing slashes from URLs
I really need unique URLs so i think i'm dropping any future codeigniter development for this project where i've mixed url: 1) .html 2) directories 3) old dynamic urls
Instead i figure out that for SEO purpouse probably is the best to: - only use pages without extensions - avoid any directories
So if this is the case (another project of mine), i just use plain URLs in my code and regular expressions in routes.php.
The only issues is the trailing slash duplicate problem but this can be avoided globally with this .htaccess from this other solution: Remove trailing slash using .htaccess except for home / landing page
精彩评论