开发者

Where can I get raw news articles from the last year?

开发者 https://www.devze.com 2022-12-22 01:46 出处:网络
I\'m writing some code that calculates certain statistics about word usages. Does anyone know where I can find a database of raw news articles from various topics over a period of (say) the last yea

I'm writing some code that calculates certain statistics about word usages.

Does anyone know where I can find a database of raw news articles from various topics over a period of (say) the last year? Preferably they would be either in plain te开发者_JS百科xt format or XML. Trying to scrape content from random web sites isn't a good option.

I know going forward I could probably archive them myself. However, I need to kick start the process with a bunch of existing articles... the more the merrier.

Any other ideas for corpus data-sets that are easily available in simple to parse form would also be appreciated.


You might try the Internet Archive. They have a text section but I don't know if it has news. You might also be able to use their Wayback machine to pull up news articles from major site using their RSS feeds.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号