开发者

Looking for dataset to test FULLTEXT style searches on [closed]

开发者 https://www.devze.com 2023-01-04 14:35 出处:网络
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

开发者_运维知识库

Closed 7 years ago.

Improve this question

I am looking for a corpus of text to run some trial fulltext style data searches across. Either something I can download, or a system that generates it. Something a bit more random would be better e.g. 1,000,000 wikipedia articles in a format easy to insert into a 2 column database (id, text).

Any ideas or suggestions?


Project Gutenberg has 32000 books available.

Edit: As of now (17.06.16) there are 52,284 free ebooks to download as plain text file in UTF-8 in a wide variety of topics (From science to religion). Also in formats EPUB, Kindle or html format. Check here Project Gutenberg


Why not use a Wikipedia dump?


I'll throw this out there since I'm familiar with it - Prosper.com makes their member loan listings available for analysis through an XML export. The export would have about 50,000 loan requests with descriptions and over 1,000,000 member profiles (although many of those are empty).

0

精彩评论

暂无评论...
验证码 换一张
取 消