开发者

How to use Mahout in a Windows environment?

开发者 https://www.devze.com 2022-12-28 19:08 出处:网络
I am try开发者_高级运维ing to use Mahout in an application running on Windows. I want to build clusters from a lucene index using k-means.

I am try开发者_高级运维ing to use Mahout in an application running on Windows. I want to build clusters from a lucene index using k-means.

As soon as I have to create sequence files (creating vectors from a lucene index), I get a Hadoop-Exception, since Hadoop makes command line calls to programs unknown in a Windows environment (e.g. chmod). Running in Cygwin is not an option, since I want to be able to run the App from eclipse.

So my question is

  • is there a way to avoid having to create sequence files to retrieve my vectors from a lucene index?
  • or is there a way to create sequence files in a Windows environment?

  • The only way you can run Hadoop on a Windows environment is to install Cygwin. For more info, see this blog post:

    http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/

    Cygwin will provide all the command-line utilities (like chmod) that Hadoop relies on. You can still run your Hadoop jobs from within Eclipse if you want.


    Do you know the SequenceFile API? Have a look here: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html You can try to write/read the data by yourself.

    I think you can run Mahout from eclipse in Windowns in stand-alone mode. But you will appear several short comings and barriers. You should try how far you come.

    In my opinion you shouldn't insist on running mahout from eclipse. ;-)


    You can use a virtual machine to run you Hadoop environment. As for me, the best solution is using http://hortonworks.com/ project. Everything works pretty.

    0

    精彩评论

    暂无评论...
    验证码 换一张
    取 消

    关注公众号