Preparations
Complete source code of examples from book is available at separate repository at Github, together with short instruction on how to use them. Please, note that book was written & tested for Mahout 0.5 - stable release, that existed at time of publishing, and master branch in repository contains code for this version. There are also separate branches for code that was modified to work with Mahout versions 0.6 and 0.7 - they are named accordingly.To obtain code, you can use either Git, or use Github's "download source" functionality. Here are links for all existing versions: 0.5 (master), 0.6, 0.7 - download and unpack archives to some location.To work with examples, you need to have Apache Maven installed (it's better to install it from repository on Mac OS X or Linux systems). Maven is used to compile source code and to create packages. Maven project could be also imported into your favorite Java IDE - Eclipse, Netbeans, or Idea (I will explain how to use Eclipse, but for other IDEs the process is similar). To use Maven with Eclipse, you need to have m2eclipse plugin installed - it will provide import and build functionality.
To run examples from chapter 16, you'll also need to have Apache Zookeeper installed - see instructions in README file in repository - they're pretty detailed.
You also need to download Mahout distribution to run some examples (usually they involve execution of mahout script). Download file mahout-distribution-<version>.tar.gz and unpack it. You can also download file mahout-distribution-<version>-src.tar.gz, although this isn't necessary (it contains Mahout's source code).
I just want to mention, that Mahout is works best on Unix-based systems -- all examples were tested on Mac OS X & Linux. This also applied to Hadoop, so if you're using Windows, it could be better to install Linux in virtual machine and use it for all work.
Build examples
To be able to run examples, you need to build packages (jar files). From directory where source code for examples is located (you should have file pom.xml in this directory) execute following command:mvn package
It will compile source code and create packages. Compiled packages are stored in the
target directory. There are several files created:-
mia-<version>.jarcontains only examples, to run them you need to specify all dependencies; -
mia-<version>-jar-with-dependencies.jarcontains examples plus all dependencies - this jar could be run without specifying additional classpath elements; -
mia-<version>-job.jarcontains examples plus all dependencies, excluding Hadoop -- it should be used for Hadoop jobs.
Import of example's source code into Eclipse
Import of code into Eclipse is very easy - go to menu File, select Import... item, and then unfold Maven and select Existing Maven Projects from list and press Next. Eclipse will ask you where source code is located - point to directory where you unpacked examples - Eclipse will analyze pom.xml and will display string like: /pom.xml com.manning:mia:0.5:jar, you can press Finish after that.After import, project will be opened in Eclipse, and you can look into source code, modify examples if you need, and execute them (see below).
If you need, you can also import source code of Mahout itself into Eclipse, the procedure is similar, but this may work for all releases - in some cases, it will give you error that some plugins aren't covered by m2eclipse - you can select Ignore item in Quick fix menu (when you click right mouse button).
How to run examples
You can run examples either from command line, or directly from Eclipse.Run from Eclipse
To run example from Eclipse, select needed class from browser on left, click right mouse button on it, select Run as..., and from sub-menu, select Java Application.Take into account, that some classes need to have additional parameters specified - you can customize this by selecting Run configurations item from Run as... sub-menu.
For example, code from chapter 2, expects that file intro.csv is located in current directory (top of the project), while it's located together with source code, so execution without explicit configuration will lead to error. To fix this problem you need to specify that working directory for these examples is in non-default place - go to Run configurations, and select Arguments tab in dialog window. Then change Working directory parameter from Default to Other, press Workspace... button, and select src/main/java/mia/recommender/ch02 directory from tree view. After that you can press Run button, and your example will be executed without error.
Run from command-line
You can run examples from command line either by using java directly, or by using Maven's exec plugin.To run examples with java, you need to specify package with all dependencies in classpath, and specify class name to execute, like this:
java -cp target/mia-0.5-jar-with-dependencies.jar mia.recommender.ch02.IREvaluatorIntro
But to run like this, you need to have package recompiled if you did some changes. From this perspective, Maven's exec plugin is more handy - it automatically recompile changed code, and executes it without packaging everything once again. To execute you class with need to issue following command (for this example, you need to copy intro.csv file to top-level directory, or it will fail):
mvn exec:java -Dexec.mainClass="mia.recommender.ch02.IREvaluatorIntro"
If your class accepts command-line parameters, then you can specify them using exec.args parameter of plugin:
mvn exec:java -Dexec.mainClass="mia.recommender.ch02.IREvaluatorIntro" -Dexec.args="src"
Conclusion
So, I hope, that this article helped you to start with Mahout in Action examples. Most of examples should work as described here, but some requires more work, but you can find instructions for them in the README file in source code repository.If you're still having questions, then I try to answer them ;-)
13 comments:
Thanks for your post. Great writeup. This might be a silly question, but where is the output printed when you execute using maven. I am running the first example in Ch2, and am new to maven.
It will print everything to console, although it can print a lot of necessary information, like "preparing exec:java", etc. But you can make it quiet with -q option.
Although this isn't always handy - for example, if it will be error during execution, then you'll see only "Build failed" message. In this case, you'll need to re-run code without -q option to see backtrace.
P.S. for ch02 examples, you need to copy intro.csv file into top level directory, or examples won't find it.
I see it embedded in the verbose output now, thanks! I didn’t know where to look for it! :-)
Thanks. It's helpful.
where do we get the intro.csv file
it's in repository, in src/main/java/mia/recommender/ch02/intro.csv
Thank you Alex, I successfully executed some of the mia examples thanks to your blog!
However, could you please explain how mia knows where to find the mahout libraries? I can't find them anywhere in the mia eclipse project.
Also how would I go about if I wanted to create my own application using Mahout?
many thanks
Fred
Thank you Alex, I successfully executed some of the mia examples thanks to your blog!
However, could you please explain how mia knows where to find the mahout libraries? I can't find them anywhere in the mia exlipse project.
Also how would I go about if I wanted to create my own application using Mahout?
many thanks
Fred
Eclipse knows where to find Mahout libraries, because it uses information from Maven, and maven automatically downloads all necessary dependencies.
If you want to create new project, then create new Maven project, and add Mahout core library as dependency. You can use org.apache as groupId, mahout-examples as artifcatId, and 0.7 as version.
You can find more information about maven at http://www.sonatype.com/books/mvnref-book/reference/
Hi Alex,
thanks for your help! I have now set up my own maven project and I used the code from listing 2.1 just to see if I can set up a recommender myself. Maven seems to pick up all the dependencies needed to compile the code, however there seem to be some runtime dependencies that are not picked up by maven. I get the following exception at runtime:
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
Do you have any suggestions?
I really appreciate your help. Mahout seems like a powerful tool.
Fred
it looks like org.slf4j wasn't added to dependencies - can you check that you have the same dependencies as in pom.xml for MiA examples...
Hi Alex, I want to run the Chapter 6 in MiA mapper and reducers but I am not, it will be great if you can help in same.
what kind of problems do you have? All commands that are in book, were checked against hadoop...
Post a Comment