PreparationsComplete source code of examples from book is available at separate repository at Github, together with short instruction on how to use them. Please, note that book was written & tested for Mahout 0.5 - stable release, that existed at time of publishing, and master branch in repository contains code for this version. There are also separate branches for code that was modified to work with Mahout versions 0.6 and 0.7 - they are named accordingly.To obtain code, you can use either Git, or use Github's "download source" functionality. Here are links for all existing versions: 0.5 (master), 0.6, 0.7 - download and unpack archives to some location.
To work with examples, you need to have Apache Maven installed (it's better to install it from repository on Mac OS X or Linux systems). Maven is used to compile source code and to create packages. Maven project could be also imported into your favorite Java IDE - Eclipse, Netbeans, or Idea (I will explain how to use Eclipse, but for other IDEs the process is similar). To use Maven with Eclipse, you need to have m2eclipse plugin installed - it will provide import and build functionality.
To run examples from chapter 16, you'll also need to have Apache Zookeeper installed - see instructions in README file in repository - they're pretty detailed.
You also need to download Mahout distribution to run some examples (usually they involve execution of mahout script). Download file mahout-distribution-<version>.tar.gz and unpack it. You can also download file mahout-distribution-<version>-src.tar.gz, although this isn't necessary (it contains Mahout's source code).
I just want to mention, that Mahout is works best on Unix-based systems -- all examples were tested on Mac OS X & Linux. This also applied to Hadoop, so if you're using Windows, it could be better to install Linux in virtual machine and use it for all work.
Build examplesTo be able to run examples, you need to build packages (jar files). From directory where source code for examples is located (you should have file pom.xml in this directory) execute following command:
It will compile source code and create packages. Compiled packages are stored in the
targetdirectory. There are several files created:
mia-<version>.jarcontains only examples, to run them you need to specify all dependencies;
mia-<version>-jar-with-dependencies.jarcontains examples plus all dependencies - this jar could be run without specifying additional classpath elements;
mia-<version>-job.jarcontains examples plus all dependencies, excluding Hadoop -- it should be used for Hadoop jobs.
Import of example's source code into EclipseImport of code into Eclipse is very easy - go to menu File, select Import... item, and then unfold Maven and select Existing Maven Projects from list and press Next. Eclipse will ask you where source code is located - point to directory where you unpacked examples - Eclipse will analyze pom.xml and will display string like: /pom.xml com.manning:mia:0.5:jar, you can press Finish after that.
After import, project will be opened in Eclipse, and you can look into source code, modify examples if you need, and execute them (see below).
If you need, you can also import source code of Mahout itself into Eclipse, the procedure is similar, but this may work for all releases - in some cases, it will give you error that some plugins aren't covered by m2eclipse - you can select Ignore item in Quick fix menu (when you click right mouse button).
How to run examplesYou can run examples either from command line, or directly from Eclipse.
Run from EclipseTo run example from Eclipse, select needed class from browser on left, click right mouse button on it, select Run as..., and from sub-menu, select Java Application.
Take into account, that some classes need to have additional parameters specified - you can customize this by selecting Run configurations item from Run as... sub-menu.
For example, code from chapter 2, expects that file intro.csv is located in current directory (top of the project), while it's located together with source code, so execution without explicit configuration will lead to error. To fix this problem you need to specify that working directory for these examples is in non-default place - go to Run configurations, and select Arguments tab in dialog window. Then change Working directory parameter from Default to Other, press Workspace... button, and select src/main/java/mia/recommender/ch02 directory from tree view. After that you can press Run button, and your example will be executed without error.
Run from command-lineYou can run examples from command line either by using java directly, or by using Maven's exec plugin.
To run examples with java, you need to specify package with all dependencies in classpath, and specify class name to execute, like this:
java -cp target/mia-0.5-jar-with-dependencies.jar mia.recommender.ch02.IREvaluatorIntro
But to run like this, you need to have package recompiled if you did some changes. From this perspective, Maven's exec plugin is more handy - it automatically recompile changed code, and executes it without packaging everything once again. To execute you class with need to issue following command (for this example, you need to copy intro.csv file to top-level directory, or it will fail):
mvn exec:java -Dexec.mainClass="mia.recommender.ch02.IREvaluatorIntro"
If your class accepts command-line parameters, then you can specify them using exec.args parameter of plugin:
mvn exec:java -Dexec.mainClass="mia.recommender.ch02.IREvaluatorIntro" -Dexec.args="src"
ConclusionSo, I hope, that this article helped you to start with Mahout in Action examples. Most of examples should work as described here, but some requires more work, but you can find instructions for them in the README file in source code repository.
If you're still having questions, then I try to answer them ;-)