Apache Mahout is an Apache project to produce free implementations of distributed or otherwise scalable machine learning algorithms on the Hadoop platform.
The MAHOUT-DISTRIBUTION-0.4-SRC
installation is done in below
versions of Linux, Java and Hadoop respectively.
UBUNTU 12.04 LTS
JAVA 1.7.0_09
HADOOP 1.1.0
Install maven using below command.
apt-get install
maven2
After installation
check for maven version using command mvn –version
Set the Maven environment variables
like below.
export
MAVEN_HOME="/usr/share/maven2"
export
PATH=$PATH:$MAVEN_HOME/bin
I have hduser as a dedicated hadoop system
user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going
to install mahout in /home/hduser folder. Change the directory to the hduser
and execute below commands.
Download the
Mahout from below URL using wget.
wget
http://apache.techartifact.com/mirror/mahout/0.4/mahout-distribution-0.4-src.tar.gz
[Out of so many
zipped files in there download the .src zipped file]
Unzip the tar file.
sudo tar xzf mahout-distribution-0.4-src.tar.gz
Change the name to Mahout.
sudo
mv mahout-distribution-0.4-src
mahout
Now
go to Mahout folder and execute bellow command.
mvn install
That’s it, now we can
run the Mahout examples.
After every example you have to clear the
output directory and tmp directory. Otherwise it give error like Output directory temp/itemIDIndex already exists.
For
me those directories are
/home/hduser/output [Specified in --output option]
/home/hduser/temp [Specified in hadoop.tmp.dir of core-site.xml file]
No comments:
Post a Comment