Recently I have started learning Mahout using Mahout In Action. Topic that I first choose was Un-Supervised Learning. Very first example that I found was about Clustering. I was able to code the example of K-Means Clustering and was able to compiled it successfully. Then I followed the standard procedure of creating Jar and Running Jar on Hadoop Cluster to submit a Map-Reduce Clustering Job For me. But it did not worked and I was continuously getting ClassDef Not Found Error.
After some googling I realized that we can install maven, can build the project using Maven and then we can execute it.
But I was looking for a way to execute the code for Mahout on Hadoop without using Maven. Following is the class that I coded:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import java.io.File; | |
import java.io.IOException; | |
import java.util.ArrayList; | |
import java.util.List; | |
import org.apache.hadoop.*; | |
import org.apache.hadoop.util.*; | |
import org.apache.hadoop.conf.*; | |
import org.apache.hadoop.conf.Configuration; | |
import org.apache.hadoop.fs.FileSystem; | |
import org.apache.hadoop.fs.Path; | |
import org.apache.hadoop.io.IntWritable; | |
import org.apache.hadoop.io.LongWritable; | |
import org.apache.hadoop.io.SequenceFile; | |
import org.apache.hadoop.io.Text; | |
import org.apache.mahout.clustering.WeightedVectorWritable; | |
import org.apache.mahout.clustering.kmeans.Cluster; | |
import org.apache.mahout.clustering.kmeans.KMeansDriver; | |
import org.apache.mahout.common.distance.EuclideanDistanceMeasure; | |
import org.apache.mahout.math.RandomAccessSparseVector; | |
import org.apache.mahout.math.Vector; | |
import org.apache.mahout.math.VectorWritable; | |
//public class SimpleKMeansClustering extends Configured implements Tool | |
public class SimpleKMeansClustering | |
{ | |
public static final double[][] points = { {1, 1}, {2, 1}, {1, 2}, | |
{2, 2}, {3, 3}, {8, 8}, | |
{9, 8}, {8, 9}, {9, 9}}; | |
public static void writePointsToFile(List<Vector> points, | |
String fileName, | |
FileSystem fs, | |
Configuration conf) throws IOException | |
{ | |
Path path = new Path(fileName); | |
SequenceFile.Writer writer = new SequenceFile.Writer(fs,conf,path, LongWritable.class, VectorWritable.class); | |
long recNum = 0; | |
VectorWritable vec = new VectorWritable(); | |
for (Vector point : points) | |
{ | |
vec.set(point); | |
writer.append(new LongWritable(recNum++), vec); | |
} | |
writer.close(); | |
} | |
public static List<Vector> getPoints(double[][] raw) | |
{ | |
List<Vector> points = new ArrayList<Vector>(); | |
for (int i = 0; i < raw.length; i++) | |
{ | |
double[] fr = raw[i]; | |
Vector vec = new RandomAccessSparseVector(fr.length); | |
vec.assign(fr); | |
points.add(vec); | |
} | |
return points; | |
} | |
public static void main(String args[]) throws Exception | |
{ | |
int k = 2; | |
List<Vector> vectors = getPoints(points); | |
File testData = new File("testdata"); | |
if (!testData.exists()) | |
{ | |
testData.mkdir(); | |
} | |
testData = new File("testdata/points"); | |
if (!testData.exists()) | |
{ | |
testData.mkdir(); | |
} | |
Configuration conf = new Configuration(); | |
FileSystem fs = FileSystem.get(conf); | |
writePointsToFile(vectors, "testdata/points/file1", fs, conf); | |
Path path = new Path("testdata/clusters/part-00000"); | |
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,path, Text.class, Cluster.class); | |
for (int i = 0; i < k; i++) | |
{ | |
Vector vec = vectors.get(i); | |
Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure()); | |
writer.append(new Text(cluster.getIdentifier()), cluster); | |
} | |
writer.close(); | |
KMeansDriver.run(conf, new Path("testdata/points"),new Path("testdata/clusters"),new Path("output"), new EuclideanDistanceMeasure(),0.001, 10, true,false); | |
SequenceFile.Reader reader = new SequenceFile.Reader(fs, | |
new Path("output/" + Cluster.CLUSTERED_POINTS_DIR | |
+ "/part-m-00000"), conf); | |
IntWritable key = new IntWritable(); | |
WeightedVectorWritable value = new WeightedVectorWritable(); | |
while (reader.next(key, value)) | |
{ | |
System.out.println(value.toString() + " belongs to cluster " | |
+ key.toString()); | |
} | |
reader.close(); | |
} | |
} |
Then I compiled the java file to create a java file as:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
javac -cp "/usr/lib/haoop/lib/*:/usr/lib/hadoop/*:/usr/lib/mahout/*:/usr/lib/mahout/lib/*" SimpleKMeansClustering.java |
Above command has created my class file: SimpleKMeansClustering.class
Now as I don't wanted to use Maven to build my project, I looked at the Sean Owen Comment:
Use the "job" JAR file provided by Mahout. It packages up all the dependencies. You need to add your classes to it too.
So I went to my Mahout installation directory:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> cd /usr/lib/mahout | |
> ls /usr/lib/mahout/ | |
bin examples mahout-core-0.5-cdh3u3.jar mahout-examples-0.5-cdh3u3.jar mahout-math-0.5-cdh3u3.jar mahout-utils-0.5-cdh3u3.jar | |
conf lib mahout-core-0.5-cdh3u3-job.jar mahout-examples-0.5-cdh3u3-job.jar mahout-taste-webapp-0.5-cdh3u3.war |
I copied the above file in red to the directory where my Mahout Code resides.
Then I have added my class file to our main jar file mahout-core-0.5-cdh3u3-job.jar Using following Command. And then simply executed main jar file by invoking my class SimpleKMeansClustering. My Code got executed properly in Map-Reduce Fashion and Generated Output As:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> jar uf mahout-core-0.5-cdh3u3-job.jar SimpleKMeansClustering.class | |
> hadoop jar mahout-core-0.5-cdh3u3-job.jar SimpleKMeansClustering | |
1.0: [1.000, 1.000] belongs to cluster 0 | |
1.0: [2.000, 1.000] belongs to cluster 0 | |
1.0: [1.000, 2.000] belongs to cluster 0 | |
1.0: [2.000, 2.000] belongs to cluster 0 | |
1.0: [3.000, 3.000] belongs to cluster 0 | |
1.0: [8.000, 8.000] belongs to cluster 1 | |
1.0: [9.000, 8.000] belongs to cluster 1 | |
1.0: [8.000, 9.000] belongs to cluster 1 | |
1.0: [9.000, 9.000] belongs to cluster 1 |
kırşehir
ReplyDeletekarabük
adıyaman
niğde
ordu
7BJ1