Cloud Computing for Data Analysis
-------------------------------------------------
EXTRA CREDIT EXERCISE - Exercise 11: Support Vector Machine - Instructions
_1. Create a Maven project in Eclipse for Spark
_1.1 Open Elipse IDE for Scala
_1.2 File -> New -> Project
_1.3 Select Maven -> Maven Project -> Click Next
_1.4 Enable the following checkbox - Create a simple project
_1.5 Enter Group Id: and Artifact Id: for example: svm6190
_1.6 Click finish
_2. Right click on the project folder -> Configure -> Add Scala Nature
_3. Right click on src/main/java -> refactor -> rename -> scala
_4. Add the following dependencies in pom.xml
org.apache.spark
spark-core_2.11
2.2.0
org.apache.spark
spark-mllib_2.11
2.2.0
_5. Right click on the project folder -> Build Path -> Configure Build Path -> Scala Compiler -> Enable the checkbox (USe Project Settings).
Select Scala Installation -> Latest 2.11 bundle (dynamic). Then click Apply -> ok -> ok
_6. Copy the file svmdriver.scala and SVMMultiClass into the src/main/scala folder in the project.
Code available in https://webpages.uncc.edu/aatzache/ITCS6190/Project/SVM.zip
_7. Right click on the project folder -> Run As -> maven clean
_8. Right click on the project folder -> Run As -> maven install
_9. The jar file would be generated in the target folder in the Project.
_10. Get the .jar file.
_11. Create a cluster with Hadoop and Spark in AWS and start the cluster.
Once the cluster is running, log-in to the master node using Putty(Windows) or SSH(MAC or Linux)
_12. Create a data bucket in AWS S3. Upload the Car Data .jar files to S3
_13. From the master node download .jar using the command:
aws s3 cp s3://BUCKET_NAME/JAR_NAME.jar .
_14. Run the .jar file using your terminal or Putty using following command:
spark-submit --class svmdriver --master yarn --deploy-mode client s3://BUCKET_NAME/data.txt
_15. Copy the Ouput - Confusion Matrix, Accuracy and Precision, Recall, F-measure metrics from the Terminal to a text file. Name the text file as Output.txt
Save the terminal command window text.
SUBMIT the Output.txt, and the terminal command window text file on Canvas.
* Delete/Terminate the AWS cluster and delete all files from S3 when finished, otherwise Amazon will charge your Credit Card