
单机版Spark安装
到官网http://archive.apache.org/dist/spark/ 下选择版本下载spark包,本次采用的是预编译版本

选择spark-1.6.1-bin-hadoop2.6.tgz

解压到/opt/目录下
tar -xvzf spark-1.6.1-bin-hadoop2.6.tgz -C /opt/
[root@hdp spark-1.6.1-bin-hadoop2.6]# cd /opt/
[root@hdp opt]# ls
rh spark-1.6.1-bin-hadoop2.6
[root@hdp opt]#
配置环境变量
vim /etc/profile
添加SPARK_HOME信息
export JAVA_HOME=/usr/local/java/jdk1.8.0_211 #jdk安装目录
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export SPARK_HOME=/opt/spark-1.6.1-bin-hadoop2.6
export PATH=$PATH:${JAVA_PATH}:${SPARK_HOME}/bin
刷新环境变量
source /etc/profile
spark-shell exit退出
pyspark(或者运行pyspark exit()退出)

到此安装成功
下面配置PySpark
配置PySpark
拷贝spark安装目录下的Python下的pyspark到Python的site-packages下
[root@hdp python]# pwd
/opt/spark-1.6.1-bin-hadoop2.6/python
[root@hdp python]# ls
docs lib pyspark run-tests run-tests.py test_support
[root@hdp python]# python --version
Python 2.6.6
[root@hdp python]# whereis python
python: /usr/bin/python /usr/bin/python2.6 /usr/lib/python2.6 /usr/lib64/python2.6 /usr/include/python2.6 /usr/share/man/man1/python.1.gz
[root@hdp python]# cp -r pyspark /usr/lib/python2.6/site-packages/
[root@hdp python]#
最后pip install py4j就完成python开发pyspark程序配置了,快用jupyter等应用试试吧
Spark学习样例
初学者可以学习spark的代码样例开发,java, python, r, scala版本都有
[root@hdp python]# pwd
/opt/spark-1.6.1-bin-hadoop2.6/examples/src/main/python
[root@hdp python]# ls
als.py hbase_outputformat.py pagerank.py status_api_demo.py
avro_inputformat.py kmeans.py parquet_inputformat.py streaming
cassandra_inputformat.py logistic_regression.py pi.py transitive_closure.py
cassandra_outputformat.py ml sort.py wordcount.py
hbase_inputformat.py mllib sql.py