Added support for running hadoop in single node mode. Changed README accordingly

This commit is contained in:
Giannis Mouchakis 2016-03-10 02:40:42 +02:00
parent 6634ef00fb
commit 2b3b4b1205
2 changed files with 30 additions and 13 deletions

View File

@ -1,27 +1,42 @@
This is a Hadoop cluster running in docker containers. The namenode and datanodes run in different containers. # Hadoop Docker
The cluster by default uses data replication "2". To change it edit the hdfs-site.xml file. This repository provides Hadoop in Docker containers. You can either run Hadoop in a single node or create a cluster.
The deployed Hadoop uses data replication "2". To change it edit the hdfs-site.xml file.
All data are stored in /hdfs-data, so to store data in a host directory run the container using "-v /path/to/host:/hdfs-data".
By default the container formats the namenode directory only if not exists (hdfs namenode -format -nonInteractive).
If you want to mount an external directory that already contains a namenode directory and format it you have to first delete it manually.
## Single node mode
To deploy a single Hadoop node run
docker run -h namenode bde2020/hadoop-base
To store data in a host directory run the container as as
docker run -h namenode -v /path/to/host:/hdfs-data bde2020/hadoop-base
## Cluster mode
The namenode runs in a seperate container than the datanodes.
To start the namenode run To start the namenode run
docker run --name namenode -h namenode bde2020/hadoop-namenode docker run --name namenode -h namenode bde2020/hadoop-namenode
To start two datanodes on the same host run To add a datanode to the cluster run
docker run --name datanode1 --link namenode:namenode bde2020/hadoop-datanode docker run --link namenode:namenode bde2020/hadoop-datanode
docker run --name datanode2 --link namenode:namenode bde2020/hadoop-datanode
More info is comming soon on how to run hadoop docker using docker network and docker swarm Use the same command to add more datanodes to the cluster
All data are stored in /hdfs-data, so to store data in a host directory datanodes as More info is comming soon on how to deploy a Hadoop cluster using docker network and docker swarm
docker run --name datanode1 --link namenode:namenode -v /path/to/host:/hdfs-data bde2020/hadoop-datanode # access the namenode
docker run --name datanode2 --link namenode:namenode -v /path/to/host:/hdfs-data bde2020/hadoop-datanode
By default the namenode formats the namenode directory only if not exists (hdfs namenode -format -nonInteractive). The namenode listens on
If you want to mount an external directory that already contains a namenode directory and format it you have to first delete it manually.
Hadoop namenode listens on
hdfs://namenode:8020 hdfs://namenode:8020

View File

@ -26,3 +26,5 @@ RUN mv hadoop-$HADOOP_VERSION $HADOOP_PREFIX
# add configuration files # add configuration files
ADD core-site.xml $HADOOP_CONF_DIR/core-site.xml ADD core-site.xml $HADOOP_CONF_DIR/core-site.xml
ADD hdfs-site.xml $HADOOP_CONF_DIR/hdfs-site.xml ADD hdfs-site.xml $HADOOP_CONF_DIR/hdfs-site.xml
CMD hdfs namenode -format -nonInteractive & hdfs namenode && hdfs datanode