diff --git a/README.md b/README.md index 64c2491..053e37b 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,36 @@ # Hadoop Docker -## Supported Hadoop Versions -See repository branches for supported hadoop versions +This repository provides setup for Hadoop cluster using Docker containers. Spin up NameNode, DataNode, ResourceManager, NodeManager, and HistoryServer—each in its own container. + +--- + +## Versions + +- Hadoop: 3.4.1 +- Java: 17 + +--- + +> [!IMPORTANT] +> You may have to restart the `hadoop-resourcemanager` few times before it actually starts running. ## Quick Start To deploy an example HDFS cluster, run: -``` - docker-compose up + +```shell +docker-compose up -d ``` Run example wordcount job: -``` - make wordcount + +```shell +make wordcount ``` Or deploy in swarm: -``` + +```shell docker stack deploy -c docker-compose-v3.yml hadoop ``` @@ -24,35 +38,39 @@ docker stack deploy -c docker-compose-v3.yml hadoop Run `docker network inspect` on the network (e.g. `dockerhadoop_default`) to find the IP the hadoop interfaces are published on. Access these interfaces with the following URLs: -* Namenode: http://:9870/dfshealth.html#tab-overview -* History server: http://:8188/applicationhistory -* Datanode: http://:9864/ -* Nodemanager: http://:8042/node -* Resource manager: http://:8088/ +- Namenode: http://:9870/dfshealth.html#tab-overview +- History server: http://:8188/applicationhistory +- Datanode: http://:9864/ +- Nodemanager: http://:8042/node +- Resource manager: http://:8088/ ## Configure Environment Variables The configuration parameters can be specified in the hadoop.env file or as environmental variables for specific services (e.g. namenode, datanode etc.): + ``` - CORE_CONF_fs_defaultFS=hdfs://namenode:8020 +CORE_CONF_fs_defaultFS=hdfs://namenode:8020 ``` CORE_CONF corresponds to core-site.xml. fs_defaultFS=hdfs://namenode:8020 will be transformed into: + ``` - fs.defaultFShdfs://namenode:8020 +fs.defaultFShdfs://namenode:8020 ``` + To define dash inside a configuration parameter, use triple underscore, such as YARN_CONF_yarn_log___aggregation___enable=true (yarn-site.xml): + ``` - yarn.log-aggregation-enabletrue +yarn.log-aggregation-enabletrue ``` The available configurations are: -* /etc/hadoop/core-site.xml CORE_CONF -* /etc/hadoop/hdfs-site.xml HDFS_CONF -* /etc/hadoop/yarn-site.xml YARN_CONF -* /etc/hadoop/httpfs-site.xml HTTPFS_CONF -* /etc/hadoop/kms-site.xml KMS_CONF -* /etc/hadoop/mapred-site.xml MAPRED_CONF +- /etc/hadoop/core-site.xml CORE_CONF +- /etc/hadoop/hdfs-site.xml HDFS_CONF +- /etc/hadoop/yarn-site.xml YARN_CONF +- /etc/hadoop/httpfs-site.xml HTTPFS_CONF +- /etc/hadoop/kms-site.xml KMS_CONF +- /etc/hadoop/mapred-site.xml MAPRED_CONF If you need to extend some other configuration file, refer to base/entrypoint.sh bash script.