This tutorial is meant to set up a small cluster with VirtualBox so that we can practice some distributed frameworks (e.g. Hadoop, Spark) on our own machine.
The main idea is to virtualize many instances with VirtualBox and connect them with the inner network. If you already set up a virtual node or are familiar with VirtualBox, but you failed to connect them together or failed to share a folder, you can jump to this anchor.
Install VirtualBox
I believe you can find a lot of tutorials about how to install VirtualBox on your machine. The official one for Linux can be found in here. Note that the deb didn’t work on my Ubuntu 16.04 and I recommend to install via commands.
ISO Image
To install a virtual instance or a node (as in a cluster) on VirtualBox, we need an ISO Image that works as an installation CD. Many tutorials recommend the Ubuntu (Desktop version) but I think it’s necessary here since we are using those instances to calculate not to display. Therefore, to reduce the overhead (e.g. display/memory/space), we go for the Server version of Ubuntu Xenial (16.04) as the Operating System for each node. In that way, we don’t have to give up the abundant supports the Ubuntu community provides but also boost the performance.
You can download the ISO from the official list. I used the ubuntu
-16.04.5-server-i386.iso
as the image.
Node Installation
Click the “New” button in VirtualBox and the “Expert Mode” in the dialog.
Since I have already built two instances, I will just name this one worker3. You can start with any number/name you like. I will recommend preserving at least 1 GB for each instance. But don’t worry about it because it’s adjustable in the future. After that, click “Create”.
In the next dialog, you can just follow the default settings:
Then click “Create” to create a totally empty instance.
To install an operating system, select the instance and click the “Setting” button on the top. As you can see, you can change many parameters of this instance when it’s shutdown. Now we can go to the “Storage" section to load the ISO image we just downloaded.
Click the button in the center list, select “Choose disk” and locate the ISO Image. After that, we can just press “OK” and finish the settings.
Now we can start the instance and install the Ubuntu System.
You can just follow the default configuration and setup your username and password. You might need to confirm some changes like the timezone or the disk writing. But it all happens in a virtualized environment and won’t affect the OS you are using.
(Note: If your cursor is captured by the screen, press the Right-Ctrl to detach.)
It takes less than 10 minutes on my machine and it will automatically reboot after installation. When you saw,
it means you have successfully installed the instance!
Previous steps are quite simple but necessary. The following part is more challenging and might take you some time searching on Google or Stack Overflow.
Accessing the virtualized instance via SSH
Switching between the virtual machine and your host OS is somehow inefficient. Since we don’t need a GUI (and the server version of Ubuntu doesn’t have one) of the virtual instance, we can export a SSH port and access it from our host system with Putty (in Windows) or Terminal (in Linux and Mac).
To do that, we have to login our virtual instance and install the openssh-server:
sudo apt-get update
sudo apt-get install openssh-server -y
and then we need to edit the ile /etc/ssh/sshd_config
. We have to enable the PasswordAuthentication or copy the public key of the host to the authorized_key file in the virtualized instance. If you don’t know what I am talking about, you can just type sudo
vim /etc/ssh/sshd_config
, :
52
, Enter
, d
, l
, Esc
, :
wq
, Enter
.
It should look like this:
For now, we have already set up the virtual instance but we still cannot access it from host since VirtualBox use some kind of private network to handle the network. To export a SSH port, we can select an instance and go to “Settings” > “Network” > “Adapter 1” > “Advanced” > “Port Forwarding” and add a rule by clicking the button.
The rule looks like:
You can change the Host Port to any feasible port number as you like. Here I preseve 12501 for my first instance, 12502 for my second instance and 12503 for my third instance.
After saving the configuration, you can login with Putty with IP: 127.0.1.1, Port: 12503 (or the one you pick), Username: worker3 (the username of the instance). For Linux or Mac user, type ssh worker3@127.0.1.1 -p
12503 to connect with the instance.
Now you can close the instance and restart it using headless mode. To do that, right-click the instance and check the “start” item, there is a “headless start” selection. If you accidentally open the GUI, you can detach GUI in the “top bar menu” > “Machine” > “detach GUI”.
Extra note: I recommend using tmux or screen to create different working areas in one window and connect to each instance in each area to better controlling your instances.
Share a folder
Go to “Setting” > “Shared Folders” > to add a shared folder from the host’s file system.
The Folder Path can be any folder that you want to share between the host and the instance. And I recommend naming the Folder Name as “shared” since we are going to use it in the following instructions.
VirtualBox cannot install the VirtualBox Addition on some systems automatically, and unfortunately, the Ubuntu server is one of them. To share a folder between the host and the instance, we need to first attach the GUI window. Select “top bar menu” > “Devices” > “Insert Guest Additions CD image..”.
We have to (in the virtualized instance):
- Mount the CD.
- Manually install the Guest Additions from the CD.
- Reboot the instance.
- Mount the shared folder.
Here is a solution (Mounting VirtualBox shared folders on Ubuntu Server 16.04 LTS) that is tested to work.