How to run Apache Spark on MicroK8s and Ubuntu Core, in the cloud: Part 2

robgibbon

on 18 August 2021

Tags: Big Data

This article was last updated 1 year ago.


If you have followed Part 1 of this blog post, you’ll have a working setup that allows you to run MicroK8s on Ubuntu Core in a VM on your local workstation using Multipass. But you’re itching to get this up and running on the cloud. I know, so am I! So let’s step through that now. Currently, this is known to work on GCE, so we’ll run this on the GCP cloud. In the future, you will be able to run Ubuntu Core on all the major clouds. If you don’t already have one, run and make yourself a GCP account and come back here as quickly as you can!

Fresh bake: building an Ubuntu Core VM image

For the first step, we’ll need a freshly built OS image of Ubuntu Core. We can use Qemu for this. By default, your Ubuntu Core systems are linked to your Ubuntu ONE account, so that you and only you can log in. 

Just be sure that you have uploaded your SSH public key to your Ubuntu ONE account profile before you start, or you could have some trouble logging into your new VM. This time we’ll use Ubuntu Core 20. Use the following commands:

wget http://cdimage.ubuntu.com/ubuntu-core/20/stable/current/ubuntu-core-20-amd64.img.xz

sudo apt install xz-utils qemu
unxz ubuntu-core-20-amd64.img.xz

qemu-img resize -f raw ubuntu-core-20-amd64.img 60G

qemu-system-x86_64 -smp 2 -m 2048 -net nic,model=virtio -net user,hostfwd=tcp::8022-:22,hostfwd=tcp::8090-:80 -vga qxl -drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=ubuntu-core-20-amd64.img,cache=none,format=raw,id=disk1,if=none -device virtio-blk-pci,drive=disk1,bootindex=1 -machine accel=kvm

You’ll need to follow the onscreen instructions in the Qemu console window to associate the image with your Ubuntu ONE account.

Once done, verify that the logins work by going back to your terminal and connecting to the VM via SSH:

ssh <your Ubuntu ONE username>@localhost -p 8022
sudo snap install lxd

If all has gone well, you should have been able to log in and install LXD. If so, then great – you can now shut down and power off the VM. In order to convert the VM image that we just built to VHDX format so that we can import it to GCP, you can use the following steps:

sudo apt install qemu-utils

qemu-img convert ubuntu-core-20-amd64.img -O vhdx -o subformat=dynamic ubuntu-core-20-amd64.vhdx

G-force! Moving to the cloud

We should be able to import the VM image to GCP now! Use the following commands:

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-352.0.0-linux-x86_64.tar.gz

tar xzf google-cloud-sdk-352.0.0-linux-x86_64.tar.gz
./google-cloud-sdk/install.sh

gcloud auth login
gcloud config set project <YOUR_PROJECT>

gsutil cp ubuntu-core-20-amd64.vhdx gs://<YOUR_BUCKET>/

gcloud compute images import ubuntu-core-20 --data-disk --source-file=gs://<YOUR_BUCKET>/ubuntu-core-20-amd64.vhdx 

Now, we need to enable secure boot for the image, or the Ubuntu Core 20 subsystem won’t play with us. Enabling secure boot also means that the system will run with full disk encryption. So the first step is to create another image based on the one we just imported, but with the UEFI_COMPATIBLE flag turned on:

gcloud compute images create ubuntu-core-20-secureboot --source-disk ubuntu-core-20 --guest-os-features="UEFI_COMPATIBLE"

Alrighty, it is time. Are you ready? Run the following command to launch that Ubuntu Core VM on the cloud. The command enables nested virtualisation, and it’s going to launch an 8-core, 32GB-RAM N2-series instance with secure boot and a second 60GB block device for LXD. (We need an N2-series because these instances support nested virtualisation; but if you want to take a lower or higher instance spec, be my guest!)

gcloud beta compute instances create ubuntu-core-20 --zone=europe-west1-b --machine-type=n2-standard-8 --network-interface network=default --network-tier=PREMIUM --maintenance-policy=MIGRATE --service-account=<YOUR_SERVICE_ACCOUNT>@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --min-cpu-platform="Intel Cascade Lake" --image=ubuntu-core-20-secureboot --boot-disk-size=60GB --boot-disk-type=pd-balanced --boot-disk-device-name=ubuntu-core-20 --shielded-secure-boot --no-shielded-vtpm --no-shielded-integrity-monitoring --reservation-affinity=any --enable-nested-virtualization --create-disk=size=60,mode=rw,auto-delete=yes,name=storage-disk,device-name=storage-disk

Once the image is up and running, we can ssh to it in the cloud using the keypair we associated with our Ubuntu ONE account, as before:

GCE_IIP=$(gcloud compute instances list | grep ubuntu-core-20 | awk '{ print $5 }')
ssh <your Ubuntu ONE username>@$GCE_IIP
sudo lxd init --auto --storage-create-device=/dev/sdb --storage-backend=zfs
sudo lxc init ubuntu:focal microk8s --vm -c limits.memory=28GB -c limits.cpu=7
sudo lxc config device override microk8s root size=40GB
sudo lxc start microk8s
sleep 90 # give the instance time to boot
sudo lxc exec microk8s -- sudo snap install microk8s --classic
sudo lxc exec microk8s -- sudo microk8s enable storage registry dns ingress

Well, look at that! You got LXD and MicroK8s onboard your shiny new Ubuntu Core cloud server, all nested and virtualised. Now that you’ve done this, you can head over to Part 3. See you there!

Ubuntu cloud

Ubuntu offers all the training, software infrastructure, tools, services and support you need for your public and private clouds.

Newsletter signup

Get the latest Ubuntu news and updates in your inbox.

By submitting this form, I confirm that I have read and agree to Canonical's Privacy Policy.

Related posts

Spark or Hadoop: the best choice for big data teams?

I always find the Olympics to be an unusual experience. I’m hardly an athletics fanatic, yet I can’t help but get swept up in the spirit of the competition....

Can it play Doom? Running an AI LAN party on a Spark cluster with ViZDoom

It’s all about AI these days, so I decided to try and answer the important question: can you make a Spark cluster run AI agents that play a game of Doom, in a...

Migrating from Cloudera to a modern data hub architecture

In the early 2010s, Apache Hadoop captured the imagination of the tech community. A free and powerful open source platform, it gave users a way to process...