the slidedeck

Containerizing the largest Dutch e-commerce site:
The bol.com story
the shop for everyone
1
Content
•About me
•About bol.com
•Containers... in production
•Mayfly: the original container use case
•Choices, choices...
•Lessons learned
•Next steps
2
About me
• Maarten Dirkse (@mdirkse)
• Developer with a history degree, 9+ years of experience (mostly Java)
• Work on the bol.com tools team. We provide the platform for the
organisation to build software: Jenkins, SCM, Mayfly (more on that later)
• Have been running containers in production* for almost 2 years.
(bol.com has been running containers in production, no *, for a little over
a year but really only for the past 5 months)
* production internally, for devs, not for customers
3
About bol.com
•Over 6,5 million active customers
Brand awareness
> 95%
> 75%
•Virtual footprint of almost 1 visitors million
per day
•Over 14,5 million products
•Moved to our own DC two years ago
•VM-based architecture: 1 node per app
instance
•Everything is puppetized but was derived
from a static config source (Racktables)
•We’re hiring! http://banen.bol.com
4
Containers... in production
^^ obligatory container ship pic
the shop for everyone
5
Containers... in production
• Several mission-critical apps running in containers... in VM’s
• Mesos + Marathon cluster that runs backend GUI for the webshop
• Home-grown spidering solution that runs on Google Container Engine
(also Mesos on GCE)
• Mesos + Marathon cluster that runs Mayfly...
6
Mayfly: the original use case
^^ http://mayflycd.github.io/mayfly-talks/
the shop for everyone
7
What is Mayfly?
• Team had an idea for allowing teams to develop every service feature in
isolation to remove bottleneck of shared test environment
• Needed isolated runtime environment for every feature branch (that’s a
lot of environments)
• VM infrastructure was too static, too resource heavy, too slow
8
Containers to the rescue!
• Instead of having every feature branch deploy as a VM, deploy it as a
container
• Use of containers meant we could spin up environments in seconds and
pack more of them onto the hardware
• And so it was that containers were introduced at bol.com. But...
9
DockerCon 2014: docker + ?
Towards “peak container confusion”
Mesos
Marathon (or Aurora?)
Kubernetes
Synapse & Nerve
Paasta
AWS EC2 CS
CoreOS + Fleet
RancherOS
Spotify Helios
wut?
10
Choices, choices....
^^ obligatory cat pic
the shop for everyone
11
The stack
• After trying Fleet (buggy) and Kubernetes (5 min old) we settled on
Mesos+Marathon running on CoreOS RHEL7 on bare metal
• Consul for service discovery, Kevlar for KV store.
• Choices made for Mayfly became the prototype for the bol.com
container infrastructure
12
Dynamic infrastructure is the future!
As the limitations of our VM-based infrastructure
became clear, the platform team became
convinced that the move to dynamic
infrastructure was a necessary step to take in
order to keep scaling the IT-architecture.
13
13
But wait, we’re not finished!
• After you’re done installing your new, mind-blowing tech you realize a lot
of loose ends still need to be tied up.
• Deploying docker to your machines? (and which version)?
--> Docker puppet module (https://github.com/garethr/garethr-docker)
• What about logs?
--> Logspout (https://github.com/gliderlabs/logspout)
• Zombie processes, SD registration?
--> ContainerPilot (https://github.com/joyent/containerpilot)
14
But wait, we’re not finished!
• How do you actually tell Marathon what to deploy?
--> Marathon terraform provider (https://github.com/Banno/terraform-provider-marathon)
• Install a (properly secured) Docker registry. We went with the stock
Docker registry behind a secured Nginx reverse-proxy
• Base images? We choose to use the RHEL7 base image as the root of
everything (known quantity in terms of ops support and security vetting)
• And mind how you create images...
15
BOB
• Needed a way to audit and vet images that would be run in our
landscape
• Created BOB, a wrapper tool for docker build and docker push
• BOB checks your Dockerfile’s and images, ensuring that they meet
company standards, before they’re pushed to the registry
• Nothing gets pushed to the registry if it hasn’t been built by BOB
16
BOB (the builder) running on Jenkins
17
Use cases
• Mayfly (see above)
• BIZ: lots of small, independently deployable modules with back office
functionality. Stateless, ideal for containerization.
• Spidering: horizontally scalable stateless processes that run in the cloud.
18
Lessons learned
^^ nothing funny about this, most of ‘em were learned the hard way
the shop for everyone
19
Lessons learned 1/2
• Most of this stuff is relatively new or brand new, expect growing pains
• Don’t run your container orchestration software (Mesos, Marathon) in
containers. So if Docker dies, your platform doesn’t degrade with it.
• Running your apps in a container can sometimes lead to interesting
issues that don’t exist outside of containers (JVM memory issues, for
instance)
--> See https://www.youtube.com/watch?v=6ePUiQuaUos for example
20
Lessons learned 2/2
• Graphite-style metrics become problematic in a container world.
Prometheus exists, but we can’t just switch from one day to the next
• HA-Proxy & consul template combo is pretty brittle, we now use Fabio
-->https://github.com/eBay/fabio
• Keep it simple, make small changes
Static to dynamic is a sea change that is incredibly hard to oversee. Take
small steps that deliver value immediately
21
The cultural shift
• Beware the mindset transition that dev teams will have to experience
• Devs: “what do you mean I can’t ssh into the container?”)
• It takes time for ops people to adjust to the idea of dynamic
infrastructure. People tend to think from within their own constraints
--> OPS control over the app runtime will no longer be absolute
22
Next steps
^^ obligatory lolcat
the shop for everyone
23
Next steps
• IP-per-container
(needed for per-container firewalls, aka to get security off our back)
• Per-app service descriptor that drives app infra and config (to replace
hiera data and feed Terraform)
• Migrating ever more apps to the dynamic infrastructure
24
Thank you!
Till next time
the shop for everyone