About Me • Joined Achievers in June 2009 • Prior to Achievers, I was the CTO of ZipLocal • I have spent the last 7 years worrying about how to build scalable applications • Academic Background: – Ph.D. from the University of Toronto – Naval Research Labs Post Doctoral Fellow of Secure Systems at Cambridge University Goals • Tell you about our journey to a scalable architecture • Give you insight into common scaling problems • Give you a way to think about the issues of scaling that you can apply today ACHIEVERS What Does Achievers Do • Achievers started in rewards and recognition space in 2007 • We provide reward and recognition software – Points based system to reward performance – Catalog to redeem the points • Our mission is to “Change the way the world works” The Achievers Home Page Our Traffic Growth • From 2009 to today – Visits up 903% – Unique Visitors up 832% • Last month we did 2.5 million page views • During business hours we have about 250 people on the site at any given moment Funding • 3.3 million Series A from JLA Ventures • 6.9 million Series B form Grandbanks • 24 million Series C from Sequoia Capital PRELIMINARIES Definitions • Performance – Performance measures the speed which a single request can be executed • Scalability – Scalability is the ability to handle a growing number of requests in a capable manner Scalability != Performance Which Language Scales the Best? • Languages Don’t Scale Architectures Do • If you hear “language X doesn’t scale” then turn around and walk away. – That person doesn’t understand scalability There is a bit more to Scalability • Scalability is also about how you scale the development team • If you are successful and need to add people how easy is it for them to contribute • How fast can you write code – Your competitors are right behind you – He who can develop good code fast wins! OUR SAAS PLATFORM The Achievers Platform • Multi tenant architecture – One code base – One database • Module based platform – Hundreds of configuration options for each module – Lots of legacy configurations Backend Processing • We handle many millions of dollars of orders every month • We send out hundreds of thousands of emails a month THE ARCHITECTURE CIRCA 2009 The Stack • • • • • • Pretty Standard J2EE stack Hibernate Spring JMS MySql All running on Amazon EC2 Aside – Amazon EC2 • EC2 is great • Spin up machines for testing then shut them down • A must for any startup – Don’t manage your own servers when you are small. It isn’t worth it Architecture Presentation Business Logic Servlet HTML Objects Hibernate JSP Pages MySql LOOKS GREAT SO WHAT'S THE PROBLEM? Architecture – Data Center View Server 1 But J2EE Scales • Sure it does BUT • The devil is in the details MEET THE DEVIL DETAILS Scaling Was an Afterthought • We had to scale vertically since the underlying design did not consider what would happen if we had 2 web servers • We had the largest EC2 instance money could buy • You cannot retrofit scalability – Your architecture and design either have it or they doesn’t Design Decisions • Your basic approach and philosophy to a few things will determine how hard it will be to scale your infrastructure COMPLEXITY Who doesn’t like magic • Extensive use of Aspect Oriented Programming (AOP) – Allows you to define ‘cut-points’ to insert code before or after a function call • As an academic AOP is brilliant • As a CTO not so much There is a Pattern for That • Use of design patterns for the sake of using a design pattern • Don’t get me wrong every developer must know and understand design patterns • But it isn’t a competition to see who can use the most design patterns in any given day – The right tool for the right job – Don’t force it! Overly complex object model • The Access Control model had so many objects and relationships that other than the original author no other person ever understood it Why is Complexity Bad? • If the system dies at two o'clock in the morning and I'm staring at your code, can I easily figure out what's going on? • People Forget about Magic – Code needs to be in front of you not buried in an XML file or magically invoked What Does This Have To Do With Scalability? • Complex systems are really, really hard to scale – In a clustered environment you need to first figure out if the problem is because of clustering or because of your code – This isn’t trivial even for simple systems • To many things to worry about • When you hit a wall (and you will) it becomes very hard to figure out what to do Don’t Forget About the People • As you grow your team you need to ramp everybody up • A complex system takes longer to learn than a simple one • Complexity ALWAYS increases over time. If you start with something that is complex it will quickly get beyond the scope of a meer mortal Complexity Desire for Complex Solutions Experience THE DATABASE The Database • ORMs make you stupid … kidding … sort of • You need to understand your data – Do not let an ORM define your database you will be sorry • Generating reports out of an ORM is painful • Developers must understand how a DB works – You will forget about what a DB is good for if you don’t consider it explicitly – New developers usually do not understand the importance of the DB in scaling ORM’s • Can they scale? – Sure • Is it hard? – Yup • A quote from stackoverflow on scaling ORM’s – “… a good ORM will provide plenty of hooks that allow you to optimize quite a bit. You just need to spend some time learning it.” Is that all? • Initially ORMs might allow you to write code quickly – I would challenge this but that is another topic • Your system runs into a brick wall. Customers are complaining. Your CEO is chewing out the CTO. The VP Engineering is curled up in a ball in the corner. They turn to you as the architect and you answer: “We just need to learn how to use all the hooks” Just Learn the ORM • I have yet to meet somebody that could convince me that they knew how to scale an ORM – It HAS been done, so yes it is possible but it takes patience and a CEO that likes to wait – I’ve had people tell me “we just have to rewrite the ORM with a new ORM that could scale” Know your database • I believe that your DB should own all your data – Let it do what it is good at • If that is true then simple replication strategies and a little bit of coding can get you reading data from a replica • You can then start denormalizing the DB to get better performance Scaling Your Data • Scaling a DB is a well understood problem with well understood solutions • Don’t confuse this with easy! SESSIONS Server Side Sessions • Very developer friendly • You have 2 choices to scale: – Session replication – Sticky Sessions Session Replication • Yuck! • Lots of network chatter • Slow propagation of the session means the user has a bad experience • You could be moving lots of data around – Our sessions were huge Sticky Sessions • Works but you now need to worry about a machine being overloaded while the others are idle • A machine failure logs out everybody from that machine • You have be very careful when configuring – If all IP addresses go to one server then you essentially have one company per server CACHING When to Cache • Our platform made extensive use of caches • That has to be good right? • Not in our case – Items were cached by Java – Shared state posed a problem when adding another server – Yes there are Java based solutions but all you are doing is adding complexity ADMITTING YOU HAVE A PROBLEM It Won’t Love You Back • Never fall in love with your technology. It will break your heart. • You must always challenge your assumptions and be prepared to throw away something – Hard to throw away your ‘baby’ – Remember it is just a bunch of 1’s and 0’s THE JOURNEY Basic Premise • Every web application follows the same basic flow: 1. User makes a request 2. Validate the request 3. Grab some data 4. Process it a bit 5. Build a Page for the user Guiding Architectural Principles • Initial deployment would be on 3 machines – Forcing us to understand how we are going to scale upfront • Servers must be stateless • The database owns all the data • Caching is an explicit choice to solve a real problem • Always use the right tool for the job • Minimize complexity Other Goals • Zero downtime deployments • We wanted to be able upgrade customers one at a time • Maximize developer productivity The Target Load Balancer Web Server MemcacheD Cluster NAS Device Web Server Web Server Background Processing MySql Master MySql Slave The Language Choice • Why PHP – Faster code/debug cycles • This has increased our productivity – Zero downtime deployments • We have patched running servers multiple times in a day and nobody has noticed anything – Shared nothing philosophy • Forces a good frame of mind for server development Doesn’t PHP Suck? • Languages don’t suck only the developers using them do • PHP isn’t perfect – Google ‘why php sucks’ for an extensive list • But PHP doesn’t scale – Remember, languages don’t scale … – If you don’t believe me ask Wikipedia, Facebook, Digg etc. Sure but PHP is Slow • If your web application is not database bound then you are probably doing it wrong • Yes Java might perform at some things but that will not be a limiting factor Surely There are Down Sides? • Because PHP does not have strong typing you need really good error detection and reporting – We will do another talk on our struggles and solutions • Coding standards are a must since PHP lets you pretty much do whatever you want – Naming conventions are super important – Don’t start a religious war over bracket placement. There really is only one right way The Framework • We use Codeigniter (CI) • Simple MVC framework – The code is very easy to follow • Works out of the box, but is very extensible – Strictly follows the Open/Closed principle – We have extended CI a lot to meet our needs • Doesn’t require learning anything but PHP Using the Right Tool • Have Apache (or a faster web server) server all static content • A Network Attached Storage (NAS) device was used for a shared file system. – This makes life a TON easier • Have your web servers serve requests • Move background work to another server The Problem • We had about 120 customers and we couldn’t just go away to do what we needed to do – Not a bad problem to have THE MIGRATION Step 1 • We wrote a controller that would forward requests to the new code base • GET requests could be easily forwarded • POST request were a bit more complicated • This step allowed us to start developing the new platform AND keep releasing features Step 2 • Start migrating customers to the new platform • We put a proxy server in front of our old and new platforms. • We then proxied specific requests to the version they were running on The Setup HAProxy Express Platform Achievers Platform MySql HAProxy • If you don’t have it installed go back to the office download it and install it! • It isn’t just a load balancer – We can move specific traffic to specific machines for whatever reason – We have a machine with profiling capabilities that we have used to profile production problems – Fine grain control over your request We did it! • It took us almost 6 months to migrate every customer but we did get there • Our productivity has improved • And we have an architecture that we know can handle whatever we can throw at it – At least in the short term CONCLUSIONS Scaling is Hard • Don’t make it harder on yourself – Reduce complexity – Understand your database – Have an upfront strategy to deal with state • We picked stateless but you don’t have to Never let anybody tell you a language or framework does or doesn’t scale. It is all in the details
© Copyright 2026 Paperzz