Sowrey.org

28 Apr, 2008

Web 2.0 Expo: Capacity Planning for Web Operations

Posted by: Geoff In: Critical Mass| Technology| Travel

Schill suggested I go to this one as well. Again, mostly because Schill knows the guy, but given how this is perennial problem for us as well, it couldn’t hurt to see how Flickr handles the problem.
Presenter: John Allspaw, Flickr

  • Traditional capacity planning: queuing theory
  • Some sites (e.g. Flickr) put out updates 20 times a day; timelines are much, much shorter
    • Release early, release often (fail early, fail often)
  • Why capacity planning is important
    • Hardware costs
    • Network and data storage costs money
    • Cloudware costs $$$ too
    • Too little is bad, too much is waste
  • Normal growth is planned, expected, projected, hoped for
  • Instantaneous growth are unexpected, spikes, external events, Digg effect
    • Slam and/or destroy your performance
  • Instantaneous coping
    • Disable heavier features on the site (Flickr builds featres with config files for quick disabling)
    • Aggressive caching or serve stale data
    • Bake dynamic pages into static ones
  • Capacity != performance
    • Making something last doesn’t make it fast
    • Tuning is good, just don’t count on it
    • Accept what performance you have, not what you want
  • Good capacity measurement tools
    • Measure and record any number you give it over time (metric collection tools; aka trending)
    • Easily compare metrics to any other metrics
    • Import/export
    • Examples: Cacti.net, muninprojects.linpro.no, ganglia.info, hyperic.com
    • Flickr uses ganglia
  • Related questions
    • How much can a server handle before it dies?
    • How many can we lose before we’re screwed?
    • How quickly can we get another server?
  • Need to relate the network/CPU performance to your application performance
    • Only real way to establish how much a given server can handle, and how many servers you might need
  • Benchmarking is a bit of a red herring, but can be used if you’ve no other choice
  • “Diagonal” scaling: vertical scaling (big, powerful) + horizontal scaling (lots of the same thing)
  • Flickr went from old, slower machines to new, faster machines — less of them that did more
  • Use Common Sense (TM)
    • pay attention to the right metrics (many of them are irrelevant or misleading, but it might not be the one that shows where/how a server died)
    • Review graphs constantly (weekly, hourly, seasonally)
  • Complex simulation/modelling rarely worth the time and effort
    • Better to put it into production and see what happens
    • “I’ve got a stack of napkins and a pen, and I’m not afraid to use ‘em!”
  • Tuning and weaking will never gain you excess capacity

No Responses to "Web 2.0 Expo: Capacity Planning for Web Operations"

Comment Form

Flickr PhotoStream

    Jen, Jason, and the Patchfinder Video: Leaving Montezuma Video: Feeding the capuchin, Part 2 Video: Feeding the capuchin, Part 1 Video: No touch monkey! Video: Nature's alarm howl Green and brown The view up to our room Flock of pelicans Splash! Lizard on a tree Reading and relaxing 

About

My own little hidey-hole on the weeb. I've got lots of other services, yes, but all of them feed into here. Mostly...

View Geoff Sowrey's profile on LinkedIn