In Facebook we run huge Java services, this applies both to the size of a single process and to scale of our servers fleet.
Facebook Lite is one of these dominant Java services within Facebook, serving hundreds of millions of users every month. The architecture of Facebook Lite is unique, as it offloads client’s typical work (data retrieval, business logic, layout calculation, etc.) to the server, causing it to evolve into a memory bound service.
This architecture provides clear advantages to Facebook Lite users and developers, however it also imposes difficulties on service owners for keeping the service healthy and safe from memory regressions. For instance, even a memory regression of 1% has high stability and cost implications on our production system. Therefore, should be detected and blocked as soon as possible.
In this session we will go through the evolution of the Facebook Lite service from a point in time in which it was occasionally suffering from massive memory regressions that put it at risk, through building a scalable and advanced memory analysis infrastructure, to providing high granularity memory visibility to developers and enabling them to push our service to its efficiency limits with massive memory wins.