O'Reilly、Cloudera 主办
Make Data Work
2017年7月12-13日:培训
2017年7月13-15日:会议
北京,中国

微软的通用异常检测平台 (The common anomaly detection platform at Microsoft)

此演讲使用中文 (This will be presented in Chinese)

Tony Xing (Microsoft)
16:20–17:00 Saturday, 2017-07-15
数据工程和架构 (Data engineering and architecture)
地点: 多功能厅5B+C(Function Room 5B+C) 观众水平 (Level): 非技术性 (Non-technical)

必要预备知识 (Prerequisite Knowledge)

A basic understanding of data processing

您将学到什么 (What you'll learn)

Microsoft’s common anomaly detection platform, an API service built internally to provide product teams the flexibility to plug in any anomaly detection algorithms to fit their own signal types

描述 (Description)

微软的应用与服务部门拥有两套时序异常检测系统来应对不同的数据场景。它们服务于Bing搜索、Ads、Office365和Skype等多个业务。在运维这两套系统的过程中,我们团队发现了下面几个客户痛点:

  • 时序数据的类型非常得多,很难用一个算法来处理所有的类型同时还能获得可接受的假真率;
  • 客户可能不想把数据放到之前异常检测系统使用的数据收集系统里;
  • 客户希望能有多维度的异常检测(例如使用诸如国家、语言和设备等维度),这就带来了很高的计算负载。

Tony Xing会对微软的通用异常检测平台做一个总体的介绍。这个平台是内部构建的一个API,用来为产品团队提供灵活的插入各种适合他们自己数据类型的异常检测算法的服务。这个平台:

  • 可以处理多维度的时序数据,从而能检查到维度内和组合维度上的异常。
  • 是一个独立的基于API的服务,所以客户能很容易的把异常检查加入到他们自己的产品体验里。
  • 有一个很容易插入不同的算法的框架,从而能处理多种数据类型。这样客户就能够选择使用最适合他们的异常检查引擎。
  • 几乎是实时响应的。
  • 是一个可以线性扩展的服务。


Microsoft’s Application and Service group had two systems doing time series anomaly detection for various data scenarios, which were serving teams across Bing Search, Ads, Office 365, and Skype. During the operation of those two systems, the team identified several customer pain points:

  • There are various time series signal types, and it is hard to have one algorithm cover everything with acceptable false positives.
  • Customers might not want to onboard specific data ingestion systems used by prior AD systems.
  • Customers want to have multidimensional anomaly detection (e.g., having dimensions like country, language, and devices), which is computational expensive.

Tony Xing offers an overview of Microsoft’s common anomaly detection platform, an API service built internally to provide product teams the flexibility to plug in any anomaly detection algorithms to fit their own signal types. This platform:

  • Handles multidimensional time series, so the anomalies within the dimensions and combinations of dimensions can be detected.
  • Is an independent API-based service, so customers can easily add AD into their own product experience.
  • Has a framework to easily plug in different learning algorithms to handle various signal types, so customers can pick what detection engine is best for them.
  • Operates in near real time.
  • Is a linear scalable service.
Photo of Tony Xing

Tony Xing

Microsoft

Tony Xing is a senior product manager on the Shared Data team within Microsoft’s Application and Service group. Previously, he was a senior product manager on the Skype data team within Microsoft’s Application and Service group. Tony is a frequent speaker at Strata.

联系OReillyData

关注OReillyData微信号获取最新会议信息并浏览前沿数据文章。

WeChat QRcode

 

Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

阅读关于大数据的最新理念。

ORB Data Site