丰泉机械(丰泉环保电力有限公司怎么样)

barry0012个月前产品信息362

  机器学习过程中的四个误区:

数据泄露;过拟合;数据采用和切分;数据质量。

  In a recent presentation, Ben Hamnerdescribed the common pitfalls in machine learning projects he and his colleagues have observed during competitions on Kaggle.

  The talk was titled “Machine Learning Gremlins” and was presented in February 2014 at Strata.

  In this post we take a look at the pitfalls from Ben’s talk, what they look like and how to avoid them.

  Machine Learning Process

  Early in the talk, Ben presented a snap-shot of the process for working a machine learning problem end-to-end.

  

  Machine Learning Process

  Taken from “Machine Learning Gremlins” by Ben Hamner

  This snapshot included 9 steps, as follows:

Start with a business problem

Source data

Split data

Select an evaluation metric

Perform feature extraction

Model Training

Feature Selection

Model Selection

Production System

  He commented that the process is iterative rather than linear.

  He also commented that each step in this process can go wrong, derailing the whole project.

  Discriminating Dogs and Cats

  Ben presented a case study problem for building an automatic cat door that can let the cat in and keep the dog out. This was an instructive example as it touched on a number of key problems in working a data problem.

  

  Discriminating Dogs and Cats

  Taken from “Machine Learning Gremlins” by Ben Hamner

  Sample Size

  The first great takeaway from this example was that he studied accuracy of the model against data sample size and showed that more samples correlated with greater accuracy.

  He then added more data until accuracy leveled off. This was a great example of understanding how easy it can be get an idea of the sensitivity of your system to sample size and adjust accordingly.

  Wrong Problem

  The second great takeaway from this example was that the system failed, it let in all cats in the neighborhood.

  It was a clever example highlighting the importance of understanding the constraints of the problem that needs to be solved, rather than the problem that you want to solve.

  Pitfalls In Machine Learning Projects

  Ben went on to discuss four common pitfalls in when working on machine learning problems.

  Although these problems are common, he points out that they can be identified and addressed relatively easily.

丰泉机械(丰泉环保电力有限公司怎么样)

  

  Overfitting

  Taken from “Machine Learning Gremlins” by Ben Hamner

Data Leakage: The problem of making use of data in the model to which a production system would not have access. This is particularly common in time series problems. Can also happen with data like system id’s that may indicate a class label. Run a model and take a careful look at the attributes that contribute to the success of the model. Sanity check and consider whether it makes sense. (check out the referenced paper “Leakage in Data Mining” PDF)

Overfitting: Modeling the training data too closely such that the model also includes noise in the model. The result is poor ability to generalize. This becomes more of a problem in higher dimensions with more complex class boundaries.

Data Sampling and Splitting: Related to data leakage, you need to very careful that the train/test/validation sets are indeed independent samples. Much thought and work is required for time series problems to ensure that you can reply data to the system chronologically and validate model accuracy.

Data Quality: Check the consistency of your data. Ben gave an example of flight data where some aircraft were landing before taking off. Inconsistent, duplicate, and corrupt data needs to be identified and explicitly handled. It can directly hurt the modeling problem and ability of a model to generalize.

丰泉机械(丰泉环保电力有限公司怎么样)

Summary

  Ben’s talk “Machine Learning Gremlins” is a quick and practical talk.

  You will get a useful crash course in the common pitfalls we are all susceptible to when working on a data problem.

  出处:machinelearningmastery。

标签: 丰泉机械

相关文章

液压铰链(液压铰链坏了怎么修)

液压铰链(液压铰链坏了怎么修)

1、1看表面首先我们要看看它表面的漆层是否光滑平整,有无划痕或者凹凸不平之处,有无出现变形现象等如果液压铰链的材质较差,使用的是由边角料制作的话,那此种液压铰链通常美观性欠缺,会影响到家具的档次2看液...

手动机械弯管器(手动弯管机怎么使用)

手动机械弯管器(手动弯管机怎么使用)

  一、最新锅炉软化水处理设备概述  最新锅炉软化水处理设备主要是针对锅炉内壁表面结垢而研发出来的,锅炉结构最主要的原因是锅炉用水中,钙离子和镁离子含量超过锅炉用水标准,形成的碳酸物和硫酸物,长期累积...

卸扣机械(卸扣使用方法)

卸扣机械(卸扣使用方法)

  福鼎机械设备销售有限公司是一家中联重科加盟公司,经过多年的发展已经成立多个站点,业务遍布各地。主要的场址是廊坊分部:廊坊开发区创业南路与爱民道交口;香河分部:香河县亚太城北;文安分部:文安县左各庄...

诺力手动液压叉车(诺力手动液压叉车说明书)

诺力手动液压叉车(诺力手动液压叉车说明书)

3T手动搬运叉车价格10001900之间,诺力牌子的全国最贵,名气最响规格常规的是550*1150MM和685*1220MMM。 有诺力叉车诺力智能装备股份有限公司,为各种超市仓库等运输货物而设计根...

粮食输送机械与应用(粮食输送机械输送设备)

粮食输送机械与应用(粮食输送机械输送设备)

螺旋输送机具有结构简单,制做成本低,密封性强、操 作安全方便等优点,中间可多点装、卸料。广泛用于化工、 建材、冶金、粮食等部门。 螺旋机广泛应用于各行业,如建材、化工、电力、冶金、煤矿炭、粮食等行业...

机械油污(机械油污清洗剂主要成分)

机械油污(机械油污清洗剂主要成分)

2选用72%浓度的医用酒精,在药店都能买到首先我们用酒精侵泡沾上黄油的布料,侵泡5分钟后开始搓揉,最后用清水洗干净后查看结果从图中可以看出,用酒精洗过的黄油颜色变淡了3洗洁精,它能彻底的清除餐具上的油...

发表评论    

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。