分享
分销 收藏 举报 申诉 / 31
播放页_导航下方通栏广告

类型Python机器学习Kaggle案例实战.pdf

  • 上传人:曲****
  • 文档编号:228928
  • 上传时间:2023-03-18
  • 格式:PDF
  • 页数:31
  • 大小:1.51MB
  • 下载积分:19 金币
  • 播放页_非在线预览资源立即下载上方广告
    配套讲稿:

    如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。

    特殊限制:

    部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。

    关 键  词:
    Python 机器 学习 Kaggle 案例 实战
    资源描述:
    ATAGURU血:炼卸脸Python机器学习Kaggle案例实战第1周-DATAGURU专业数据分析社区-Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU法律声明【声明】本视频和幻灯片为炼数成金网络课程的教 学资料,所有资料只能在课程内使用,不得在课程以外范散播,违者将可能被追究法律和经济责任。课程详情访问炼数成金培训网站http:DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU关注炼数成金企业微信 提供全面的数据价值资讯,涵盖商业智能与数据分析.大数据、企业信息化.数字化 技术等,各种高性价比课程信息,赶紧掏出您的手机关注吧!DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪关于KaggleATAGURUHill炼数!脸 https: Home for Data ScienceKaggle helps you learn,work,and playCreate an account orHost a competitionCompetitions Climb the worlds most elite machine learning leaderboardsWant to host a competition?Datasets Explore and analyze a collection of high quality public datasetsKernels Run code in the cloud and receive community feedback on your workDATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪I案例背景介绍riATAGURU Crowdflower Search Results Relevance https: 目前,小型在线企业没有很好的方法来评估其搜索算法的性能,使得他们难以提供卓 越的客户体验。这个比赛的目标是创建一个可以用来衡量搜索结果相关性的开源模型。这样,您将帮 助小型企业主获取更多竞争对手提供的丰富资源。它还将为更加成熟的企业提供一个 测试模式。考虑到领先的电子商务网站的查询和结果产品描述,本次比赛要求您评估 其搜索算法的准确性。DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪I数据集介绍ATAGURU 该比赛的数据集是使用CrowdFlower平台上丰富的查询结果配对创建的。他们正在赞 助这项竞赛,作为对开源数据科学界的投资。Crowd Flower收集,清理和标注的数据 集可以使您的监督机器学习梦想成真。为了评估搜索相关性,CrowdFlower已经让他们的人群对少数电子商务网站进行了搜 索。共生成261个搜索词,Crowd Flower将产品列表及其相应的搜索字词放在一起。要求人群中的每个评分者给出产品搜索项1分,2分,3分,4分,表示该项完全满足搜 索查询,1表示该项与搜索项不符。DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪I数据集介绍ATAGURU 本次比赛的挑战是预测产品描述和产品标题的相关性分数。为了确保您的算法足够强 大以处理野外现实世界中的任何嘈杂的HTML片段,产品描述字段中提供的数据是原始 的,并且包含与产品无关的信息。为了阻止手工标注数据,Crowd Flower还提供了额外的数据,没有被测试集中的人群 所标注。计算分数时忽略此数据。DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪数据集介绍 ly露 数据集下载:https:/www.kaggle.eom/c/crowdflower-search-relevance/data train.csv训练集数据-id:产品id-query:搜索词语-product_title:产品标题-product_description:产品描述的完整文本(部分带有HTML标签)-median_relevance:三位评分员的相关性评分中位数.值为1到4的整数.-relevance_variance:评分员的相关性评分的方差.test.csv测试集数据-id:产品id-query:搜索词语-product_description:产品描述的完整文本(部分带有HTML标签)目标变量:median_relevance-DATAGURU专业数据分析社区-Python机器学习Kaggle案例实战讲师黄志洪何翠仪I数据集介绍TAGURU训练数据集DFA1idqueryproduct_titleproduct_descnptionmedian_relevancerelevance.vanance21bridal shower decorationsAccent Pillow with Heart Design-Red/BlackRed satin accent pillow embroidered with a heart in black thread 8 x 81032led Christmas lightsSet of 10 Battery Operated Multi LED Train Christmas Ughts-Clear WireSet of 10 Battery Operated Tram Christmas Lights Item#X124210 Features.Color:multi-color bulbs with matching tram light covers/clear wire Multicolor consists of red,green,blue and yellow bulbs Number of bulbs on string:10 Bulb size micro LED Spacing between bulbs:6 inches Lighted length 4.5 feet Total length:5.5 feet 12 inch lead cord Additional product features:LED lights use 90%less energy Cool to the touch If one bulb burns out,the rest will stay lit Lights are equipped with Lamp Lock feature,which makes them replaceable,interchangeable and keeps them from falling out Requires 3 AA”batteries(not included)Convenient on/off/timer switch located on battery pack Timer function on battery pack allows for 6 hours on and 18 hours off Cannot connect multiple sets together UL listed for indoor use only Tram dimensions:1.5 H x 18W x 5、D M凯eriaKs):plastic/wire/acrylic4044projectorViewSonic Pro820C DLP Multimedia Projector404715wine rackConcept Housewares WR-44526 Solid-Wood Ceiling/Wall-Mount Wine Rack,Charcoal Grey.6 BottleLike a silent and sturdy iree.the Southern Enterprises Bird and Branch Coat Rack is an eyecatching addition to your home d 茅cor This tree themed coat rack features strong branches with pinecone accents and a small bird perched at the top to give it a whimsical and welcoming appearance while still making it sturdy enough to hold your coats,hats,umbrellas and more.Whether it serves as a coat rack,a hat rack or a combination of the two檄匐 be a great space saver that gets appreciated for its graceful appearanceNumber of Hooks:10 Frame Material:Metal Hardware Material:Metal40DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸数据集介绍测试数据集ACD35612Bidqueryproduct-titleproduccdescripnon3electric griddleStar-Max 48 in Electric Griddle6Phillips coffee makerPhilips SENSED HD7810 WHITE Single Serve Pod Coffee Maker Espresso Brew Machine9san francisco 49ers2013 San Francisco 49ers ClockA 2013 San Francisco 49ers clock is the ultimate way for you to show off your team spirit.This clock would be a great conversation piece for any office or bedroom and ihe licensed photo features some of the teams best players11aveeno shampooAVEENO 10.5FLOZ NRSH SHINE SHWater,Ammonium Lauryl Sulfate,Dimethicone,Sodium Cumenesulfonate,Cocamide MEA,Cetyl Alcohol,Acrylates Copolymer,Cocamidopropyl Betaine,Fragrance,Phenoxyethanol,Caprylyl Glycol.Glycol Distearaie,Tetrasodium EDTA,Guar Hydroxypropyltrimonium Chloride Triticum Vulgare(Wheat)Germ Oil.Triticum Vulgare(Wheat Gluten,Orbignya Speciosa Kernel Oil,Glycerin.Polyquatemium-10.Astrocaryum Murumuru Seed Butter.Mauritia Flexuosa Fruit Oil,Mica,Titanium Dioxide May Also Contain:Citric Acid,Sodium Hydroxide.12flea and tick control for dogsMerial Frontline Plus Flea and Tick Control for Dogs and Puppies 45-88 pound14table clockClassy Wood Table ClockWatch oui for this antique wood table clock which will surely give a diverse appeal to your home ambience.Made of quality wood material this table clock is durable and easy to maintain.This wood table clock is in round shape with the numbers in roman form.It has a small designer pattern around the borders The strong base will help in keeping this wood table clock firm and steady.Keep this wood table clock in your living room,bedroom or study room to add a hint of vintage feel to the decor It goes well with both modern and traditionally themed houses.This vintage wood table clockcan be gifted to your near and dear ones who love similar kind of decor pieces.Hurry up and qet this amazinqly desianed wood rableDATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪TAGURU数据集介绍提交数据格式1idprediction2333634935113612371438153916310183111931221313223142331524316253172631827319293203032133322343233632437325383DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU评分标准 https: quadratic weighted kappa(一)2(N l)2i Etj 皿 j。,jK=一DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU集成学习 集成学习:是目前机器学习的一大热门方向,所谓集成学习简单理解就是指采用多个 分类器对数据集进行预测,从而提高整体分类器的泛化能力。DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU集成学习三种常见框架:bagging、boosting,stackingbagging训练集DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪集成学习ATAGURU加炼数I脸 boostingboostingDATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸集成学习stacking训练集stackingDATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸集成学习偏差与方差High VarianceS.SCQLow VarianceE(F)W)m=2 力E(/i)Im=y*2町)(CovVar(F)=Varm=2 疗*Par5)+22t I J XI=m2 y2 a2 p+m y*2*p 片 *J后而*a3*(1-p)DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪集成学习ATAGURUHill炼数I脸 bagging的偏差与方差mEV)”85)I1=m uboosting的偏差与方差mE(F)=y*85)iVar(F)=m2*y2*a=*p+m*y2*ct2*(1 p)=m=y2 a2*1+m),=*a2 (1-1)r 7=nr y.Var(F)=m=a p+m y2*a2 (1 p)=m2 a-p+m*上 a*(1 p)m-nx/.(l-p)O-1*p H-mDATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU基础模型XGBoost Linear BoosterXG Boost Tree BoosterGradientBoostingRegressorExtraTreesRegressorRandomForestRegressorSVRRidgeKeras NNRGF RegressionTable 7:Model LibraryPackageModelFeatureWeightingXG BoostgblinearMSEHigh/LowYesCOCRSoftmaxSoftkappagbtreeMSELowYesCOCRSoftmaxSoftkappaSklearnGradientBoostingRegressorLowYesExtraTreesRegressorLowYesRandomForestRegressorLowYesSVRLowYesRidgeHigh/LowYesLassoHigh/LowNoLogisticRegressionHigh/LowNoKerasNN RegressionLowNoRGFRegressionLowNoDATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸冠军思路分享http:/ ExtractionEnsemble SelectionOutputCounting FeaturesDropping HTML tagsDistanceFeaturesWord ReplacementStemmingTF-IDFFeaturesQuery IdXGBoost Linear BoosterXG Boost Tree BoosterGradientBoostingRegressorExtraTreesRegressorRandomForestRegressorSVRRidgeKeras NNRGF RegressionSubmissionDATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸数据探索DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸预处理剔除HTML标签-通过bs4库提取HTML中的文本信息单词替换-拼写错误修正-同义词替换-其他单词替换词干化DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU预处理Table 1:Spelling Correctionmisspellingscorrectionrefrigirator ret hargal 1 batteries adidas assassiiiss creed rabopp k cups pxtftn;il hardisk 50()gbrefrigerator rechargeable batteries adidas fragrance assassins creed racliael ray7 cookware donut shop k cups external hardisk 500 gbTable 2:Synonym Replacementsynonymsreplacementchild,kid bicycle,bike refrigerator,fridge,freezer fragrance,perfume,cologne,eau de toilettekid bike fridge perfumeTable 3:Other Replacementoriginalreplacementnutri systemnutrisystemsoda streamsodastreamplaystationPSps 2ps2ps 3ps3ps 4ps4coffeemakercoffee makerk-cupk cup4-ounce4 ounce8-ounce8 ounce12-ounce12 ounceounceozhardiskhard drivehard diskhard driveharley-davndsonbarley davidsonharleydavidsonharley davidsondoctor whodr wholevi strausslevismac bookmacbookmicro-usbmicro usbvideo gamesvideogamesgame padgamepadwestern digitalwdDATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸特征提取 counting 特征-基本counting特征 Count of n-grani)coiuit of ngi,am(j,n).ngram(fi,n).and ngram(/l,n).Count&Ratio of Digitcount.&ratio of digits in 匕、and 4.Count&Ratio of Unique zz-gramcount&ratio of unique ngram(gn n).ngram(/;,77),and ngram(J,.77).Description Missing Indicatorbinaiy indicator indicating whether&is empty.1 r Coimt&Ratio of as z?-gram in bs 7?-gram诈 Such features were computed for all the combinations of a and b%,&(a r b).Statistics of Positions of as z?gram in 6?s zz-gram For those intersect n-gram,we recorded their positions,and computed the following statistics as features.一 minimum value(0%quantile)-median value(50%quantile)一 maximum value(100%quantile)mean valuestandard deviation(std)Statistics of Normalized Positions of a?s n-gram in bs n-grain These features are simihir with above features,but computed using positions normalized by the length of a.-DATAGURU专业数据分析社区-Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸特征提取距离特征Jaccard coefficienti ir c(n I C B|JacciU-dCoef(yl.B)=.and Dice distance-基本距离特征 D(ngi,ani(qi.n).ngram(/.77.)D(ngram(q,n).ngiam(4,72)D(ngram(f 2,zz).ngram(4.)DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪特征提取DATAGURU距离特征-统计距离特征1.group the samples by median_relevance and(query.median_relevance).Gr=i|ri=r(3)Gq,=i qi=q,7:=r(4)where q /(i.e.all the unique query)and r 1.2.3,4)pute distance between each sample and all the samples in each median_relevance level.Note that we exchided the current sample being considered when computing the distance.For Gqr.we considered the group with same query as the cmrent sample.Sfn=0(ngram(。,九).ngram#)j GrJ*i SQin=)(ngiam(fl,n).ngram(/j?n)j G%,r,j*?(6)where r 6 1,2,3.4 and)(,)(JaccaxclCoef(-.).DiceDist(-,-).3.for and SQi.rm.respectively,compute statistics such as minimum value(0%quantile)median value(50%quantile)maximum value(100%quantile)mean value standard deviation(std)more can be added,e.g.moment features and other quantiles as featmes.-DATAGURU专业数据分析社区-Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸特征提取 TF-IDF 特征-基本TF-IDF特征 TF-IDF Features Basic Cosine Similarity Statistical Cosine Similarity SVD Reduced Features Basic Cosine Similarity Based on SVD Reduced Features Statistical Cosine Similarity Based on SVD Reduced Features_ query unigrani/bigrani and product_title uni gram/bigram query unigi,am/bigrtun and product_description unigi,ain/bigiani query id(qid)and product_title unigram/bigrani query id(qid)and product_description unigram/bigrani cooccuiTence terms for query unigram and product-title unigram is silver fremada.silver sterling,silver silver,silver freeform,silver necklace,necklace fremada.necklace sterling,necklace silver,necklace freeform,necklace necklace cooccurrence terms for query bigiam and product_title unigram is silver necklace fremada.silver necklace sterling,silver necklace silver,silver necklace freeform,silver necklace necklace-DATAGURU专业数据分析社区-Python机器学习Kaggle案例实战讲师黄志洪何翠仪ATAGURU加炼数I脸特征提取 其他特征-query的独热编码 独热编码-独热编码即One-Hot编码,又称一位有效编码,其方法是使用N位状态寄存器来对N个状态 讲行编码,每个状态都由他独立的寄存器位,并日在任意时候,其中只有一位有效自然状态码为:000,001010,011100,101独热编码为:000001,000010.000100,001000.010000,100000DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪I代码 ig勰躅 https: CrowdFlower 特征提取 生成最佳单模型 生成模型库 通过综合选择产生最终判断结果 DATAGURU专业数据分析社区-Python机器学习Kaggle案例实战讲师黄志洪何翠仪I炼数成金逆向收费式网络课程 露躅 Dataguru(炼数成金)是专业数据分析网站,提供教育,媒体,内容,社区,出版,数据分析业务等服务。我们的课程采用新兴的互联网教育形式,独创地发展了逆向收 费式网络培训课程模式。既继承传统教育重学习氛围,重竞争压力的特点,同时又发 挥互联网的威力打破时空限制,把天南地北志同道合的朋友组织在一起交流学习,使 到原先孤立的学习个体组合成有组织的探索力量。并且把原先动辄成千上万的学习成 本,直线下降至百元范围,造福大众。我们的目标是:低成本传播高价值知识,构架 中国第一的网上知识流转阵地。关于逆向收费式网络的详情,请看我们的培训网站http:/DATAGURU专业数据分析社区Python机器学习Kaggle案例实战讲师黄志洪何翠仪ir)ATAGURU bill炼的脸ThanksFAQ时间DATAGURU专业数据分析网站
    展开阅读全文
    提示  咨信网温馨提示:
    1、咨信平台为文档C2C交易模式,即用户上传的文档直接被用户下载,收益归上传人(含作者)所有;本站仅是提供信息存储空间和展示预览,仅对用户上传内容的表现方式做保护处理,对上载内容不做任何修改或编辑。所展示的作品文档包括内容和图片全部来源于网络用户和作者上传投稿,我们不确定上传用户享有完全著作权,根据《信息网络传播权保护条例》,如果侵犯了您的版权、权益或隐私,请联系我们,核实后会尽快下架及时删除,并可随时和客服了解处理情况,尊重保护知识产权我们共同努力。
    2、文档的总页数、文档格式和文档大小以系统显示为准(内容中显示的页数不一定正确),网站客服只以系统显示的页数、文件格式、文档大小作为仲裁依据,个别因单元格分列造成显示页码不一将协商解决,平台无法对文档的真实性、完整性、权威性、准确性、专业性及其观点立场做任何保证或承诺,下载前须认真查看,确认无误后再购买,务必慎重购买;若有违法违纪将进行移交司法处理,若涉侵权平台将进行基本处罚并下架。
    3、本站所有内容均由用户上传,付费前请自行鉴别,如您付费,意味着您已接受本站规则且自行承担风险,本站不进行额外附加服务,虚拟产品一经售出概不退款(未进行购买下载可退充值款),文档一经付费(服务费)、不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
    4、如你看到网页展示的文档有www.zixin.com.cn水印,是因预览和防盗链等技术需要对页面进行转换压缩成图而已,我们并不对上传的文档进行任何编辑或修改,文档下载后都不会有水印标识(原文档上传前个别存留的除外),下载后原文更清晰;试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓;PPT和DOC文档可被视为“模板”,允许上传人保留章节、目录结构的情况下删减部份的内容;PDF文档不管是原文档转换或图片扫描而得,本站不作要求视为允许,下载前可先查看【教您几个在下载文档中可以更好的避免被坑】。
    5、本文档所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用;网站提供的党政主题相关内容(国旗、国徽、党徽--等)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
    6、文档遇到问题,请及时联系平台进行协调解决,联系【微信客服】、【QQ客服】,若有其他问题请点击或扫码反馈【服务填表】;文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“【版权申诉】”,意见反馈和侵权处理邮箱:1219186828@qq.com;也可以拔打客服电话:0574-28810668;投诉电话:18658249818。

    开通VIP折扣优惠下载文档

    自信AI创作助手
    关于本文
    本文标题:Python机器学习Kaggle案例实战.pdf
    链接地址:https://www.zixin.com.cn/doc/228928.html
    页脚通栏广告

    Copyright ©2010-2025   All Rights Reserved  宁波自信网络信息技术有限公司 版权所有   |  客服电话:0574-28810668    微信客服:咨信网客服    投诉电话:18658249818   

    违法和不良信息举报邮箱:help@zixin.com.cn    文档合作和网站合作邮箱:fuwu@zixin.com.cn    意见反馈和侵权处理邮箱:1219186828@qq.com   | 证照中心

    12321jubao.png12321网络举报中心 电话:010-12321  jubao.png中国互联网举报中心 电话:12377   gongan.png浙公网安备33021202000488号  icp.png浙ICP备2021020529号-1 浙B2-20240490   


    关注我们 :微信公众号  抖音  微博  LOFTER               

    自信网络  |  ZixinNetwork