词条 | 基于开源工具的数据分析 |
释义 | 图书信息作者:Philipp K. Janert 出版社: 东南大学出版社; 第1版 (2011年6月1日) 外文书名: O’Reilly Media,Inc. 平装: 509页 正文语种: 英语 开本: 16 ISBN: 7564126744, 9787564126742 条形码: 9787564126742 产品尺寸及重量: 23 x 17.6 x 2.4 cm ; 839 g 内容简介数据收集相对比较简单,而要把原始信息转化为有用的数据则需要知道如何精确地抽取你想要的内容。通过这《基于开源工具的数据分析(影印版)》(作者Philipp K.Janert)的深入讲解,那些对数据分析感兴趣的中等或者富有经验的程序员将可以学习到在商业环境中与数据打交道的技术。你将了解到如何观察数据来找出它所包含的信息,如何在概念模型里捕捉到这些想法,然后把你的理解通过商业计划、度量标准的精确报告和其他方式反馈给你所在的机构。 你将会通过《基于开源工具的数据分析(影印版)》每章结束部分的动手实践来慢慢体验各种概念。最重要的是,你将了解到如何思考你所希望获取的数据——而不是依赖于工具来替你思考。 编辑推荐《基于开源工具的数据分析(影印版)》(作者Philipp K.Janert)使用图形来描述带有一个、两个或者十多个变量的数据;使用粗略计算以及维度和概率参数来开发概念模型;使用诸如模拟和聚类的集约计算方法来挖掘数据;通过报告、信息板和其他度量程序来让你的结论更容易理解;理解财务计算,包括货币时间价值;利用降维技术或者预测分析来克服数据分析过程中面临的挑战;熟悉数据分析的不同开源编程环境。 目录PREFACE 1 INTRODUCTION Data Analysis What's in This Book What's with the Workshops? What's with the Math? What You'll Need What's Missing PART I Graphics: Looking at Data 2 A SINGLE VARIABLE: SHAPE AND DISTRIBUTION Dot andJitter Plots Histograms and Kernel Density Estimates The Cumu/atiue Distribution Function Rank-Order Plots and Lilt Charts Only When Appropriate: Summary Statistics and Box Plots Workshop: NumPy Further Reading 3 TWO VARIABLES: ESTABLISHING RELATIONSHIPS Scatter Plots Conquering Noise: 5moothing Logarithmic Plots Banking Linear ReRression and All That Shouwing What's Important Graphical Analysis and Presentation Graphics Workshop: matplotlib Further Reading TIME AS A VARIABLE: TIME-SERIES ANALYSIS Examples The Task Smoothing Don't Ouerlook the Obuious! The Correlation Function Optional: Filters and Conuolutions Workshop: scipy.signal Further ReadinR 5 MORE THAN TWO VARIABLES: GRAPHICAL MULTIVARIATE ANALYSIS False-Color Plots A Lot at a Glance: Multiplots Composition Problems Nouel Plot Types Interactiue Explorations Workshop: Tools for Multiuariate Graphics Further ReadinR 6 INTERMEZZO: A DATA ANALYSIS SESSION A Data Analysis Session Workshop: gnuplot Further ReadinR PART II Analyticg: Modeling Data 7 GUESSTIMATION AND THE BACK OF THE ENVELOPE Principles of Guesstimation How Good Are Those Numbers? Optional: A Closer Look at Perturbation Theory and Error PropaRation Workshop: The Gnu Scientific Library (GSL) Further Reading 8 MODELS FROM SCALING ARGUMENTS Models ArRuments from Scale Mean-Field Approximations Common Time-Euolution Scenarios Case Study: How Many Seruers Are Best? Why Modeling? Workshop: Sage Further Reading 9 ARGUMENTS FROM PROBABILITY MODELS The. Binomial Distribution and Bernoulli Trials The Gaussian Distribution and the Central Limit Theorem Power-Law Distributions and Non-Normal Statistics Other Distributions Optional: Case Study--Unique Visitors ouer Time Workshop: Power-Law Distributions Further Reading 10 WHAT YOU REALLY NEED TO KNOW ABOUT CLASSICAL STATISTICS Genesis Statistics Defined Statistics Explained Controlled Experiments Versus Obseruationa} Studies Optional: Bayesian Statistics--The Other Point of View Workshop: R Further Reading 11 INTERMEZZO:MYTHBUSTING--BIGFOOT, LEAST SQUARES, AND ALL THAT How to Auerage Auerages The Standard Deuiation Least Squares Further Reading PART III Computation: Mininhg Data 12 SIMULATIONS A Warm-Up Question Monte Carlo Simulations Resampling Methods Workshop: Discrete Euent Simulations with Simpy Further Reading 13 FINDING CLUSTERS What Constitutes a Cluster? Distance and Similarity Measures Clustering Methods Pre-and Postprocessing Other ThouRhts A Special Case: Market BasketAnalysis A Word of WarninR Workshop: P/cluster and the C Clustering Library Further Reading 14 SEEING THE FOREST FOR THE TREES: FINDING IMPORTANT ATTRIBUTES Principal Component Analysis Visual Techniques Kohonen Maps Workshop: PCA with R Further Readin2 15 INTERMEZZO:WHEN MORE IS DIFFERENT A Horror Story Some Suggestions What About Map/Reduce? Workshop: Generating Permutations Further Reading PART IV Applications: Using Data 16 REPORTING, BUSINESS INTELLIGENCE, AND DASHBOARDS Business Intelligence Corporate Metrics and Dashboards Data Quality Issues Workshop: Berkeley DB and SQLite Further Reading 17 FINANCIAL CALCULATIONS AND MODELING The Time Value o[ Money Uncertainty in Planning and Opportunity Costs Cost Concepts and Depreciation Should You Care? Is This All That Matters? Workshop: The Newsuendor Problem Further Reading 18 PREDICTIVE ANALYTICS Introduction Some Classification Terminology Algorithms for Classification The Process The Secret Sauce The Nature o[ Statistical Learning Workshop: Two Do-lt-Yoursel Classifiers Further Reading 19 EPILOGUE: FACTS ARE NOT REALITY A PROGRAMMING ENVIRONMENTS FOR SCIENTIFIC COMPUTATION AND DATA ANALYSIS Software Tools A Catalog of Scientific Software Writing Your Own Further Reading B RESULTS FROM CALCULUS Common Functions Calculus Useful Tricks Notation and Basic Math Where to Go from Here Further Readin9 WORKING WITH DATA Sources for Data Cleanin9 and ConditioninR Sarnplin9 Data File Formats The Care and Feeding of Your Data Zoo Skills Terminology Further Fleadin9 INDEX |
随便看 |
百科全书收录4421916条中文百科知识,基本涵盖了大多数领域的百科知识,是一部内容开放、自由的电子版百科全书。