Developing Analytic Talent: Becoming a Data Scientist
Buy Rights Online Buy Rights

Rights Contact Login For More Details

More About This Title Developing Analytic Talent: Becoming a Data Scientist


Learn what it takes to succeed in the the most in-demand tech job

Harvard Business Review calls it the sexiest tech job of the 21st century. Data scientists are in demand, and this unique book shows you exactly what employers want and the skill set that separates the quality data scientist from other talented IT professionals. Data science involves extracting, creating, and processing data to turn it into business value. With over 15 years of big data, predictive modeling, and business analytics experience, author Vincent Granville is no stranger to data science. In this one-of-a-kind guide, he provides insight into the essential data science skills, such as statistics and visualization techniques, and covers everything from analytical recipes and data science tricks to common job interview questions, sample resumes, and source code.

The applications are endless and varied: automatically detecting spam and plagiarism, optimizing bid prices in keyword advertising, identifying new molecules to fight cancer, assessing the risk of meteorite impact. Complete with case studies, this book is a must, whether you're looking to become a data scientist or to hire one.

  • Explains the finer points of data science, the required skills, and how to acquire them, including analytical recipes, standard rules, source code, and a dictionary of terms
  • Shows what companies are looking for and how the growing importance of big data has increased the demand for data scientists
  • Features job interview questions, sample resumes, salary surveys, and examples of job ads
  • Case studies explore how data science is used on Wall Street, in botnet detection, for online advertising, and in many other business-critical situations

Developing Analytic Talent: Becoming a Data Scientist is essential reading for those aspiring to this hot career choice and for employers seeking the best candidates.


Vincent Granville, Ph.D. is a data scientist with 15 years of big data, predictive modeling, and business analytics experience. He is the co-founder of Data Science Central, which includes a robust editorial platform, social interaction, forum-based technical support, the latest in technology tools and trends, and industry job opportunities.


Introduction xxi

Chapter 1 What Is Data Science? 1

Real Versus Fake Data Science 2

Two Examples of Fake Data Science 5

The Face of the New University 6

The Data Scientist 9

Data Scientist Versus Data Engineer 9

Data Scientist Versus Statistician 11

Data Scientist Versus Business Analyst 12

Data Science Applications in 13 Real-World Scenarios 13

Scenario 1: DUI Arrests Decrease After

End of State Monopoly on Liquor Sales 14

Scenario 2: Data Science and Intuition 15

Scenario 3: Data Glitch Turns Data Into Gibberish 18

Scenario 4: Regression in Unusual Spaces 19

Scenario 5: Analytics Versus Seduction to Boost Sales 20

Scenario 6: About Hidden Data 22

Scenario 7: High Crime Rates Caused by Gasoline Lead. Really? 23

Scenario 8: Boeing Dreamliner Problems 23

Scenario 9: Seven Tricky Sentences for NLP 24

Scenario 10: Data Scientists Dictate What We Eat? 25

Scenario 11: Increasing Sales with Better Relevancy 27

Scenario 12: Detecting Fake Profiles or Likes on Facebook 29

Scenario 13: Analytics for Restaurants 30

Data Science History, Pioneers, and Modern Trends 30

Statistics Will Experience a Renaissance 31

History and Pioneers 32

Modern Trends 34

Recent Q&A Discussions 35

Summary 39

Chapter 2 Big Data Is Different 41

Two Big Data Issues 41

The Curse of Big Data 41

When Data Flows Too Fast 45

Examples of Big Data Techniques 51

Big Data Problem Epitomizing the

Challenges of Data Science 51

Clustering and Taxonomy Creation for Massive Data Sets 53

Excel with 100 Million Rows 57

What MapReduce Can’t Do 60

The Problem 61

Three Solutions 61

Conclusion: When to Use MapReduce 63

Communication Issues 63

Data Science: The End of Statistics? 65

The Eight Worst Predictive Modeling Techniques 65

Marrying Computer Science, Statistics, and Domain Expertise 67

The Big Data Ecosystem 70

Summary 71

Chapter 3 Becoming a Data Scientist 73

Key Features of Data Scientists 73

Data Scientist Roles 73

Horizontal Versus Vertical Data Scientist 75

Types of Data Scientists 78

Fake Data Scientist 78

Self-Made Data Scientist 78

Amateur Data Scientist 79

Extreme Data Scientist 80

Data Scientist Demographics 82

Training for Data Science 82

University Programs 82

Corporate and Association Training Programs 86

Free Training Programs 87

Data Scientist Career Paths 89

The Independent Consultant 89

The Entrepreneur 95

Summary 107

Chapter 4 Data Science Craftsmanship, Part I 109

New Types of Metrics 110

Metrics to Optimize Digital Marketing Campaigns 111

Metrics for Fraud Detection 112

Choosing Proper Analytics Tools 113

Analytics Software 114

Visualization Tools 115

Real-Time Products 116

Programming Languages 117

Visualization 118

Producing Data Videos with R 118

More Sophisticated Videos 122

Statistical Modeling Without Models 122

What Is a Statistical Model Without Modeling? 123

How Does the Algorithm Work? 124

Source Code to Produce the Data Sets 125

Three Classes of Metrics: Centrality, Volatility, Bumpiness 125

Relationships Among Centrality, Volatility, and Bumpiness 125

Defining Bumpiness 126

Bumpiness Computation in Excel 127

Uses of Bumpiness Coefficients 128

Statistical Clustering for Big Data 129

Correlation and R-Squared for Big Data 130

A New Family of Rank Correlations 132

Asymptotic Distribution and Normalization 134

Computational Complexity 137

Computing q(n) 137

A Theoretical Solution 140

Structured Coefficient 140

Identifying the Number of Clusters 141

Methodology 142

Example 143

Internet Topology Mapping 143

Securing Communications: Data Encoding 147

Summary 149

Chapter 5 Data Science Craftsmanship, Part II 151

Data Dictionary 152

What Is a Data Dictionary? 152

Building a Data Dictionary 152

Hidden Decision Trees 153

Implementation 155

Example: Scoring Internet Traffic 156

Conclusion 158

Model-Free Confidence Intervals 158

Methodology 158

The Analyticbridge First Theorem 159

Application 160

Source Code 160

Random Numbers 161

Four Ways to Solve a Problem 163

Intuitive Approach for Business Analysts with Great Intuitive Abilities 164

Monte Carlo Simulations Approach for Software Engineers 165

Statistical Modeling Approach for Statisticians 165

Big Data Approach for Computer Scientists 165

Causation Versus Correlation 165

How Do You Detect Causes? 166

Life Cycle of Data Science Projects 168

Predictive Modeling Mistakes 171

Logistic-Related Regressions 172

Interactions Between Variables 172

First Order Approximation 172

Second Order Approximation 174

Regression with Excel 175

Experimental Design 176

Interesting Metrics 176

Segmenting the Patient Population 176

Customized Treatments 177

Analytics as a Service and APIs 178

How It Works 179

Example of Implementation 179

Source Code for Keyword Correlation API 180

Miscellaneous Topics 183

Preserving Scores When Data Sets Change 183

Optimizing Web Crawlers 184

Hash Joins 186

Simple Source Code to Simulate Clusters 186

New Synthetic Variance for Hadoop and Big Data 187

Introduction to Hadoop/MapReduce 187

Synthetic Metrics 188

Hadoop, Numerical, and Statistical Stability 189

The Abstract Concept of Variance 189

A New Big Data Theorem 191

Transformation-Invariant Metrics 192

Implementation: Communications

Versus Computational Costs 193

Final Comments 193

Summary 193

Chapter 6 Data Science Application Case Studies 195

Stock Market 195

Pattern to Boost Return by 500 Percent 195

Optimizing Statistical Trading Strategies 197

Stock Trading API: Statistical Model 200

Stock Trading API: Implementation 202

Stock Market Simulations 203

Some Mathematics 205

New Trends 208

Encryption 209

Data Science Application: Steganography 209

Solid E‑Mail Encryption 212

Captcha Hack 214

Fraud Detection 216

Click Fraud 216

Continuous Click Scores Versus Binary Fraud/Non-Fraud 218

Mathematical Model and Benchmarking 219

Bias Due to Bogus Conversions 220

A Few Misconceptions 221

Statistical Challenges 221

Click Scoring to Optimize Keyword Bids 222

Automated, Fast Feature Selection with Combinatorial Optimization 224

Predictive Power of a Feature: Cross-Validation 225

Association Rules to Detect Collusion and Botnets 228

Extreme Value Theory for Pattern Detection 229

Digital Analytics 230

Online Advertising: Formula for Reach and Frequency 231

E‑Mail Marketing: Boosting Performance by 300 Percent 231

Optimize Keyword Advertising Campaigns in 7 Days 232

Automated News Feed Optimization 234

Competitive Intelligence with 234

Measuring Return on Twitter Hashtags 237

Improving Google Search with Three Fixes 240

Improving Relevancy Algorithms 242

Ad Rotation Problem 244

Miscellaneous 245

Better Sales Forecasts with Simpler Models 245

Better Detection of Healthcare Fraud 247

Attribution Modeling 248

Forecasting Meteorite Hits 248

Data Collection at Trailhead Parking Lots 252

Other Applications of Data Science 253

Summary 253

Chapter 7 Launching Your New Data Science Career 255

Job Interview Questions 255

Questions About Your Experience 255

Technical Questions 257

General Questions 258

Questions About Data Science Projects 260

Testing Your Own Visual and Analytic Thinking 263

Detecting Patterns with the Naked Eye 263

Identifying Aberrations 266

Misleading Time Series and Random Walks 266

From Statistician to Data Scientist 268

Data Scientists Are Also Statistical Practitioners 268

Who Should Teach Statistics to Data Scientists? 269

Hiring Issues 269

Data Scientists Work Closely with Data Architects 270

Who Should Be Involved in Strategic Thinking? 270

Two Types of Statisticians 271

Using Big Data Versus Sampling 272

Taxonomy of a Data Scientist 273

Data Science’s Most Popular Skill Mixes 273

Top Data Scientists on LinkedIn 276

400 Data Scientist Job Titles 279

Salary Surveys 281

Salary Breakdown by Skill and Location 281

Create Your Own Salary Survey 285

Summary 285

Chapter 8 Data Science Resources 287

Professional Resources 287

Data Sets 288

Books 288

Conferences and Organizations 290

Websites 291

Definitions 292

Career-Building Resources 295

Companies Employing Data Scientists 296

Sample Data Science Job Ads 297

Sample Resumes 297

Summary 298

Index 299


"I strongly recommend this book for readers whose background is related to data science, statistics, information technology and management, computer science, business analytics, and so on." (Online Information Review, May 2015)