Skip navigation

How to start with data science?

Data science involve three sets of skills: 1) applied statistics, 2) computer science and IT (e.g., machine learning + coding + SQL), and 3) domain knowledge + what they refer to as hacking skills! There is an interesting article by David L Danoho (Stanford Statistics Professor) entitled as 50 years of data science where he discusses that data science is different from traditional statistics because: 1) the emphasis is not on inference but on learning (statistical learning, or machine learning), 2) it requires some IT and big data skills (e.g., database management, SQL) + coding skills.

To become a data scientist, you can start with one of the two most popular data sience languages: Python, or R. I prefer Python, and it seems Python is more advanced with recent development (such as deep learning), but R is catching up. There are some key libraries in Python you need to get familiar with, such as pandas & numpy (for Data wrangling), matplotlib & seaborn (data visualization), sklearn & statsmodel (machine learning and time-series data analysis). Two machine learning subjects you should start with are: 1) regression, and 2) classification. Download some data sets from UC Irvine Machine Learning Repository, or from OpenML , load the data set using pandas library, visualize the data using matplotlib and seaborn, cleanse the data (again using pandas), process the data and the apply some classification & regression algorithms from sklearn library.

Well, you would need to learn each regression and classification algorithm, but for applying them, you don't have to reinvent them. For this, you can refer to the bible of statistical learning by Hastie, Friedman, and Tibshirani: The Elements of Statistical Learning. To learn the statistics (mainly EDA, exploratory data analysis) for data science, there is an excellent book to start with: thinkstats.

After this, depending on what you want to do, you can advance in various directions: deep learning using Keras, TensorFlow or PyTorch, interactive data visualization, Tableau, cloud computing (AWs, Microsoft Azure, Google GCP, ...), PySpark and big data analytics.

I got no work experience. What can I do?

Well, you may need a mentor, as it's easy to get overwhelmed. Data science is very broad and depending on the industry you want to go in, different skills may be emphasized. The industry has started to care less and less about your degree and more and more about your skills. It is always more difficult to get the first job! There are even companies that take advantage of brilliant people with no data science related work experience, revising their resume and finding them a job for 60k per year (obviously they get a good commision for trapping a good engineer for a cheap company)! There are also many companies who develop online courses on python, machine learning, etc and make money through that! They don't even develop the course lectures themselves, they hire cheap talent to create those materials for them. Not sure why people still pay for online courses when the best of them are available for free (well, maybe we shouldn't underestimate people's ignorance! It should be good, that's why its not free)! Going to bootcamps is an exception... If you decide to go, go to the best of them. The bootcamps will scratch the surface for you, you learn the language to get the job, and well the employer gotto deal with the depth(!) and you learn the actual stuff at work!

I have a PhD, should I get a degree in data science before getting a job?

If you have a PhD or MS in a quantitative field and you already have coding experience, I suggest that you start self-training, take some online courses, and teach yourself. Remember that the best courses are available for free. The cost is your commitment. You can start, for example, with Kaggle Micro-Courses or Google Machine Learning Crash Course. Keep in mind that the best school to learn data science at practice is "at work"! Learn the fundamentals, research and review interview questions, improve your skills, and keep trying. Corporate America cares less about your degree and more about your skills. Many companies (including Google and Apple) don't have any requirement for a collage degree.

If you are aware of great resources, it will be great if you let me know so I would share it here. You can contact me through mehr@stanford.edu

How to succeed at interviews?

Depends on the industry and the role you are applying for. To get a job in tech industry (including jiant tech companies such as Apple, Amazon, or Google), you don't need any great resume (the resume is only to get you an interview), you just need to answer the interview questions as good as possible. Interview questions are very repetitive, particularly if you are applying for a software position. It just takes time and prepration. The Oil and Gas sector is very different. To get a job in Oil and Gas, if the oil price is high, you just need to be able to pronounce oil, otherwise, you need to do better than other applicants (particularly at the interview).

How to form a data science team?

For companies who don't want to fall behind in this competetive market, forming a data sceince team is the first thought. Forming a good team may cost one to two million dollars per year and finding competent data scientists/engineers is not easy. People are used to do things like before, whereas the new era of data science is a very dynamic environment with a particular emphasis on open source and cloud computing. I will be happy to help you find a proper strategy going forward. You can contact me through mehr@stanford.edu.

Learning to get job with 6-digit pay

I organize courses to teach data science and guide and mentor students to succeed at interviews. You can fill out this form or contact me directly if you are interested to participate.

-->