Becoming a Data Scientist – Curriculum via Metromap 77

Data Science, Machine Learning, Big Data Analytics, Cognitive Computing …. well all of us have been avalanched with articles, skills demand info graph’s and point of views on these topics (yawn!). One thing is for sure; you cannot become a data scientist overnight. Its a journey, for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap? What tools and techniques do I need to know? How will you know when you have achieved your goal?

Given how critical visualization is for data science, ironically I was not able to find (except for a few), pragmatic and yet visual representation of what it takes to become a data scientist. So here is my modest attempt at creating a curriculum, a learning plan that one can use in this becoming a data scientist journey. I took inspiration from the metro maps and used it to depict the learning path. I organized the overall plan progressively into the following areas / domains,

  1. Fundamentals
  2. Statistics
  3. Programming
  4. Machine Learning
  5. Text Mining / Natural Language Processing
  6. Data Visualization
  7. Big Data
  8. Data Ingestion
  9. Data Munging
  10. Toolbox

Each area  / domain is represented as a “metro line”, with the stations depicting the topics you must learn / master / understand in a progressive fashion. The idea is you pick a line, catch a train and go thru all the stations (topics) till you reach the final destination (or) switch to the next line. I have progressively marked each station (line) 1 thru 10 to indicate the order in which you travel. You can use this as an individual learning plan to identify the areas you most want to develop and the acquire skills. By no means this is the end; but a solid start. Feel free to leave your comments and constructive feedback.

PS: I did not want to impose the use of any commercial tools in this plan. I have based this plan on tools/libraries available as open source for the most part. If you have access to a commercial software such as IBM SPSS or SAS Enterprise Miner, by all means go for it. The plan still holds good.

PS: I originally wanted to create an interactive visualization using D3.js or InfoVis. But wanted to get this out quickly. Maybe I will do an interactive map in the next iteration.

RoadToDataScientist

77 thoughts on “Becoming a Data Scientist – Curriculum via Metromap

  1. Pingback: Becoming a Data Scientist - Curriculum via Metr...

  2. Reply Anutham Mar 28,2014 12:24 PM

    Brilliant Swami. This will help me to be on the right track and change tracks appropriately. I will have this on my wall.

  3. Reply Tarık Alruncu Mar 28,2014 9:57 AM

    Hi Swami,

    I’ve heard about this map from a friend of mine. That is really a wonderful work. Thanks.

    Is it okey if i use the map by addressing through this page on the capture in my blog?

  4. Pingback: Becoming a Data Scientist - Curriculum via Metr...

  5. Reply Anirudh Feb 17,2014 2:24 PM

    Hi Swamy ,

    This is an eye opener . Recently I graduated to become a data scientist in my organisation and i have realized that there are multiple things that I have missed in my journey and they do seem inevitable looking at the links here.

    This is really awesome !

    Regards
    Anirudh Kala

  6. Reply Manash Jan 14,2014 2:53 PM

    Swami,

    Outstanding – thought map, thanks for the nice visualization;

    Regards,
    Manash

  7. Reply Karthik Srinivasan Jan 3,2014 11:56 AM

    The Metro-map is superb. Quite exhaustive. But it took me some time to digest the cumulative percentages. This is my next favourite Data Science visual depiction after the Data-science Venn diagram of Drew Conway.(in order of chronology only)

    Thanks for putting it up.

  8. Pingback: Kaggle now has 100K data scientists, but what’s a data scientist? | BaciNews

  9. Reply Vikash Kodati Dec 25,2013 11:31 PM

    This is Divine, for a person like me who needs the big picture before the dive. thanks much.

  10. Reply Francesco Dec 20,2013 7:59 PM

    It’d be great if each stop had a link to resources. Where can I learn Sharding?

  11. Reply YesSQL Dec 20,2013 1:58 PM

    Don’t know why it has suddenly become a dirty word but good old fashioned SQL is still going to be required for quite a while. Believe it or not most Data Scientists are going to run into it sooner rather than later.

  12. Reply Boris Dec 9,2013 3:12 PM

    The map makes sense only in case when traveler knows the destination

  13. Pingback: Carrière/ idées apprentissage | Pearltrees

  14. Pingback: Step Zero: “I have an existential map. It has ‘You are here’ written all over it.” — Stephen Wright | The Hopeful Statistician

  15. Pingback: Data Science Metromap | Bring on the Data!

  16. Pingback: Data Science Metromap | Bring on the Data!

  17. Reply Katie Kent Oct 29,2013 7:40 PM

    Very inventive way to convey all of this information – awesome work! It definitely represents that the pathway to becoming a data scientist is long one, and there are many, many tools to learn.

    A great resource for those interested in learning these tools is this comprehensive post from Zipfian Academy: http://blog.zipfianacademy.com/post/46864003608/a-practical-intro-to-data-science

  18. Reply surendar Oct 25,2013 6:25 AM

    nice to look.. which gives an overview about the whole process to become a data scientist

  19. Reply Ravi Oct 23,2013 2:47 PM

    Really very good stuff…Thanks Swami

  20. Pingback: Road to Data Scientist by MetroMap | 銀返し

  21. Pingback: So you want to be a data scientist? #BigData | "It’s better to be absolutely ridiculous than absolutely boring."

  22. Reply Mahesh Chandrasekaran Sep 22,2013 5:24 PM

    Hi Swami,

    Great Article… This is the only article I found in the internet which clearly describes the information about what a Data Scientist should know !

    If possible can you suggest some books ?

    Thanks in Advance.
    Mahesh Chandrasekaran

  23. Pingback: Road To Become A Data Scientist | Socially Wired

  24. Reply Eroteme Sep 14,2013 4:58 PM

    Loved the way you have captured it along with percentage completion. Would it be ok with you if I used this pic in conveying a message? Credits will be duly given, rest assured.

  25. Pingback: A blog to share: How to become a data scientist | Peng's Blog

  26. Pingback: A blog to share: How to become a data scientist | Peng's Blog

  27. Reply Mahesh Balija Sep 11,2013 1:51 PM

    This is very nice article. Really creative way of representing the BigData buzz words.

  28. Pingback: Enesdemirci | Pearltrees

  29. Pingback: A Data Science Tutorial | Data Spring

  30. Pingback: Becoming a Data Scientist - Curriculum via Metr...

  31. Pingback: Artificial Intelligence Blog · Category Theory ?

  32. Reply Chinmoy Aug 25,2013 9:36 AM

    This is truly awesome!!! makes everything digestable

  33. Pingback: New Programs Emerge to Train Big Data Scientists « EDUCATION

  34. Pingback: Becoming a Data Scientist – Curriculum via Metromap ← Pragmatic Perspectives | Biotechnology + Innovation

  35. Reply Prasad Aug 18,2013 10:34 AM

    Brilliant graphics. This is what I was looking for. Really helpful. Already looking forward to the next one.

  36. Reply anp114 Aug 14,2013 3:06 PM

    This really is helpful! I find that I started in the middle of the path (stations 4-6) and I am struggling to understand what I need from stations 1-3 and what to do to get through stations 7-10. Can you post what sources you consulted to make this? I think that would help me understand it better.

  37. Reply SoBigData Aug 7,2013 9:27 AM

    Hi Swami,

    Thank you for providing the map! I believe data scientists will be happy to get to know what they don’t know.
    I listed all the keywords in your map, linking with Wikipedia pages and development pages of tools.

    http://sobigdata.com/2013/08/07/long-journey-to-data-scientists/

    Thank you, again!

  38. Pingback: Long Journey to Data Scientists | So Big Data

  39. Pingback: Data Science Courses | datascienceuml

  40. Pingback: 10 Big Data Infographics « Viafoura Blog

  41. Pingback: Startups, data scientists and visualisation stuff – Vic’s midweek reading blog « OptimalHq

  42. Reply s Jul 23,2013 10:19 AM

    hi

    thanks for the image. its very relevant.

    is it possible to request the topics as a list or plain text ?

    thanks
    suwonsi@gmail.com

  43. Pingback: The long road to become a big data scientist – infographic | IBM Watson Cloud Computing

  44. Pingback: The long road to become a big data scientist – infographic | Hadoop 2.0 2013 and beyond

  45. Reply Himanshu Jha Jul 18,2013 9:44 AM

    Hello Swami

    Thanks a lot for visualization the map of Data Scientist.

    Could you please provide us some referance for all this stations (topics).

    Himanshu Jha

  46. Pingback: 未整理 | Pearltrees

  47. Reply Mick Kerrigan Jul 17,2013 5:48 AM

    Wow, this is really great. This is going to start appearing on cubicle walls very quickly!! It’s great for data scientists that want to round off their skills.

  48. Pingback: IBM 首席架構師告訴你如何進入「21 世紀最性感職業」:數據分析科學家 | TechOrange《 專訪與人物 |

  49. Pingback: Data Science Curriculum Road Map | Melissa Learns Data Science

  50. Reply Carrie Gallagher Jul 16,2013 2:25 PM

    This is really cool – thanks for sharing!

  51. Reply jigyasa Jul 16,2013 9:29 AM

    great map

  52. Reply Narasimman Jul 15,2013 10:26 PM

    Hi,

    Very useful informative. The idea of using a metro for visualization is too good.

  53. Reply vamsi Jul 15,2013 7:14 PM

    Excellent collection of information.

  54. Pingback: EduDeavor | 如何成为一名数据科学家

  55. Reply TestingSaaS Jul 15,2013 11:25 AM

    Swami,

    thanks for the nice visualization; it gives a good overview.

    Best, Cordny

  56. Reply Rajnish Jul 13,2013 9:48 PM

    Thanks for sharing this. It is very descriptive and motivational for me.

  57. Reply Swami Chandrasekaran Jul 13,2013 4:12 PM

    Thanks all for your kind words, suggestions and comments. I will incorporate these in my next version. Happy that I was able to make a small dent and difference in this (Data Science) hot and yet ambiguous area.

    Also I’m working on zooming into each of the domains & corresponding stops to provide a 1-2 sentence overview of what a particular topic means. Kind of like a visual cheatsheet.

  58. Pingback: Data Viz News [15] | Visual Loop

  59. Reply lorinc Jul 13,2013 3:17 PM

    Swami,

    I believe that I can say that in the name of your target audience: this is a great help, we love you, but two descriptive sentences on each stop could exponentially increase the effective usefulness of this map.

    The reason is that we do not know yet the domain, so while titles are fine mnemonics for experts, they are just not enough to get started for a beginner.

    Thank you again,
    Lorinc

  60. Reply Wilco van Ginkel Jul 13,2013 1:02 PM

    Thanks for sharing this journey map. A few things come to mind when exploring the map:
    1/ Not every station should be equal – you can’t be good at every skill. Data scientist is team work with different roles. I suggest a “High Lights Line” across the Main Stations of each line. These stations represent the essentials which every data science team player needs to know.
    2/ The line travelled depends on the role of the traveller in the data science team – Not every line needs to be travelled by every team member. Each member travels the line which is relevant for his role (after he/she has first travelled the High Lights Line).
    3/ There is (currently) no line for soft skills – very important to communicate/present your work and understand where other people are coming from when talking about data. You can be the best data sciencist out there, but if you can’t convey your message, your work will not be resognized. Social skills are pretty important in team work.

    KR,
    Wilco

  61. Reply Aditya Jul 13,2013 8:23 AM

    Swami,
    You have done a great work that is defiantly going to help people who are willing to explore the field of Data Science.
    I have a query, can you explain, what does the %age values represent in the road map (Shown inside yellow stars)?

    Thanks in advance,
    Aditya

  62. Reply Angel Jul 13,2013 7:52 AM

    Hi Swami, nice post. What tool did you use for this drawing? Thank you so much! :)

  63. Reply Sridhar Jul 12,2013 7:15 PM

    Excellent work and highly remarkable. This helps everybody who wanted to pursue their carrier in Data Scientist.

  64. Pingback: Visualising the Road to Becoming a Data Scientist | What's The Big Data?

  65. Reply Jonathan Jul 11,2013 8:58 PM

    Very nice. Thanks for sharing

  66. Reply Ravi Nalliyappa Jul 11,2013 6:51 PM

    Very creative representation. Congrats

  67. Pingback: Kaggle now has 100K data scientists, but what’s a data scientist? | 8ballbilliard

  68. Pingback: Kaggle now has 100K data scientists, but what’s a data scientist? ← techtings

  69. Reply Natalie Peters Jul 10,2013 10:36 AM

    Swami-
    THANK YOU!!
    I am beginning my path to becoming a data scientist – have my undergrad in Mathematics and trying to determine where to go next and how to do it.
    In addition to the MBA I am going to begin pursuing, this map is very useful. I have been looking for this type of thing for a few days now, and yours tops everything I’ve seen.

  70. Pingback: Becoming a Data Scientist - Curriculum via Metr...

  71. Reply Giovanni M Dall'Olio Jul 9,2013 12:36 PM

    p.s. there is a typo in “Support VectoS machines”.

  72. Reply Giovanni M Dall'Olio Jul 9,2013 12:35 PM

    Nice chart!
    Maybe you can provide a printer-friendly version, inverting black and white colors. This way, people will be able to print it, and annotate the “stops” where they have already been.

  73. Reply Gopal Jul 9,2013 5:52 AM

    Swami this is great visual treat!
    In your next iteration you can consider super-imposing Big Data and Visualisation as layers on top of the basic layers !

    Another iteration would be to present the info as relevant to the people in the org – starting with the developer / data miner all the way up to the CIO/CEO.

    All the best.
    Gopal

Leave a Reply