Becoming a Data Scientist – Curriculum via Metromap

Data Science, Machine Learning, Big Data Analytics, Cognitive Computing …. well all of us have been avalanched with articles, skills demand info graph’s and point of views on these topics (yawn!). One thing is for sure; you cannot become a data scientist overnight. Its a journey, for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap? What tools and techniques do I need to know? How will you know when you have achieved your goal?

Given how critical visualization is for data science, ironically I was not able to find (except for a few), pragmatic and yet visual representation of what it takes to become a data scientist. So here is my modest attempt at creating a curriculum, a learning plan that one can use in this becoming a data scientist journey. I took inspiration from the metro maps and used it to depict the learning path. I organized the overall plan progressively into the following areas / domains,

  1. Fundamentals
  2. Statistics
  3. Programming
  4. Machine Learning
  5. Text Mining / Natural Language Processing
  6. Data Visualization
  7. Big Data
  8. Data Ingestion
  9. Data Munging
  10. Toolbox

Each area  / domain is represented as a “metro line”, with the stations depicting the topics you must learn / master / understand in a progressive fashion. The idea is you pick a line, catch a train and go thru all the stations (topics) till you reach the final destination (or) switch to the next line. I have progressively marked each station (line) 1 thru 10 to indicate the order in which you travel. You can use this as an individual learning plan to identify the areas you most want to develop and the acquire skills. By no means this is the end; but a solid start. Feel free to leave your comments and constructive feedback.

PS: I did not want to impose the use of any commercial tools in this plan. I have based this plan on tools/libraries available as open source for the most part. If you have access to a commercial software such as IBM SPSS or SAS Enterprise Miner, by all means go for it. The plan still holds good.

PS: I originally wanted to create an interactive visualization using D3.js or InfoVis. But wanted to get this out quickly. Maybe I will do an interactive map in the next iteration.

102 Replies to “Becoming a Data Scientist – Curriculum via Metromap”

  1. Swami this is great visual treat!
    In your next iteration you can consider super-imposing Big Data and Visualisation as layers on top of the basic layers !

    Another iteration would be to present the info as relevant to the people in the org – starting with the developer / data miner all the way up to the CIO/CEO.

    All the best.
    Gopal

  2. Nice chart!
    Maybe you can provide a printer-friendly version, inverting black and white colors. This way, people will be able to print it, and annotate the “stops” where they have already been.

    1. Nice idea!

  3. p.s. there is a typo in “Support VectoS machines”.

  4. […] Becoming a data scientist a journey; for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap?  […]

  5. Swami-
    THANK YOU!!
    I am beginning my path to becoming a data scientist – have my undergrad in Mathematics and trying to determine where to go next and how to do it.
    In addition to the MBA I am going to begin pursuing, this map is very useful. I have been looking for this type of thing for a few days now, and yours tops everything I’ve seen.

  6. […] Swami Chandrasekaran built a great subway-style map of the optimal data scientist skillset, which you can see on his […]

  7. […] Swami Chan­drasekaran built a great subway-style map of the opti­mal data sci­en­tist skillset, which you can see on […]

  8. Ravi Nalliyappa says: Reply

    Very creative representation. Congrats

  9. Very nice. Thanks for sharing

  10. Excellent work and highly remarkable. This helps everybody who wanted to pursue their carrier in Data Scientist.

  11. Hi Swami, nice post. What tool did you use for this drawing? Thank you so much! 🙂

  12. Swami,
    You have done a great work that is defiantly going to help people who are willing to explore the field of Data Science.
    I have a query, can you explain, what does the %age values represent in the road map (Shown inside yellow stars)?

    Thanks in advance,
    Aditya

  13. Wilco van Ginkel says: Reply

    Thanks for sharing this journey map. A few things come to mind when exploring the map:
    1/ Not every station should be equal – you can’t be good at every skill. Data scientist is team work with different roles. I suggest a “High Lights Line” across the Main Stations of each line. These stations represent the essentials which every data science team player needs to know.
    2/ The line travelled depends on the role of the traveller in the data science team – Not every line needs to be travelled by every team member. Each member travels the line which is relevant for his role (after he/she has first travelled the High Lights Line).
    3/ There is (currently) no line for soft skills – very important to communicate/present your work and understand where other people are coming from when talking about data. You can be the best data sciencist out there, but if you can’t convey your message, your work will not be resognized. Social skills are pretty important in team work.

    KR,
    Wilco

  14. Swami,

    I believe that I can say that in the name of your target audience: this is a great help, we love you, but two descriptive sentences on each stop could exponentially increase the effective usefulness of this map.

    The reason is that we do not know yet the domain, so while titles are fine mnemonics for experts, they are just not enough to get started for a beginner.

    Thank you again,
    Lorinc

  15. […] Becoming a Data Scientist – Curriculum via Metromap 12 | Pragmatic Perspectives […]

  16. Swami Chandrasekaran says: Reply

    Thanks all for your kind words, suggestions and comments. I will incorporate these in my next version. Happy that I was able to make a small dent and difference in this (Data Science) hot and yet ambiguous area.

    Also I’m working on zooming into each of the domains & corresponding stops to provide a 1-2 sentence overview of what a particular topic means. Kind of like a visual cheatsheet.

  17. Thanks for sharing this. It is very descriptive and motivational for me.

  18. Swami,

    thanks for the nice visualization; it gives a good overview.

    Best, Cordny

  19. […] 来源:Becoming a Data Scientist – Curriculum via Metromap […]

  20. Excellent collection of information.

  21. Hi,

    Very useful informative. The idea of using a metro for visualization is too good.

  22. great map

  23. This is really cool – thanks for sharing!

  24. Wow, this is really great. This is going to start appearing on cubicle walls very quickly!! It’s great for data scientists that want to round off their skills.

  25. […] Becoming a Data Scientist – Curriculum via Metromap ← Pragmatic Perspectives […]

  26. Himanshu Jha says: Reply

    Hello Swami

    Thanks a lot for visualization the map of Data Scientist.

    Could you please provide us some referance for all this stations (topics).

    Himanshu Jha

  27. […] data scientist also shows the vast amount of skills required to become a big data scientist. Swami Chandrasekaran took it upon him to visualize the long road to become a big data scientist. One thing is for […]

  28. […] data scientist also shows the vast amount of skills required to become a big data scientist. Swami Chandrasekaran took it upon him to visualize the long road to become a big data scientist. One thing is for […]

  29. hi

    thanks for the image. its very relevant.

    is it possible to request the topics as a list or plain text ?

    thanks
    suwonsi@gmail.com

  30. […] In a bit of a departure, Swami Chandrasekaran put together a great infographic on the long road to becoming a Big Data scientist. While every operation that wants to utilize Big Data may not have to go through this arduous […]

  31. […] very interesting roadmap of what it takes to be the comprehensive data […]

  32. […] Swami Chandrasekaran summarized the long journey becoming a data scientist in his post, Becoming a Data Scientist – Curriculum via Metromap. […]

  33. Hi Swami,

    Thank you for providing the map! I believe data scientists will be happy to get to know what they don’t know.
    I listed all the keywords in your map, linking with Wikipedia pages and development pages of tools.

    http://sobigdata.com/2013/08/07/long-journey-to-data-scientists/

    Thank you, again!

  34. This really is helpful! I find that I started in the middle of the path (stations 4-6) and I am struggling to understand what I need from stations 1-3 and what to do to get through stations 7-10. Can you post what sources you consulted to make this? I think that would help me understand it better.

  35. Brilliant graphics. This is what I was looking for. Really helpful. Already looking forward to the next one.

  36. […] Becoming a Data Scientist – Curriculum via Metromap ← Pragmatic Perspectives. […]

  37. […] maintains a map of the various programs across the United States. Swami Chandrasekaran built a Metromap visualization of the data scientist curriculum – covering statistics, programming, machine […]

  38. This is truly awesome!!! makes everything digestable

  39. […] PS:  I loved this graphic by Swami Chandrasekaran from the article “Becoming a Data Scientist“. […]

  40. […] Becoming a data scientist a journey; for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap?  […]

  41. […] Welcome! My goal is to provide a data science primer loosely based on an incredible “Metro” chart of data science learning paths. […]

  42. […] Becoming a Data Scientist – Curriculum via Metromap ← Pragmatic Perspectives […]

  43. This is very nice article. Really creative way of representing the BigData buzz words.

  44. […] The origin: http://nirvacana.com/thoughts/becoming-a-data-scientist/ […]

  45. […] The origin: http://nirvacana.com/thoughts/becoming-a-data-scientist/ […]

  46. Loved the way you have captured it along with percentage completion. Would it be ok with you if I used this pic in conveying a message? Credits will be duly given, rest assured.

    1. Swami Chandrasekaran says: Reply

      Sure

  47. […] 1. Becoming A Data Scientist – Curriculum via Metromap […]

  48. Mahesh Chandrasekaran says: Reply

    Hi Swami,

    Great Article… This is the only article I found in the internet which clearly describes the information about what a Data Scientist should know !

    If possible can you suggest some books ?

    Thanks in Advance.
    Mahesh Chandrasekaran

    1. Swami Chandrasekaran says: Reply

      – If you want to get applied skills, and if you have picked the R route, I would recommend “Data Mining with R: Learning with Case Studies (ISBN-10: 1439810184)” and go through it end to end.
      – If you have access to IBM SPSS Modeler then use the Applications Guide, ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/15.0/en/ApplicationsGuide.pdf
      – There are tons of data sets that are publicly available – UCI, Amazon Review Data, data.gov etc. Use them and DIY.
      – Also recommend “Machine Learning for Hackers” (ISBN: 1449303714)

  49. […] the big picture it’s worth looking at Swami Chandrasekaran’s Tube Map like picture of the data science […]

  50. […] original blog is here. And I was actually directed there from a question in […]

  51. Really very good stuff…Thanks Swami

  52. nice to look.. which gives an overview about the whole process to become a data scientist

  53. Very inventive way to convey all of this information – awesome work! It definitely represents that the pathway to becoming a data scientist is long one, and there are many, many tools to learn.

    A great resource for those interested in learning these tools is this comprehensive post from Zipfian Academy: http://blog.zipfianacademy.com/post/46864003608/a-practical-intro-to-data-science

  54. […] Swami Chandrasekaran created an amazing subway-style map of areas involved in data science. It’s posted (in larger form!) on his blog, nirvacana.com. […]

  55. Forwarded by D. Schwartz, heres a sweet metro map visualization a learning map by Swami Chandrasekaran showing the journey to become a data scientist. The map and comments are on his blog, nirvacana.com.

  56. […] Chandrasekaran created is a “metromap” depicting a sample learning path which a budding Data Scientist could follow in order to […]

  57. […] Becoming a Data Scientist – Curriculum via Metromap ← Pragmatic Perspectives […]

  58. The map makes sense only in case when traveler knows the destination

  59. Don’t know why it has suddenly become a dirty word but good old fashioned SQL is still going to be required for quite a while. Believe it or not most Data Scientists are going to run into it sooner rather than later.

  60. It’d be great if each stop had a link to resources. Where can I learn Sharding?

  61. This is Divine, for a person like me who needs the big picture before the dive. thanks much.

  62. […] Swami Chandrasekaran built a great subway-style map of the optimal data scientist skillset, which you can see on his […]

  63. Karthik Srinivasan says: Reply

    The Metro-map is superb. Quite exhaustive. But it took me some time to digest the cumulative percentages. This is my next favourite Data Science visual depiction after the Data-science Venn diagram of Drew Conway.(in order of chronology only)

    Thanks for putting it up.

  64. Swami,

    Outstanding – thought map, thanks for the nice visualization;

    Regards,
    Manash

  65. Hi Swamy ,

    This is an eye opener . Recently I graduated to become a data scientist in my organisation and i have realized that there are multiple things that I have missed in my journey and they do seem inevitable looking at the links here.

    This is really awesome !

    Regards
    Anirudh Kala

  66. […] Becoming a data scientist a journey; for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap? What tools and techniques do I need to know? How will you know when you have achieved your goal?  […]

  67. Hi Swami,

    I’ve heard about this map from a friend of mine. That is really a wonderful work. Thanks.

    Is it okey if i use the map by addressing through this page on the capture in my blog?

    1. Swami Chandrasekaran says: Reply

      Sure go ahead and use it.

  68. Brilliant Swami. This will help me to be on the right track and change tracks appropriately. I will have this on my wall.

  69. […] “ Becoming a data scientist a journey; for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap?”  […]

  70. Zeeshan Sabir says: Reply

    Mr.Swami, really good work!
    This is abosolutely wonderful illustration to map high level learning abstractions into low level grains.

  71. Would I really just become a data scientist or get some credits with universities i would enroll in so as to officially graduate, if I just worked on material/resources required?

    Thanks again Shwami

  72. Hi Swami, really impressive visual. I am studying to get into data science at the moment so this is incredibly helpful. any chance of emailing me a high quality version, i would like to print it onto a large poster and put it up on the wall in my room. Thanks, Hom

  73. Outstanding!

    I came across your page while searching for a framework to train data scientists. This is the most comprehensive and cohesive one I have come across so far. The potential for extending this is enormous as pointed out in a bunch of replies. This can easily become a framework using which users can add their own weights, skills tracks, highlights, checklists, etc!

    Hoping that you get time to complete the D3 version!

    Thanks for sharing. I am hoping that the permission you granted to “Eroteme” applies to all of us.

  74. James Rissler says: Reply

    I cannot express in words my gratitude for putting this together. As a current programmer looking to venture in data science – this has provided an excellent roadmap for continued learning / working towards my goal of being a full time data scientist.

    Thanks again,
    James

  75. John Wandeto says: Reply

    Swami,this is nice job.
    In programming, why did you find it necessary to have both Python and R?can’t one of these (or any other e.g. Octave, C++) be sufficient?
    Nice time.

  76. Nice pic!

    Only one thing: IMHO Support Vector Machines is missplaced. It should be in the Machine Learning path.

  77. Very nice. Certainly as “learing roadmap” I find it nice & useful.

  78. I think this is a great technical roadmap, but where is the business domain tube line or would we continue the metaphor and say each city in which this tube map is used, is a business domain/context: London city = Customer, Munich = Operations/Supply Chain, New York City = Risk, Tokyo = Design. The business context is one of the most key areas for applying data science skills and adds that extra dimension that moves an academic data scientist from the one dimension mathematical/scientific into the application to business context, thus rounding out the profile.

  79. Hi

    Very useful map. Thanks.
    By the way, this was published two years back.
    Any updates you would suggest in the map??

  80. Hi, I would like to thank you for publishing such a good road map. Can you however, guide me on how to get started because I’m a fresher with absolutely no experience. Can you suggest the requisite courses to be done and skills to be learnt to successfully become a data scientist?

    Thank you again 🙂

  81. vijayakumar gopalakrishnan says: Reply

    Great Work

  82. Hi Swami,

    Thank you for the excellent work!

    A question please, this visualisation shows how to become a Data Scientist getting on the different trains each time and learn the required skills. But does this visual also shows that these are the stages a project should follow? So, would each one of the Data Science roles reflect in one of these lines?

    Fundamentals :
    Statistics : Statistician
    Programming : Data Engineer / Data Architecture?
    Machine Learning
    Text Mining / Natural Language Processing
    Data Visualization
    Big Data
    Data Ingestion
    Data Munging
    Toolbox

    Please if anyone knows can you fill in what job does what part of the chart?

    Will be greatly appreciated!

    Best,
    Dorita

  83. Hi Swami,

    Great job on this! Would it be possible to get a hi res copy of this graphic? Would like to print and laminate for my team’s war room to inspire them as they become more skilled in data science.

    Please PM me if this is possible and I have your permission to print and laminate.

    Cheers,
    Neal

  84. Hi Swami
    Can you guide which institute is best for learning Data science from India
    Thanks

  85. Hi guys,

    I just created a Github to work with this beautiful roadmap.

    We could start posting tutoriels, short pieces of code .. Everything to make it more clear for new people in data science.

    I’ll start to add some, but I can’t do this all by myself 😉

    1. Swami Chandrasekaran says: Reply

      Great work Emeric.

  86. Hi, Swami!

    Can I use your image on my new data science book? I will give you the credit for the image and I will write your URL.

    Thanks

    1. Swami Chandrasekaran says: Reply

      Sure can.

  87. Excellent curriculum!

  88. Hi Swami,

    I am conducting research about using diagrams in searching and browsing. Can I use your diagram about data science? I may include it in a paper or in a demo website with a complete citation (I will include your name and url).

    Thank you.
    Hisham Benotman.

    1. Swami Chandrasekaran says: Reply

      Yes you can. Thanks for asking.

  89. Hi, Swami
    Great job! Is the interactive version of this Data science subway map already available ? Can I use your map on my website including your name and url ?
    Thank you very much.

    1. Swami Chandrasekaran says: Reply

      Yes you can use. I don’t have an interactive version yet.

  90. Great content useful for all the candidates of Data Science training who want to kick start these career in Data Science training field.

Leave a Reply