Hello everybody,
I didn't use ggplot2 that much until 2 weeks ago. Which in my opinion was a big mistake on my behalf. I think my train of thought shifted from
"I just want to plot something, I don't directly get what's going on with this ggplot2 thing. I'll just find a quick solution"
to
"Wow! Once you get the basic idea, the world becomes your oyster!"
Personally, I blame Matlab. I'm just so used to use a different function for calling each plot (do a specific thing with Input data), that I did not directly realize that a plot is basically (Input data) + (do something with it)
Once you get a hang for it, it get's really fun, because you often just have to change a geom_point() to a geom_smooth() to get a completely different thing - or simply put them both in a row!
Anyhow, without further ado, here is my tutorial-like pdf:
Cheers,
Hannes
Thursday, July 23, 2015
Thursday, July 9, 2015
Links I favorited about data science
I had a few weeks on where I used Twitter mostly on my phone. So I started blindly favoriting tweets that could be usefull. This blog post is mostly for me to curate all these data related links. If it comes in handy for others the better. I try to sort them a bit after topics... hat tips go to all the data scientists that show an immigrant like me some interesting things (too lazy to list them right now...)
General Methods and algorithms
- Data Elixir - what I am doing with this blog post in big
- Machine learning primer - did just skim over it, but it seems this series is great to communicate very important and central concepts with people new to the field
- Statistical learning overload - haven't watched the videos yet, but the Hastie book (freely available) and all the things coming with it are probably a first step for any data scientist immigrant
- Statistical Data Mining Tutorials collection
- Machine learning cheat sheets - a great combination with the Hastie book. Use some of the quick cheat sheet information first and then get down to the more gory details using the book and videos. Check out this sheet for example
- Hyper parameter selection
- A lot of data science cheat sheets (which basically forces you to read more links)
- Machine Learning Visualizations (made in Python and R, horray)
R
- R introduction plus text mining course - @StatsInTheWild is probably my favorite twitter handle
- dplyr tutorial - I still live in a dplyr less world. Which I guess I should regret every time I write df[df[,'Stat1']>0 & df [,'Stat2']>2,] - people might call that dumb, I call it oldschool
- ggrepel: I finally can use the ggplot package for messy textplots
Python
- Speed up Pandas by using categoricals
- pipe data frames through functions (seems neat)
- Sometimes you might need a horizontal boxplot
- Neural Network Implementation (HowTo) - I have no practical experience with neural networks and haven't gone indepth here yet. But I usually like to steal Sebastian Raschka's code ;) - this network activation cheat sheet belongs here as well
- More neural networks
- Data visualization with Python and JavaScript - presentation (side note: presentations are not the nicest form to digest afterwards. But I guess this one tackles a lot of points)
- How to build Python packages
- Scikit-learn-classifiers
Other languages
- The art of command line - really need to get into this one, as my command line skills suck :D
Subscribe to:
Posts (Atom)