Computing is enormously important in modern social science, but little attention is paid to teaching researchers how to do it efficiently and reliably; even in highly quantitative fields. So we are often left to our own devices. Below are some of my thoughts about the uses of various tools and some ideas about how to become proficient in their use.
If you are on OSX check out my setup tutorial, which covers R, Python, and Emacs, among other things.
R is great for cleaning, manipulating, exploring, modeling, and visualizing data, which is what social scientists spend most of their time doing. There are a huge number of libraries available, many of which are very high quality. It is often quite slow (though there are things like Rcpp) and the syntax is a bit inconsistent. It is not a very good general purpose language (by statisticians for statisticians). You can be fantastically productive in this language though.
For an introduction to R with no programming background
There are a number of books that are both an introduction to statistics and an introduction to R. I haven't read any of these books, but I've heard good things about Dalgaard's book. Cosma Shalizi's statistical computing class is pretty sweet too.
If you are either not new to R, or not new to programming
One of the nice things about R is that there are so many useful things already implemented. Be sure to check out the CRAN Task Views, which are curated lists of packages organized by task (e.g. econometrics, spatial, bayes, nlp). CRANberries and R-bloggers are good ways to keep up with development news.
There are a couple of packages that are "must-haves" for data analysis:
If you don't have a pre-existing preference for a particular text editor I'd highly recommend R-Studio.
Python is a general purpose language (as opposed to a domain specific language) that is great for getting data in its various forms (scraping data from the web, interacting with relational data bases, parsing text files, etc.), manipulating said data, doing analysis, and visualizing data. The Python data analysis stack consists of (my opinion, not canonical):
There are vastly more resources for learning Python than there are for R. For a general introduction to the language Learn Python the Hard Way is a good place to start. Google also has a python course. The official tutorial is also pretty good. IPython notebooks can be viewed (rendered) on nbviewer. Below are a number of other excellent resources.
I do not recommend Emacs if you are new to programming. If you are planning on spending a lot of time coding though (like most quantitative social scientists do) it is worth investing (your time) in an editor, though, perhaps it is wise to keep this in mind as well.
On a Mac the best way to install it is using the OSX specific binary or Homebrew (I have a tutorial covering the latter option). Once you've installed things start with the built in tutorial (
C-h t, read
control h, t).
The Emacs Wiki is huge (and messy, though, not as bad as it used to be). Other things to check out:
Must have packages for social scientists include:
Git is a distributed revision control application. I've written some about why you should be using Git and GitHub if you are a quantitative social scientist. I have also written a tutorial that is enough to be basically functional from the command line with Git and GitHub. GitHub is a hosting service for Git repositories that makes collaboration and a variety of other tasks a lot easier. GitHub gives free private repositories to college students/professors for 2 years.
While there aren't many people writing exclusively about Git (other than people working at GitHub and such) there is lots of good community content out there. /r/git gets a fair bit of it.
This GitHub cheatsheet is also pretty nice.