Getting Started with Arabic Text Analysis
See my workflow:
- Code, data and slides from a 5-day workshop on Arabic Text Analysis at Cairo University, Spring 2019.
Arabic Stemmer for Text Analysis
I developed a stemmer for Arabic text analysis for the text analysis in my book, Deadly Clerics. I primarly use it in R, where it is publicly available as the "arabicStemR" package (example code). It is also part of the txtorg utility for text analysis workflow, and I've implemented a Python version (on the txtorg github page, or upon request). The stemmer is loosely based on the light 10 stemmer, but with a number of modifications. I need to improve the documentation, but here is the CRAN manual and a description of what the stemmer actually changes and removes.
If you use the stemmer, please cite: Nielsen, Richard A. 2017. Deadly Clerics: Blocked Ambition and the Paths to Jihad. Cambridge University Press.
Arabic Unicode in R
Some example code for getting Unicode strings into and out of R.
Arabic Typesetting in Latex
Some tricks for type-setting Arabic in latex (I hacked some of this together in 2010, but updated with help from David Romney and Gary King in 2014). If you want a choice of fonts, the flexible way is through XeTeX, a Latex-like program that comes free with most distributions of Tex. An example is here: .zip. The Arabtex package offers an alternative if you like the particular font it supports: .zip. You can also use unicode to place Arabic text in R figures: .txt.
Advice: Applying to Grad School
Advice: Surviving Grad School (and beyond)
Advice: Job Market
Text Analysis Intro
Some R code that illustrates basic text analysis, along with some slides. Also, I put together an example of how to use the caret package in R for text analysis: .zip.
Web Scraping Examples
A quick example of web-scraping using python that I wrote up for someone: .zip. Not sure you'll like Python? Try this testimonial.
Hack for Updating R
A script to make it less annoying to re-install all my packages after updating R. Steps: (1) Before installing the new update, open your current version of R; (2) Open a new R Editor window and paste in the code; (3) Change the directory in line 3 to your preferred directory for saving a txt file; (4) run the script. This will produce a file called myRpacks.txt that contains R code for re-installing your packages; (5) close the old R and install the new R; (6) Open new R and paste the contents of myRpacks.txt into the command line to install your packages.
A few code snippets that I've found useful. They're posted here mostly so I can remember where to find them.
Mahalanobis distance matrix code (fast!)
Defend against R matrices turning into vectors when only one row is left:
mm <- matrix(3,3,3); is.matrix(mm[1,]); is.matrix(mm[1,,drop=F])
Make prettier plot axes (from Carlisle Rainey's compactr library):
axis(2,at=c(0,250,500),labels=F,las=2,cex.axis=.5, tck = -0.02)
axis(2,at=c(0,250,500),labels=T,las=2,cex.axis=.5, tck = F, line=-.08, lwd=0)
Open a pdf you just made in R:
Remove all but one object in your workspace:
Arabic Text Analysis Workshop, Cairo University, 2019.
[Code and Data]
ANU master class materials: Using Web Data for Islamic Studies