Code Switching

I don't generally consider myself to be stubborn.

I am very open to trying new things -- in fact, I love trying new things! I have no problem going off on my own to see a new place, I'll ask questions even if it feels like I should already know the answer, I'm very open to being taught or corrected.

Just last week I went on a solo adventure to Heysham, where I found this ruin of an 8th century chapel and learned all about Viking gravestones from a lovely docent at a local historic church!

Figure 1: Just last week I went on a solo adventure to Heysham, where I found this ruin of an 8th century chapel and learned all about Viking gravestones from a lovely docent at a local historic church!

But, this week, my text analysis class brought out a rare burst of stubbornness in me.

We've spent the last couple weeks learning about out-of-the-box (meaning "no programming necessary") tools for corpus text analysis, like AntConc and Voyant, but now we're getting into the real meat of the course: SpaCy. SpaCy is a package for corpus text analysis in Python. It's super powerful, and the tool of choice for lots of people doing corpus linguistic analysis.

It's been three years since I last used Python in DCS109 at Bates. Since then, I've become an R afficionado, as any reader of this blog is already aware. I find R intuitive, which apparently is not a popular opinion, and R has the capability to do pretty much any kind of data analysis and visualization.

Except, apparently, the full functionality of SpaCy. There is an R wrapper to SpaCy - the spacyr package - but from what I've read online, it just doesn't work as well for text analysis as SpaCy does. This annoyed me beyond belief when I started reading up on it, and my immediate reaction was to say "I refuse to relearn Python. I guess I'll use the R version, even if it's not as powerful." -- Very unlike me!

I quickly came around to the fact that I was just being stubborn. There's no prerequisite coding knowledge for this class, so most people are actually learning to code for the first time. If everyone else can handle learning Python, I can bring myself to re-learn it alongside them. So, I am now re-learning Python in order to make full use of SpaCy, and it is so hard to be constantly switching between these two languages. It's not that the logic or structure of them are irreconcilably different -- it's the little differences in syntax that I keep mixing up.

As an American in the UK, I already do some amount of code switching in my spoken language. When I talk to British friends, I adopt the British terms for things -- I eat chips, crisps, and biscuits, know lots of people who play football, and wait in the queue for tea at Gregg's. But when I'm on the phone with family back home in the States, I switch back to cookies and soccer.

Now my code switching extends to the actual code I write -- full credit to my mom for realizing the pun before I did. I've been working through the ProgrammingHistorian SpaCy tutorial and found it helpful to make a cheat-sheet for myself to cross-reference some basic syntax things between R and Python, so I thought I would share that here for anyone else who may be finding it tricky to work between the two languages.

Task, data type, etc. R Python
Assigning value to a variable variable variable = value
General format of calling a
function to work on a dataframe
function(df_name) df_name.function()
Loading packages library(package_name) import package_name
Loading in data from CSV df_name df_name = pandas.read_csv("spreadsheet.csv")
Looking at the first few rows of
a dataframe
head(df_name) df_name.head()
Selecting a single column from
a dataframe
df_name$column_name df_name["column_name"]
Renaming columns colnames(df_name) new_names = {'Col1': 'name1', 'Col2': 'name2', 'Col3' : 'name3'}
df_name.rename(columns=new_names, inplace=True)
Concatenating strings new_string new_string = string1 + string2
Booleans TRUE, FALSE True, False
Logicals &, | and, or
List list list = ["item1", "item2", "item3"]