Shakespeare loses to Wu Tang Clan in vocabulary duel

94SupraTT

Spoilers!
Donator
Joined:
Aug 20, 2012
Posts:
2,189
Liked Posts:
1,576
My favorite teams
  1. Chicago Bears
http://www.pbs.org/newshour/rundown/data-scientist-pits-shakespeare-wu-tang-clan-battle-words/


“To be, or not to be” could have been the perfect start to a rap song. Some have even called Shakespeare the original rapper, his use of iambic pentameter the first dropped beat.

So, it seems fitting that when comparing the range of vocabulary among hip hop artists, data scientist Matt Daniels would also incorporate the Bard himself — whose entire collection of work included a total of 28,829 unique words forms, with 12,493 appearing only once.

In order to include younger rappers in the data, Daniels compared the first 35,000 lyrics — or in Shakespeare’s case, the first 5,000 words of seven of his works — of 85 different rappers from Salt-n-Pepa to Drake. He then counted each unique word to determine the extent of a rapper’s vocabulary.



Image courtesy of Matt Daniels.
http://rappers.mdaniels.com.s3-webs...al&utm_source=twitter.com&utm_campaign=buffer

Where did Shakespeare place? That is the question.

Rapper Aesop Rock held the top spot with the use of 7,392 unique words. The Wu-Tang Clan’s GZA — whose “Dark Matter” album inspired a flood of science rap submissions from PBS NewsHour viewers last year — wasn’t too far behind with 6,426 unique words. DMX ended up in the last slot with a total of 3,214 unique words.

And Shakespeare? He landed near the middle range with a total of 5,170 unique words, right in the middle of Outkast and the Beastie Boys, among others.

See Daniels’ work in its entirety here.
http://rappers.mdaniels.com.s3-webs...al&utm_source=twitter.com&utm_campaign=buffer
 

nvanprooyen

Moderator
Staff member
Donator
CCS Hall of Fame '19
Joined:
Apr 4, 2011
Posts:
18,757
Liked Posts:
27,292
Location:
Volusia County, FL
My favorite teams
  1. Chicago Bears
I wonder if he's counting real words only....or if made up words count too
 

nvanprooyen

Moderator
Staff member
Donator
CCS Hall of Fame '19
Joined:
Apr 4, 2011
Posts:
18,757
Liked Posts:
27,292
Location:
Volusia County, FL
My favorite teams
  1. Chicago Bears
"I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset. It still isn’t perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit), featured vocalists, and repetitive choruses."
 

Ares

CCS Hall of Fame
Donator
CCS Hall of Fame '19
Joined:
Aug 21, 2012
Posts:
42,487
Liked Posts:
35,201
"I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset. It still isn’t perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit), featured vocalists, and repetitive choruses."

This does not seem like a fair test at all then lol.
 

malcore

Guest
What about impact tackles?


SGDjf7D.jpg
 
Last edited:

Bearin' Down

Well-known member
Joined:
Aug 20, 2012
Posts:
5,247
Liked Posts:
3,251
Location:
Chicago
"I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset. It still isn’t perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit), featured vocalists, and repetitive choruses."

LOL I thought you made this up and then I read the article. Wow.
 

ClydeLee

New member
Joined:
Jun 29, 2010
Posts:
14,829
Liked Posts:
4,113
Location:
The OP
This does not seem like a fair test at all then lol.

It's quite exactly fair. Shakespeare made up countless numbers of those words he used as well, or at least they weren't in print prior, often simple words like bubble or Kate.

Sent from my LGL85C using Tapatalk 2
 

Bearin' Down

Well-known member
Joined:
Aug 20, 2012
Posts:
5,247
Liked Posts:
3,251
Location:
Chicago
It's quite exactly fair. Shakespeare made up countless numbers of those words he used as well, or at least they weren't in print prior, often simple words like bubble or Kate.

Sent from my LGL85C using Tapatalk 2

Yes, but lets say Shakespeare used the word pimping twice. He would have used the entire word "pimping" and not attempted, and been credited for using both "pimping" and "pimpin." In that regard, he is getting hosed.
 

Top