| [March 13 2007] |
| |
|
 |
| |
|
Discussing the world
I did a little experiment today.
Out of 231 posts with comments, over 4.5 years, how many times are the following words used?
93 think 71 do 62 see 54 know 35 try 33 need 31 hear 31 take 29 want 22 smell 22 run 21 feel 19 walk 11 taste 10 give 7 desire 2 paddle 2 swim 1 fly
I'm not sure I realized I had such a strong desire to communicate thoughts.
Then I wondered what the distribution of words is for other people? Here are some results:
 The rings are, in order from outside to in: this blog Abstractions Terminus Est Book of Joe
It would be cool if someone wrote a real-time utility that could rank word occurence on blogs the way Amazon does in books, and then show you sites with similar distributions, rare words, etc. I bet it could be done somehow with a Google mashup.
If I had the hacking chops to do something like this, here's how I think I'd go about it:
- Collect the distribution of words/phrases found in the set of blogs already searched by the Google's blogsearch utility.
- Get rid of the most common ones (pronouns, articles, etc.)
- Use an appropriate statistical threshold to determine the list of words that occur more frequently in a particular blog than in the entire set from step 1
- Index the blogs on their rare words
- Cluster blogs based on their word frequencies
- Write a nice graphical interface to display the results and invite further exploration
Posted by origamifreak at 11:39 pm |
|
[Comments count: five] |
|
 |
|
|
1: Very interesting. How did you cull/choose the verbs?
I notice there is no 'fold' or 'notice'.
Posted by Deborah at 12:22 pm on 03.14.07 |
|
|
2: you "do" twice as much as you "try", I think that is pretty damn impressive!
Posted by sharon at 3:11 pm on 03.14.07 |
|
|
3: Deb: I just arbitrarily picked a bunch of action words - at first I did it with just the senses: smell, see, touch, taste, hear, and it was so interesting I expanded it.
Sharon, crack me up, you do. Very funny, you are. ;-)
Posted by anja at 8:53 pm on 03.14.07 |
|
|
4: I need to start thinking more than I give!
Posted by Terminus Est at 8:58 pm on 03.14.07 |
|
|
5: I think "go, read, write, watch, visit, help" would take pretty big chunks of your blog-word graph.
Posted by Jeni at 9:02 pm on 03.15.07 |
|
|
|
[Post a new comment] |
|
|
 |
|
|
|