[manifolds]

[March 13 2007]

Discussing the world

I did a little experiment today.

Out of 231 posts with comments, over 4.5 years, how many times are the following words used?

93 think
71 do
62 see
54 know
35 try
33 need
31 hear
31 take
29 want
22 smell
22 run
21 feel
19 walk
11 taste
10 give
7 desire
2 paddle
2 swim
1 fly

I'm not sure I realized I had such a strong desire to communicate thoughts.

Then I wondered what the distribution of words is for other people?  Here are some results:

The rings are, in order from outside to in:
this blog
Abstractions
Terminus Est
Book of Joe

It would be cool if someone wrote a real-time utility that could rank word occurence on blogs the way Amazon does in books, and then show you sites with similar distributions, rare words, etc.  I bet it could be done somehow with a Google mashup.

If I had the hacking chops to do something like this, here's how I think I'd go about it:

  1. Collect the distribution of words/phrases found in the set of blogs already searched by the Google's blogsearch utility.
  2. Get rid of the most common ones (pronouns, articles, etc.)
  3. Use an appropriate statistical threshold to determine the list of words that occur more frequently in a particular blog than in the entire set from step 1
  4. Index the blogs on their rare words
  5. Cluster blogs based on their word frequencies
  6. Write a nice graphical interface to display the results and invite further exploration

Posted by origamifreak at 11:39 pm

[Comments count: five]

1: Very interesting. How did you cull/choose the verbs?

I notice there is no 'fold' or 'notice'.

Posted by Deborah at 12:22 pm on 03.14.07

2: you "do" twice as much as you "try", I think that is pretty damn impressive!

Posted by sharon at 3:11 pm on 03.14.07

3: Deb: I just arbitrarily picked a bunch of action words - at first I did it with just the senses: smell, see, touch, taste, hear, and it was so interesting I expanded it.

Sharon, crack me up, you do. Very funny, you are. ;-)

Posted by anja at 8:53 pm on 03.14.07

4: I need to start thinking more than I give!

Posted by Terminus Est at 8:58 pm on 03.14.07

5: I think "go, read, write, watch, visit, help" would take pretty big chunks of your blog-word graph.

Posted by Jeni at 9:02 pm on 03.15.07

[Post a new comment]

Name:

E-mail: (optional)

Website: (optional)

Type your comments here:

Remember me.

 

base

past wrinkles