The myth of average and the so called ‘keyword density’.

“Let’s take the average. ““Let’s analyze the keyword density of the page.” I hear these words often enough. The important thing is that in most of the cases- and by that I do not mean 51%, but something close to 95%- these techniques have absolutely no use.

Let us first talk about average, or mean. When we talk about taking the mean and it to be useful we usually assume(or should assume) that the data distribution is uniform. The estimation with average as a central tendency breaks apart when we are encountered with skewed data. In fact almost most of the data in the real world is skewed except for those textbooks examples. To bring home my point let me tell you a joke, about a statistician. It is said that there was a very tall statistician who was crossing a river with his family of a very short wife and 3 very small kids. He had to decide whether to cross the river or not. He being a mean guy(pun intended) he decided to take the mean or the average of the height of the whole family, and compare it with the depth of the river. He found out that the average height of his family just manage to top the depth of the river. And he decided to cross with his whole family. When he reached the other side, not surprisingly he found out that he is the only one who was able to cross the river, and the rest of the family drowned. The same happens in the real world estimation. The average or the mean is almost always a bad measure of central tendency. In fact nature works more on what is called Pareto Distribution, or 80-20 rule in layman’s parlance.

Now let us talk about word density. Let us for sometimes ignore the fact that the term does not satisfy the rigor of mathematical definition, and is more of a buzzword than actually something statistically useful. But the general idea is to match the most number of keywords pertaining to the supposed subject. Let us say you are manually looking for the page most relevant to the subject ‘apple computers’, and on your side you have a list of words pertaining to ‘apple computers’. One document you find that it contains the words– apple, steve, steve woz, steve jobs, mac,leopard etc etc etc…and it matches 90% of your word list. What is your conclusion? I would definitely say that the aforementioned document is NOT related to apple computers, but actually is a spam. So basically a simplistic keyword density spews out spam after spam and you are wondering what is wrong. It is not just that the word density technique is very easy to game, but that it also inherently is a mismatch to the real world situation. You don’t come across relevant documents with neatly placed word density. And to top it all, your list of relevant terms may not be complete and are likely to give lots of false positive.

I vote for banning these two words in the technical exchanges- average/mean and ‘word density’ so that we don’t fall into woolly thinking.


Book Advantage

Over the years I have learnt the importance of “really good” technical books. Just as it is very hard to write an entire blog on this nugget of wisdom which might run in just one line–“though shalt realize the importance of really good technical books”, it is very difficult to write a “really good” technical book.

Let me tell you, all the university culture hoopla aside, if you want to be on par with the best universities in the world, ALL you NEED to know is to know which books they follow for their curricula(as far as technical education is concerned). Of course there are other factors, like what the Nobel prize winning professors speaks and all. But as I told you, all you NEED to know is just this- what books they follow, and THAT’S IT.

Sometimes, some of the best things may not be easy, but they are simple. Knowing which books they follow is simple, but then what you may do with it may not be easy. But, of-course it depends how you define “easy” or “difficult”. If you enjoy going over them, reading them, experimenting, then it is not even difficult. Hence, some of the best education is both- EASY and SIMPLE.

p/s: I KNOW most of you bozos will not believe me.


Many lives as a technologist.

**WARNING**- this one is about feelings.

I wish I was a a technologist with multiple lives. Each lifetime I would devote to the following:

1. Robotics. And do things like these, which this guy Hirose does.

2. Develope open source systems, and build scores of products like mozilla.

3. Simulating life through chaos theory and other mathematical concepts, and spend a lifetime discovering the source of intelligence through computeronics.

4. Telecom and electronics.

5. Train young “minds” and gear them towards creative technology.

Well, I plan to atleast touch upon all of these in this ONE life I have got. I guess, I don’t have a lot of time. And yes, I am going to build a computer from scratch in this life time only. This has always been my childhood dream which I plan to realise.

At the end, I believe that no one deserves to do menial jobs and miss the chance of being craetive in this ONE life she has got. Hence, I want to develop technology which would raise economy and free up people from menial jobs. Only robots would do menial job. But then some might feel sympathetic towards robots and say that robots have right to be creative too. Well, I believe in human race more than robot race- so……!

Any dreams you have got that can spark mine?

Tim Berners Lee and Web Science.

Have you heard of Tim Berners-Lee? In the past few months he has been almost ubiquitous. He is seen everywhere ,talking about Web. To begin with I thought he is just another jumping-the-wagon guy who is just capitalising on the www phenomena. You can’t blame me, otherwise why would someone hold a conference on “what is web 2.0”? I mean I am talking about another Tim- Tim O Reilly. And then the claim that the term was originally coined by him in one of the brainstorming session with MediaLive. All these claims and conferences and the hoopla surrounding it was too corny for me to fish out any genuine value from it. Don’t mistake me, I have benefitted from the books of his publication- to give the credit where it is due. But when people blow a lot of hot air, then I am quite sensitive to it to catch the whiff.

But, as I realise this Tim is way different. Tim Berners-Lee, that is. What he says, touches chords with me. He has also written a book – “Weaving the Web“- if you want you can gift it to me. He also talks about Web Sciences. Which would be different from traditional computer sciences. It would be a holistic science taking into consideration the dynamics of society in the virtual world, and the impact of internet on people, business and the challenges and opportunities. Well, actually I don’t have much idea what Tim(the good one) meant by it (you can gift me his book, remember?)- but that is what I think he must be meaning by it.

p/s: I just noticed that wordpress does have the facility to import blogs from other sites(sadly not from xanga). Now did this facility exist before MY writing about it here? I should also advertise that this idea actually came from me in one of my brainstorming session from MYSELF or what?(Just like Tim O’Really). Time to hold a conference then a media coverage and then perhaps a worldwide tour to promote the idea that actually it was me who mooted this idea. I am short of funds actually. If someone can give me the funds I would just slip in the FACT that actually it was a brainstorming session with him when I(we) came upon this idea.

import blog.xanga.*



I have recently switched to this wordpress from xanga. It was easy as I had written just one blog in my previous technology blog , so all I had to do was copy and paste it here. But with my personal blog it has a different story. I have written so much in it that now I can’t manually copy, paste and format it on a new blogsite. I wish I had a software which would do this for me. Blogs are growing day by day and people would be changing blog sites more and more as better ones come up. Hence demand for such a software would be rife. Any blogsite which apart from being attractive also has the feature to import blog entries form other blog sites which the writer owns would have a big advantage. So look out for it!




Technology and Mathematics :an Introduction(MUST read)

Now why this technology blog when I already have one[here]? Well, to begin with technology is an interest of mine which deserves a category of its own. Hence, this blog. Now why only this technology blog? Well, first of all this blog will not be talking about gadgets , or how to fine tune and speed up your computer[or may be when I have nothing else to write about!] or other such tid-bits; it will talk about two guiding phenomena of technology.

(1) Underlying philosophy in technological advancement.

(2) The backbone of technology-mathematics.

To make matters less ambiguous let me state that when I refer to technology I usually talk about the net and IT revolution. So technology would mainly mean this, until otherwise stated specifically.

Also when I use the term mathematics I mean higher mathematics, and not in the general sense like – hey count nicely, do your maths.

As the complexity of cutting edge technology grows, it will be more and more dependent on mathematics. For many it would seem that this is the era of technology- well it is not. The era of technology is yet to come. Let me give you an analogy to illustrate my point. At the time of Aristotle and Archimedes[of the Eureka fame!] the era of Physics had begun, but it had not reached its mathematical finesse yet. There was a lot of accosting and dating which went between mathematics and physics, but the marriage of physics and mathematics took place with Newton acting as the head priest. And with the culmination of strong bond between mathematics and physics put forward by Einstein. You can call it an age of Physics if you want, but certainly not of technology. Technology is still in its ‘tinkering’ stage with now and then dates with mathematics. There have been many who have used the underlying power of mathematics to change the face of technology. To give you an example I choose the most popular technology[if not the best example to show the relation between mathematics and technology]—Google. It extensively used mathematics to harness the power of net in its new search engine- For further reading you can have a look at the paper written by Larry Page and Sergey Brin- co founders of Google-here.

Let me finish this introduction by paying tribute to the beauty of mathematics by the proof of infinitude of primes by Euclid.




There are infinitely many primes.



Suppose that p1=2 < p2 = 3 < … < pr are all of the primes. Let P = p1p2pr+1 and let p be a prime dividing P; then p can not be any of p1, p2, …, pr, otherwise p would divide the difference Pp1p2pr=1, which is impossible. Hence either P itself is a prime number or any prime other than p1, p2,…, pr .Now if P is a prime then we get a contradiction as we assumed that pr is the largest prime. If there is any other prime number which divides P,say Q, then Q is other than p1, p2,… pr and hence greater than pr, because all the primes less then pr[namely p1, p1,… pr-1] are already excluded. Hence Q is a prime greater than p1-again a contradiction. Hence there are infinitely many primes.

On a personal note, my love affair with mathematics began by seeing the power and beauty of this particular proof. If mathematics could be represented by one theorem, it would be Euclid’s theorem ,which reflects the force as well as beauty of mathematics in a delicate balance.

Happy thoughts!


p/s: Another area of interest which deserves a blog of its own is economics. I will start an economics blog soon.

Goodchild: There must be. Your way, the way of intelligence is the way to destruction. You made man make that choice, you made him violate the natural law.
Sky: No, human life has scorned both of us. The chaos to come is of man’s own making. Neither you or I can help him now.

–From movie: Sky(1976)

