Thursday, July 15, 2010

Wolfram|Alpha

From the heart...

Thanks for the wolfram alpha team.

It is a wonderful project and i am a big fan since i learnt about this. Has been a great source of inspiration on how human kind have matured to boldly undertake such vast project.

I bow to the team.

Their results have been useful in asserting points at work.

Disclaimer:

(The below feedback is a free pool hypthesis. - not researched. no evaluation tests caried.)

Feedback:
I searched for "invictus" and i landed up with only one result and no alternate interpretations.

I was hoping to be linked to the henley's poem of the same title.
wikipedia has the expected entry. google's result put that in 2nd place - simple search based on popularity theory i would assume(as matter of fact: with a 100% accuracy: - i don't know how google search works).
Because in recent years the movie is very popular and then the curious googlers like me search its meaning - fail to find them in webster and then end up in google.com and there you go.
1. imdb movie entry.
2. wikipedia for the poem
3. wiki for the movie tied with the poetry as a parent. the person who wrote the movie entry "knew" that the movie is actualy inspired by the poem.

this comes back to revisit the the valuation system used to add information into the knowledge base, the source of the information, also for bringing up the search result - and a lot more factors which the specialists in this area will actually know.

Suggestion:
(disclaimer:
note that i am a programmer - my current software project is user login into a web service) and only a hobbyist not a professional in the subject matter - the suggestion needs to be valued with this context in mind)

looking only from search perspective, the result shall potentially exclude contents beyond a certain level of particularity in the data.

for that the the factor of particularity needs to be designed first.

let us try to model particularity inside a mathematical form factor.

having no base algorithm to work for... i start from scratch here from the actual search.

every result from the knowledge base must be assigned a score.
the scoring system defines the particularity factor.

factors that could possibly be variables for the definition are:

1. input interpretation
2. source
3. source valuation system
4. number of sources
5. syntactical construction of the result entry in the language system of the result.
6. particularity factor

type of source is a key ingredient:

imbd: is a movie database.
a movie database valuation system is based upon a published hard content by a particular publisher.

wikipedia: wikipedia may be classfied dynamic knowledge base with a keyword search engine. there you get to know the knowledge of who ever had come across the word in their life and chose it to write it down in the wiki.

latin dictionary: is a fundamental focabulary knowledge base.
example: a word from a language dictionary should be given a higher score than the same word which is particularly a proper noun( a movie name or poem title).

The first default interpretation for invictus from a knowledge search perspective should be more close to the latin dictionary than wikipedia than imdb.
if the word comes from both english and latin dictionary, then the one which has no entry as a root word will get priority.
meaning language evolution tree needs to be added as a valuation system for the dictionary source.

wikipedia is a complex entity in itself: the value system for wikipedia has two important factors:
A. the number of references and citations used in the entry
B. which one is considered as a child entry and which one the parent and the score from factor A.

1. input interpretation depends on the source. every source has a primary objective: yellow pages lists businesses in a particular district.
2. source depends on the source valuation system: accuracy of the data. patents database' valuation system involves personal research and investigation around the patented art work.
3. besides serving directly the basic number of possible interpretations... number of sources a particular input being located brings in a new dimension: links and heirarchies amongst the sources contributes to the input interpretation system - effectively increasing or decreasing the number of possible interpretations from the base number.
4. language semantics in terms of the usage of forms of speech, subject pronouns, simple sentences and complex sentences, conjunctions, assertive nature of the sentence etc. - all these contribute to the effective correctness of the knowledge. But this calls for a specific knowledge base in itself. Avoiding diving into this ocean for the suggestion model.
5. particularity depends on all of the above derivative factors.

Let's have a hypothetical knowledge base with 3 data sets:

1. Companies
2. Scientists
3. Fruits

this knowledge base is now searched with the keyword "Apple".

the first choice of result shall be the fruit. and then comes company and then comes Sir Issac Newton.

now how does that work?
Easy to say but difficult to do.

1. apple is found in all the 3 sources.
Base Interpretions: 3 - being the 3 primary objectives of the sources.


2. type:
companies db has history and profile about these social organisations with varying sizes with respective objectives.
companies are people made.
companies depend on people.

scientists db has data about different key personalities and carry their biography.
people is an absolute entity.
people depend on fruits

fruits db has full information about the fruit starting from botanical name to the nutritional value of the food.
fruit is an absolute entity from nature.

3. value system:

this hypothetical company and scientists knowledge base is pretty accurate. Imagine i wrote them using pencil on paper by personally visiting each every one of them 8^).

fruits are absolute entities of nature.

4. inter-source linkages

a. Companies have scientists employed and are also named by some of them(sometimes owned by a scientist - all the same mathemetically)
b. scientists work in companies(sometimes own them - all the same mathemetically)
c. some companies have either full are partial names in the fruits database.
d. some companies' objectives are sometimes fruits - Tang for instance.
e. some scientists have fruits as a subject of their work.
considering the above environmental conditions...
the 3 sources are interlinked - companies and scientists are siblings with mutual dependancy and both are dependant on the parent.

Example Database Entry in Companies:

Apple's Company Name was chosen when steve's friends didn't have better name and they found the fruit name as the chosen one. scientist's from apple r&d brings in a revolutionary capacitive touch screen.

Source Data:
Apple is a value in the "Company Name" attribute.
Apple is found in "Story" attribute - a descriptive field
Apple's Employee lists "Scientists"

Another Example Database Entry in Scientists:

Newton explained gravity with inspiration from the fruit falling from a tree. Thus the fruit is the subject of the study.
Newton is employed in Cambridge University.


Source Data:
Newton is a value in the "Scientist Name" attribute.
Apple is found in "Story" attribute - a descriptive field
Newton is a scientist in a company

Another Example Database Entry in Fruits:

Apple is found in the Fruit Name attribute.

From these:-

Inference:
Fruits is parent.
Companies and Scientists are peers and child to fruits.


5. Skipped for simplicity in this writeup - also my knowledge base in that area is poorly populated 8^)

6. particularity:
the parent fruits thus gets a lesser particularity factor than other two attributing to the abstract nature of the entities it stores.
the peers companies and scientists are dependant on parent and thus less abstract and more particular.
Apple is a full match in 2 databases and a relative match in 1 database.
of the two databases where it is full match. one is a parent and other is a child.

thus:

Apple as a fruit is more fundamentality important for a knowledge base search. - reason: parent full match
Apple as a company is next more fundamentality important for a knowledge base search. - reason: child full match
Apple as a scientists's story is next more fundamentality important for a knowledge base search. - reason: child relative match

so shall be the result interpretation order:

Apple as a fruit: default interpretation and there shall be search result.
Alternate interpretation results links shall have the other two listed in that order one below the other for easy user reading experience.

A search for apple on Wolfram results with the company first and then the fruit as alternative. no news on newton though(which is ok for the mathematical probability of reaching a newton's gravity subject matter content for a internet knowledge search is very low with the keyword "apple").

All said... Wolfram Alpha is wonderful.
Try it out here:
http://www.wolframalpha.com


Edit: I have removed the CC license. I wanted make it simple. Just use this wherever you like if you like the above thought process. Forget attribution etc. Just use it.

I searched for invictus again today and found that Wolfram Alpha guys have changed a lot and have now changed the interface with a lot more features and I am now able to improve on my research. I love their improvement on the input interpretation. Still no Henley's poem option but today search for invictus yielded in the right direction. They never stop to impress me.

integrity

When made with cunningness and thievery, knowledge and skill lack wisdom, dignity and honour.