Sometime in 2007 I had a very strange dream. A man in a shiny, - TopicsExpress

David Gregory Pihl

Sometime in 2007 I had a very strange dream. A man in a shiny, white robe came to me and asked me to follow him. This wasn’t a white robe like you would expect to see in a Jesus painting. It was more like something you would expect to see in a Buck Rogers/ Quantum Leap/ Logan’s Run/ Battlestar Galactica episode from the 1970’s. It was made from an iridescent white taffeta that gave off a shimmery glow—the kind of glow you get from a soft focus lens or a Vaseline filter. He ushered me into a room full of computers. Not computers the way we are used to seeing them today. These looked more like a 70’s era workstation made by Commodore Business Machines. Two monitors were embedded into a “hump” of white plastic at the back of a single piece tabletop. The keyboard was also embedded into the tabletop, and the keycaps looked like they were smoky onyx or monochromatic gray mock tortoise. But they felt like plastic, and were warm to the touch. Definitely not carved from some kind of gem stone or anything like that! The man sat me down in front of a computer terminal, at a table where the other terminal was vacant. The chairs, of course, were a blast from the past. A single piece, thermo-formed, white plastic or (or fiberglass) swivel chair on a round, white pedestal base. The pivot joint was hidden just under the seat, which was shaped like part of an egg shell. Not with ragged edges, but the same kind of oval shape and the same smooth, semi-gloss white finish. He didn’t sit in the terminal beside me, however. Instead he stood behind me and placed his left hand on my shoulder. With his right hand, he pointed to various parts of the computer screen and told me how it works. Not how to use it, mind you, but how the computer works. Oddly enough, the man never seemed to use words as he conveyed these concepts to me. I looked at whatever he was pointing at, and I immediately understood what he was trying to show me. The dream happened again the next night, and then again the night after that. It took me an entire week of repetition before I felt like I fully understood the computer program the guy was teaching me to use. I remember telling him repeatedly that he had the wrong guy, and that I could never build a computer that does any of these things. But he didn’t dignify my protestation with anything other than a grimace and a smile. A sort of apologetic, smiling grimace if you know what I mean. As if to say, “sorry bud, you’re the one they chose.” In the years after the dreams took place; my world flew apart at the seams. I lost my wife, my kids, my dog, my job, my respectability, and my ability to depend upon others for handling any of the essential aspects of daily life. It’s all me now. That’s a terrifying position in which to find yourself when you are a broken person in the first place. Through it all, I have attempted to comply with the assignment I was given by this seventies Jor-El lookalike dude (whoever he was). I was the kid who followed the programming guide that came with my Commodore64 computer, step by step, but could not even get it to flash the words “Hello World” (tutorial number one). I am an idiot, and completely worthless as a programmer. But for some strange reason, everything the guy in the robe taught me has borne tremendous fruit. Each tangent I have gone on this past five or six years has followed the same pattern. I tried to imagine a way to accomplish one of the objectives that the man explained to me. I started to build a tool that might serve as a “proof of concept” for the project at hand. Just as I was within reach of the finish line, I have discovered information that I wished I’d had from the outset. Every project has turned out to be something that somebody somewhere has already done before me. Had I known the proper terminology, I could have avoided a LOT of duplicated effort. For some reason it has been important that I begin to build each tinkertoy by myself, before I could invoke the code libraries that have already been built by others before me. I read with great interest about the life of Henry Eyring, the guy who practically invented the study of biochemistry. As an undergrad student, Eyring never accepted the math equations in his textbook at face value. He had to create the proofs for each and every math problem. It didn’t matter that he was reinventing the wheel, he needed to know these things for himself. He later remarked that he had done something that none of his peers saw any value in doing. But he insists that by climbing those peaks the hard way, he enjoyed vistas that nobody else was likely to have ever seen. Indeed, it is said of Dr. Eyring that he saw chemical chain reactions in his head that we struggle to model with high speed computers to this very day. I’ve never taken chemistry, so I’ll just take that at face value. The point is, the dude was smart. And he got that way by doing his own math, and refusing to take somebody else’s word for it. So perhaps I am in good company, even though it’s aggravating that I’ve had to do everything the hard way. In every instance, I have found some corner of the computer universe where somebody else has assembled the code as part of his doctoral dissertation. Or where an organization has been working for several years to hammer out a standard way of doing exactly what I envisioned. It’s humbling, and frankly incomprehensible to me that this would occur on so many occasions. But what came next was even more astonishing. I have sought advice from every computer programmer I could think of. One of them listened to my vision, and told me I had officially blown his mind. He spoke to several of his colleagues, and they all agreed that my vision was a perfect solution to several problems they were dealing with. At length, I was given the opportunity to share my entire vision with Bill Barrett, a computer genius I have known for much of my life, but who I lost touch with a decade or so ago. Dr. Barrett listened intently, and then explained with his usual nonchalance that I had merely reinvented the wheel. The algorithm I proposed to him was exactly the same as the artificial intelligence idea that formed the basis of the LISP programming language back in the 1960’s. In other words, I had just reinvented artificial intelligence. That may inspire a collective yawn on the part of most of my friends, but I was stunned by this revelation. A guy like me, who never took a programming class in his life! Dr. Barrett then encouraged me to narrow the scope of my project, and focus on one little aspect of it for now. He seemed to like the obituary generator idea that I had described at one point. So he told me to focus on that, and the rest of the world will continue to work on the “thinking” machines that I saw in my dreams. And anyway, the obituary idea has direct application in the everyday life of human kind. In order to describe the Obit Generator to you, I’ll have to first ask you to focus on the three fundamental aspects of genealogical research. Yes, there is a time and place for photographs and stories and etc.. But at the end of the day, genealogists are primarily concerned with three basic types of information. The names of people, the dates that certain events took place, and the name of the place where the event in question occurred. Everything else keys off of those three data sets. I don’t want to bore you completely out of your skull, but there are many aspects of this that I fixated on for the better part of a year. It’s important to establish what is meant by the words you use to describe the timing of an event. A specific group of humans might understand what you mean when you say Mittwoch, or Wednesday, or Suyobi, or whatever. But computers don’t instinctively understand any of these words. Even the slightest variation in how you describe the middle day of the week can throw the computer completely off track. When you say the funeral service will be held on Tuesday Morning at Ten o’clock am, do you mean 10:00 GMT, or are you talking about Mountain Standard Time (mst)? At length I concluded that we need a standard way to store the timing of an event in a super standard format. What you type into your computer database would need to be translated into code. At length, I came upon a web site that argues in favor of all computer dates being stored in a format that has been used primarily by astronomers up until now. It is a decimal date format that can go millions of years into the future, or trillions of years into the past without any difficulty at all. If you need more precision, you need only add an additional decimal place. Whether you’re describing an epoch or a nanosecond, the Julian Day Date format is a really good choice. Once you have stored the date in your database, it ought to be possible to display that date in whatever format a person chooses to view it in. If you want the day of an event to be displayed as a decimal number in the Julian Day Date format, fine. But if you want something else, say the ancient Hebrew lunar calendar, it is possible to create a software algorithm that converts the decimal number into the format you choose. 10/15/1983. Thursday, the fifteenth day of October, in the year of our Lord Nineteen Hundred and Eighty Three. 15 Oct 1983. Whether you want the date displayed in Hebrew, or in Chinese Characters, this should be doable. After working on this idea for a time, I discovered a web site that converts the Julian Day Date into any number of different formats. Including the Gregorain Calendar, and several lunar calendars used by our ancient ancestors. The source code is free for the taking, and is offered as a gift to the world of programmers on the formilab web site. I found a section of the Omniglot web site that contains common words and phrases into dozens of languages. Included in the tables of common words are the days of the week and the months of the year. Eventually I discovered an XML Schema for identifying “temporal annotations” with what are called “TimeX wrappers”. These so-called wrappers are simple bits of code that signify to computers that the words between these two “metatags” are indications of a time (or other temporal event). Yes, the TimeX tags can be used to signify more than just the Julian day date. They are also used to mark up words like “yesterday” so that the computer will understand that the time of an event was so signified. But that’s maybe a detail you don’t want to worry about now. If we are going to abstract temporal annotations In this way (eg; store them as decimal numbers, but display them as words), why not do the same thing with the other two types of data? Why not store geo-spatial annotations as latitude/longitude co-ordinates? There are MANY ways that place names can be indicated, and there are good reasons to store the data in as unambiguous a manner as possible. Geopolitical boundaries change all the time. The Japanese call their home “Nihon”, not “Japan” or “Japon”. If you store the place of an event as lat/lon coordinates, you can then display the place name according to historical boundaries, modern boundaries, or even in different languages. Finally, there ought to be a way to store proper nouns in a format that can easily be translated between different writing systems. There are five ways to correctly display a Japanese name using Roman characters. The difference between Hepburn and Wapuro Romaji do not become apparent unless the Japanese word contains either a double vowel, or a hard consonant. But that’s just the beginning of the problem. Whether you spell it Satoh, Satou, Satoo, Sato, or Satō, even the most common surname in Japan can be spelled with dozens of different sets of Kanji. One thing is for sure, you can search all day for Olaf Aagaard, but the computer will not instinctively try to find Ølaf Ågård. Computers are dumb like that. You have to tell them every stupid little thing you want them to do. Lucky for us, the extremely heavy lifting has already been done for us already. First of all, there is a set of meta tags that have been designated for use in describing proper nouns. It’s called the ENAMEX standard. Like Timex2 , which is used for temporal annotations, and SWING, used for Geospatial Annotations, Enamex is just a set of “XML Wrappers” which specify what kind of data resides between a pair of meta tags. This is only part of the equation for dealing with proper nouns. The second part of the problem, the bigger part, lies in abstracting the proper noun into a decimal number that the computer can use to keep track of a person’s name. If possible, you want to build some kind of thesaurus, and not just a list of names. Because the most difficult names to deal with come from Japan and Thailand, I have been operating under the assumption that solving the problem for Japanese names would simultaneously solve the problem for everybody else. As it turns out, there were several attempts made at solving this problem in the early days of the US Census. The census was tabulated by hand, using a hashing scheme called “Soundex”. Soundex begat Metophone, which begat Double Metophone, and a variety of other hashing systems. Essentially, these systems were designed to compensate for spelling variations like Aagard, Agard, Agaard, and so forth. These early efforts point in the right direction, but don’t do anything for non-Roman letterstyles. Jim Breen at Monash University created a computer resource many years ago that has been used by every major software title that deals with Japanese Kanji , but that caters primarily to native English speakers. Mr. Breen told me that Japanese commoners did not have surnames until the late 1800’s. In the couple of years since that time, the number of proper nouns in Japan has grown until it dwarfs the number of European surnames. He had to break apart his dictionary, and create a whole new dictionary of proper nouns called ENAMDICT. When I asked Dr. Breen if I could use his enamdict as source code, he politely declined. He then told me about a guy who is doing exactly what I had in mind already. Jack Halpern used Dr. Breen’s ENAMDICT as the basis of a thesaurus of Chinese, Japanese, and Korean names. Jack Halpern took enamdict and lexicalized it, and then added all of the Romanized variations for each Japanese proper name. This phonetic indexing of the CJK dictionary has formed the basis for a few larger databases that seek to index all proper names in the world. The best of these, as far as I can tell, is called the Nomino database. The CIA uses this database to track people who hop around the globe, and whose names appear in a variety of different scripts that are specific to various regions of the world. Apparently a news feed published in Farsi will display a terrorists’ name differently than a news blog written in Sanskrit would. Nomino makes it possible to search for a name using one spelling variant, and then ask the computer to try all the other variations as well. Halpern’s approach to this problem is the easiest to understand, so I’ll explain it to you using the methodology he describes on his site. Imagine a spreadsheet divided into the usual rows and columns. Each row represents a category of names. Think of a single entry in a thesaurus, and you’ll get what I mean. The row might be titled “Jonathan”, for example. Along the row you might find a lot of familiar variants such as John, Johan, Johannes, Johnny, and so forth. I would propose that the proper way to store name information would be to place two different things between the ENAMEX wrappers. First, the row heading. The number we assign for the category of “Jonathan-like names”. Secondly, the numeric value of the column in which the specific name variant resides. In other words, the person we are looking for had a Jonathan-like name, and he mostly went by “John”. The numeric value of 123 invokes the one-hundredth and twenty-third row on the spreadsheet that lists all the names in the world. Which corresponds, of course, to Jonathan-like names. The value 123A means that this person specifically used the name in column A of row 123, which is spelled “John”. So now we have a foundation upon which we can build a wide array of software, starting with a tool for generating online obituaries. First, you tell your computer that the forename of the deceased is John. The computer goes out and finds the name “John” in the nomino database, and then records the numeric value for this name. 123A. It then places that numeric value between the ENAMEX wrappers that are used to identify a forename. Then you type in his last name, “Aagaard”, and the computer goes out in search of the category for Ågård-like names. I’ve often found that computer programmers don’t always think about computers in the way that makes the most efficient use of my time. Inevitably they are going to ask me to fill out the address of the funeral home in the same way that my grandpa might’ve hand written the address on a letter. Street Address, City, State, and Zip. How many web forms have you had to fill out that ask for your mailing address in this exact same format? Doesn’t it just make your eyes bleed? I mean, this is a computer, for heaven’s sake, and not a letter carrier! The obit generator that I envision would ask you first for the zip code of the funeral home. If I can tell it the zip code, then it can find the city, state, and country information for me automatically. Right? In fact, it might go out online and search for funeral homes in that zip code, and ask me to click on the funeral home I’m looking for. Instead of typing dozens of characters, I’ve only had to type in five. Oh, yeah. I also had to click on the entry for the funeral home in question. Five keystrokes and a click. The computer can do the rest. Now if the computer knows the zip code of the funeral home, it can also potentially save me several more keystrokes as I continue to build my database. If I type in the time of the funeral service, then my computer could probably figure out what time zone the service is being held in—without asking me for ANY additional input! Right? It can store the information in Julian Day-date format, and add a decimal place for the month, the day, the hour, and the minutes. The service will be held at 11:30 am on the 25th of June. I could click on a picture of a calendar, and then select the hours on a time readout that looks like a digital clock. And with those simple mouse clicks I have now indicated when the funeral will be. With these simple mouse clicks, I have told the computer who died, and where and when the service will be held. But that’s not much of an obituary. Right? Well guess what? We’re not through yet. Most obituaries include a little blurb about the relatives of the deceased. They list the name of his father, the maiden name of his mother, the names of his siblings, the name of his spouse, the names of each of his children (with the son-or-daughter-in-law listed in parentheses), and the names of his grand children. All of this is information that might very well exist in a genealogy database that the family has already created. And if they haven’t already created it, the best genealogy databasing software titles are available as a free download for limited use. It’s a no-brainer that you should spend the time typing the data into one of those software apps, rather than typing it all into a sinlge-purpose obit generator that you’re only going to use once. Go make yourself a pedigree file in Roots Magic, Family Tree Maker, Ancestral Quest, The Master Genealogist, or Legacy Family Tree. Then export your data as a Gedcom so you can import it into the obit generator. If you’re using Roots Magic, you can automatically import the details about many of your deceased ancestors from familysearch, the free web site maintained by the LDS church. If you’re a subscriber to Ancestry, you can use Family Tree Maker to import data that exists on that site in addition to the data that the LDS church has provided to them. Assuming that the data in your GEDCOM file has been properly “normalized”, the Obit Generator ought to be able to replace all of the temporal annotations with decimal numbers (in the Julian Day-Date format). It ought to be able to replace all of the geo-spatial annotations with Lat/Lon coordinates. It ought to be able to replace all of the proper nouns with reference numbers from Nomino, or in other words, “the name thesaurus”. Once the GEDCOM has been imported, normalized, and properly encoded, there is one more step to this process. Inevitably there will be members of the immediate family who are no longer a part of the family. Obviously, divorce is an issue. You probably want your GEDCOM to include the information about the mother of your children, but you may not wish to mention her in the obituary. More to the point, you can’t say that John Aagard is survived by his brother Alan, unless Alan is still living. Right? The Obit generator will have to display a list of imported names, and then provide a check box next to each of them. If for some reason you don’t want to include Matilda Epperson in the Obituary, check the box next to her name. It ought to be that simple. Now that the Obit generator knows who, what, and where, we need to format the information in a way that people can read it. These decimal numbers aren’t gonna mean a dang thing to anybody that isn’t a computer. And as much as I know you don’t want your obituary to look like everybody else’s, there are very good reasons that at least part of the obit should be made with a cookie cutter (eg template-driven) approach. Imagine if our Obit Generator had a catalog of set phrases that show up in pretty-much every obituary you have ever read. Perhaps I could be given a choice to click on one of a dozen different “boiler plate” obits, and use that as my starting place. Theoretically, the Obit Generator could import the details I have already provided, and then format them into a generic obituary. I wouldn’t have to do much thinking about it, I could just choose “style number seventeen” from the list of available templates. Poof! My obit is already somewhat presentable. But here’s the kicker. Here’s why we went through all the trouble that I have detailed for you so far: The obit generator in my dream was able to translate my obit into dozens of languages with the simple click of a mouse. The “set phrases” in my boiler-plate obituary have already been painstakingly translated by experts into a variety of common languages. Inevitably somebody is going to want an Obit in Klingon, or ancient Scots-Gaelic, or Egyptian Hieroglyphics. And so it might make sense to create an API for developers to use that allows them to create new templates for the Obit Generator. But let’s start with the basics. English, French, German, Italian, Japanese, Portugese, etc. Because we have stored only the Nomino reference ID# instead of the name, we can now ask the software to display the name of the deceased in Hebrew, or in Korean, or whatever. If you want to display a non-Japanese name in Japanese, it would be displayed in Katakana. Because Katakana is a phonetic alphabet, there are already standard spellings for most popular western names. This is not a feature of Nomino, as far as I know, but a list of Katakana spellings for western names could be built up over time. There is a community of tattoo artists who have already begun to build such a database on their own, and so we might not have to start from scratch. Now lets’ look at a difficult example, which illustrates where the automation won’t get us from here to there. Let’s say a person with Japanese heritage dies in America, but her family wants the Obit to be published in Japanese so her overseas relatives can read it. In addition to selecting the American spelling of her name, you would also need to specify the correct spelling of her surname in Japanese (Kanji). This will require some homework, and the Obit Generator might not be able to get you all the way to your intended destination. I have had success using Jim Breen’s free online resources for sleuthing out the correct spelling of many Japanese names. But the thing is, I already speak the language, so I have a leg up on many end-users who might want to use the Obit generator. There may eventually be the need for a community of translators, and of genealogists who are willing to provide the service of looking up the correct spelling of a Japanese name. Again, I’m assuming that Japanese is the worst case scenario, so feel free to extrapolate on this basic premise. This community of professional genealogists would almost inevitably be able to speak Japanese (or whatever other language) fluently. So hiring them to translate any freeform data into Japanese might also be an option. Speaking of freeform data, there might be other sorts of information that you might want to include in an Obituary which shouldn’t require you to deviate from the template-driven approach. Facebook provides me with a list of schools when I enter in the name of my Alma Mater. LinkedIn does too. There are lots of GPS devices that seek to provide driving directions to major landmarks. Surely there are code libraries, and web sites that provide hooks a developer could use to link into their data. In other words, when I get to the part about where the deceased attended High School, I ought to be able to type in three or four letters and then click on the name of the high school from a list of potential matches. Tommy attended Tomahawk High where he graduated with honors. Tommy and Doris used to go to Farfunkel, a local club, and dance the night away. Over the years I’ve discovered several dozen web sites that catalog information about pop culture stuff—stuff that might be relevant to the casual Obituary writer. Wikipedia is loaded with background information about all manner of obscure information. Perhaps even the grocery store that Tommy worked at when he was a teen ager. The IMDB is loaded with information that might prove relevant. The date Tommy’s favorite movie came out. The name of a movie star that was born on his birthday. The title and the release date of the movie that Tommy and Doris went to see on their first date. Billboard magazine might have some data about the most popular songs when Tommy was a teen ager. For the sake of brevity, I’ll leave it at that. But I hope this thought process is one you can wrap your head around quickly, so we can start working out the details. Even a freeform Obituary might be written in such a way that it hyperlinks to information about specific schools, restaurants, theaters, favorite films, musical acts, dance halls, scout troops, or even military service (eg Camp Pendleton, the such-and-such Brigade/Battalion/{Platoon, etc.). Given the right formula, the Obit Generator could potentially generate an XML document that can be printed on paper and read by ordinary humans with little or no difficulty. But that same document could also be published online with all of the metadata you could possibly ever want or need. The resulting online Obituary could potentially be published in both English and Hebrew. All of the proper nouns could be displayed in Hebrew, and the dates could be displayed using the Hebrew calendar, and with the Hebrew name for the day of the week that the funeral will take place. The time of the funeral could even be displayed in Tel Aviv time. Even if the Obit contains a lot of freeform data, it could still contain key phrases that can be easily “localized” in other languages. My dream went into a lot more detail than the means by which a simple obituary might be published. I was made to understand that every word on every web page could eventually be identified in a way that a computer might understand its exact meaning. How this is accomplished was a marvelous thing to contemplate. The premise is rather simple. I may not know how to read Portugese, but I know how to read and write English. In theory, I could make a big difference in the ability of computers to perform accurate machine translation of a block of text. The computer could first do an automatic scan of the document, and identify the individual words. I have since learned that this is a standard process called “tokenization”. It works very well in all languages but Chinese, where there are no spaces between words. A group of computer linguists have published some interesting papers on how to automate the tokenization of Chinese text. So far, that’s the only problem I’ve been able to detect with tokenization. Once the computer has identified individual words (tokens), it could then ask me to help it define each of the words. I tell the machine what language the document was written in, and then the computer scans the document for identifiable words in that language. Each individual word would be given a meta tag that links it to a similar word in an electronic dictionary. For our purposes, this electronic dictionary would have to assign a unique identifier to each word, and an additional identifier for each of the available definitions. On the first pass, the program would only find the identifier for the word itself. After identifying each word and attaching reference ID #, the program would then attempt to diagram each sentence in the document. Eventually I learned that this process is called POS Tagging (POS meaning “Parts of Speech”). As I surveyed the available online literature about POS tagging, I got the impression that machines do a less than perfect job of this. But I have a solution in mind, and this was something that I was told about in my dream. Let the machine do the best it can, and then let the human operator finish the task. It would be nice if the computer could do a perfect job of this, but language is squishy in ways that computers will never be perfectly able to parse. It was my understanding that the POS tagging would have to be done in multiple passes. Identifying the words in the dictionary is not the same thing as parsing the words in the document. Yeah, the computer can find a word in the dictionary that is spelled the same way as the word in the document. But that is not enough for the computer to actually “understand” what that word means. Thus far, we only have a tag that indicates there is a correlation of some kind. We have a link between the two matching words. Some words have only one possible meaning, and the computer should have no difficulty parsing those. But there are lots of other words with more than one definition. It seems to me that a native speaker of the language in question ought to have an easy time recognizing that the word in the document most closely resembles definition three in the electronic dictionary. A native speaker should also have an easy time identifying synonyms that could have been properly used in place of the word that the author chose. In this case I am imagining a simple point and click procedure. Definition three, synonyms 4, 5, and 12. Click, click, click, click, and then on to the next word. Why are we looking for synonyms, you might ask? Because we are not really looking for words. We are looking for cognates. We don’t want to know that the baseball game happened in Springfield, we want to know the latitude and longitude coordinates with as much precision as is available, and what time it was in that specific time zone when the sporting competition took place. The computer wants decimal numbers, not words, so it can perform a variety of useful and interesting computations. In various subsequent passes, the machine could possibly identify groups of words that form a cohesive phrase. I can’t recall what computer linguists call this, but it seems like a valuable step to me. If a common phrase could be flagged as “common phrase number 2487203023877383”, then you could potentially delete a lot of the markup language that identifies each token, each part of speech, etc. This is stuff that the end user need not worry about. It can all happen behind the scenes. Once the markup of a freeform document has been completed, it is theoretically possible for a computer to possess a perfectly unambiguous rendering of the information contained within a document. The document is now what linguists call a CODEX. Every detail has been boiled down to its bare essence, and then codified into something that looks like simple math. Imagine if the English translation of a document was codified in this manner, and the Russian translation was codified using the identical process. It would now be possible to connect each cognate together, and create a Russian-English, English-Russian dictionary where definition 3 of the English word links to definition 5 of the Russian word. Now we are really getting somewhere. Given enough computing horsepower, and enough documents, and enough time, eventually the machine could begin to learn from all the data it has processed and compiled. While this all sounded like a crazy pipe dream in the 1960’s, it is now easy to conceive of a way to create an electronic dictionary that correlates each cognate between dozens of languages. You see, it’s not that we need one giant supercomputer. It’s that we need a giant cluster computing environment, such as the world wide web. Machine translation could get pretty good if the machine can “learn” from each document it tries to parse. The LDS Church owns hundreds of books and other documents that have been meticulously translated into dozens of languages. Imagine if the machine were allowed to learn from all of those documents, and identify key phrases that get used over and over again. Granted, there would be a great many phrases in the LDS Codex that most people find odd, or are dripping with religious overtones that are an anathema to mainstream scientists. But after all, the Holy Bible was used as a Rosetta Stone by linguists long before the personal computer was invented. Having taught myself the basics of computer linguistics, artificial intelligence, “localization”, tokenization, POS Tagging, and so forth, I thought I was pretty smart. I thought I had invented the future of computing. Then one day I contacted the founder of Wikipedia. He was quick to burst my bubble. “David, I have two words for you. Semantic Web”. It turns out that Tim Berners Lee had thought of all this stuff a long time ago. And in his role at the W3C (World Wide Web Consortium), he has been trying for more than a decade to coax and cajole web developers to help him build the semantic web. I had never heard the word Ontology before I started reading about Mr. Lee and his Semantic Web initiative. But he has been pushing for people to adopt a markup language he calls “OWL.” It’s an acronym for the Ontological Web Language, or some such thing. The W3C is the standards body that governs all aspects of the internet. But the problem Mr. Lee faces, as I understand it, is that his organization doesn’t have any teeth. They can create the best platform on the planet, but if developers want to stick with what they know, there’s nothing anybody can do to stop them. Web pages that were optimized for Netscape Navigator, or even Mosaic are still out there floating around somewhere. And in many ways I am glad that the information has not been dispatched for reasons that have only to do with formatting. But here’s my take on things. The W3C may not have teeth, but I know about a carrot that would motivate even the stubbornest donkey in the stable. It’s called SEO. Search Engine Optimization. Every corporation on the globe is spending money on finding ways to get higher placement on the most popular search engines. It makes good business sense to spend money on this. Imagine if one of the search engines were to hold a press conference tomorrow, announcing that they would award higher rankings to all web pages that comply with the W3C’s Ontological Web initiative. Do you think it would take a century for people to start reading up on O.W.L.? Or do you think the web would change overnight? Depending on which search engine, I’m pretty sure it would happen overnight. But I don’t have the ear of any big wig over at Google. I’m just a guy who had a series of strange dreams. And as much as I wish I could create an Obit Generator that is OWL compliant, for now I will just have to sit here and wish for better days. Peace.

Posted on: Mon, 09 Sep 2013 11:42:52 +0000

Sometime in 2007 I had a very strange dream. A man in a shiny, - TopicsExpress

Trending Topics

Recently Viewed Topics