Response to: Where's the Killer Semantic Web Application (Update #2)

As is often the case these days, it's much easier to drop a blog post than it is to make a simple comment in an "old media" style data space :-(

My use of "old media" implies: a place that still seeks subscriber data (no OpenID etc..), for the umpteenth time, as the toll fee for discourse development and participation on the Web.

Anyway, here is what I attempted to post as a comment to Dan Grigorovici's post titled: Where is the Semantic Web Killer App?

Dan,

An intriguing post to say the least :-)

"Linked Data" and "Semantic Web" aren't synonymous, they are simply connected, infrastructure DNA-wise. You can have "Semantic Web" style graphs (i.e RDF Data) and not have "Linked Data" as per Linked Data deployment tenets and best practices, a very important point.

I've stated repeatedly, the "Linked Data" emphasis has more to do with focusing on a point of crystallization within the larger "Semantic Web" vision, so here is a quick recap:

What is Linked Data?

A term coined by TimBL that describes an application of HTTP to the time-tested process of "Data Access by Reference". "Linked Data" adds vital items to the "Data Access by Reference" pattern that have been erstwhile unattainable:

  • The use of a Data Source Naming scoped to Database / Data Container Records as opposed to Tables, Views, Stored Procedures, Databases, and other Record Container tuple collections. Example: in ODBC / JDBC, a Data Source Name's scope stops at the Table / View level. In the Linked Data realm you get an added layer of granularity due to record level name scope
  • Incorporation of HTTP into the Data Source Naming scheme, which injects the expanse of the Web into the Data Access Range of the Data Source Name (i.e. a Named Record); so you can reference a record's description directly via HTTP which is simply a major deal (to put things mildly).

So we have HTTP based URIs as the Data Sources Names for a "Linked Data Web" i.e a Web of inter-connected Data Source Names that de-emphasize the importance of their host containers (Compound Documents / Information Resources).

The business case or value proposition of "Linked Data" is synonymous with the value proposition of data access technologies such as ODBC, JDBC. ADO.NET, OLE-DB, XMLA, and others (enterprise or consumer) in relation to the Individual and Enterprise pursuit of agility; in a realm where data is growing exponentially, and the maximum processing time in a single day remains 24 hrs. Data Access & Data Integration are timeless challenges due to the following constants:

  • Structured Data Schema Heterogeneity - we will always model the same things differently
  • Dirtiness of Data within Structured Data Containers - we are error prone due to laziness / sloppiness, time constraints, and the inherent limitation of our DNA based CPUs when dealing with large volumes of data.

Note: The line between the Enterprise & Individuals continue to blur by the second, this is something I covered during my Linked Data Planet keynote, which is like most things I put on the Web (via this blog data space), is a live and practical demonstration of the virtues of Linked Data courtesy of RDFa, the Bibliographic Ontology, and dereferencable URIs (i.e. HTTP based Data Source Names for Documents and the Entities they host).

Related

My Linked Data Planet Keynote (Updated with missing link)

I've finally found a second to drop a note about my keynote.

The keynote: Creating, Deploying, and Exploiting Linked Data, sought to achieve the fundamental goal of: Demystify the concept of "Linked Data" using anecdotal material that resonates with enterprise decision makers.

To my pleasure, 90% of the audience members confirmed familiarization with the "Data Source Name" concept of Open Database Connectivity (ODBC). Thus, all I had to do was map "Linked Data" to ODBC, and then unveil the fundamental add-ons that "Linked Data" delivers:

  • The ability to give database records names (Identifiers)
  • The use of HTTP in the database record naming mechanism - which expands a named database record's reference scope via the expanse of the Web (i.e HTTP based Identifiers called URIs).

I believe a majority of attendees came to realize that the combination above injects a new Web interaction dynamic: access to "Subject matter Concepts" and Named Entities contained within a page via HTTP base Data Source Names (URIs).

BTW - My presentation is a Linked Data Space in it's own right courtesy of the Bibliographic Ontology (which provides slide show modeling) and RDFa that allows me to embed annotations into my Slidy based presentation :-)

Related

Missing Bits from semanticweb.com Interview

Yikes! I've just discovered that the final part of the semanticweb.com's interview with Jim Hendler and I, includes critical paragraphs that omit my example links :-( As you can imagine, this is a quite excruciating, bearing in mind that "Literals" are of marginal value in a Linked Data world.

Anyway, thanks to the Blogosphere, I can attempt to fix this problem myself -- via this post :-)

Q. If you wanted to provide a bewildered but still curious novice a public example of Linked Data at work in their everyday life, what would it be?

Kingsley Idehen: Any one of the following:

My Linking Open Data community Profile Page - the Linked Data integration is exposed via the "Explore Data" Tab My Linked Data Space - viewed via OpenLink's AJAR (Asynchronous Javascript and RDF) based Linked Data Brower My Events Calendar Tag Cloud - a Linked Data view of my Calendar Space using an RDF-aware browser In all cases, you have the ability to explore my data spaces by simply clicking on the links, which on the surface appear to be standard hypertext links, although in reality you are dealing with hyperdata links (i.e., links to entities that result in the generation of entity description pages that expose entity properties via hyperdata links). Thus, you have a single page that describes me in a very rich way since it encompasses all data associated with me, covering: personal profile, blog posts, bookmarks, tag clouds, social networks etc.

Q. What would you show the CEO or CTO of a company outside the tech industry?

Kingsley Idehen: A link to the Entity ALFKI, from the popular Northwind Database associated with Microsoft Access and SQL Server database installations. This particular link exposes a typical enterprise data space (orders, customers, employees, suppliers ...) in a single page. The hyperdata links represent intricate data relationships common to most business systems that will ultimately seek to repurpose existing legacy data sources and SOA services as Linked Data. Alternatively, I would show the same links via the Zitgist Data Viewer (another Linked Data-aware browser). In both cases, I am exploiting direct access to entities via HTTP due to the protocols incorporation into the Data Source Naming scheme.

Linked Data in Action: Library of Congress

As I start my countdown to the upcoming Linked Data Planet conference, here is the first of a series of posts geared towards showcasing practical use of the burgeoning Linked Data Web.

First up, the Library of Congress, take a look at the following pages which are "Human" and machine based "User Agent" friendly:

Key point: The pages above are served up in line with Linked Data deployment and publishing tenets espoused by the Linking Open Data Community (LOD) which include (in my preferred terminology):

  • Giving "Names" to things you observe (aka Data Source Names or "DSNs" for short)
  • Use HTTP URLs in your data source naming scheme so that "access by reference" to your data sources exploits the expanse of the HTTP driven Web i.e make your DSNs "Linked Data Source Names" (LDNS)
  • Remember that Documents / Pages are compound in nature, and they aren't the only data sources we would want to name; a document's LDSN must be distinct from the LDSNs used for the subject matter concepts and/or named entities associated with a document
  • Use the RDF Data Model to express structure within your data source(s)
  • Use LDSNs when constructing statements/claims/assertions/records (triples) inside your structured data sources
  • When publishing Web Pages related to your data sources; use at least one of the following to methods to guide user agents to data sources associated with your published page; the HTML LINK tag, RDFa, GRDDL, or Content Negotiation.

The items above are features that users and decision makers should start to hone into when seeking, and evaluating, platforms that facilitate cost-effective exploitation of the Linked Data Web.

Reasoning Matters Contd

I just stumbled across a post titled: Why Reasoning Matters: Consistency Checking from Clark and Parsia

As you can see from my recent post about how we've started the process of inoculating DBpedia against the potential dangers of "contextual incoherence", we are entering a newer era in the Semantic Web's evolution. My post and the one from Clark & Parsia both touch different aspects of the "Data Dictionary" for the Semantic Web issue.

Note: in my universe of discourse, a Data Dictionary manifests when the constraints and class hierarchies defined in an ontology (e.g. a web accessible shared ontology) are functionally bound to a data manager. Interestingly the binding can take the following forms:

  • Engine Hosted - which is what you get with Virtuoso's in-built Inference Engine
  • External - which is what you get when the Inference Engine is a distinct component from the data manager (example: Owlgres which can sit in front of 3rd party SPARQL endpoints via ARQ)

The classification terminology I use above is very much off-the-cuff, its sole purpose is architectural distinction.

Anyway, it's really nice to see that we are entering an era re. the Semantic Web vision, where the virtues of reasoning are getting simpler to demonstrate and articulate.

In a nutshell, the point-point data integration era is coming to an end! The era of intelligent ontology based enterprise data integration is nigh!

Of course, there is much more to come on the practical utility front, so stay tuned as we work our way through the DBpedia inoculation program.

DBpedia receives shot #1 of CLASSiness vaccine

The current live instance of DBpedia has just received dose #1 of a series of planned "Context" oriented booster shots. These shots seek to to protect DBpedia from contextual incoherence as it grows in data set expanse and popularity. Dose #1 (vaccine label: Yago) equips DBpedia with a functional (albeit non exclusive) Data Dictionary component courtesy of the Yago Class Hierarchy .

When the DBpedia & Yago integration took place last year (around WWW2007, Banff) there was a little, but costly omission that occurred: nobody sought to load the Yago Class Hierarchy into the Virtuoso's Inference Engine :-(

Anyway, the Class Hierarchy has now been loaded into the Virtuoso's inference engine (as Virtuoso Inference Rules) and the following queries are now feasible using the live Virtuoso based DBpedia instance hosted by OpenLink Software:

define input:inference 'http://dbpedia.org/resource/inference/rules/yago#'
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property/>
PREFIX yago: <http://dbpedia.org/class/yago/>

SELECT ?s
FROM <http://dbpedia.org>
WHERE {
?s a yago:Publication106589574 .
?s dbpedia:name "The Lord of the Rings"@en .
}

-- Variant of query with Virtuoso's Full Text Index extension: bif:contains

define input:inference 'http://dbpedia.org/resource/inference/rules/yago#'
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property/>
PREFIX yago: <http://dbpedia.org/class/yago/>

SELECT ?s ?n
FROM <http://dbpedia.org>
WHERE {
?s a yago:Publication106589574 .
?s dbpedia:name ?n .
?n bif:contains 'Lord and Rings'
}

-- Retrieve all individuals instances of the Book Class
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property/>
PREFIX yago: <http://dbpedia.org/class/yago/>

SELECT ?s ?n
FROM <http://dbpedia.org>
WHERE {
?s a yago:Book106410904 .
?s dbpedia:name ?n .
}

-- Retrieve all individuals instances of Publication Class which should include all Books.
define input:inference 'http://dbpedia.org/resource/inference/rules/yago#'
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property/>
PREFIX yago: <http://dbpedia.org/class/yago/>

SELECT ?s ?n
FROM <http://dbpedia.org>
WHERE {
?s a yago:Publication106589574 .
?s dbpedia:name ?n .
}

Note: you can also move the inference pragmas to the Virtuoso Sever side i.e place the inference rules in a server instance config file, thereby negating the need to place "define input:inference 'http://dbpedia.org/resource/inference/rules/yago#'" pragmas directly in your SPARQL queries.

Related

State of the Semantic Web Presentation

Unfortunately a number of Linking Open Data (LOD) community / Linked Data tribe members (myself included) aren't at the Semantic Web Technologies conference in San Jose (we are in a busy period for Semantic Web Technology related Conferences). But all isn't lost as Ivan Herman (W3C Semantic Web Activity Lead) , LOD member, and SWEO colleague has carried the banner with aplomb.

Ivan's presentation titled: State of the Semantic Web, is a must view for those who need a quick update on where things are re. the Semantic Web in general.

I also liked the fact that in proper "Lead by example" manner, his presentation isn't PDF or PPT based, it's a Web Document :-)

Hint: as per usual, this post contains a Linked Data demo nugget. This time around, it's in the form of a shared calendar covering a large number of Semantic Web Technology events. All I had to do was subscribe to a number of WebDAV accessible iCal files from my Calendar Data Space and the platform did the rest i.e. produce Linked Data Objects for events associated with a plethora of conferences.

If you assimilate Ivan's presentation properly, you will note I've just generated, and shared, a large number of URIs covering a range of conference events. Thus, you can extend my contributions (thereby enriching the GGG) by simply associating additional data from your Linked Data Space with mine. All you have to do is use my calendar data objects URIs in your statements.

Context, Tagging, Semantic Web, and Linked Data (Updated)

Courtesy of Nova Spivack's post titled: Tagging and the Semantic Web: Tags as Objects, I stumbled across a related post by John Clarke titled: Tagging and the Semantic Web. Both of these posts use the common practice of tagging to shed light on the increasing realization that "The Pursuit of Context" is the fusion point between the current Web and its evolution into a structured Web of Linked Data.

How Semantic Tagging Works (from a 1000 feet)

When tagging a document, the semantic tagging service passes the content of a target document through a processing pipeline (a distillation process of sorts) that results in automagic extraction of the following:

Once the extraction phase is completed, a user is presented with a list of "suggested tags" using a variety of user interaction techniques. The literal values of elected Tags are then associated with one or more Tag and Tag Meaning Data Objects, with each Object type endowed with a unique Identifier.

Issues to Note

Broad acceptance that: "Context is king", is gradually taking shape. That said, "Context" landlocked within Literal values offers little over what we have right now (e.g. at Del.icio.us or Technorati), long term. By this I mean: if the end product of semantically enhanced tagging leaves us with: Literal Tag values only, Tags associated with Tag Data Objects endowed with platform specific Identifiers, or Tag Data Objects with any other Identity scheme that excludes HTTP, the ability of Web users to discern or derive multiple perspectives from the base Context (exposed by semantically enhanced Tags) will be lost, or severely impeded at best.

The shape, form, and quality of the lookup substrate that underlies semantic tagging services, ultimately affects "context fidelity" matters such as Entity Disambiguation. The importance of quality lookup infrastructure on the burgeoning Linked Data Web is the reason why OpenLink Software is intimately involved with the DBpedia and UMBEL projects.

Conclusions

I am immensely happy to see that the Web 2.0 and Semantic Web communities are beginning to coalesce around the issue of "Context". This was the case at the WWW2008 Linked Data Workshop, I am feeling a similar vibe emerging from the Semantic Web Technologies conference currently nearing completion in San Jose. Of course, I will be talking about, and demonstrating practical utility of all of this, at the upcoming Linked Data Planet conference.

Related

ODBC & WODBC Comparison

ODBC delivers open data access (by reference) to a broad range of enterprise databases via a 'C' based API. Thanks to the iODBC and unixODBC projects, ODBC is available across broad range of platforms beyond Windows.

ODBC identifies data sources using Data Source Names (DSNs).

WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.

ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.

WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).

ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).

WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!

So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!

Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.

URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.

I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.

The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.

By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)

Commercializing the Semantic Web


Unfortunately, I could only spend 4 days at the recent WWW2008 event in Beijing (I departed the morning following the Linked Data Workshop), so I couldn't take my slot on the "Commercializing the Semantic Web panel" etc.. Anyway, thanks to the Web I can still inject my points of view in the broad Web based discourse. Well so I hoped, when I attempted to post a comment to Paul Miller's ZDNet domain hosted blog thread titled: Commercialising the Semantic Web.

Unfortunately, the cost of completing ZDNet's unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I'll settle for a trackback ping instead.

What follows is the cut and paste of my intended comment contributions to Paul's post.

Paul,

As discussed earlier this week during our podcast session, commercialization of Semantic Web technology shouldn't be a mercurial matter at this stage in the game :-) It's all about looking at how it provides value :-)

From the Linked Data angle, the ability to produce, dispatch, and exploit "Context" across an array of "Perspectives" from a plethora of disparate data sources on the Web and/or behind corporate firewalls, offers immense commercial value.

Yahoo's Searchmonkey effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as "value consumption tickets" (Data Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to data encountered on the Web. Yahoo! is about to put this light on in a big way (imho).

The "self annotating" nature of the Web is what ultimately drives the manifestation of the long awaited Semantic Web. I believe I postulated about "Self Annotation & the Semantic Web" in a number of prior posts which, by the way, should be DataRSS compatible right now due to Yahoo's support of OpenSearch Data Providers (which this Blog Space has been for eons).

Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, Tag, Weblog, Shared Bookmark, Wikiword, Microformat, Microformat++ (eRDF or RDFa), GRDDL stylesheet, and RDFizer etc.. is a piece of structured data.

Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)

Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (information resource), and then you add Structure to the information resource (RSS, Atom, microformats, RDFa, eRDF, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to Linked Data) is a synch thanks to RDF Middleware (as per earlier RDF middleware posts).