« March 2008 | Main | May 2008 »

Clearing Up RDF misrepresentation once again!

Daniel Lewis has penned a post titled: Clearing up some misconceptions..again, in response to Ben Werdmuller's post titled: Introducing the Open Data Definition.

The great thing about the Linked Data Web is that it's much easier to discovery and respond to these points of view before the ink dries :-) Ben certainly needs to take a look at the Semantic Web FAQ pre or post assimilation of Daniel's response.

Linked Data enters state of Evoluation

During a brief chat with Michael Hausenblas about a new Linked Data project he is championing called: LForum, I made a freudian slip, in the form of the typo: Evoluation, which at the time was supposed to have been: Evolution. Anyway, we had a chuckle and realized we were on to something, so I proceeded to formalize the definition:

Evoluation is evolution devoid of the randomness of mutation. A state of being in which it is possible to evaluate and choose evolutionary paths.

Evoluation actually describes where we are today in relation to the World Wide Web; to the Linking Open Data community (LOD), it's taking the path towards becoming a Giant Global Graph of Linked Data; to the Web 2.0 community, it's simply a collection of Web Services and associated APIs; and to many others, it remains an opaque collection of interlinked documents.

The great thing about the Web is that it allows netizens to explore a plethora of paths without adversely affecting the paths of others. That said, controlling one's path may take mutation out of evolution, but we are still left with the requirement to adapt and eventually survive in a competitive environment. Thus, although we can evaluate and choose from the many paths the Web's evolution offers us, the path that delivers the most benefits ultimately dominates. :-)

Linked Data Trip Report - Part 1 (Update 2)

Typo cleansed edition :-)

Objectives

  • Meet LOD Community Members
  • Participate in Workshop

Meeting LOD Community Members

Although the Web continues to shrink the planet by removing the restrictions of geopgrahic location, meeting people face-to-face remains invaluable (*priceless in Mastercard AD speak*). Naturally, meeting and chatting with as many LOD community members as possible was high up on my agenda.

Participate in Workshop

As one of the co-chairs of the Linking Open Data Workshop (LODW), I had a 5 minute workshop opening slot during which I spoke about the following:

Where we are today:

We have DBpedia as a major hub on the burgeoning Linked Data Web. When OpenLink offered to host DBpedia (a combination of Virtuoso DBMS Software and sizable backend Hardware infrastructure), it did so knowing that such an effort would emphatically address the "chicken and egg" conundrum that, prior to this undertaking, stifled the ability to demonstrate practical utility of HTTP based Linked Data.

Today, the Linked Data bootstrap mission has been accomplished.

Where we go next:

Although DBpedia is a hub (ground zero of Linked Data), we have to put it into perspective in relation to a new set of needs and expectations moving forward. Today, DBpedia is a Sun at the heart of a Solar System within the Linked Data Galaxy. But unlike Space as we know it, in Cyberspace we can have connectivity and collaboration across Solar Systems -- life exists elsewhere and we are part of a collaborative collective unimpeded by constraints of space travel etc. Thus, expect to see the emergence of other Solar Systems accessible to DBpedia and its collections of planets (see. LOD diagram). Examples underway include UMBEL which will serve the Linked Data planets from OpenCyc (Subject Matter Concepts), Yago (Named Entities), and Bio2RDF (which provides powerful Bio Informatics based Linked Data planet).

I urged the community to veer more aggressively towards developing and demonstrating practical Linked Data driven solutions that are aligned to well known problems. Of course, I encouraged all presenters to make this an integral part of their presentations :-)

Workshop Summary:

The workshop was well attended and I found all the presentations engaging and full of enthusiasm.

As the sessions progressed, it became clear during a number of accompanying Q&A sessions that a new Linked Data exploitation frontier is emerging. The frontier in question takes the form of a Linked Data substrate capable of addressing the taxonomic needs of solutions aimed at automated Named Entity Extraction, Disambiguation, Subject matter Concept alignment, transparently integrated with existing Web Content. Thus, we are moving beyond the minting and deployment of of dereferencable URIs and RDF data sets to automagically associating existing Web Content with Named Entities (People, Organizations, Places, Events etc..) and Subject matter Concepts (Politics, Music, Sports, and others) while remaining true to the Linking Open Data Community creed i.e. ensuring the Named Entity and Subject matter Concept URIs are available to user agents or users seeking to produce alternative data views (i.e. Mesh-ups).

I will get to part 2 of this report once the actual workshop sessions slides go live (*these are different from the pre-event PDFs links*).

Linked Data Illustrated and a Virtuoso Functionality Reminder

Daniel Lewis has put together a nice collection of Linked Data related posts that illustrate the fundamentals of the Linked Data Web and the vital role that Virtuoso plays as a deployment platform. Remember, Virtuoso was architected in 1998 (see Virtuoso History) in anticipation of the eventual Internet, Intranet, and Extranet level requirements for a different kind of Server. At the time of Virtuoso's inception, many thought our desire to build a multi-protocol, multi-model, and multi-purpose, virtual and native data server was sheer craziness, but we pressed on (courtesy of our vision and technical capabilities). Today, we have a very sophisticated Universal Server Platform (in Open Source and Commercial forms) that is naturally equipped to do the following via very simple interfaces:
    - Provide highly scalable RDF Data Management via a Quad Store (DBpedia is an example of a live demonstration)
    - Powerful WebDAV innovations that simplify read-write mode interaction with Linked Data
    - More...

Linked Data enabling PHP Applications

Daniel lewis has penned a variation of post about Linked Data enabling PHP applications such as: Wordpress, phpBB3, MediaWiki etc.

Daniel simplifies my post by using diagrams to depict the different paths for PHP based applications exposing Linked Data - especially those that already provide a significant amount of the content that drives Web 2.0.

If all the content in Web 2.0 information resources are distillable into discrete data objects endowed with HTTP based IDs (URIs), with zero "RDF handcrafting Tax", what do we end up with? A Giant Global Graph of Linked Data; the Web as a Database.

So, what used to apply exclusively, within enterprise settings re. Oracle, DB2, Informix, Ingres, Sybase, Microsoft SQL Server, MySQL, PostrgeSQL, Progress Open Edge, Firebird, and others, now applies to the Web. The Web becomes the "Distributed Database Bus" that connects database records across disparate databases (or Data Spaces). These databases manage and expose records that are remotely accessible "by reference" via HTTP.

As I've stated at every opportunity in the past, Web 2.0 is the greatest thing that every happened to the Semantic Web vision :-) Without the "Web 2.0 Data Silo Conundrum" we wouldn't have the cry for "Data Portability" that brings a lot of clarity to some fundamental Web 2.0 limitations that end-users ultimately find unacceptable.

In the late '80s, the SQL Access Group (now part of X/Open) addressed a similar problem with RDBMS silos within the enterprise that lead to the SAG CLI which is exists today as Open Database Connectivity.

In a sense we now have WODBC (Web Open Database Connectivity), comprised of Web Services based CLIs and/or traditional back-end DBMS CLIs (ODBC, JDBC, ADO.NET, OLE-DB, or Native), Query Language (SPARQL Query Language), and a Wire Protocol (HTTP based SPARQL Protocol) delivering Web infrastructure equivalents of SQL and RDA, but much better, and with much broader scope for delivering profound value due to the Web's inherent openness. Today's PHP, Python, Ruby, Tcl, Perl, ASP.NET developer is the enterprise 4GL developer of yore, without enterprise confinement. We could even be talking about 5GL development once the Linked Data interaction is meshed with dynamic languages (delivering higher levels of abstraction at the language and data interaction levels). Even the underlying schemas and basic design will evolve from Closed World (solely) to a mesh of Closed & Open World view schemas.

Adding Wordpress Blogs into the Linked Data Web using Virtuoso

Wordpress is a Weblog platform comprised of the following:

  1. User Interface - PHP
  2. Application Logic - PHP
  3. Data Storage (SQL RDBMS) - MySQL via PHP-MySQL
  4. Application Server - Apache

In the form above (the norm), Wordpress data can be injected into the Linked Data Web via RDFization middleware such as theVirtuoso Sponger (built into all Virtuoso instances) and Triplr. The downside of this approach is that the blog owner doesn't necessary possess full control over their contributions to the emerging Giant Global Graph or Linked Data.

Another route to Linked Data exposure is via Virtuoso's Metaschema Language for producing RDF Views over ODBC/JDBC accessible Data Sources, that enables the following setup:

  1. User Interface - PHP
  2. Application Logic - PHP
  3. Data Storage (SQL RDBMS) - MySQL via the PHP-MySQL data access interface
  4. Virtual Database linkage of MySQL Tables into Virtuoso
  5. RDF View generated over the Virtual SQL Tables
  6. Application Server - Virtuoso which provides Linked Data Deployment such that RDF Linked Data is exposed when requested by Web User Agents.

Alternatively, you can also exploit Virtuoso as the SQL DBMS, RDF DBMS, Application Server, and Linked Data Deployment platform:

  1. User Interface - PHP
  2. Application Logic - PHP
  3. Data Storage (SQL RDBMS) - Virtuoso via PHP-ODBC data access interface (* ODBC is Virtuoso's native SQL CLI/API *)
  4. RDF View generated over the Native SQL Tables
  5. Application Server - Virtuoso which provides Linked Data Deployment such that RDF Linked Data is exposed when requested by Web User Agents (e.g. OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, and Tabulator).

Benefits?

  • Each user account gets a proper Linked Data URI (ID) that can me meshed/smushed with other IDs (so you add data from this new blog space to other linked data sources associated with you other URIs/IDs)
  • Each post gets a proper URI All data is now query-able via SPARQL Discoverability increases exponentially (without drop in relevance in either direction i.e. discovering or being discovered)

How Do I map the WordPress SQL Schema to RDF using Virtuoso?

  • Determine the RDF Schema or Ontologies that define the Classes for which you will be producing instance data (e.g. SIOC and FOAF)
  • Declare URI/IRI generator functions (*special Virtuoso functions*)
  • Use SPARQL Graph patterns to apply URI/IRI generator functions to Tables, Views, Table Values mode Stored Procedures, Query Resultsets as part of RDBMS to RDF mapping

Read the Meta Schema Language guide or simply apply our "WordPress SQL Schema to RDF" script to your Virtuoso hosted instance. Of course, there are other mappings that cover other PHP applications deployed via Virtuoso:

Live Demos?