« December 2007 | Main | February 2008 »

FOAF-ing Linked Data is quite SIOC-ing

The title of this post is a "Tongue in cheek" expression of euphoria now that I have FOAF and SIOC (pronounced SHOCK) based data spaces exposed via my FOAF and my SIOC information resource (RDF files) URIs.

If you want to explore who I know, what I read, and what I've tagged (amongst other things), all you have to do is:

  1. Beam a SPARQL query down my data space URIs which expose FOAF or SIOC based interconnected Linked Data graphs.
  2. Walkthrough using an RDF Browser until you reach a beachhead and then beam your SPARQL from there (remember you only need the URI of the RDF Data Source, and while in my Data Space every data item has a proper URI).

Some Tools that help you comprehend what I am saying:

Browsers

    Zitgist Data Viewer (SIOC and FOAF data spaces)
    OpenLink RDF Browser (SIOC and FOAF data spaces)

Query Tools

SPARQL based RDF Store Benchmarks via DBpedia

Christian Becker has delivered the final cut of an initial iteration of his DBpedia based RDF Data Stores benchmark. This particular exercise brought some very interesting things to our attention re. Virtuoso's default mode of operation:

  1. Virtuoso is a Quad Store in a Triple Store world -- it supports RDF data set storage partitioning via Named Graphs and it requires the use of the SPARQL FROM clause to scope query patterns to appropriate data sets. Otherwise, it looks across all hosted data sets for matching patterns
  2. We should be able to use our server side configuration settings to make the Quad Store behave like a Triple Store (meaning we set the list of applicable named graphs as part of the session configuration)
  3. Provide hints to users about missing POGS, PSOG, and SOPG bitmap indexes when SPARQL query patterns received by the server are deemed suboptimal (we do know the execution costs of each query)

How Do I create the missing Bitmap Indexes?

Go to the HTML based Virtuoso Conductor, iSQL command line interface, or an ODBC / JDBC / ADO.NET / OLE DB client and execute:

CREATE BITMAP index RDF_QUAD_POGS on DB.DBA.RDF_QUAD (P,O,G,S); 
CREATE BITMAP index RDF_QUAD_PSOG on DB.DBA.RDF_QUAD (P,S,O,G);
CREATE BITMAP index RDF_QUAD_SOPG on DB.DBA.RDF_QUAD (S,O,P,G);

Related

Semantic Data Web Epiphanies: One Node at a Time

In 2006, I stumbled across Jason Kolb (online) via a 4-part series of posts titled: Reinventing the Internet. At the time, I realized that Jason was postulating about what is popularly known today as "Data Portability", so I made contact with him (blogosphere style) via a post of my own titled: Data Spaces, Internet Reinvention, and the Semantic Web. Naturally, I tried to unveil to Jason the connection between his vision and the essence of the Semantic Web. Of course, he was skeptical :-)

Jason recently moved to Massachusetts which lead to me pinging him about our earlier blogosphere encounter and the emergence of a Data Portability Community. I also informed him about the fact that TimBL, myself, and a number of other Semantic Web technology enthusiasts, frequently meet on the 2nd Tuesday of each month at the MIT hosted Cambridge Semantic Web Gatherings, to discuss, demonstrate, debate all aspects of the Semantic Web. Luckily (for both of us), Jason attended the last event, and we got to meet each other in person.

Following our face to face meeting in Cambridge, a number of follow-on conversations ensued covering, Linked Data and practical applications of the Semantic Web vision. Jason writes about our exchanges a recent post titled: The Semantic Web. His passion for Data Portability enabled me to use OpenID and FOAF integration to connect the Semantic Web and Data Portability via the Linked Data concept.

During our conversations, Jason also eluded to the fact that he had already encountered OpenLink Software while working with our ODBC Drivers (part of or UDA product family) for IBM Informix (Single-Tier or Multi-Tier Editions) a few years ago (interesting random connection).

As I've stated in the past, I've always felt that the Semantic Web vision will materialize by way of a global epiphany. The count down to this inevitable event started at the birth of the blogosphere, ironically. And accelerated more recently, through the emergence of Web 2.0 and Social Networking, even more ironically :-)

The blogosphere started the process of Data Space coalescence via RSS/Atom based semi-strucutured data enclaves, Web 2.0 RDFpropagated Web Service usage en route to creating service provider controlled, data and information silosRDF, Social NetworkingRDF brought attention to the fact that User Generated Data wasn't actually owned or controlled by the Data Creators etc.

The emergence of "Data Portability" has created a palatable moniker for a clearly defined, and slightly easier to understand, problem: the meshing of Data and Identity in cyberspace i.e. individual points of presence in cyberspace, in the form of "Personal Data Spaces in the Clouds" (think: doing really powerful stuff with .name domains). In a sense, this is the critical inflection point between the document centric "Web of Linked Documents" and the data centric "Web or Linked Data". There is absolutely no other way solve this problem in a manner that alleviates the imminent challenges presented by information overload -- resulting from the exponential growth of user generated data across the Internet and enterprise Intranets.

SPARQL based RDF Store Benchmarks via DBpedia

Christian Becker has delivered the final cut of an initial iteration of his DBpedia based RDF Data Stores benchmark. This particular exercise brought some very interesting things to our attention re. Virtuoso's default mode of operation:

  1. Virtuoso is a Quad Store in a Triple Store world -- it supports RDF data set storage partitioning via Named Graphs and it requires the use of the SPARQL FROM clause to scope query patterns to appropriate data sets. Otherwise, it looks across all hosted data sets for matching patterns
  2. We should be able to use our server side configuration settings to make the Quad Store behave like a Triple Store (meaning we set the list of applicable named graphs as part of the session configuration)
  3. Provide hints to users about missing POGS, PSOG, and SOPG bitmap indexes when SPARQL query patterns received by the server are deemed suboptimal (we do know the execution costs of each query)

How Do I create the missing Bitmap Indexes?

Go to the HTML based Virtuoso Conductor, iSQL command line interface, or an ODBC / JDBC / ADO.NET / OLE DB client and execute:

CREATE BITMAP index RDF_QUAD_POGS on DB.DBA.RDF_QUAD (P,O,G,S);
CREATE BITMAP index RDF_QUAD_PSOG on DB.DBA.RDF_QUAD (P,S,O,G); 
CREATE BITMAP index RDF_QUAD_PSOG on DB.DBA.RDF_QUAD (S,O,P,G); 

Related

W3C's SPARQLing Data Access Ingenuity

The W3C officially unveiled the SPARQL Query Language today via a press release titled: W3C Opens Data on the Web with SPARQL.

What is SPARQL?

A query language for the burgeoning Structured & Linked Data Web (aka Semantic Web / Giant Global Graph). Like SQL, for the Relational Data Model, it provides a query language for the Graph based RDF Data Model.

It's also a REST or SOAP based Web Service that exposes SPARQL access to RDF Data via an endpoint.

In addition, it's also a Query Results Serialization format that includes XML and JSON support.

Why is it Important?

It brings important clarity to the notion of the "Web as a Database" by transforming existing Web Sites, Portals, and Web Services into bona fide corpus of Mesh-able (rather than Mash-able) Data Sources. For instance, you can perform queries that join one or more of the aforementioned data sources in exactly the same manner (albeit different syntax) as you would one or more SQL Tables.

Example:

-- SPARQL equivalent of SQL SELECT * against my personal data space hosted FOAF file
SELECT DISTINCT ?s ?p ?o
FROM <http://myopenlink.net/dataspace/person/kidehen> 
WHERE {?s ?p ?o}
-- SPARQL against my social network
-- Note: My SPARQL will be beamed across all of contacts in the social networks of my contacts as long as they are all HTTP URI based within each data space 
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?Person
FROM <http://myopenlink.net/dataspace/person/kidehen>
WHERE {?s a foaf:Person; foaf:knows ?Person}

Note: you can use the basic SPARQL Endpoint, SPARQL Query By Example, or SPARQL Query Builder Demo tool to experiment with the demonstration queries above.

How Do I use It?

SPARQL is implemented by RDF Data Management Systems (Triple or Quad Stores) just as SQL is implemented by Relational Database Management Systems. The aforementioned data management systems will typically expose SPARQL access via a SPARQL endpoint.

Where are it's implementations?

A SPARQL implementors Testimonial page accompanies the SPARQL press release. In addition the is a growing collection of implementations on the ESW Wiki Page for SPARQL compliant RDF Triple & Quad Stores.

Is this really a big deal?

Yes! SPARQL facilitates an unobtrusive manifestation of a Linked Data Web by way of natural extension of the existing Document Web i.e these Web enclaves co-exist in symbiotic fashion.

As DBpedia very clearly demonstrates, Linked Data makes the Semantic Web demonstrable and much easier to comprehend. Without SPARQL there would be no mechanism for Linked Data deployment, and without Linked Data there is no mechanism for Beaming Queries (directly or indirectly) across the Giant Global Graph of data hosted by Social Networks, Shard Bookmarks Services, Weblogs, Wikis, RSS/Atom/OPML feeds, Photo Galleries and other Web accessible Data Sources (Data Spaces).

Related items

    >Detailed SPARQL Query Examples using SIOC Data Spaces

Politics, Old Media, and Linked Data

According to current media:

Senator Barack Obama is a beacon of change within the democratic party while Senator Hillary Clinton is status quo.

According to the data in the GovtTrack.us data space:

Senator Barack Obama is a rank-and-file Democrat according to GovTrack's analysis of his track record in congress. Whereas, Senator Hillary Clinton is a radical democrat, according to the same Govt. Track analysis of her track record in congress.

Who do we believe? The GovtTrack.us performance data, old media pundits, or postulations of the candidates? GovtTrack.us is a new approach to candidate vetting. It provides data in traditional Document Web and Linked Data Web forms, placing analytic power in the hands of the citizen.

Here are insights into the track records of Senators Hillary Clinton and Barack Obama via the Zitgist Linked Data Viewer:

  1. Senator Hillary Clinton
  2. Senator Barack Obama

Note: I am not aligned to any political party or candidate, this is just a demonstration of Linked Data that has a high degree of poignancy relative to US primary elections etc..

2008, Facebook Data Portability, and the Giant Global Graph of Linked Data

As 2007 came to a close I repeatedly mulled over the idea of putting together a usual "year in review" and a set of predictions for the coming year etc. Anyway, the more I pondered, the smaller the list became. While pondering (as 2008 rolled around), the Blogosphere was set ablaze with the Robert Scoble's announcement of his account suspension by Facebook. Of course, many chimed in expressing views either side of the ensuing debate: Who is right -- Scoble or Facebook. The more I assimilated the views expressed about this event, the more ironic I found the general discourse, for the following reasons:

  1. Web 2.0 is fundamentally about Web Services as the prime vehicle for interactions across "points of Web presence"
  2. Facebook is a Web 2.0 hosted service for social networking that provides Web Services APIs for accessing data in the Facebook data space. You have to do so "on the fly" within clearly defined constraints i.e you can interact with data across your social network via Facebook APIs, but you cannot cache the data (perform an export style dump of the data)
  3. Facebook is a main driver of the term: "social graph", but their underlying data model is relational and the Web Services response (data you get back) doesn't return a data graph, instead it returns an tree (i.e XML)
  4. Scoble's had a number of close encounters with Linked Data Web | Semantic Data Web | Web 3.0 aficionados in various forms throughout 2007, but still doesn't quite make the connection between Web Services APIs as part of a processing pipeline that includes structured data extraction from XML data en route to producing Data Graphs comprised of Data Objects (Entities) endowed with: Unique Identifiers, Classification or Categorization schemes, Attributes, and Relationships prescribed by one or more shared Data Dictionaries/Schemas/Ontologies
  5. A global information bus that exposes a Linked Data mesh comprised of Data Objects, Object Attributes, and Object Relationships across "points of Web presence" is what TimBL described in 1998 (Semantic Web Roadmap) and more recently in 2007 (Giant Global Graph)
  6. The Linked Data mesh (i.e Linked Data Web or GGG) is anchored by the use of HTTP to mint Location, Structure, and Value independent Object Identifiers called URIs or IRIs. In addition, the Linked Data Web is also equipped with a query language, protocol, and results serialization format for XML and JSON called: SPARQL.

So, unlike Scoble, I am able to make my Facebook Data portable without violating Facebook rules (no data caching outside Facebook realm) by doing the following:

  1. Use an RDFizer for Facebook to convert XML response data from Facebook Web Services into RDF "on the fly" Ensure that my RDF is comprised of Object Identifiers that are HTTP based and thereby dereferencable (i.e. I can use SPARQL to unravel the Linked Data Graph in my Facebook data space)
  2. The act of data dereferencing enables me to expose my Facebook Data as Linked Data associated with my Personal URI
  3. This interaction only occurs via my data space and in all cases the interactions with data work via my RDFizer middleware (e.g the Virtuoso Sponger) that talks directly to Facebook Web Services.

In a nutshell, my Linked Data Space enables you to reference data in my data space via Object Identifiers (URIs), and some cases the Object IDs and Graphs are constructed on the fly via RDFization middleware.

Here are my URIs that provide different paths to my Facebook Data Space:

To conclude, 2008 is clearly the inflection year during which we will final unshackle Data and Identity from the confines of "Web Data Silos" by leveraging the HTTP, SPARQL, and RDF induced virtues of Linked Data.

Related Posts:

  1. 2008 and the Rise of Linked Data
  2. Scoble Right, Wrong, and Beyond
  3. Scoble interviewing TimBL (note to Scoble: re-watch your interview since he made some specific points about Linked Data and URIs that you need to grasp)
  4. Prior Blog posts my this Blog Data Space that include the literal patterns: Scoble Semantic Web

OpenOffice.org, SPARQL, and the Linked Data Web

Question posed by Dan Brickley via a blog post: SQL, OpenOffice: would a JDBC driver for SPARQL protocol make sense?

Writing a JDBC Driver for SPARQL is a little overkill. OpenOffice.org simply needs to make XML or Web Data (HTML, XHTML, and XML) bonafide data sources within its "Pivot Table" functionality realm. Then all that would then be required is a SPARQL SELECT Query transported via the SPARQL Protocol with results sent back using the SPARQL XML results serialization format (all part of a single SPARQL Protocol URL).

Excel successfully consumes the following information resource URI: http://tinyurl.com/yvoccj (a tiny url for a SPARQL SELECT against my FOAF file).

Alternatively, and currently achievable, you could simply use SPASQL (SPARQL within SQL) using a DBMS engine that supports SQL, SPARQL, and SPARQL e.g. Virtuoso.

Virtuoso SPASQL support is exposed via it's ODBC and/or JDBC Drivers. Thus you can do things such as:

  1. Use a SPARQL Query in the FROM CLAUSE of a SQL statement
  2. Execute SPARQL via SQL processor by prepending SPARQL query text with the literals "sparql"

BTW - My News Years Resolution: get my act together and shrink the ever increasing list of "simple & practical Virtuoso use case demos" on my todo which now spans all the way back to 2006 :-(