« August 2007 | Main | October 2007 »

Fourth Platform: Data Spaces in The Cloud (Update)

I've written extensively on the subject of Data Spaces in relation to the Data Web for while. I've also written sparingly about OpenLink Data Spaces (a Data Web Platform that build using Virtuoso). On the other hand, I haven't shed much light on installation and deployment of OpenLink Data Spaces.

Jon Udell recently penned a post titled: The Fourth Platform. The post arrives at a spookily coincidental time (this happens quite often between Jon and I as demonstrated last year during our podcast; the "Fourth" in his Innovators Podcast series).

The platform that Jon describes is "Cloud Based" and comprised of Storage and Computation. I would like to add Data Access and Management (native and virtual) under the fourth platform banner with the end product called: "Cloud based Data Spaces".

As I write, we are releasing a Virtuoso AMI (Amazon Image) labeled: virtuoso-dataspace-server. This edition of Virtuoso includes the OpenLink Data Spaces Layer and all of the OAT applications we've been developing for a while.

What Benefits Does this offer?

  1. Personal Data Spaces in the Cloud - a place where you can control and consolidate data across your Blogs, Wikis, RSS/Atom Feed Subscriptions, Shared Bookmarks, Shared Calendars, Discussion Threads, Photo Galleries etc
  2. All the data in your Data Space is SPARQL or GData accessible.
  3. All of the data in your Personal Data Space is Linked Data from the get go. Each Item of data is URI addressable
  4. SIOC support - your Blogs, Wikis, Bookmarks etc.. are based on the SIOC ontology for Semantically Interlinking Online Communities (think: Open social-graph++)
  5. FOAF support - your FOAF Profile page provides a URI that is an in-road to all Data in your Data Space.
  6. OpenID support - your Personal Data Space ID is usable wherever OpenID is supported. OpenID and FOAF are integrated as per latest FOAF specs
  7. Integration with Facebook - You can access your Data Space from Facebook or access Facebook from your Data Space
  8. Unified Storage - The WebDAV based filesystem provides Cloud Storage that's integrated with Amazon S3; It also exposes all of your Data Space data via a traditional filesystem UI (think virtual Spotlight); You can also mount this drive to your local filesystem via your native operating system's WebDAV support
  9. SyncML - you can sync calendar and contact details with your Data Space in the cloud from your Mobile phone.
  10. A practical Semantic Data Web solution - based on Web Infrastructure and doesn't require you to do anything beyond exposing URIs for data in your Data Spaces.

EC2-AMI Details:

    AMI ID: ami-e2ca2f8b
    Manifest file: virtuoso-images/virtuoso-dataspace-server.manifest.xml

Installation Guide:

  1. Get an Amazon Web Services (AWS) account
  2. Signup for S3 and EC2 services
  3. Install the EC2 plugin for Firefox
  4. Start the EC2 plugin
  5. Locate the row containing ami-e2ca2f8b Manifest virtuoso-images/virtuoso-dataspace-server.manifest.xml (sort using the AMI ID or Manifest Columns)
  6. Create a Virtuoso Data Space Server AMI instance
  7. Wait 4-5 minutes (*take a few minutes to create the pre-configured Linux Image*)
  8. Connect to: http://your-ec2-instance-cname:8890/ and Log in with user/password dba/dba
  9. Go to the Admin UI (Virtuoso Conductor) and change the PWDs for the 'dba' and 'dav' accounts (*Important!*)
  10. Give the "SPARQL" user "SPARQL_UPDATE" privileges (required if you want to exploit the in-built Sponger Middleware)
  11. Click on the ODS (OpenLink Data Spaces) link to start an Personal Editon of OpenLink Data Spaces (or go to: http://your-ec2-instance-cname/dataspace/ods/index.html)
  12. Log-in using the username and password credentials for the 'dav' account (or register a new user note: OpenID is an option here also) Create an Data Space Application Instance by clicking on a Data Space App. Tab
  13. Import data from your existing Web 2.0 style applications into OpenLink Data Spaces e.g. subscribe to a few RSS/Atom feeds via the "Feeds Manager" application or import some Bookmarks using the "Bookmarks" application
  14. Then look at the imported data in Linked Data form via your ODS generated URIs based on the patterns: http://your-ec2-instance-cname/dataspace/person/your-ods-id#this (URI for You the Person), http://your-ec2-instance-cname/dataspace/person/your-ods-id (FOAF File URI), http://your-ec2-instance-cname/dataspace/your-ods-id (SIOC File URI)

(OAT) from your Data Space instance

Install the OAT VAD package via the Admin UI and then apply the URI patterns below within your browser:
  1. http://:8890/oatdemo - Entire OAT Demo Collection
  2. http://:8890/rdfbrowser - RDF Browser
  3. http://:8890/isparql - SPARQL Query Builder (iSPARQL)
  4. http://:8890/qbe - SQL Query Builder (iSQL)
  5. http://:8890/formdesigner - Forms Builder (for building Meshups based on RDF, SQL, or Web Servives Data Souces)
  6. http://:8890/dbdesigner - SQL DB Schema Designer (note a Visual SQL-RDF Mapper is also on it's way
  7. http://:8890/DAV/JS/ - To view the OAT Tree (there are some experimental demos that are missing from the main demo app etc..)

There's more to come!

Nice Collection of Plain English Demos (Wikis, RSS, Social-Networking)

Nice Collection of plain english demos:

  1. Wikis
  2. RSS
  3. Social-Networking

The Social-Networking demo is explained using a Social-Graph visual. Interesting, the demo validates Tim Finn's point of view as expressed in his post titled: Is it a social network or a social graph :-)

Data Spaces & The Fourth Platform

I've written extensively on the subject of Data Spaces in relation to the Data Web for while. I've also written sparingly about OpenLink Data Spaces (a Data Web Platform that build using Virtuoso). On the other hand, I haven't shed much light on installation and deployment of OpenLink Data Spaces.

Jon Udell recently penned a post titled: The Fourth Platform. The post arrives at a spookily coincidental time (this happens quite often between Jon and I as demonstrated last year during our podcast; the "Fourth" in his Innovators Podcast series).

The platform that Jon describes is "Cloud Based" and comprised of Storage and Computation. I would like to add Data Access and Management (native and virtual) under the fourth platform banner with the end product called: "Cloud based Data Spaces".

As I write, we are releasing a Virtuoso AMI (Amazon Image) labeled: virtuoso-dataspace-server. This edition of Virtuoso includes the OpenLink Data Spaces Layer and all of the OAT applications we've been developing for a while.

What Benefits Does this offer?

  1. Personal Data Spaces in the Cloud - a place where you can control and consolidate data across your Blogs, Wikis, RSS/Atom Feed Subscriptions, Shared Bookmarks, Shared Calendars, Discussion Threads, Photo Galleries etc
  2. All the data in your Data Space is SPARQL or GData accessible.
  3. All of the data in your Personal Data Space is Linked Data from the get go. Each Item of data is URI addressable
  4. SIOC support - your Blogs, Wikis, Bookmarks etc.. are based on the SIOC ontology for Semantically Interlinking Online Communities (think: Open social-graph++)
  5. FOAF support - your FOAF Profile page provides a URI that is an in-road to all Data in your Data Space.
  6. OpenID support - your Personal Data Space ID is usable wherever OpenID is supported. OpenID and FOAF are integrated as per latest FOAF specs
  7. Two Integration with Facebook - You can access your Data Space from Facebook or access Facebook from your Data Space
  8. Unified Storage - The WebDAV based filesystem provides Cloud Storage that's integrated with Amazon S3; It also exposes all of your Data Space data via a traditional filesystem UI (think virtual Spotlight); You can also mount this drive to your local filesystem via your native operating system's WebDAV support
  9. SyncML - you can sync calendar and contact details with your Data Space in the cloud from your Mobile phone.
  10. A practical Semantic Data Web solution - based on Web Infrastructure and doesn't require you to do anything beyond exposing URIs for data in your Data Spaces.

EC2-AMI Details:

    AMI ID: ami-e2ca2f8b
    Manifest file: virtuoso-images/virtuoso-dataspace-server.manifest.xml

Installation Guide:

  1. Get an Amazon Web Services (AWS) account
  2. Signup for S3 and EC2 services
  3. Install the EC2 plugin for Firefox
  4. Start the EC2 plugin
  5. Locate the row containing ami-9df316f4 Manifest virtuoso-images/virtuoso-dataspace-server.manifest.xml (sort using the AMI ID or Manifest Columns
  6. Start the Virtuoso Data Space Server AMI
  7. Wait 4-5 minutes (*take a few minutes to create the pre-configured Linux Image*)
  8. Connect to http://:8890/ Log in with user/password dba/dba
  9. Go to the Admin UI (Virtuoso Conductor) and change the PWDs for the 'dba' and 'dav' accounts (*Important!*)
  10. Give the "SPARQL" user "SPARQL_UPDATE" privileges (required if you want to exploit the in-built Sponger Middleware)
  11. Click on the ODS (OpenLink Data Spaces) link to start an Personal Editon of OpenLink Data Spaces (or go to: http://your-ec2-instance-cname/dataspace/ods/index.html)
  12. Log-in using the username and password credentials for the 'dav' account (or register a new user note: OpenID is an option here also) Create an Data Space Application Instance by clicking on a Data Space App. Tab
  13. Import data from your existing Web 2.0 style applications into OpenLink Data Spaces e.g. subscribe to a few RSS/Atom feeds via the "Feeds Manager" application or import some Bookmarks using the "Bookmarks" application
  14. Then look at the imported data in Linked Data form via your ODS generated URIs based on the patterns: http://your-ec2-instance-cname/dataspace/person/your-ods-id#this (URI for You the Person), http://your-ec2-instance-cname/dataspace/person/your-ods-id (FOAF File URI), http://your-ec2-instance-cname/dataspace/your-ods-id (SIOC File URI)

(OAT) from your Data Space instance

Install the OAT VAD package via the Admin UI and then apply the URI patterns below within your browser:
  1. http://:8890/oatdemo - Entire OAT Demo Collection
  2. http://:8890/rdfbrowser - RDF Browser
  3. http://:8890/isparql - SPARQL Query Builder (iSPARQL)
  4. http://:8890/qbe - SQL Query Builder (iSQL)
  5. http://:8890/formdesigner - Forms Builder (for building Meshups based on RDF, SQL, or Web Servives Data Souces)
  6. http://:8890/dbdesigner - SQL DB Schema Designer (note a Visual SQL-RDF Mapper is also on it's way
  7. http://:8890/DAV/JS/ - To view the OAT Tree (there are some experimental demos that are missing from the main demo app etc..)

There's more to come!

Semantic Web Value Proposition

The motivation behind this post is a response to the Read/WriteWeb post titled: Semantic Web: Difficulties with the Classic Approach.

First off, I am going to focus on the Semantic Data Web aspect of the overall Semantic Web vision (a continuum) as this is what we have now. I am also writing this post as a deliberate contribution to the discourse swirling around the real topic: Semantic Web Value Proposition.

Situation Analysis

We are in the early stages of the long anticipated Knowledge Economy. That being the case, it would be safe to assume that information access, processing, and dissemination are of utmost importance to individuals and organizations alike. You don't produce knowledge in a vacum! Likewise, you can produce Information in a vacum, you need Data.

The Semantic Data Web's value to Individuals

Problem:

Increasingly, Blogs, Wikis, Shared Bookmarks, Photo Galleries, Discussion Forums, Shared Calendars and the like, have become invaluable tools for individual and organizational participation in Web enabled global discourse (where a lot of knowledge is discovered). These tools, are typically associated with Web 2.0, implying Read-Write access via Web Services, centralized application hosting, and data lock-in (silos).

The reality expressed above is a recipe for "Information Overload" and complete annihilation of ones effective pursuit and exploitation of knowledge due "Time Scarcity" (note: disconnecting is not an option). Information abundance is inversely related to available processing time (for humans in particular). In my case for instance, I was actively subscribed to over 500+ RSS feeds in 2003. As of today, I've simply stopped counting, and that's just my Weblog Data Space. Then add to that, all of the Discussions I track across Blogs, wikis, message boards, mailing lists, traditional usnet discussion forumns, and the like, and I think you get the picture.

Beyond information overload, Web 2.0 data is "Semi-Structured" by way of it's dominant data containers ((X)HTML, RSS, Atom documents and data streams etc.) lacking semantics that formally expose individual data items as distinct entities, endowed with unambiguous naming / identification, descriptive attributes (a type of property/predicate), and relationships (a type of property/predicate).

Solution:

Devise a standard for Structured Data Semantics that is compatible with the Web Information BUS.

Produce structured data (entities, entity types, entity relationships) from Web 1.0 and Web 2.0 resources that already exists on the Web such that individual entities, their attributes, and relationships are accessible and discernible to software agents (machines).

Once the entities are individually exposed, the next requirement is a mechanism for selective access to these entities i.e. a query language.

Semantic Data Web Technologies that facilitate the solution described above include:

Structured Data Standards:
    RDF - Data Model for structured data
    RDF/XML - A serialization format for RDF based structured data
    N3 / Turtle - more human friendly serialization formats for RDF based structured data
Entity Exposure & Generation:
    GRDDL - enables association between XHTML pages and XSLT stylesheets that facilitates loosely coupled "on the fly" extraction of RDF from non RDF documents
    RDFa - enables document publishers or viewers (i.e those repurposing or annotating) to embed structured data into existing XHTML documents
    eRDF - another option for embedding structured RDF data within (X)HTML documents
    RDF Middleware - typically incorporating GRDDL, RDFa, eRDF, and custom extraction and mapping as part of a structured data production pipeline
. Entity Naming & Identification:

Use of URIs or IRIs for uniquely identifying physical (HTML Documents, Image Files, Multimedia Files etc..) and abstract (People, Places, Music, and other abstract things).

Entity Access & Querying:

    SPARQL Query Language - the SQL analog of the Semantic Data Web that enables query constructs that target named entities, entity attributes, and entity relationships

The Semantic Data Web's value to Organizations

Problem:

Organizations are rife with a plethora of business systems that are built atop a myriad of database engines, sourced from a variety of DBMS vendors. A typical organization would have a different database engine, from a specific DBMS vendor, underlying critical business applications such as: Human Resource Management (HR), Customer Relationship Management (CRM), Accounting, Supply Chain Management etc. In a nutshell, you have DBMS Engines, and DBMS Schema heterogeneity permeating the IT infrastructure of organizations on a global scale, making Data & Information Integration the biggest headache across all IT driven organizations.

Solution:

Alleviation of the pain (costs) associated with Data & Information Integration.

Semantic Data Web offerings:

A dexterous data model (RDF) that enables the construction of conceptual views of disparate data sources across an organization based on existing web architecture components such as HTTP and URIs.

Existing middleware solutions that facilitate the exposure of SQL DBMS data as RDF based Structured Data include:

BTW - There is an upcoming W3C Workshop covering the integration of SQL and RDF data.

Conclusion

The Semantic Data Web is here, it's value delivery vehicle is the URI. The URI is a conduit to Interlinked Structured Data (RDF based Linked Data) derived from existing data sources on the World Wide Web alongside data continuously injected into the Web by organizations world wide. Ironically, the Semantic Data Web only platform that crystallizes the: Information at Your Fingertips vision, without development environment, operating system, application, or database lock-in. You simply click on a Linked Data URI and the serendipitous exploration and discovery of data commences.

The unobtrusive emergence of the Semantic Data Web is a reflection of the soundness of the underlying Semantic Web vision.

If you are excited about Mash-ups then your are a Semantic Web enthusiast and benefactor in the making, because you only "Mash" (brute force data extraction and interlinking) because you can't "Mesh" (natural data extraction and interlinking). Likewise, if you are a social-networking, open social-graph, or portable social-network enthusiast, then you are also a Semantic Data Web benefactor and enthusiasts, because your "values" (yes, the values associated with the properties that define you e.g your interests etc) are the fundamental basis for portable, open, social-networking, which is what the Semantic Data Web hands to you on a platter without compromise (i.e. data lock-in or loss of data ownership).

Some practical examples of Semantic Data Web prowess:
    DBpedia (*note: I deliberately use DBpedia URIs in my posts where I would otherwise have used a Wikipedia article URI*)

RDF Browser View of My Hyperdata & Linked Data Post

Bearing in mind we are all time challenged, here are links to OpenLink and Zitgist RDF Browser views of my earlier blog post re. Hyperdata & Linked Data.

Both browsers should lead you to the posts from Danny, Nova, and Tim. In both cases the URI < xmlns="http" www.openlinksw.com="www.openlinksw.com" dataspace="dataspace" kidehen="kidehen" openlinksw.com="openlinksw.com" weblog="weblog" s="s" blog="blog" b127="b127" d="d"> is a pointer to structured data (in my Blog Data Space) if your user agent (browser or other Web Client) requests an RDF representation of this post via its HTTP request payload (what the Browser are doing via the "Accept:" headers).</>

As you can see the Data Web is actually here! Without RDF generation upheaval (or Tax).

Web of Linked Data & Hyperdata

I've just read the extensive post by Nova Spivack titled: The Semantic Web, Collective Intelligence and Hyperdata, courtesy of a post by Danny Ayres titled: Confused about the Semantic Web , in response to a post by Tim O'Reilly titled: Economist Confused About the Semantic Web? .

My Comments:

Hyperdata is short for HyperLinked Data :-) The same applies to Linked Data. Thus, we have two literal labels for the same core Concept. HTTP is the enabling protocol for "Hyper-linking" Documents and associated Structured Data via the World Wide Web (Web for short). Data Links associated with Structured Data contained in, or hosted by, Documents on the Web.

RDFa, eRDF, GRDDL, SPARQL Query Language, SPARQL Protocol (SOAP or REST service), SPARQL Results Serializations (XML or JSON) collectively provide a myriad of unobtrusive routes to structured data embedded within, or associated with, existing Web Documents.

As Danny already states, ontologies are not prerequisites for producing structured data using the RDF Data Model. They simply aid the ability to express one's self clearly (i.e. no repetition or ambiguity) across a broad audience of machines (directly) and their human masters (indirectly).

Using the crux of this post as the anecdote: The Semantic Data Web would simplify the process of claiming and/or proving that Linked Data and Hyperdata describe the same concept. It achieves this by using Triples (Subject, Predicate, Object) expressed in various forms (N3, Turtle, RDF/XML etc.) to formalize claims in a form palatable to electronic agents (machines) operating on behalf of Humans. In a nutshell, this increases human productive by completely obliterates the erstwhile exponential costs of discovering data, information, and knowledge.

BTW - for full effect, view this post (i.e. cut and paste the Permalink URI of this post, below) into an RDF Browser such as:

Yet Another RDFa Demo

Ivan Herman just posted another nice example of practical RDFa usage in a blog post titled: Yet Another RDFa Proccessor. In his post, Ivan exposes a URI for his FOAF-in-RDFa file.

Since I am aggressively tracking RDFa developments, I decided to quickly view Ivan's FOAF-in-RDFa file via the OpenLink RDF Browser. The full implications are best understood when you click on each of the Browser's Tabs -- each providing a different perspective on this interesting addition to the Semantic Data Web (note: the Fresnel Tab which demonstrates declarative UI templating using N3).

What's Going on Here?

The OpenLink RDF Browser is a Rich Internet Application built using OAT (OpenLink Ajax Toolkit). In my case, I am deploying the RDF Browser from a Virtuoso instance, which implies that the Browser is able to use the Virtuoso Sponger Middleware (exposed as a REST Service at the Virtuoso instance endpoint: /proxy); which includes an RDFa Cartridge comprised of a metadata extractor and an RDF Schema / OWL Ontology mapper. That's it!