Kingsley Idehen's Typepad

The URI, URL, and Linked Data Meme's Generic HTTP URI (Updated)

Situation Analysis

As the "Linked Data" meme has gained momentum you've more than likely been on the receiving end of dialog with Linked Open Data community members (myself included) that goes something like this:

"Do you have a URI", "Get yourself a URI", "Give me a de-referencable URI" etc..

And each time, you respond with a URL -- which to the best of your Web knowledge is a bona fide URI. But to your utter confusion you are told: Nah! You gave me a Document URI instead of the URI of a real-world thing or object etc..

What's up with that?

Well our everyday use of the Web is an unfortunate conflation of two distinct things, which have Identity: Real World Objects (RWOs) & Address/Location of Documents (Information bearing Resources).

The "Linked Data" meme is about enhancing the Web by unobtrusively reintroducing its core essence: the generic HTTP URI, a vital piece of Web Architecture DNA. Basically, its about so realizing the full capabilities of the Web as a platform for Open Data Identification, Definition, Access, Storage, Representation, Presentation, and Integration.

What is a Real World Object?

People, Places, Music, Books, Cars, Ideas, Emotions etc..

What is a URI?

A Uniform Resource Identifier. A global identifier mechanism for network addressable data items. Its sole function is Name oriented Identification.

URI Generic Syntax

The constituent parts of a URI (from URI Generic Syntax RFC) are depicted below:

What is a URL?

A location oriented HTTP scheme based URI. The HTTP scheme introduces a powerful and inherent duality that delivers:

  1. Resource Address/Location Identifier
  2. Data Access mechanism for an Information bearing Resource (Document, File etc..)

So far so good!

What is an HTTP based URI?

The kind of URI Linked Data aficionados mean when they use the term: URI.

An HTTP URI is an HTTP scheme based URI. Unlike a URL, this kind of HTTP scheme URI is devoid of any Web Location orientation or specificity. Thus, Its inherent duality provides a more powerful level of abstraction. Hence, you can use this form of URI to assign Names/Identifiers to Real World Objects (RWO). Even better, courtesy of the Identity/Address duality of the HTTP scheme, a single URI can deliver the following:

  1. RWO Identfier/Name
  2. RWO Metadata document Locator (courtesy of URL aspect)
  3. Negotiable Representation of the Located Document (courtesy of HTTP's content negotiation feature).

What is Metadata?

Data about Data. Put differently, data that describes other data in a structured manner.

How Do we Model Metadata?

The predominant model for metadata is the Entity-Attribute-Value + Classes & Relationships model (EAV/CR). A model that's been with us since the inception of modern computing (long before the Web).

What about RDF?

The Resource Description Framework (RDF) is a framework for describing Web addressable resources. In a nutshell, its a framework for adding Metadata bearing Information Resources to the current Web. Its comprised of:

  1. Entity-Attribute-Value (aka. Subject-Predictate-Object) plus Classes & Relationships (Data Dictionaries e.g., OWL) metadata model
  2. A plethora of instance data representation formats that include: RDFa (when doing so within (X)HTML docs), Turtle, N3, TriX, RDF/XML etc.

What's the Problem Today?

The ubiquitous use of the Web is primarily focused on a Linked Mesh of Information bearing Documents. URLs rather than generic HTTP URIs are the prime mechanism for Web tapestry; basically, we use URLs to conduct Information -- which is inherently subjective -- instead of using HTTP URIs to conduct "Raw Data" -- which is inherently objective.

Note: Information is "data in context", it isn't the same thing as "Raw Data". Thus, if we can link to Information via the Web, why shouldn't we be able to do the same for "Raw Data"?


How Does the Link Data meme solve the problem?

The meme simply provides a set of guidelines (best practices) for producing Web architecture friendly metadata. Meaning: when producing EAV/CR model based metadata, endow Subjects, their Attributes, and Attribute Values (optionally) with HTTP URIs. By doing so, a new level of Link Abstraction on the Web is possible i.e., "Data Item to Data Item" level links (aka hyperdata links). Even better, when you de-reference a RWO hyperdata link you end up with a negotiated representations of its metadata.

Conclusion

Linked Data is ultimately about an HTTP URI for each item in the Data Organization Hierarchy :-)

Related

  1. History of how "Resource" became part of URI - historic account by TimBL
  2. Linked Data Design Issues Document - TimBL's initial Linked Data Guide
  3. Linked Data Rules Simplified - My attempt at simplifying the Linked Data Meme without SPARQL & RDF distraction
  4. Linked Data & Identity - another related post
  5. The Linked Data Meme's Value Proposition
  6. My Del.icio.us hosted Bookmark Data Space for Identity Schemes
  7. TimBL's Ted Talk re. "Raw Linked Data".

rdfxmllinked_datasemanticwebsparqlhistorysemantic_webDataSpace

02:34 PM | Permalink | Comments (0) | TrackBack (0)

Exploring the Value Proposition of Linked Data

What is Linked Data?

The primary topic of a meme penned by TimBL in the form of a Design Issues Doc (note: this is how TimBL has shared his thoughts since the Beginning of the Web).

There are a number of dimensions to the meme, but its primary purpose is the reintroduction of the HTTP URI -- a vital component of the Web's core architecture.

What's Special about HTTP URIs?

They possess an intrinsic duality that combines persistent and unambiguous Data Identity with platform & representation format independent Data Access. Thus, you can use a string of characters that look like a contemporary Web URL to unambiguously achieve the following:

  1. Identity or Name Anything of Interest
  2. Describe Anything of Interest by associating the Description Subject's Identity with a constellation of Attribute and Value pairs (technically: an Entity-Attribute-Value or Subject-Predicate-Object graph)
  3. Make the Description of Named Things of Interest discoverable on the Web by implicitly binding the aforementioned to Documents that hold their descriptions (technically: metadata documents or information resources)

What's the basic value proposition of the Linked Data meme?

Enabling more productive use of the Web by users and developers alike. All of which is achieved by tweaking the Web's Hyperlinking feature such that it now includes Hypertext and Hyperdata as link types.

Note: Hyperdata Linking is simply what an HTTP URI facilitates.

Examples problems solved by injecting Linked Data into the Web:

  1. Federated Identity by enabling Individuals to unambiguously Identify themselves (Profiles++) courtesy of existing Internet and Web protocols (e.g., FOAF+SSL's WebIDs which combine Personal Identity with X.509 certificates and HTTPs based client side certification)
  2. Security and Privacy challenge alleviation by delivering a mechanism for policy based data access that feeds off federated individual identity and social network (graph) traversal
  3. Spam Busting via the above
  4. .
  5. Increasing the Serendipitous Discovery Quotient (SDQ) of Web accessible resources by embedding Rich Metadata into (X)HTML Documents e.g., structured descriptions of your "WishLists" and "OfferLists" via a common set of terms offered by vocabularies such as GoodRelations and SIOC
  6. Coherent integration of disparate data across the Web and/or within the Enterprise via "Data Meshing" rather than "Data Mashing"
  7. Moving beyond imprecise statistically driven "Keyword Search" (e.g. Page Rank) to "Precision Find" driven by typed link based Entity Rank plus Entity Type and Entity Property filters.

Conclusion

If all of the above still falls into the technical mumbo-jumbo realm, then simply consider Linked Data as delivering Open Data Access in granular form to Web accessible data -- that goes beyond data containers (documents or files).

The value proposition of Linked Data is inextricably linked to the value proposition of the World Wide Web. This is true, because the Linked Data meme is ultimately about an enhancement of the current Web; achieved by reintroducing its architectural essence -- in new context -- via a new level of link abstraction, courtesy of the Identity and Access duality of HTTP URIs.

As a result of Linked Data, you can now have Links on the Web for a Person, Document, Music, Consumer Electronics, Products & Services, Business Opening & Closing Hours, Personal "WishLists" and "OfferList", an Idea, etc.. in addition to links for Properties (Attributes & Values) of the aforementioned. Ultimately, all of these links will be indexed in a myriad of ways providing the substrate for the next major period of Internet & Web driven innovation, within our larger human-ingenuity driven innovation continuum.

Related

  • Recipes for Describing Your Business and its Offerings using the GoodRelations Vocabulary / Schema
  • Solving Real Problems with RDF based Linked Data
  • Other Linked Data Posts from this Blog oriented Linked Data Space (goes back a few years!)
  • Various practical Linked Data demo links from my Del.icio.us Bookmark oriented Data Space
  • My personal WebID which is conduit to a Linked Data mesh covering vast variety of things I've opted to share with others via the Web (best viewed using a Linked Data aware User Agent like ODE).

rdflinked_datasemanticwebfoafsiocsocialnetworkingDataSpace

08:17 PM | Permalink | Comments (0) | TrackBack (0)

Important Things to Note about the World Wide Web

Based on the prevalence of confusion re. the Linked Data meme, here are a few important points to remember about the World Wide Web.

  1. Its an HTTP based Network Cluster within the Internet (remember: Networks are about meshes of Nodes connected by Links)
  2. Its underlying data model is that of a Network (we've had Network Data models for eons. EAV/CR is an example)
  3. Links are facilitated via URIs
  4. Until recently the granularity of Networking on the Web was scoped to Data Containers (documents) (due to prevalence of URL style links
  5. The Linked Data meme adds Data Item (Datum) level granularity to World Wide Web networking via HTTP URIs
  6. Data Items become Web Reference-able when you Identify/Name them using HTTP based URIs
  7. An HTTP URI implicitly binds a Web Reference-able Data Item (Entity, Datum, Data Object, Resource) to its Web Accessible Metadata
  8. Web Accessible Metadata resides within Data Containers (documents or information resources)
  9. The representation of a Web Accessible Metadata container is negotiable
  10. I am able to write and dispatch this blog post courtesy of the Web features listed above
  11. You are able to explore the many dimensions to data exposed by this blog should you decide to explore the Linked Data mesh exposed by this post's HTTP URI (via its permalink permalink)

The HTTP URI is the secret sauce of the Web that is powerfully and unobtrusively reintroduced via the Linked Data meme (classic back to the future act). This powerful sauce possess a unique power courtesy of its inherent duality i.e., how it uniquely combines Data Item Identity (think keys in traditional DBMS parlance) with Data Access (e.g. access to negotiable representations of associated metadata).

As you can see, I've made no mention of RDF or SPARQL, and I can still articulate the inherent value of the "Linked Data" dimension that the "Linked Data" meme adds to the World Wide Web.

As per usual this post is a live demonstration of Linked Data (dog-food style) :-)

Related

  • Greg Boutin's post about Linked Data Brand Management
  • Ian Davis' "Linked Data Brand" post
  • Paul Miller's "Does Linked Data need RDF" post

rdflinked_datasemanticwebsparql

09:27 AM | Permalink | Comments (0) | TrackBack (0)

Linked Data Rules Simplified

As a compliment to the most recent Linked Data Design Issues note by TimBL, I would like to add this subtle tweak to the enumerated rules:

  1. Identify or Name things using HTTP URIs
  2. Describe things using the RDF metadata model
  3. Increase link data mesh density on the Web by linking (referring) to things in other data spaces using their HTTP URIs.

If you perform the steps above, on any HTTP network (e.g. World Wide Web), you implicitly bind the Names/Identifiers of things to negotiable representations of their metadata (description) bearing documents.

Also note, you can create and deploy the resulting RDF metadata using any of the following approaches:

  1. RDFa within (X)HTML documents
  2. N3, Turtle, TriX, RDF/XML etc. based documents
  3. Programmatically generated variants of 1&2.

Related

  • What is the Linked Data meme about?
  • Simple Explanation of RDF and Linked Data Dynamics
rdfxmllinked_datasemanticwebDataSpace

10:49 AM | Permalink | Comments (0) | TrackBack (0)

BBC Linked Data Meshup In 3 Steps

Situation Analysis:

Dr. Dre is one of the artists in the Linked Data Space we host for the BBC. He is also referenced in music oriented data spaces such as DBpedia, MusicBrainz and Last.FM (to name a few).

Challenge:

How do I obtain a holistic view of the entity "Dr. Dre" across the BBC, MusicBrainz, and Last.FM data spaces? We know the BBC published Linked Data, but what about Last.FM and MusicBrainz? Both of these data spaces only expose XML or JSON data via REST APIs?

Solution:

Simple 3 step Linked Data Meshup courtesy of Virtuoso's in-built RDFizer Middleware "the Sponger" (think ODBC Driver Manager for the Linked Data Web) and its numerous Cartridges (think ODBC Drivers for the Linked Data Web).

Steps:

  1. Go to Last.FM and search using pattern: Dr. Dre (you will end up with this URL: http://www.last.fm/music/Dr.+Dre)
  2. Go to the Virtuoso powered BBC Linked Data Space home page and enter: http://bbc.openlinksw.com/about/html/http://www.last.fm/music/Dr.+Dre
  3. Go to the BBC Linked Data Space home page and type full text pattern (using default tab): Dr. Dre, then view Dr. Dre's metadata via the Statistics Link.

What Happened?

The following took place:

  1. Virtuoso Sponger sent an HTTP GET to Last.FM
  2. Distilled the "Artist" entity "Dr. Dre" from the page, and made a Linked Data graph
  3. Inverse Functional Property and sameAs reasoning handled the Meshup (augmented graph from a conjunctive query processing pipeline)
  4. Links for "Dr. Dre" across BBC (sameAs), Last.FM (seeAlso), via DBpedia URI.

The new enhanced URI for Dr. Dre now provides a rich holistic view of the aforementioned "Artist" entity. This URI is usable anywhere on the Web for Linked Data Conduction :-)


Related (as in NearBy)




  • Augmenting Last.fm Data with BBC data on the Talis Platform

rdfxmlodbcsqllinked_datasemanticwebweb30virtuosoDataSpace

02:09 PM | Permalink | Comments (0) | TrackBack (0)

Understanding the BBC's Virtuoso Powered Linked Data Space

The BBC's recently announced Linked Data space for Programmes and Music data, joins a growing list of immediately useful "Virtuoso Powered" linked data spaces, driving the burgeoning Web of Linked Data. Others include: DBpedia, Bio2RDF, NeuroCommons etc (the click friendly version of the LOD-Cloud diagram reveals a snapshot of other Virtuoso driven linked data spaces).

Why is it important?

As a leading media organization, the BBC's use of Linked Data provides a clear beacon to other media players re. the imminence of a serious Linked Data induced sector inflection. In a nutshell, every Web Site has to evolve into a Linked Data Space: a location on the Web that provides granular access to discrete data items in line with the core principles of the Linked Data meme.

Remember, the essence of the Linked Data meme is simply this: you reference data items and access their metadata, in variety of formats via a single HTTP based URI. This approach to Web data publishing is compatible with any HTTP aware user agent (e.g., your Web Browser or tools & applications that provide abstracted access to HTTP).

How Do I use it?

There a number of very powerful things available to end-users and developers alike.

End-Users:

The most powerful feature of our variant of the BBC's Linked Data Space is the exposure of Faceted Find (think Search++ and beyond). Thus, you can go the the home page of the service and commence data discovery and exploration via any of the following interfaces:

  • Full Text Search Tab -- type in a full text pattern and then experience Linked Data Entity Ranking as opposed to Page Ranking
  • URI Lookup (By Label) Tab -- type in part of a URI and let the system auto-complete by looking up Entity Labels
  • URI Lookup (Raw String Pattern) Tab -- type in part of a URI and let the system auto-complete by looking up the raw URI
  • OpenLink Data Explorer Service -- "deceptively simple" Linked Data explorer and Data Mesher (simply type in a URI or Text pattern, then view the data via a myriad of entity type specific viewer tabs).

Once you are comfortable with at least one of the items above, you can exploit the system further by performing any of the following:

  • Explore the Linked Data Space via Data Dictionary -- click on a Named Data Set URI and then explore Class instances (rdf:type property values)
  • Explore Entity Metadata -- currently labeled "Statistics" but really is "Metadata" that describes data about an Entity (how you discern identifier co-reference, indirect identifiers, references from other data sets, and provenance/source graphs).

Information Architects & Developers

  • Bare bones SPARQL Endpoint -- usable by SPARQL aware user agents
  • SPARQL Query Tool -- type in SPARQL and interact with result pages that enable URI navigation (de-referencing)
  • iSPARQL Query By Example -- paint your SPARQL Query and Learn SPARQL by Example (just take defaults and then click "OK" to get in)
  • Virtuoso Facets API - REST API for Faceted Browsing & Navigation across Linked Data Set Dimensions.


Disambiguated Search (aka. Search++ or Find)



In line with the time-tested "embrace and extend" pattern, we provide Full Text search capability, but unlike Google, Yahoo!, Bing and other search engines, we don't use use "Page Rank" algorithm to sort results; instead, we use an "Entity Rank" algorithm since we are dealing with an RDF based Graph model DBMS where links exist between entities across instance data and data dictionary (vocabularies, schemas, ontologies) boundaries. In addition, when you get results (by clicking "show values" or "show values with distinct counts") that list entities associated with a full text search pattern, we take a quantum leap beyond search engines by allowing you to use "Entity Type" and/or "Entity Properties" (all of these have HTTP URIs too) to set your own context for what you seek.

Much more to come in the form of BBC specific demo queries and tutorials :-)

Related

  • Live LOD Cloud Cache instance that combines BBC data with other data sets from the LOD Cloud (in a single Virtuoso RDF DBMS hosting 5 Billion+ triples & counting)
rdflinked_datasemanticwebsparqlopenlinkvirtuosoDataSpace

05:59 PM | Permalink | Comments (0) | TrackBack (0)

The Time for RDBMS Primacy Downgrade is Nigh!

As the world works it way through a "once in a generation" economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the apex of the data access and data management pyramid is nigh.

What is the Data Access, and Data Management Value Pyramid?

As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.

Image

The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.

In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.

Why has RDBMS Primacy has Endured?

Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.

Image

For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:

"Future of Database Research is excellent, but what is the future of data?"

"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."

-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.

"One size fits all: A concept whose time has come and gone

  1. They are direct descendants of System R and Ingres and were architected more than 25 years ago
  2. They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.

-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.

Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".

Circumstantial Pain

As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).

Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:

Government (Globally) -

Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.

Enterprises -

Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.

In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.

Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).

Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.

Technology

There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:

  1. Query language standardization - nothing close to SQL standardization
  2. Data Access API standardization - nothing close to ODBC, JDBC, OLE-DB, or ADO.NET
  3. Wire protocol standardization - nothing close to HTTP
  4. Distributed Identity infrastructure - nothing close to the non-repudiatable digital Identity that foaf+ssl accords
  5. Use of Identifiers as network based pointers to data sources - nothing close to RDF based Linked Data
  6. Negotiable data representation - nothing close to Mime and HTTP based Content Negotiation
  7. Scalability especially in the era of Internet & Web scale.

Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models

A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.

What Comes Next?

The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:

  1. The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore e.g., the current global economic crisis remains centered on the inability to connect dots across "Open World" and "Closed World" data frontiers
  2. Entity-Attribute-Value with Classes & Relationships (EAV/CR) based DBMS models are more effective when dealing with disparate data associated with disparate schemas, across disparate DBMS engines, host operating systems, and networks.

Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:

  1. Every item of data (Datum/Entity/Object/Resource) has Identity
  2. Identity is achieved via Identifiers that aren't locked at the DBMS, OS, Network, or Application levels
  3. Object Identifiers and Object values are independent (extricably linked by association)
  4. Object values should be de-referencable via Object Identifier
  5. Representation of de-referenced value graph (entity, attributes, and values mesh) must be negotiable (i.e. content negotiation)
  6. Structured query language must provide mechanism for Creation, Deletion, Updates, and Querying of data objects
  7. Performance & Scalability across "Closed World" (enterprise) and "Open World" (Internet & Web) realms.

Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.

The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.

It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.

EAV/CR Oriented Data Access & Management Technology

Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:

  • Resource Description Framework (RDF) - an EAV/CR based framework
  • RDF Linked Data - EAV/CR based framework that mandates de-referencable HTTP based Identifiers
  • ADO.NET Entity Frameworks - Microsoft .NET based EAV/CR framework
  • Core Data Services - Mac OS X based EAV/CR framework that evolved from NeXT's Enterprise Object Frameworks (EOF).

The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.

Related

  • The Semantic Way - Alan Cho's Summary of PwC 2009 tech forecast report on the Semantic Web
  • Is the RDBMS Doomed - ReadWriteWeb Article
  • Anti-RDBMS: a list of Distributed Key-Value Stores - by Richard Jones (CTO Last.FM)
  • How & Why Glue is Using Amazon SimpleDB
  • Object Database Manifesto (Identity excerpt)
  • Database Models Overview
  • Ted Nelson Explaining Irregularity and Idiosyncrasy of Data Structures - ZigZag Demo
rdfxmloledbingresjdbcsqlodbclinked_datasemanticwebfoafsocialnetworkingDataSpace

06:07 PM | Permalink | Comments (0) | TrackBack (0)

Library of Congress & Reasonable Linked Data

While exploring the Subject Headings Linked Data Space (LCSH) recently unveiled by the Library of Congress, I noticed that the URI for the subject heading: World Wide Web, exposes an "owl:sameAs" link to resource URI: "info:lc/authorities/sh95000541" -- in fact, a URI.URN that isn't HTTP protocol scheme based.

The observations above triggered a discussion thread on Twitter that involved: @edsu, @iand, and moi. Naturally, it morphed into a live demonstration of: human vs machine, interpretation of claims expressed in the RDF graph.

What makes this whole thing interesting?

It showcases (in Man vs Machine style) the issue of unambiguously discerning the meaning of the owl:sameAs claim expressed in the LCSH Linked Data Space.

Perspectives & Potential Confusion

From the Linked Data perspective, it may spook a few people to see owl:sameAs values such as: "info:lc/authorities/sh95000541", that cannot be de-referenced using HTTP.

It may confuse a few people or user agents that see URI de-referencing as not necessarily HTTP specific, thereby attempting to de-reference the URI.URN on the assumption that it's associated with a "handle system", for instance.

It may even confuse RDFizer / RDFization middleware that use owl:sameAs as a data provider attribution mechanism via hint/nudge URI values derived from original content / data URI.URLs that de-reference to nothing e.g., an original resource URI.URL plus "#this" which produces URI.URN-URL -- think of this pattern as "owl:shameAs" in a sense :-)

Unambiguously Discerning Meaning

Simply bring OWL reasoning (inference rules and reasoners) into the mix, thereby negating human dialogue about interpretation which ultimately unveils a mesh of orthogonal view points. Remember, OWL is all about infrastructure that ultimately enables you to express yourself clearly i.e., say what you mean, and mean what you say.

Path to Clarity (using Virtuoso, its in-built Sponger Middleware, and Inference Engine):

  1. GET the data into the Virtuoso Quad store -- what the sponger does via its URIBurner Service (while following designated predicates such as owl:sameAs in case they point to other mesh-able data sources)
  2. Query the data in Quad Store with "owl:sameAs" inference rules enabled
  3. Repeat the last step with the inference rules excluded.

Actual SPARQL Queries:

  • SPARQL Query against the HTTP based Subject Heading URI for WWW
  • SPARQL Query (with reasoning via inference rule for owl:sameAs) against the URN based Subject Heading URI for WWW
  • SPARQL Query (*without* reasoning via inference rule for owl:sameAs) against the URN based Subject Heading URI for WWW

Observations:

The SPARQL queries against the Graph generated and automatically populated by the Sponger reveal -- without human intervention-- that: "info:lc/authorities/sh95000541", is just an alternative name for < xmlns="http" id.loc.gov="id.loc.gov" authorities="authorities" sh95000541="sh95000541" concept="concept">, and that the graph produced by LCSH is self-describing enough for an OWL reasoner to figure this all out courtesy of the owl:sameAs property :-).

Hopefully, this post also provides a simple example of how OWL facilitates "Reasonable Linked Data".

Related

  • State of the Linked Data Web
  • Making Linked Data Reasonable Using Description Logics Series - post by Mike Bergman
rdflinked_datasemanticwebweb30sparqlvirtuosoDataSpace

01:53 PM | Permalink | Comments (1) | TrackBack (0)

Linked Data & Identity

A person, organization, place, idea, subject matter topic/heading, and other real world things possess "identity" -- that is, a constellation of characteristics that distinguish them from any other identity. Associated with this abstraction can be a label used as a reference, or "identifier". This is the distinction between a thing and the name of the thing.

section from IETF's Domain Keys spec. (paraphrased by me)

.

The Linked Data meme is based on the use of HTTP based URIs as reference / identifier labels associated with the "identity abstraction" referred to above. Thus, when you de-reference (request information about) an HTTP based URI you ultimately end up with a resource URL that exposes the "constellation of characteristics" mentioned above, in a representation negotiated at request time -- between an HTTP client and server e.g., (X)HTML, JSON, XML, RDF/XML, N3, Turtle, Trix, others :-)

Related

  • What is the Linked Data meme About?
  • Simple Explanation of RDF & Linked Data Dynamics.
  • Handle -- Internet wide Identity Scheme and Resolution System
rdfxmllinked_datasemanticwebDataSpace

04:05 PM | Permalink | Comments (0) | TrackBack (0)

What is the Linked Data Meme about?

The act of using URIs to "refer to" (reference) Web addressable data objects. It's also the act of using the same URI to de-reference the description of a referenced data object; in this case, the representation of the description is negotiated by a Web client and/or Web server. Thus, you can access the description of a data object via data representation formats such as: JSON, XML, (X)HTML, RDF/XML, N3, Turtle, TriX etc.

Note: In proper Web parlance, a data object is referred to as a resource.

Simple example (using DBpedia)

In the Linked Data realm, If you want to make a reference to the Linked Data meme in a blog post, you are better off using the resource URI: http://dbpedia.org/resource/Linked_Data, instead of the Web page URL: http://dbpedia.org/page/Linked_Data, which is the address of a physical document (an information conveying artifact) that at best visually presents the negotiated representation of a resource description.

Why is this valuable?

In the simplest sense, you only have one focal point for referencing (referring to) and de-referencing (retrieving data about) a given Web resource. It protects you from the impact of Web document location changes (amongst many other things).

Remember, a single URI is a conduit into a realm where the identity, access, representation, presentation, and storage of a resource (data object) are completely distinct. It's the mechanism for conducting data across network, machine, operating system, dbms engine, application, and service (API) boundaries. Thus, without "linked data meme" prescribed URI referencing and de-referencing, we are simply back to "business as usual" re. the industry at large, where networks, operating systems, dbms engines, applications, and services (APIs) become the basis for "data lock-in" and silo construction.

Going forward

Take a second to think about the profound virtues of the ubiquitous Web of Linked Document URLs that we have today, and then apply that thinking to the burgeoning Web of Linked Data URIs, that has just turned corner and heading in everyone's direction at full blast.

Note to "Social Media" players: Who you know isn't the canonical object of sociality. What you are i.e., your description and the data objects it exposes, are real objects of your sociality :-)

Related

  • Other post in this Blog Data Space associated with "Linked Data".
rdfxmllinked_datasemanticwebDataSpace

11:32 AM | Permalink | Comments (0) | TrackBack (0)

Next »

About

Archives

  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008

Recent Comments

  • HR Dissertation on Library of Congress & Reasonable Linked Data
  • cheap computers on SAP, IBM Make Play for Oracle Database Customers With New DB2 Version
  • Ewhemwhg on Virtuoso, PHP 3.5 Runtime Hosting, phpBB3, and Linked Data
  • amily on Virtuoso, PHP 3.5 Runtime Hosting, phpBB3, and Linked Data
  • Dan on How to Explain SOA to Your CIO
  • judy on How to Explain SOA to Your CIO
  • Martin on How to Explain SOA to Your CIO
  • Roy on SAP, IBM Make Play for Oracle Database Customers With New DB2 Version
  • Bush on How to Explain SOA to Your CIO
  • Helga on How to Explain SOA to Your CIO

Recent Posts

  • The URI, URL, and Linked Data Meme's Generic HTTP URI (Updated)
  • Exploring the Value Proposition of Linked Data
  • Important Things to Note about the World Wide Web
  • Linked Data Rules Simplified
  • BBC Linked Data Meshup In 3 Steps
  • Understanding the BBC's Virtuoso Powered Linked Data Space
  • The Time for RDBMS Primacy Downgrade is Nigh!
  • Library of Congress & Reasonable Linked Data
  • Linked Data & Identity
  • What is the Linked Data Meme about?
Subscribe to this blog's feed