Finally! That's all I can say re. Freebase :-) They've now plugged their database and their community driven data curation efforts into the burgeoning Linked DataWeb.
Here are some examples of how we distill Entities (People, Places, Music, and other things) from Freebase (X)HTML pages (meaning: we don't have to start from RDF information resources as data sources for the eventual RDF Linked Data we generate):
Tip: Install our OpenLink Data Explorer extension for Firefox. Once installed, simply browse through Freebase, and whenever you encounter a page about something of interest, simply use the following sequences to distill (via the Page Description feature) the entities from the page you are reading:
As you explore the Linked Data graph exposed via our product portfolio, I expect you to experience, or at least spot, the virtuous potential of high SDQ (Serendipitous Discovery Quotient) courtesy of Linked Data, which is Web 3.0's answer to SEO. For instance, how Database, Operating System, and Processor family paths in the product portfolio graph (data network) unveil a lot more about OpenLink Software than meets the proverbial "eye" :-)
Runtime hosting is functionality realm of Virtuoso that is sometimes easily overlooked. In this post I want to provide a simple no-hassles HOWTO guide for installing Virtuoso on Windows (32 or 64 Bit), Mac OS X (Universal or Native 64 Bit), and Linux (32 or 64 Bit). The installation guide also covers the instantiation of phpBB3 as verification of the Virtuoso hosted PHP 3.5 runtime.
What are the benefits of PHP Runtime Hosting?
Like Apache, Virtuoso is a bona-fide WebApplication Server for PHP based applications. Unlike Apache, Virtuoso is also the following:
a Hybrid Native DBMS Engine (Relational, RDF-Graph, and Document models) that is accessible via industry standard interfaces (solely)
a Virtual DBMS or Master Data Manager (MDM) that virtualizes heterogeneous data sources (ODBC, JDBC, Web Services, Hypermedia Resources, Non Hypermedia Resources)
an RDF Middleware solution for RDF-zation of non RDF resources across the Web and enterprise Intranets and/or Extranets (in the form of Cartridges for data exposed via REST or SOA oriented SOAP interfaces)
an RDF Linked Data Server (meaning it can deploy RDF Linked Data based on its native and/or virtualized data)
As result of the above, when you deploy a PHP application using Virtuoso, you inherit the following benefits:
Use of PHP-iODBC for in-process communication with Virtuoso
Easy generation of RDF Linked Data Views atop the SQL schemas of PHP applications
Easy deployment of RDF Linked Data from virtualized data sources
Less LAMP monoculture (*there is no such thing as virtuous monoculture*) when dealing with PHP based Web applications.
As indicated in prior posts, producing RDF Linked Data from the existing Web, where a lot of content is deployed by PHP based content managers, should simply come down to RDF Views over the SQL Schemas and deployment / publishing of the RDF Views in RDF Linked data form. In a nutshell, this is what Virtuoso delivers via its PHP runtime hosting and pre packaged VADs (Virtuoso Application Distribution packages), for popular PHP based applications such as: phpBB3, Drupal, WordPress, and MediaWiki.
In addition, to the RDF Linked Data deployment, we've also taken the traditional LAMP installation tedium out of the typical PHP application deployment process. For instance, you don't have to rebuild PHP 3.5 (32 or 64 Bit) on Windows, Mac OS X, or Linux to get going, simply install Virtuoso, and then select a VAD package for the relevant application and you're set. If the application of choice isn't pre packaged by us, simply install as you would when using Apache, which comes dow to situating the PHP files in your Web structure under the Web Application's root directory.
Installation Guide
Download the Virtuoso installer for Windows (32 Bit msi file or 64 Bit msi file), Mac OS X (Universal Binary dmg file), or instantiate the Virtuoso EC2 AMI (*search for pattern: "Virtuoso when using the Firefox extension for EC2 as the AMI ID is currently: ami-7c31d515 and name: virtuoso-test/virtuoso-cloud-beta-9-i386.manifest.xml, for latest cut*)
Run the installer (or download the movies using the links in the related section below)
Go to the Virtuoso Conductor (*which will show up at the end of the installation process* or go to http://localhost:8890/conductor)
Go to the "Admin" tab within the (X)HTML based UI and select the "Packages" sub-menu item (a Tab)
Pick phpBB3 (or any other pre-packaged PHP app) and then click on "Install/Upgrase"
The watch one of my silent movies or read the initial startup guides for Virtuoso hosted phpBB3, Drupal, Wordpress, MediaWiki.
Related
At the current time, I've only provided links to ZIP files containing the Virtuoso installation "silent movies". This approach is a short-term solution to some of my current movie publishing challenges re. YouTube and Vimeo -- where the compressed output hasn't been of acceptable visual quality. Once resolved, I will publish much more "Multimedia Web" friendly movies :-)
Typically, Orri's post are targeted at the hard core RDF and SQL DBMS audiences, but in this particular post, he shoots straight at the business community revealing "Opportunity Cost" containment as the invisible driver behind the business aspects of any market inflection.
Remember, the Web isn't ubiquitous because its users mastered the mechanics and virtues of HTML and/or HTTP. Web ubiquity is a function of the opportunity cost of not being on the Web, courtesy of the network effects of hyperlinked documents -- i.e., the instant gratification of traversing documents on the Web via a single click action. In similar fashion, the Linked Data Web's ubiquity will simply come down to the opportunity cost of not being "inside the Web", courtesy of the network effects of hyperlinked entities (documents, people, music, books, and other "Things").
Here are some excerpts from Orri's post:
Every time there is a major shift in technology, this shift needs to be motivated by addressing a new class of problem. This means doing something that could not be done before. The last time this happened was when the relational database became the dominant IT technology. At that time, the questions involved putting the enterprise in the database and building a cluster of line of business applications around the database. The argument for the RDBMS was that you did not have to constrain the set of queries that might later be made, when designing the database. In other words, it was making things more ad hoc. This was opposed then on grounds of being less efficient than the hierarchical and network databases which the relational eventually replaced.
Today, the point of the Data Web is that you do not have to constrain what your data can join or integrate with, when you design your database. The counter-argument is that this is slow and geeky and not scalable. See the similarity?
A difference is that we are not specifically aiming at replacing the RDBMS. In fact, if you know exactly what you will query and have a well defined workload, a relational representation optimized for the workload will give you about 10x the performance of the equivalent RDF warehouse. OLTP remains a relational-only domain.
However, when we are talking about doing queries and analytics against the Web, or even against more than a handful of relational systems, the things which make RDBMS good become problematic.
If we think about Web 1.0 as a period where the distinguishing noun was: "Author", and Web 2.0 the noun: "Journalist", we should be able to see that what comes next is the noun: "Analyst". This new generation analyst would be equipped with de-referencable Web Identity courtesy of their Person EntityURI. The analyst's URI would also be the critical component of Web based low cost attribution ecosystem; one that ultimately turns the URI into the analyst's brand emblem / imprint.
You will control your data in the Web 3.0 realm. If somehow this remains somewhat incomprehensible and nebulous (as is typical in this emerging realm) then simply think about this as: The Magic of You!
Remember, "You" was the Times person of the year as an acknowledgement of the Web 2.0 phenomenon, and maybe this time next year it would simply be the "Magic of Being You" that's the person of the year :-)
Web 3.0 brings databasing to the Web (as a feature). The single most important action item at this stage is the act of creating a record for yourself, in this new distributed database held together by an HTTP based Network (e.g., the World Wide Web).
Now, if re-labeling can confuse me when applied to a realm I've been intimately involved with for eons (internet time). I don't want to imagine what it does for others who aren't that intimately involved with the important data access and data integration realms.
On the more refreshing side, the article does shed some light on the potency of RDF and OWL when applied to the construction of conceptual views of heterogeneous data sources.
"How do you know that data coming from one place calculates net revenue the same way that data coming from another place does? You’ve got people using the same term for different things and different terms for the same things. How do you reconcile all of that? That’s really what semantic integration is about."
BTW - I discovered this article via another titled: Understanding Integration And How It Can Help with SOA, that covers SOA and Integration matters. Again, in this piece I feel the gradual realization of the virtues that RDF, OWL, and RDF Linked Data bring to bear in the vital realm of data integration across heterogeneous data silos.
Conclusion
A number of events, at the micro and macro economic levels, are forcing attention back to the issue of productive use of existing IT resources. The trouble with the aforementioned quest is that it ultimately unveils the global IT affliction known as: heterogeneous data silos, and the challenges of pain alleviation, that have been ignored forever or approached inadequately as clearly shown by the rapid build up of SOA horror stories in the data integration realm.
Data Integration via conceptualization of heterogenous data sources, that result in concrete conceptual layer data access and management, remains the greatest and most potent application of technologies associated with the "Semantic Web" and/or "Linked Data" monikers.
As more Linked Data is injected into the Web from the Linking Open Data community and other initiatives, it's important to note that "Linked Data" is available in a variety of forms such as:
Data Model Definition oriented Linked Data (aka. Data Dictionary)
Data Model Instance Data (aka. Instance Data)
Linked Data oriented solutions that leverage the smart data substrate that Models and Instance Data meshes deliver.
Note: The common glue across the different types of Linked Data remains the commitment to data object (entity) identification and access via de-referencable URIs (aka. record / entity level data source names).
As stated in my recent post titled: Semantic Web: Travails to Harmony Illustrated. Harmonious intersections of instance data, data dictionaries (schemas, ontologies, rules etc.) provide a powerful substrate (smart data) for the development and deployment of "People" and/or "Machine" oriented solutions. Of course, others have commented on these matters and expressed similar views (see related section below).
The clickable venn diagram below, provides a simple exploration path that exposes the linkage that already exists, across the different Linked Data types, within the burgeoning Linked Data Web.
Now that the virtues of dynamic generation of RDF based Linked Data are becoming clearer, I guess it's time to unveil the VirtuosoSponger driven Dynamic Linked Data constellation diagram.
Our diagram depicts the myriad of data sources from which RDF Linked Data is generated "on the fly" via our data source specific RDF-zation cartridges/drivers. It also unveils how the sponger leverages the Linked Data constellations of UMBEL, DBpedia, Bio2Rdf, and others for lookups.
RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.
If the RDF generated, results in an entity-to-entity level network (graph) in which each entity is endowed with a de-referencable HTTP based ID (a URI), we end up with an enhancement to the Web that adds Hyperdata linking across extracted entities, to the existing Hypertext based Web of linked documents (pages, images, and other information resource types). Thus, I can use the same URL linking mechanism to reference a broader range of "Things" i.e., documents, things that documents are about, or things loosely associated with documents.
The VirtuosoSponger is an example of an RDF Middleware solution from OpenLink Software. It's an in-built component of the Virtuoso Universal Server, and deployable in many forms e.g., Software as Service (SaaS) or traditional software installation. It delivers RDF-ization services via a collection of Web information resource specific Cartridges/Providers/Drivers covering Wikipedia, Freebase, CrunchBase, WikiCompany, OpenLibrary, Digg, eBay, Amazon, RSS/Atom/OPML feed sources, XBRL, and many more.
RDF-ization alone doesn't ensure valuable RDF based Linked Data on the Web. The process of producing RDF Linked Data is ultimately about the art of effectively describing resources with an eye for context.
Linked Data Cloud Lookups (e.g., perform UMBEL lookup to add "isAbout" fidelity to graph and then lookup DBpedia and other LOD instance data enclaves for Identical individuals and connect via "owl:sameAs")
RDF Linked Data Graph projection that uses the description of the container information resource to expose the URIs of the distilled entities.
The animation that follows illustrates the process (5,000 feet view), from grabbing resources via HTTP GET, to injecting RDF Linked Data back into the Web cloud:
Note: the Shredder is a Generic Cartridge, so you would have one of these per data source type (information resource type).
Generation of RDF Linked Data Views of SQL, XML, and Web Services in general
Deployment of RDF Linked Data
"On the Fly" generation of RDF Linked Data from Document Web information resources (i.e. distillation of entities from their containers e.g. Web pages) via Cartridges / Drivers
SPARQL extensions that bring SPARQL closer to SQL e.g Aggregates, Update, Insert, Delete
Named Graph support (i.e. use of logical names to partition RDF data within Virtuoso's multi-model dbms engine)
Inference Engine (currently in use re. DBpedia via Yago and UMBEL)
Binds all your data sources (blogs, wikis, bookmarks, photos, calendar items etc. ) to your URI so can "Find" things by only remembering your URI
Makes your profile page and personal URI the focal point of Linked Data Web presence
Delivers Data Portability (using data access by value or data access by reference) across data silos (e.g. Web 2.0 style social networks)
Allows you make annotations about anything in your own Data Space(s) on the Web without exposure to RDF markup
A Briefcase feature that provides a WebDAV driven RDF Linked Data variant of functionality seen in Mac OS X Spotlight and WinFS with the addition of SPARQL compliance
Blog, Wiki, WebDAV File Server, Shared Bookmarks, Calendar, and other applications that look and feel like Web 2.0 counterparts but emitt RDF Linked Data amongst a plethora of data exchange formats
Available as an EC2 AMI
etc..
OpenLink Ajax Toolkit functionality summary:
Provides binding to SQL, RDF, XML, and Web Services via Ajax Database Connectivity Layer (you only need an ODBC, JDBC, OLE-DB, ADO.NET, XMLA Driver, or Web Service on the backend for dynamic data access from Javascript)
All controls are Ajax Database Connectivity bound (widgets get their data from Ajax Database Connectivity data sources)
Recent Comments