Python4Spongers
RDF "Spongers" are a powerful middleware architecture, developed by OpenLink Software (the makers of Virtuoso), for creating RDF rich meta-data on demand.
The key idea is that the middleware consults public APIs or other data sources for collating relevant RDF meta-data for a given URI.
Unfortunately, the development of such sponger components is still difficult for many programmers. On this page, I propose a simple skeleton for coding the core transformation in Python.
This still requires a wrapper so that the code can be used in a Virtuoso environment, but that should be doable.
If you have any questions or suggestions, please contact me at mheppATcomputerDOTorg.
#!/usr/bin/env python # encoding: utf-8 """ py4spongers.py Example of how the principle of OpenLink "Sponger" technology can be implemented in Python Created by Martin Hepp on 2010-03-22. http://www.heppnetz.de/ This software is free software under the LPGL. """ import re from rdflib import * def rdf4uri(uri="", base_uri="http://example.com/uriburner/"): """ This method returns available RDF meta-data for the Web page identified by <uri> as a string containing RDF/XML. Input Parameters: uri : URI of the page base_uri : Base URI to be used for the RDF model Output Parameter a string containing RDF/XML """ # Step 1: Fetch entity identifier from URI # Amazon Example: # http://www.amazon.com/Apple-touch-Generation-NEWEST-MODEL/dp/B002M3SOBU/ref=sr_1_9?ie=UTF8&s=electronics&qid=1269264339&sr=8-9 # http://www.amazon.com/Apple-touch-Generation-NEWEST-MODEL/dp/B002M3SOBU/ # # We use a simple regex to extract the ID from the URI (needs to be adapted per each sponger) p = re.compile(r".*/dp/(\w*)/.*") m = p.match(uri) identifier = m.group(1) # Step 2: Fetch meta-data for that data entity, e.g. via AMAZON API # contact API --> omitted in this example # In this example, we simply return static data # Step 3: Compile RDF Graph NS = Namespace(base_uri) GR = Namespace('http://purl.org/goodrelations/v1#') RDFS = Namespace("http://www.w3.org/2000/01/rdf-schema#") RDF = Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#") g = ConjunctiveGraph() # static dummy data, to be replaced by real content from API g.add((NS[identifier+'#Product'],RDF['type'],GR['ProductOrServicesSomeInstancesPlaceholder'])) g.add((NS[identifier+'#Product'],RDFS['label'],Literal('SampleProduct'))) # Step 4: Return Graph as RDF/XML return g.serialize() if __name__ == '__main__': rdf_xml = rdf4uri(uri='http://www.amazon.com/Apple-touch-Generation-NEWEST-MODEL/dp/B002M3SOBU/') print rdf_xml
In the current form, it will return a static pattern for each valid Amazon product URI:
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:about="http://example.com/uriburner/B002M3SOBU#Product"> <rdf:type rdf:resource="http://purl.org/goodrelations/v1#ProductOrServicesSomeInstancesPlaceholder"/> <rdfs:label>SampleProduct</rdfs:label> </rdf:Description> </rdf:RDF>