Python4Spongers

From Wiki of the E-Business and Web Science Research Group
Jump to: navigation, search

RDF "Spongers" are a powerful middleware architecture, developed by OpenLink Software (the makers of Virtuoso), for creating RDF rich meta-data on demand.


The key idea is that the middleware consults public APIs or other data sources for collating relevant RDF meta-data for a given URI.

Unfortunately, the development of such sponger components is still difficult for many programmers. On this page, I propose a simple skeleton for coding the core transformation in Python.


This still requires a wrapper so that the code can be used in a Virtuoso environment, but that should be doable.


If you have any questions or suggestions, please contact me at mheppATcomputerDOTorg.


Example:
 
#!/usr/bin/env python
# encoding: utf-8
"""
py4spongers.py
 
Example of how the principle of OpenLink "Sponger" technology can be implemented in Python
 
Created by Martin Hepp on 2010-03-22.
http://www.heppnetz.de/
 
This software is free software under the LPGL.
 
"""
 
import re
from rdflib import *
 
def rdf4uri(uri="", base_uri="http://example.com/uriburner/"):
	"""
	This method returns available RDF meta-data for the Web page identified by <uri> as a string containing RDF/XML.
 
	Input Parameters:
	  uri : URI of the page
	  base_uri : Base URI to be used for the RDF model
 
	Output Parameter
 
	 a string containing RDF/XML
	"""
	# Step 1: Fetch entity identifier from URI
 
	# Amazon Example:
	# http://www.amazon.com/Apple-touch-Generation-NEWEST-MODEL/dp/B002M3SOBU/ref=sr_1_9?ie=UTF8&s=electronics&qid=1269264339&sr=8-9
	# http://www.amazon.com/Apple-touch-Generation-NEWEST-MODEL/dp/B002M3SOBU/
	# 
	# We use a simple regex to extract the ID from the URI (needs to be adapted per each sponger)
	p = re.compile(r".*/dp/(\w*)/.*")
	m = p.match(uri)
	identifier = m.group(1)
 
	# Step 2: Fetch meta-data for that data entity, e.g. via AMAZON API
	# contact API --> omitted in this example
	# In this example, we simply return static data
 
	# Step 3: Compile RDF Graph
	NS = Namespace(base_uri)
	GR = Namespace('http://purl.org/goodrelations/v1#')	
	RDFS = Namespace("http://www.w3.org/2000/01/rdf-schema#")
	RDF = Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#")
 
	g = ConjunctiveGraph()
	# static dummy data, to be replaced by real content from API
 
	g.add((NS[identifier+'#Product'],RDF['type'],GR['ProductOrServicesSomeInstancesPlaceholder']))
	g.add((NS[identifier+'#Product'],RDFS['label'],Literal('SampleProduct')))
 
	# Step 4: Return Graph as RDF/XML
	return g.serialize()
 
if __name__ == '__main__':
	rdf_xml = rdf4uri(uri='http://www.amazon.com/Apple-touch-Generation-NEWEST-MODEL/dp/B002M3SOBU/')
	print rdf_xml
 
 


In the current form, it will return a static pattern for each valid Amazon product URI:

 
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
  <rdf:Description rdf:about="http://example.com/uriburner/B002M3SOBU#Product">
    <rdf:type rdf:resource="http://purl.org/goodrelations/v1#ProductOrServicesSomeInstancesPlaceholder"/>
    <rdfs:label>SampleProduct</rdfs:label>
  </rdf:Description>
</rdf:RDF>
 
 
Personal tools
Navigation