PTSW4Sitemaps

From Wiki of the E-Business and Web Science Research Group

Jump to: navigation, search

PingTheSemanticWeb for Semantic Sitemaps

PingTheSemanticWeb is a valuable service for the Linked Data initiative.

Unfortunately, it does currently not accept the bulk notification for datasets for which a semantic sitemap is already available.

Below, we provide a free Python script that can be used to notify PingTheSemanticWeb using a given semantic sitemap.

Script

 
#!/usr/bin/env python
# encoding: utf-8
"""
pingTheSemanticWeb.py
 
Bulk Notification of PingTheSemanticWeb using a Semantic Sitemap
 
Created by Martin Hepp on 2009-06-14.
This software is free software under the LPGL.
 
Acknowledgements: Some inspiration on using urllib came from Doug Hellmann,
http://www.doughellmann.com/PyMOTW/urllib/
"""
 
import urllib
from xml.dom import minidom
 
SITEMAP_URI = 'http://rdf4ecommerce.esolda.com/sitemap.xml' # insert URI of your sitemap here
PTSW_URL = 'http://pingthesemanticweb.com/ping.php?url='
SEMSITEMAP_NS = 'http://sw.deri.org/2007/07/sitemapextension/scschema.xsd'
 
def get_listof_URIs(sitemapURI):
	"""Extract a list of all data dump locations from the semantic sitemap at <sitemapURI>"""
	sitemap = urllib.urlopen(sitemapURI)
	dom = minidom.parse(sitemap)
	resources = []
	elements = dom.getElementsByTagNameNS(SEMSITEMAP_NS,'dataDumpLocation')
	for location in elements:
		resources.append(str(location.firstChild.data))
	return resources
 
def call_ptsw(address):
	"""Notify PingTheSemanticWeb via http GET of the RDF content at <address>"""
	p = urllib.quote(address)
	url = PTSW_URL + p
	print "PTSW:", url
	response = urllib.urlopen(url)	
	headers = response.info()
	print 'DATE    :', headers['date']
	print 'HEADERS :'
	print headers
 
# Main 
uri_list = get_listof_URIs(SITEMAP_URI)
counter = 1
total = len(uri_list)
for address in uri_list:
	print "URI %d of %d: %s" % (counter, total, address)
	call_ptsw(address)
	counter = counter + 1
 


Usage

Simply  replace

'http://rdf4ecommerce.esolda.com/sitemap.xml' 

by the URI of your sitemap.xml file in the following line:

SITEMAP_URI = 'http://rdf4ecommerce.esolda.com/sitemap.xml'
Personal tools
Navigation