PCS2OWL Evaluation Website

This page was set up in the context of the evaluation of a scientific paper entitled PCS2OWL: An Approach for Deriving Web Ontologies from Product Classification Systems. On this page we present the source code and queries used for proving the conceptual correctness of the ontology conversion from the Google product taxonomy, one of the product classification systems currently supported by the PCS2OWL tool. To learn more about the project and the other product classification systems that have been converted so far please refer to the project landing page at

http://www.ebusiness-unibw.org/ontologies/pcs2owl/

Reverse-Engineering the Google Product Taxonomy

One step in the evaluation of our paper consisted of a reverse-engineering approach to build up the original Google product taxonomy file using the OWL ontology file generated by the PCS2OWL converter.

Animals & Pet Supplies
Animals & Pet Supplies > Live Animals
Animals & Pet Supplies > Pet Supplies
Animals & Pet Supplies > Pet Supplies > Bird Supplies
Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cages & Stands
Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Food
Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Ladders & Perches
Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Toys
Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Treats
Animals & Pet Supplies > Pet Supplies > Cat Supplies
...

The live SPARQL query examples given subsequently are executed against a SPARQL endpoint that contains the product ontology derived from the Google product taxonomy and stored under a graph name "urn:google". In the following, we give a step-by-step example presenting the necessary source code snippets together with the corresponding SPARQL queries, afterwards we show the complete source code example that we used for our evaluation.

This document is mainly split into three sections:

Step-by-Step Example

In this section we describe the details of our approach towards a reverse-engineered Google product taxonomy. As you will see shortly, the respective lines are built up from the hierarchy in the RDF graph using proper SPARQL queries and by concatenating the results using the same "right angle bracket" delimiter (">") as it can be found in the source file (see the file contents outlined above).

Step 1

First, we define constants for the endpoint URI, the product classification standard (google), the language of the source file (en), and the graph identifier (urn:google).

Step 2

Query 1 fetches all taxonomic classes from the Google taxonomy. Without restricting the number of results there would be returned 5,508 of such classes in total (btw, the text area for the SPARQL query is editable!).

Output

The detailed source code for retrieving all classes looks as in the following code snippet. It first prepares the SPARQL query, then it sets up the endpoint, executes the query on that endpoint, and finally it processes the results encoded as JSON format.

Step 3

In step 3 we read in the contents of the source file of the Google product taxonomy. The file is split up into a list of lines. They will later be used for comparing and assessing the quality of the conversion. Moreover, a print statement is included that checks whether the number of lines equals the number of URIs from the SPARQL query, which necessarily need to be the same, otherwise the conversion was wrong.

Step 4

Query 2 selects all root nodes in the RDF graph, i.e. classes that have no parent class in the taxonomy.

Output

The execution of the above SPARQL query compiles a list of all root nodes in the RDF graph. The results are populated into the result list (which was empty so far), as outlined in the next code listing.

Step 5

The third SPARQL query retrieves label and hierarchy code of a given class and its parent class.

Output

Query 3 is applied in a loop over all URIs obtained from query 1. Furthermore, it is repeated using the SPARQL 1.1 property path feature for different hierarchical dephts, rdfs:subClassOf{1} (= rdfs:subClassOf), rdfs:subClassOf{2}, etc. That allows to build up all possible combinations of paths in the taxonomy, thus covering the whole taxonomy. At the same time, a list of keys keeps track of the depth of the taxonomy, later needed in order to iterate over the nodes of the taxonomy.

If we chose to print the aggregated result of the sequential execution of query 3 populated with results from query 1 as shown above, we would obtain something similar to the following table (a more extensive, yet incomplete table based on a random sample is available from here).

sc6_labelsc5_labelsc4_labelsc3_labelsc2_labelsc1_labelc_label
1.Apparel & Accessories Clothing Activewear Boxing Shorts
2.Hardware Tools Masonry Tools Floats
3.Hardware Hardware Accessories Lubricants
4.Food, Beverages & Tobacco Food Items Prepared Foods Sushi
5.Electronics Circuit Components Semiconductors Transistors
6.Home & Garden Kitchen & Dining Kitchen Tools & Utensils Scoops Ice Cream Scoops
7.Hardware Electrical Supplies Wall Plates
8.Vehicles & Parts Vehicle Parts & Accessories Motor Vehicle Care Vehicle Fluids Brake Fluid
9.Sporting Goods Exercise & Fitness Weightlifting Belts
10.Vehicles & Parts Vehicle Parts & Accessories Motor Vehicle Parts Motor Vehicle Exhaust Catalytic Converters

Step 6

Next, we iterate row- and column-wise over a table as indicated by the previous table, concatenating the labels of the class nodes and separating them from each other by a "right angle bracket" (">") as in the original Google product taxonomy file. Furthermore, a check step is included that looks up every constructed line in the list of lines obtained from the original file. For every matching or non-matching line respective counters are increased.

Each check step is logged using proper messages. "[YES]" that goes along with the output of a line indicates that a matching line could be found in the initial derived list from the original file. Similarly, "[NO]" denotes that no matching line could be found.

Step 7

The last step finally prepends a short summary to the logged output and writes it to a file.

Results

The snippet below shows the output written to the result file. The first part is a short summary describing that the number of URIs equals the number of lines in the original file. It also outlines the numbers of the YES/NO-counters, namely 5,508 checks succeeded and no single check failed. The detailed list of check steps is appended afterwards.

==============
Short Summary:
==============

No. items match?
--------------
True
--------------

Consistent?
--------------
YES:	  5508
--------------
NO:	      
--------------

[YES]	Home & Garden
[YES]	Baby & Toddler
[YES]	Toys & Games
[YES]	Religious & Ceremonial
[YES]	Vehicles & Parts
[YES]	Cameras & Optics
[YES]	Apparel & Accessories
[YES]	Mature
[YES]	Food, Beverages & Tobacco
[YES]	Health & Beauty
[YES]	Furniture
[YES]	Hardware
[YES]	Office Supplies
[YES]	Electronics
[YES]	Luggage & Bags
[YES]	Animals & Pet Supplies
[YES]	Media
[YES]	Arts & Entertainment
[YES]	Software
[YES]	Business & Industrial
[YES]	Sporting Goods
[YES]	Electronics > Audio > Audio Accessories > Satellite Radio Accessories
[YES]	Hardware > Countertops > Stone Countertops
[YES]	Sporting Goods > Water Sports > Boating > Rowing > Rowing Seat Pads
[YES]	Sporting Goods > Exercise & Fitness > Exercise Balls
[YES]	Hardware > Tools > Measuring Tools & Sensors > Distance Meters
[YES]	Furniture > Tables > Sewing Machine Tables
[YES]	Sporting Goods > Combat Sports > Fencing > Fencing Protective Gear
...

The complete file with all the results can be downloaded from here.

Full Example

Below is the full code example that combines the single steps from above (Note that the provided example takes advantage of some Python helper modules that for the sake of simplicity we do not discuss here):