Open In Colab

Using pycorese#

This notebook demonstrates how to use the pycorese package:

  • to load knowledge graph

  • to perform a SPARQL query

  • to validate a SHACL form

  • to access the classes of Corese Java API

Install pycorese#

Java Runtime Environment (JRE) 11 or higher is required to run pycorese.

If you don’t have Java installed please refer to the official website to download and install it.

!java -version
openjdk version "11.0.25" 2024-10-15
OpenJDK Runtime Environment (build 11.0.25+9-post-Ubuntu-1ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.25+9-post-Ubuntu-1ubuntu122.04, mixed mode, sharing)

pycorese is available on PyPI and can be installed using pip:

!pip install pycorese

Download the data files from the GitHub repository:

import os
import sys
if  not os.path.exists('./data/beatles.rdf'):
    print('Downloading the data files...')
    !mkdir -p ./data
    !wget https://raw.githubusercontent.com/corese-stack/corese-python/main/examples/data/beatles.rdf -O ./data/beatles.rdf
    !wget https://raw.githubusercontent.com/corese-stack/corese-python/main/examples/data/beatles-validator.ttl -O ./data/beatles-validator.ttl

if sys.platform == 'win32':
    !dir /b .\data\*.*
else:
    !ls ./data
beatles.rdf  beatles-validator.ttl

Connect to Corese API#

Demonstrate loading and querying data with CoreseAPI connected through Py4J or JPype packages. If you don’t specify the java bridge type, the default is Py4J.

#%%timeit -n 1 -r 1
from  pycorese.api import CoreseAPI

python_to_java_bridge = 'py4j'
corese = CoreseAPI(java_bridge=python_to_java_bridge)
corese.loadCorese()

High-level API#

Run SELECT query#

import os
data_path = os.path.abspath('./data/beatles.rdf')

query = '''
SELECT *
WHERE {?subject ?p ?o} LIMIT 5'''

graph = corese.loadRDF(data_path)
results = corese.sparqlSelect(graph, query=query, return_dataframe=True)

results
subject p o
0 http://example.com/Please_Please_Me http://example.com/artist http://example.com/The_Beatles
1 http://example.com/McCartney http://example.com/artist http://example.com/Paul_McCartney
2 http://example.com/Imagine http://example.com/artist http://example.com/John_Lennon
3 http://example.com/Please_Please_Me http://example.com/date 1963-03-22
4 http://example.com/McCartney http://example.com/date 1970-04-17

Load inference rules#

corese.resetRuleEngine(graph)
query = "select * where {?s a ?type} order by ?type"
print(corese.sparqlSelect(graph, query=query))
print("Graph size: ", graph.graphSize())
                                     s                           type
0  http://example.com/Please_Please_Me       http://example.com/Album
1         http://example.com/McCartney       http://example.com/Album
2           http://example.com/Imagine       http://example.com/Album
3       http://example.com/The_Beatles        http://example.com/Band
4       http://example.com/John_Lennon  http://example.com/SoloArtist
5    http://example.com/Paul_McCartney  http://example.com/SoloArtist
6       http://example.com/Ringo_Starr  http://example.com/SoloArtist
7   http://example.com/George_Harrison  http://example.com/SoloArtist
8        http://example.com/Love_Me_Do        http://example.com/Song
Graph size:  29

Adding inference rules to the Corese engine should change the results of the query by adding new triples.

corese.loadRuleEngine(graph, profile=corese.RuleEngine.Profile.RDFS)
print("Graph size: ", graph.graphSize())
Graph size:  33

Let’s see what was added.

query = "select * where {?s a ?type} order by ?type"
print(corese.sparqlSelect(graph, query=query))
print("Graph size: ", graph.graphSize())
                                      s                           type
0   http://example.com/Please_Please_Me       http://example.com/Album
1          http://example.com/McCartney       http://example.com/Album
2            http://example.com/Imagine       http://example.com/Album
3        http://example.com/The_Beatles        http://example.com/Band
4        http://example.com/John_Lennon      http://example.com/Person
5     http://example.com/Paul_McCartney      http://example.com/Person
6        http://example.com/Ringo_Starr      http://example.com/Person
7    http://example.com/George_Harrison      http://example.com/Person
8        http://example.com/John_Lennon  http://example.com/SoloArtist
9     http://example.com/Paul_McCartney  http://example.com/SoloArtist
10       http://example.com/Ringo_Starr  http://example.com/SoloArtist
11   http://example.com/George_Harrison  http://example.com/SoloArtist
12        http://example.com/Love_Me_Do        http://example.com/Song
Graph size:  33

The inference was that the solo artist is also a person although it was not explicitly stated in the data.

Run CONSTRUCT query#

prefixes = '@prefix ex: <http://example.com/>'
contruct = '''CONSTRUCT {?A_Beatle a ex:BandMember }
              WHERE { ex:The_Beatles ex:member ?A_Beatle}'''

results = corese.sparqlConstruct(graph, prefixes=prefixes, query=contruct)

print(results)
<?xml version="1.0"?>
<rdf:RDF
  xmlns:ex='http://example.com/'
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

  <ex:BandMember rdf:about='http://example.com/Ringo_Starr'>
  </ex:BandMember>

  <ex:BandMember rdf:about='http://example.com/John_Lennon'>
  </ex:BandMember>

  <ex:BandMember rdf:about='http://example.com/George_Harrison'>
  </ex:BandMember>

  <ex:BandMember rdf:about='http://example.com/Paul_McCartney'>
  </ex:BandMember>

</rdf:RDF>

By default, the CONSTRUCT query returns the RDF/XML format. For more concise format convert the results to Turtle.

ttl = corese.toTurtle(results)

print(ttl)
<http://example.com/George_Harrison> a <http://example.com/BandMember> .

<http://example.com/John_Lennon> a <http://example.com/BandMember> .

<http://example.com/Paul_McCartney> a <http://example.com/BandMember> .

<http://example.com/Ringo_Starr> a <http://example.com/BandMember> .

Run SHACL form validation#

In the example below, we will use the the SHACL shape file that validates that the beatles graph follows the rules:

  • A band has a name and at least on member who is also a Solo Artist

  • An album has one name, one date and one artist associated with it

  • A song has one name, one duration and at least writer and at least one performer associated with it

The validation should fail because the beatles graph does not contain the required information.

data_shape_path = os.path.abspath('./data/beatles-validator.ttl')

with open(data_shape_path, 'r') as file:
    data_shape = file.read()
    print(data_shape)
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ex: <http://example.com/>

# Shape for Bands
ex:BandShape a sh:NodeShape ;
    sh:targetClass ex:Band ;
    sh:property [
        sh:path ex:name ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
    sh:property [
        sh:path ex:member ;
        sh:class ex:SoloArtist ;
        sh:minCount 1 ;
    ] .

# Shape for Solo Artists
ex:SoloArtistShape a sh:NodeShape ;
    sh:targetClass ex:SoloArtist .

# Shape for Albums
ex:AlbumShape a sh:NodeShape ;
    sh:targetClass ex:Album ;
    sh:property [
        sh:path ex:name ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
    sh:property [
        sh:path ex:date ;
        sh:datatype xsd:date ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
    sh:property [
        sh:path ex:artist ;
        sh:nodeKind sh:IRI ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] .

# Shape for Songs
ex:SongShape a sh:NodeShape ;
    sh:targetClass ex:Song ;
    sh:property [
        sh:path ex:name ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
    sh:property [
        sh:path ex:length ;
        sh:datatype xsd:integer ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
        sh:property [
        sh:path ex:performer ;
        sh:nodeKind sh:IRI ;
        sh:minCount 1 ;
    ] ;
    sh:property [
        sh:path ex:writer ;
        sh:nodeKind sh:IRI ;
        sh:minCount 1 ;
    ] .
prefixes = '@prefix ex: <http://example.com/>'
report = corese.shaclValidate(graph, shacl_shape_ttl=data_shape_path, prefixes=prefixes)

print(report)
@prefix xsh: <http://www.w3.org/ns/shacl#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

<urn:uuid:66d7b5ea-0065-4f84-b0e4-d65ba0b16a11> a sh:ValidationResult ;
  sh:focusNode <http://example.com/Love_Me_Do> ;
  sh:resultMessage "Fail at: [sh:minCount 1 ;\n  sh:nodeKind sh:IRI ;\n  sh:path <http://example.com/performer>]" ;
  sh:resultPath <http://example.com/performer> ;
  sh:resultSeverity sh:Violation ;
  sh:sourceConstraintComponent sh:MinCountConstraintComponent ;
  sh:sourceShape _:b7 ;
  sh:value 0 .

[a sh:ValidationReport ;
  sh:conforms false ;
  sh:result <urn:uuid:66d7b5ea-0065-4f84-b0e4-d65ba0b16a11>] .

The SHACL validation report is verbose and can be reshaped into a DataFrame for readability.

report_dataframe = corese.shaclReportToDataFrame(report)

report_dataframe
type focusNode resultMessage resultPath resultSeverity sourceConstraintComponent sourceShape value
o
urn:uuid:66d7b5ea-0065-4f84-b0e4-d65ba0b16a11 http://www.w3.org/ns/shacl#ValidationResult http://example.com/Love_Me_Do Fail at: [sh:minCount 1 ; sh:nodeKind sh:IRI... http://example.com/performer http://www.w3.org/ns/shacl#Violation http://www.w3.org/ns/shacl#MinCountConstraintC... _:b9 0

The report tells us that for the song Love Me Do a performer is not specified.

Low-level API#

Adding triples manually to the graph.#

# Namespace
ex = "http://example.com/"

# Get the graph from either Graph or DataManager objects
graph = graph.getGraph()

# Create and add statements: Help! is an album
new_album_IRI = graph.addResource(ex + "Help")
rdf_Type_Property = graph.addProperty(corese.Namespaces.RDF + 'type')
album_type_IRI = graph.addResource(ex + "Album")

graph.addEdge(new_album_IRI, rdf_Type_Property, album_type_IRI)
JavaObject id=o37

Let’s see what was added.

query = f'''@prefix ex: <{ex}>
            SELECT *
            where {{?album a ex:Album }}'''

exec = corese.QueryProcess.create(graph)

results = exec.query(query)

print(results)
01 ?album = <http://example.com/Please_Please_Me>; 
02 ?album = <http://example.com/McCartney>; 
03 ?album = <http://example.com/Imagine>; 
04 ?album = <http://example.com/Help>; 

The new triple (album Help) was added to the graph.

Wer can add some more detailes for the album Help! and see what was added.

# Create and add statement: The name of the album is actually Help!
name_property_IRI = graph.addProperty(ex + "name")
name_literal = graph.addLiteral("Help!")

graph.addEdge(new_album_IRI, name_property_IRI, name_literal)

# Create and add statement: The new album was released in 1965
xsd = "http://www.w3.org/2001/XMLSchema#"
release_property_IRI = graph.addProperty(ex + "date")
release_literal = graph.addLiteral("1965", xsd + 'date')

graph.addEdge(new_album_IRI, release_property_IRI, release_literal)


# Create and add statement: The Beatles is the creator of the album Help
artist_property_IRI = graph.addProperty(ex + "artist")
artist_IRI = graph.addLiteral(ex + "The_Beatles")
graph.addEdge(new_album_IRI, artist_property_IRI, artist_IRI)
JavaObject id=o46
query = f'''@prefix ex: <{ex}>
            CONSTRUCT {{ ?album ?p ?o }}
            WHERE {{
                VALUES ?album {{ ex:Help }}
                ?album ?p ?o}} '''

exec = corese.QueryProcess.create(graph)

results = exec.query(query)

results_ttl = corese.ResultFormat.create(results, corese.ResultFormat.TURTLE_FORMAT)

print(results_ttl)
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.com/> .

ex:Help ex:artist "http://example.com/The_Beatles" ;
  ex:date "1965"^^xsd:date ;
  ex:name "Help!" ;
  a ex:Album .